Client Background

Client: A leading hotels chain in the USA

Industry Type:  Real Estate, Hospitality

Services: Hostpitality

Organization Size: 1000+

Project Objective

To download the data from the servers using Cyberduck on the daily basis and perform data engineering on it. 

Project Description

  1. Firstly, download the property and forward files from the server
  2. Secondly, From the property master file a new data set was created with the conditions that the Bedrooms from Property file should be 5 or more or Max Guests from Property File should be 16 or more and City from Property File should be Sevierville or Pigeon Forge or Gatlinburg.
  3. In the forward file only those with status = R were kept and the other data was removed.
  4. Finally, forward file was merged with the new data set on ‘Property ID’ i.e., keeping only those forward data with the common ‘Property ID’ and City, Bedrooms, Max Guests columns from the new dataset was added to the forward file.

Our Solution

We created a Python Script which performs the task and create property and forward master files, which we deliver to client on weekly basis.

Project Deliverables

Two csv files named property master file and forward master file to be delivered weekly after applying various steps.

Tools used

PyCharm, PowerBi, Cyberduck, Microsoft Excel.

Language/techniques used

Python Programming Language is used to create scripts performing Data Manipulation in different files.

Models used

SDLC is a process followed for a software project, within a software organization. It consists of a detailed plan describing how to develop, maintain, replace and alter or enhance specific software. The life cycle defines a methodology for improving the quality of software and the overall development process.

We are using Iterative Waterfall SDLC Model as we have to follow our development of software in phases and we also need feedback on every step of the development of our project so as to keep track of the occurring changes with every step.

Figure 1 SDLC Iterative Waterfall Model

Skills used

Skills such as Data Pre-processing, cleaning, and data manipulation are used in this project.

Databases used

We used traditional way of storing the data i.e file systems.

Web Cloud Servers used

Cyberduck, which is a libre server and cloud storage browser for Mac and Windows with support for FTP, SFTP, WebDAV, Amazon S3 etc, was used in this project with Amazon S3 servers.

What are the technical Challenges Faced during Project Execution?

  1. Data to be processed was very big in size, so space complexity was a challenge in this project

How the Technical Challenges were Solved

  1. To solve the space complexity issues, we tried PowerBi, but now time complexity arises. 
  2. Then we did processing in chunks, by reducing file sizes to avoid memory errors.

Project Snapshots (Minimum 10 Pictures)