Client Background

Client: A leading eCommerce firm in the USA, Columbia, India, and Latin America

Gangala promotes local shops selling a wide variety of products at great prices. Easily find the best offers using our price comparison tool. It’s a WIN WIN for …

Industry Type:  eCommerce

Services: e-commerce, retail business

Organization Size: 100+

Project Title

Gangala.in: E-commerce site gathering data of different products from various sources and providing it on a single platform

Project Objective

  • Provide up-to-date data of any given product on the website along with 3-5 prices of that product from different sites for the customer to compare and buy. 

Project Description

A platform in which users can get price data of any product from multiple sites. The client provided us with raw data. We were tasked with building a pipeline for the data, build API’s to get product data such as price and update them and make sure that all the data is available for the front end team to access.  

Our Solution

We built them a pipeline to process and clean the raw data provided. We built API’s to fetch the updated data of the products. Neo4j was used as the intermediary data and mongoDB was used as our primary database. We also process the images of each product and remove any unwanted texts from it and add the client’s watermark. 

Project Deliverables

A fully-updated database with up to date data on all the products and each product having atleast 3-5 prices from different sites. 

Tools used

  • Numpy package 
  • Json package 
  • csv package 
  • concurrent futures package (for multithreading)
  • Py2neo package (to connect to neo4j using python)

Language/techniques used

  • Python 
  • Cypher Query Language (CQL)
  • APOC Queries

Databases used

  • Neo4j
  • MongoDB
  • Dataiku 
  • Odoo
  • DSS

Web Cloud Servers used

Linode cloud servers 

What are the technical Challenges Faced during Project Execution

We were asked to process 3million products per day and this was a challenge as the VM’s we used were not able to handle the load. 

How the Technical Challenges were Solved

We were able to overcome the challenge by using Asynchronous processing of the data thereby increasing the speed of the processing reducing the cost on the client side as well

Project website url

https://gangala.in/