Client Background

Client: A leading tech firm in the USA

Industry Type:  IT

Services: Consulting

Organization Size: 100+

The Problem

The client’s organization had a project that matches URLs up using TF-IDF algorithm. 

The script threw some errors and resolving these errors was the immediate ask. 

The client also required us to adjust the script for better accuracy and faster computation.

Our Solution

  1. R&D on the code developed
  2. Find & List bugs
  3. Solve the Bugs
  4. Find and get the best matching algorithm implemented. 
  5. Check and compare the existing matching algorithm implemented for accuracy. 
  6. if not check of other solution – ngrams or fuzzy logic
  7. Meet the expected output

Deliverables

  1. Fully functional code
  2. Solution & Documentation
  3. Support

Tools used

  1. Google spreadsheets
  2. Microsoft Excel
  3. Google Colaboratory

Language/techniques

Python 

Models used

  1. TF-IDF
  2. BERT
  3. Ngrams
  4. Flair Embeddings
  5. Rapid Fuzz

Skills used

  1. Problem-solving
  2. Communication
  3. Data Modelling
  4. Data Pipelining
  5. Python Coding

Databases used

Google spreadsheets

What are the technical Challenges Faced during Project Execution

  1. Bugs on the model used by the client was fairly competent using pretrained libraries
  2. The accuracy for the bug free code on the models used by the client was shaen once the model ran on a different set of data input

How the Technical Challenges were Solved

  1. A vanilla code to execute the same logic while fine tuning the matching algorithm was written in order to over come the shortcomings of the pretrained model bugs
  2. The data pre-processing was done manually in order to transform every instance of an input into better readable format to be able to go into the model and get best matching accuracy possible in the given timeframe of execution of the code 

Business Impact

  1. Helped the client to perform the matching process with maximum accuracy and lowest cost on code, by implementing manually written vanilla code from scratch to utilise the matching algorithm.

Project Snapshots 

Project website url

https://colab.research.google.com/github/AjayBidyarthy/Daniel-Emery/blob/main/vanilla.ipynb#scrollTo=vPp14xj020RL

Contact Details

Here are my contact details:

Email: ajay@blackcoffer.com

Skype: asbidyarthy

WhatsApp: +91 9717367468

Telegram: @asbidyarthy 

For project discussions and daily updates, would you like to use Slack, Skype, Telegram, or Whatsapp? Please recommend, what would work best for you.