Client Background

Client: A leading retail firm in the USA

Industry Type:  Retail

Services: Retail business, consumer services

Organization Size: 100+

The Problem

To use data ingested into Neo4j and use the nodes and relationships with its properties to determine which nodes are actually the same person. For eg: we have Person nodes in the data, now people might enter their names in different ways. Our main aim is to identify Person nodes that may have similar data and are actually the same person. This will be represented as a perfect match between the nodes. This single-person view is referred to as the Golden Record

Our Solution

Till date, we have loaded data into Neo4j and created relationships with score property which defines match strength. We have created some criterias by which we can determine what constitutes two nodes being the same and then based on them created ‘perfect match’ and ‘probable match’.
We have considered four properties for our criteria – full name, address, driver’s license, and passport number. We have relationships between nodes for these properties with scores, we use these in our perfect match and probable match creation.


We have also configured Graphlytics (a viz software) in the virtual machine which connects to the neo4j database and helps vizualize the nodes and relationships. 

We have also worked on some algorithms using the GDS library in neo4j to produce more information on the graph, the common neighbors algorithm was used to produce scores based on node similarity and the higher the score the higher the similarity. Other algorithms were tried as well but since all the properties are of String format it did not work on it.

We have Resolved issues neo4j is facing when deleting a Large set of data and Provided steps to recover neo4j if it fails by going OutofMemory.

We have figured out the issues with the probable and perfect match cypher queries not working as intended and proposed a solution. 

Solution Architecture

Deliverables

  1. Created Perfect match and probable match queries.
  2. Created queries that return the nodes (even if it does not have associated relationship) and it’s associated relationship.
  3. A cypher query that return the result as a json object that can be mapped into a java oject.
  4. A cypher query that will create the relationship if two node’s properties  have same value.
  5. A cypher query that will delete one relationship from bidirectional relationship.
  6. A python code for a sample neo4j query
  7. Adjust the perfect and probable match queries so it would work for  current data. 

Tools used

Neo4j

Language/techniques used

Cypher Query Language

Models used

The common neighbors algorithm

Skills used

CQL

Databases used

Neo4j

Contact Details

Here are my contact details:

Email: ajay@blackcoffer.com

Skype: asbidyarthy

WhatsApp: +91 9717367468

Telegram: @asbidyarthy 

For project discussions and daily updates, would you like to use Slack, Skype, Telegram, or Whatsapp? Please recommend, what would work best for you.