Client Background
Client: A leading retail firm in the USA
Industry Type: Retail
Services: Retail business, consumer services
Organization Size: 100+
The Problem
To use data ingested into Neo4j and use the nodes and relationships with its properties to determine which nodes are actually the same person. For eg: we have Person nodes in the data, now people might enter their names in different ways. Our main aim is to identify Person nodes that may have similar data and are actually the same person. This will be represented as a perfect match between the nodes. This single-person view is referred to as the Golden Record
Our Solution
Till date, we have loaded data into Neo4j and created relationships with score property which defines match strength. We have created some criterias by which we can determine what constitutes two nodes being the same and then based on them created ‘perfect match’ and ‘probable match’.
We have considered four properties for our criteria – full name, address, driver’s license, and passport number. We have relationships between nodes for these properties with scores, we use these in our perfect match and probable match creation.
We have also configured Graphlytics (a viz software) in the virtual machine which connects to the neo4j database and helps vizualize the nodes and relationships.
We have also worked on some algorithms using the GDS library in neo4j to produce more information on the graph, the common neighbors algorithm was used to produce scores based on node similarity and the higher the score the higher the similarity. Other algorithms were tried as well but since all the properties are of String format it did not work on it.
We have Resolved issues neo4j is facing when deleting a Large set of data and Provided steps to recover neo4j if it fails by going OutofMemory.
We have figured out the issues with the probable and perfect match cypher queries not working as intended and proposed a solution.
Solution Architecture
Deliverables
- Created Perfect match and probable match queries.
- Created queries that return the nodes (even if it does not have associated relationship) and it’s associated relationship.
- A cypher query that return the result as a json object that can be mapped into a java oject.
- A cypher query that will create the relationship if two node’s properties have same value.
- A cypher query that will delete one relationship from bidirectional relationship.
- A python code for a sample neo4j query
- Adjust the perfect and probable match queries so it would work for current data.
Tools used
Neo4j
Language/techniques used
Cypher Query Language
Models used
The common neighbors algorithm
Skills used
CQL
Databases used
Neo4j
Contact Details
Here are my contact details:
Email: ajay@blackcoffer.com
Skype: asbidyarthy
WhatsApp: +91 9717367468
Telegram: @asbidyarthy
For project discussions and daily updates, would you like to use Slack, Skype, Telegram, or Whatsapp? Please recommend, what would work best for you.