Client Background
Client: A leading real estate and financing firm worldwide
Industry Type: Real Estate
Products & Services: Infrastructure Development, Financing, Real Estate
Organization Size: 10000+
The Problem
Creating a user-friendly data analysis tool capable of interpreting natural language queries and providing insightful analyses from CSV data. The tool should facilitate seamless interaction, enabling users to gain valuable insights without the need for technical expertise. Key functionalities should include data exploration, trend identification, pattern recognition, and anomaly detection, all presented in a comprehensible format. The tool must also ensure efficient handling of CSV datasets while maintaining accuracy and reliability in its analyses.
Our Solution
- Data Ingestion and Conversion:
CSV data is acquired from a source (local file system, cloud storage, etc.).
The data is then converted into a pandas DataFrame using the read_csv() function or similar methods provided by the pandas library.
- Data Cleaning:
Data Cleaning operations are performed on the dataframe so that it serves as an ideal input for Pandas Agent. These may include:
Column Data type conversion.
Handling Duplicates
Handling unnecessary columns, etc.
- Initialization of Langchain’s Pandas Agent:
Langchain’s Pandas Agent is initialized with the necessary parameters. These parameters include:
System prompt: A custom prompt provided by the user or defined in the application.
Temperature: A parameter controlling the randomness of the model’s outputs.
Model: The specific model or model configuration to be used by the agent.
Other relevant parameters based on the requirements and capabilities of the agent.
- Integration with Pandas DataFrame:
The DataFrame created in the previous step serves as input for the Pandas Agent. It contains the structured data which will serve as input for the Pandas Agent.
- Natural Language Query Interpretation:
The user interacts with the system by posing queries in natural language.
Langchain’s Pandas Agent interprets these queries using GPT-4 backend and converts them into executable commands or operations on the DataFrame.
- DataFrame Operations:
The Pandas Agent executes the operations needed on the DataFrame. These operations may include:
Filtering: Selecting rows or columns based on specified criteria.
Aggregation: Computing summary statistics or aggregating data based on groups.
Transformation: Modifying data in the DataFrame (e.g., adding or removing columns, changing data types).
Joining/Merging: Combining multiple DataFrames based on common keys or indices.
Sorting: Arranging rows or columns in a specified order.
Other pandas DataFrame operations as required by the user queries.
- Delivery to End User:
The processed output is delivered to the end user through the streamlit user interface.
The user can review the insights provided by the system and further refine their queries if needed.
Solution Architecture
Deliverables
Data Analysis Tool with Streamlit frontend.
Tech Stack
- Tools used
- Langchain, OpenAI gpt-4 API
- Language/techniques used
- Python
- Models used
- Pandas Agent, GPT-4
- Skills used
- Python, Streamlit, Streamlit cloud deployment, Langchain
- Web Cloud Servers used
- Streamlit cloud
What are the technical Challenges Faced during Project Execution
To make the tool follow the Indian standards in terms of Financial Year Quarters, currency and human readable values instead of exponential values.
How the Technical Challenges were Solved
The challenge was solved by decreasing the temperature of Pandas agent to 0 and make a custom system prompt to introduce maximum bias approximating the desirable answers.
Business Impact
The user was able get data analysis insights without expertise in python, pandas and other tools used in the process of Data Analysis in a fraction of time compared to what it would have been if the process was done manually.
Project Snapshots
- Frontend Streamlit Interface
- IDE Environment
Project website url
URL: https://app-test-pandas-agent-vjbjfjkmxfrvhkhc455p4k.streamlit.app/
(Non-Functional due to the expiry of OpenAI API Key)
Project Video
Link: https://www.loom.com/share/c2099f20e9214e18a2125f5b2fde794c?sid=faa8cc4b-001c-4c51-926c-6a551dfb7c63
Important Links
Video Demo: https://www.loom.com/share/c2099f20e9214e18a2125f5b2fde794c?sid=faa8cc4b-001c-4c51-926c-6a551dfb7c63
URL to test App: https://app-test-pandas-agent-vjbjfjkmxfrvhkhc455p4k.streamlit.app/
Project Success Story: https://docs.google.com/document/d/17VZukkZW6LsXVmb6IDIZWpp61sRQY_cE/edit?usp=sharing&ouid=111848530990018600604&rtpof=true&sd=true
Solution Diagram: https://drive.google.com/file/d/16T56xrxBHioAIRnoA0EmHlSdMcmzEWP3/view?usp=sharing
Summarize
Summarized: https://blackcoffer.com/
This project was done by the Blackcoffer Team, a Global IT Consulting firm.
Contact Details
This solution was designed and developed by Blackcoffer Team
Here are my contact details:
Firm Name: Blackcoffer Pvt. Ltd.
Firm Website: www.blackcoffer.com
Firm Address: 4/2, E-Extension, Shaym Vihar Phase 1, New Delhi 110043
Email: ajay@blackcoffer.com
Skype: asbidyarthy
WhatsApp: +91 9717367468
Telegram: @asbidyarthy