Client Background

  • Client Name: A leading IT consulting firm in the USA
  • Industry Type: Technology 
  • Products & Services: ERP Software Solutions and Back Office Services 
  • Organization Size: 100+

The Problem

  • The client had a large, complex ERP database.
  • Business owners and leaders couldn’t directly explore or  talk to their data.
  • Insights were locked behind:
    • Static reports
    • Complex SQL queries
    • Technical teams acting as intermediaries

The client wanted to make their ERP data truly accessible to business users, in a way that felt:

  • Interactive – ask questions in plain English
  • Self-service – no need to know SQL or table structures
  • Visual – be able to see tables/charts, not just raw numbers

A need to connect the business owners with their data was required where a chatbot that sits on top of their ERP database and lets business owners ask, explore, and visualize their data in a simple, natural way would be more flexible and easier.

Our Solution

We built CLAI Data Assistant, a production-ready ERP SQL chatbot that sits between business users and the database.

Key capabilities:

  • Natural language to SQL
    • Users type questions in normal English.
    • The assistant converts them into validated SQL queries and runs them on the ERP MSSQL database.
  • Smart intent detection & routing
    • Distinguishes greetings, casual chat, vague questions, business questions, and forecasting questions.
    • Routes each message to the right engine – SQL query engine or forecasting engine.
  • Business-ready answers, not just raw tables
    • Raw query results are summarized into concise business insights (top customers, trends, balances, etc.).
    • Large result sets are handled safely and summarized instead of dumping thousands of rows.
  • Forecasting engine for “next year” questions
    • For questions like “What will my revenue be next year?” the system pulls multi-year history, structures it, and passes it to a prediction engine to produce sensible future estimates.
  • Streamed, chat-style UX
    • Answers appear in a modern chat interface with streaming responses, making the assistant feel responsive and interactive.
    • The end result is a single chat interface where business owners can explore their ERP data in real-time without touching SQL or navigating complex menus.

Solution Architecture

 Interface & Access Layer

User Interaction Layer

  • Entry point for all user queries.
  • Accepts natural language inputs for historical analysis, forecasting, or general questions.

API Gateway

  • Central access point to the analytics system.
  • Handles request routing, authentication, and initial request validation.
  • Ensures all traffic flows through a controlled interface before reaching core services.

Session & Governance Layer

Session Initialization

  • Establishes conversation context.
  • Maintains session-level state for continuity across interactions.

Token Budget Management

  • Monitors and controls token usage per request.
  • Ensures system efficiency and prevents excessive computational cost.

Intent Classification Engine

  • Core decision-making component.
  • Classifies incoming queries into:
    • Historical Data Query
    • Forecast Query
    • General Query
  • Routes the request to the appropriate processing pipeline.

Data & Intelligence Processing Layer

This is the analytical core of the system.


A. Historical Query Pipeline

SQL Query Generation

  • Converts natural language questions into structured SQL queries.

Query Validation

  • Validates syntax, schema compatibility, and query safety before execution.

Database Interaction

  • Executes validated queries on the database.

Result Processing

  • Cleans, structures, and prepares raw query results.

Insight Generation

  • Produces contextual business insights from retrieved data.

B. Forecast Query Pipeline

Latest Data Detection

  • Identifies the most recent available dataset required for forecasting.

Data Aggregation

  • Aggregates and prepares historical data for model consumption.

Forecast Computation

  • Applies forecasting logic or models to generate predictions.

Predictive Insight Generation

  • Converts forecast outputs into business-readable insights.

C. General Query Pipeline

Direct AI Response

  • For non-analytical or informational queries.
  • Generates a direct AI-driven response without database interaction.

 Presentation & Response Layer

Visualization & Formatting

  • Consolidates outputs from all pipelines.
  • Formats responses into:
    • Structured text
    • Tables
    • Charts (if applicable)
  • Ensures consistency in output presentation regardless of pipeline.

Streaming Response to User

  • Streams responses incrementally back to the user.
  • Improves perceived performance and user experience.

Deliverables

  • Updated SSL Certifications
    Renewed and configured SSL certificates to ensure secure communication and uninterrupted service availability.
  • Implemented Token Usage Monitoring & Cost Estimation
    Developed a function to:
    • Track tokens used per question per day
    • Calculate total tokens consumed
    • Estimate cost per request for better monitoring and cost control
  • Enhanced Prediction Stability & Token Management
    Added a keep-alive counter to prevent grouped prediction failures and implemented a token limiter function to avoid exceeding token limits.
  • Enhanced Question Logging & Storage Mechanism
    • Updated the API to store user questions in the database
    • Modified logging to maintain only one log entry per question
    • Ensured each user can view only the questions they have personally saved
  • Integrated Client Credential Fetching API
    • Created a new API to fetch database credentials and user ID from the client’s API
    • Modified the system to automatically store fetched DB credentials securely in the environment configuration
  • Implemented Session-Based Architecture for Chatbot
    • Added session_id support
    • Created a centralized session manager
    • Removed global mutable state and migrated to session-scoped storage
    • Scoped database access per session with secure credential handling
    • Integrated OpenAI thread management
    • Completed testing and validation of session-based implementation
  • Refinement & Code Optimization Improvements 
    • Implemented TOON techniques in prediction_engine.py to reduce token consumption and optimize API usage
    • Modified conversational responses to display only required disclaimer messages
    • Added safety layer to prevent users from directly submitting SQL queries
    • Updated APIs to pass user_id instead of session_id
    • Made external API link dynamic for future database integrations
    • Tested and validated multi-database routing with new database
    • Implemented dynamic current-year detection based on latest available data and added forecast disclaimer messaging
    • Improved entity-summary mapping by associating each entity exclusively with its own summary
  • Configured Auto-Restart for Backend & Frontend Services
    Setting up PM2  to ensure uptime and system stability, particularly in cases of unexpected terminal closures.
  • Testing & Validation of ColdLion Application
    Conducted complete testing and validation to ensure overall system stability, correctness, and readiness for deployment.

Tech Stack

  • Framework used
  • FastAPI (backend API framework)
  • LangChain (SQL agent orchestration)
  • OpenAI Assistants API (tool calling & routing)
  • Uvicorn (ASGI server)
  • Language/techniques used
  • Python (backend, orchestration logic)
  • JavaScript / React / Next.js (frontend)
  • Tailwind CSS (UI styling)
  • REST APIs & JSON
  • Models used
  • OpenAI GPT-4o (core reasoning, SQL generation, summaries)
  • OpenAI GPT-4o-mini (fast intent classification & keyword matching to improve latency)
  • Skills used
  • Solution & data architecture for ERP systems
  • Secure database integration (read-only MSSQL access)
  • LLM prompt design and tool-calling workflows
  • SQL optimization and validation
  • Error handling, retries, and latency optimization
  • Frontend UX for chat-based analytics tools
  • Databases used
  • Microsoft SQL Server (ERP transactional database – Sales, AR, AP, Orders, Items, Inventory, Amazon tables)
  • Web Cloud Servers used
  • AWS Windows Server 
  • Application accessed via RDP and browser using internal IP / future subdomain

What are the technical Challenges Faced during Project Execution

  1. Converting free-form English into safe SQL
    1. Users phrase questions in many different ways, including typos and partial context.
    2. Needed to ensure the LLM does not generate unsafe or heavy queries.
  2. Choosing the right tables automatically
    1. The ERP database has many tables, including logs and system tables.
    2. Mapping user intent (“open AR”, “top customers”, “Amazon shipments”) to the correct subset of tables was non-trivial.
  3. Distinguishing between historical vs forecast questions
  4. Handling vague or conversational queries gracefully
    1. Users sometimes type “show me something” or ask general knowledge questions.
    2. The system needed to respond helpfully.
  5. Latency & user experience
    1. Calling LLMs, generating SQL, executing it on MSSQL, and then summarizing –  all within a reasonable response time..

How the Technical Challenges were Solved

  1. Strict intent classification & routing layer
    1. Implemented a dedicated intent classifier (greeting / conversational / vague / business / forecast).
    2. Only business/forecast messages reach the SQL or prediction engine, reducing mistakes and load.
  2. AI-assisted keyword → table mapping + schema cache
    1. Built a schema cache that scans all tables and filters out logs/system tables.
    2. Combined an AI keyword matcher with a curated mapping (e.g., “AR”, “aging”, “open AR” → ARDocument, ARBatchDetail, Customer).
    3. This ensured the LLM only sees relevant tables for each question.
  3. Dedicated prediction tool for future questions
    1. Registered a separate run_prediction_tool in the Assistant, with clear routing rules for future-looking questions (forecast, predict, next year, 2026+).
    2. This cleanly separated “read existing data” vs “forecast future data”.
  4. Safe SQL validation and retry logic
    1. Every generated SQL query is passed through a validation layer that blocks non-SELECT statements.
    2. If a query fails due to missing columns/tables, the system can rebuild schema cache and retry.
  5. Handling vague & conversational inputs
    1. For vague business queries (“show me data”), the assistant replies with a clarification prompt offering options (Customers, Sales, AR, etc.).
    2. For general knowledge questions, it answers briefly and reminds the user that its main role is to assist with their business data.
  6. Latency optimization
    1. Used GPT-4o-mini for lightweight tasks like intent classification and keyword matching.
    2. Cached schema and minimized context to reduce token size.
    3. Implemented streaming responses so users see answers as they are generated.

Business Impact

The assistant delivers clear value:

  • Easier Access to the users data which helps in Faster decision-making. 
  • Reduced dependency on SQL skills
  • Lower risk of unsafe database access
    • All queries are read-only and automatically validated.
    • No direct SQL access is given to end users.
  • Scalable foundation for future features
    • The architecture is flexible enough to add:
      • Rich visual dashboards
      • More forecasting models
      • Multi-tenant support
      • Additional databases or modules

Project Snapshots

Project website url

https://ai.coldlion.com:8501/chat?userid=CLUser&serialno=0KWJKSMHb9dYPueHEjMYzrjJ4u7yaIX5YAmvFJpTFADloXjLvLZI3rvYsnByGuCa

Project Video

https://app.screencastify.com/watch/zvXeHMhcOvFuohX94XwC?checkOrg=7b3668bb-d7fc-4a60-87c4-180b4485ff5b