How we built an agentic AI pipeline for SA Glass that ingests any order format — hand-sketched drawings, WhatsApp photos, Excel sheets, CAD files, or typed text — and produces a validated Proforma Invoice ready for ERP import.
SA Glass, a glass manufacturing company based in Pune, receives customer orders in a wide variety of formats — hand-drawn dimension sketches sent via WhatsApp, scanned handwritten order forms, Excel sheets with varying column structures, typed PDFs, and occasionally digital CAD drawings. Every incoming order had to be read, interpreted, manually re-keyed into their system, and converted into a Proforma Invoice before production could begin.
A single order took 25–40 minutes of staff time to process. During high-demand periods, the backlog stretched to 2–3 days — meaning customers waited days for a PI that should take minutes, and production planning couldn't start until the invoice was issued. The bottleneck wasn't capacity; it was the sheer variety of formats that made automation seem impossible.
The challenge wasn't OCR — it was understanding. A handwritten sketch showing glass dimensions with annotations, arrows, and shorthand requires a system that can reason about the document, not just read the pixels. Rules-based automation had been tried and abandoned; the format variation defeated every template-matching approach they'd tested.
We built a multi-agent workflow that breaks the problem into discrete reasoning steps — each handled by a specialised agent with a specific role and a defined confidence threshold. No single model tries to do everything. The agents collaborate, hand off structured data between themselves, and escalate to a human reviewer only when confidence falls below a defined threshold.
Scanned or photographed forms, any handwriting style
Hand-drawn glass specs with measurements and annotations
Variable column layouts, merged cells, informal headers
Digital text orders in any layout or template
DXF or image exports of engineering dimension drawings
Camera-phone images of physical order documents
Identifies the input type (sketch, form, spreadsheet, CAD, photo), applies the appropriate normalisation strategy, and routes to the correct extraction path. Handles orientation correction, deskew, and noise reduction before any extraction happens.
A vision-language model reads the normalised input and extracts all order fields — dimensions (length, width, thickness, shape), quantity, glass type, finish, special requirements, delivery details, and customer reference. For sketches, it interprets dimension lines, annotation arrows, and shorthand notation the same way an experienced estimator would.
Checks extracted fields against SA Glass's product catalogue (glass types, thickness ranges, available finishes), validates dimensional feasibility, flags impossible measurements, and cross-references against the customer master for pricing tier and credit terms. Each field gets a confidence score.
If all fields clear the confidence threshold, the order proceeds directly to PI generation. If any field is uncertain, the routing agent surfaces only those fields — highlighted in context — to a human reviewer. The reviewer sees the original document alongside the extracted data and corrects specific fields rather than re-keying the whole order.
Produces a formatted PI in SA Glass's standard template — line items, pricing, GST calculation, payment terms, and delivery schedule — and pushes it to their ERP system via API. The PI is also emailed to the customer automatically with the order reference attached.
Enterprise AI without governance is a liability. We built the review workflow as a first-class part of the system, not an afterthought.
High-confidence extractions proceed without human intervention. Average PI generation time: 38 seconds from document receipt.
Ambiguous sketches or unusual specs. Reviewer sees highlighted uncertain fields only — average review time: 3 minutes, not 35.
Catalogued 6 months of historical orders across all format types. Identified 11 distinct input patterns and the failure modes of prior automation attempts.
Multimodal extraction agent built and tested on 400 historical orders. Confidence scoring calibrated against ground-truth PIs from the same period.
Business-rule validation connected to live catalogue. Confidence thresholds tuned to balance automation rate vs. error rate with the operations team.
PI template engine, ERP API integration, customer email automation, and the human review UI shipped and tested with the SA Glass team in parallel.
Within four weeks of going live, SA Glass had processed over 800 orders through the pipeline. The production planning team reported that PI backlog — previously 2–3 days during peak periods — had dropped to same-day for all auto-approved orders. The review queue for flagged orders clears in under an hour, compared to the previous overnight backlog.
We don't publish model names or integration specifics — those are a competitive advantage for SA Glass. The structural pattern, however, applies to any document-heavy manufacturing or B2B workflow.
A frontier multimodal model handles extraction across all input types — the same model reads a typed PDF and a hand-sketched drawing. Domain-specific prompting guides it to SA Glass's product vocabulary and notation conventions.
A lightweight orchestration layer routes documents between agents, manages state between steps, and enforces the confidence-gating logic. Each agent has a defined input schema, output schema, and escalation condition.
The validation agent queries SA Glass's product catalogue and customer master in real time — not a static snapshot. Price changes, new product lines, and customer updates are reflected immediately without a pipeline retrain.
A lightweight web UI that shows the original document alongside extracted fields, with uncertain values highlighted in context. Corrections feed back into the confidence model to improve routing over time.
We build agentic document AI pipelines for manufacturing, logistics, and B2B services — any input format, ERP-ready output, human-in-the-loop where it matters. Start with a free 30-minute scoping call.
Talk to our team →We respond within 24 hours. First call is free.