Loading Experience
Generative AI Engineer · Pune, India

Ravindra Kupatkar.

Building Dream @LumrexAI

Building production-grade agentic systems, RAG pipelines & LLM safety architectures — across banking, automotive & pharmaceutical domains. Not demos.

Syncro AI Logo
L
umrexAI
TECHNOLOGIES
0+
Years in Gen AI
0+
Production Systems
0+
PoCs Built
0k+
Records / Month
LangGraphLangChainCrewAI Microsoft PresidioGCP Vertex AIAWS Bedrock SageMakerRAG PipelinesLLM Fine-tuning Multi-Agent OrchestrationGoogle Model ArmorRedis Cache n8n · MCP · a2aHugging Face LangGraphLangChainCrewAI Microsoft PresidioGCP Vertex AIAWS Bedrock SageMakerRAG PipelinesLLM Fine-tuning Multi-Agent OrchestrationGoogle Model ArmorRedis Cache n8n · MCP · a2aHugging Face
01 — Philosophy

The laws
I build by.

These aren't principles from a paper — they're scars from real production systems that broke, were fixed, and survived.

— 01

LLMs are non-deterministic. Your system cannot be.

Every LLM call is wrapped in typed validation, retries, and schema enforcement. The AI is a powerful component — not the controller.

— 02

Guardrails are the architecture, not an afterthought.

Presidio PII detection runs before any data reaches an LLM. Unmasking only happens post-compliance verification. Zero exposure by design.

— 03

Every system must have a fallback path.

DLQs, circuit breakers, human review queues. When an LLM call fails — and it will — the system degrades predictably, not catastrophically.

— 04

Latency is a product decision, not a technical one.

Took CortexIQ from 7s to 3s by mapping the agent dependency graph and parallelizing independent nodes. Users feel the difference.

— 05

Observability from the first commit.

Token cost, latency, confidence scores — logged from day one. Not retrofitted after something breaks at 2 AM.

— 06

Build for the regulated domain, not the demo.

Banking, pharma, automotive — these domains don't tolerate hallucinations. RLHF-ready loops and compliance outputs are the minimum.

02 — Tech Stack

The skills behind production systems.


Every item below has been used in a shipped, production system — selected for reliability, safety, and real-world scale.

AI & Machine Learning
Generative AI Large Language Models (LLM) RAG AI Agents Deep Learning Machine Learning Natural Language Processing LLM Fine-Tuning (GPT-4, PaLM-2, Llama) SLM Fine-Tuning (Qwen2.5-14b-Instruct) Prompt Engineering Testing AI Systems
Frameworks & Libraries
LangGraph LangChain CrewAI Agentic AI Workflows Multi-Agent Workflows Microsoft Presidio (PII Guardrails) Hugging Face Models TensorFlow PyTorch
Languages
Python
Cloud & DevOps
GCP Vertex AI AWS Bedrock AWS SageMaker AWS Lambda Google Model Armor Docker Kubernetes (K8s) CI/CD Pipelines Grafana
Data & Tools
n8n Workflows MCP a2a Protocols Redis Cache Vector Databases
Methodologies
Analytical / Critical Thinking Agile Development Clean Code
03 — Systems Thinking

How my AI systems
actually work.

Real architecture flows — not marketing diagrams. Each node is a production component.

User Query
embed
Query Embedding
text-embedding-3
search
Vector DB
Pinecone / FAISS
top-k
Hybrid Retrieval
BM25 + Semantic
Context Assembly
512-token chunks
scan
Presidio PII Guard
mask before LLM
infer
LLM Inference
Bedrock / Vertex AI
validate
Output Validator
Pydantic schema
Confidence
≥ 0.80?
yes
Grounded Response
with citations
no
Human Review Queue
DLQ escalation
Retrieval: Hybrid BM25 + semantic. Chunk 512 tokens, 50-token overlap. Cross-encoder reranking on top-20 candidates.  |  Safety: Presidio masks SSN, DOB, MRN before LLM.  |  Fallback: Confidence < 0.80 → human review. Lambda timeout → DLQ + async retry.
User Message
pharmacy domain
detect
Presidio PII Scan
SSN · DOB · MRN · Drug IDs
mask
PII Masked Input
zero raw exposure
classify
Intent Classifier
LangGraph node 1
Redis Session State
AgentState · conversation history
clarify
Clarification Agent
LangGraph node 2
ground
Contextual Grounding
RAG retrieval
generate
Response Generator
LangGraph node 4
Compliance Verifier
LangGraph node 5
Compliant?
yes
Controlled Unmask
post-verification only
Safe Response + Feedback
RLHF-ready rating capture
Latency: 7s → 3s by parallelizing intent + clarification nodes in LangGraph.  |  Memory: MongoDB → Redis migration for 2× session retrieval speed.  |  RLHF: Feedback data model structured for future reward modeling from day one.
NHTSA Complaint
unstructured narrative text
clean
Text Preprocessing
normalization · dedup
tokenize
Domain Tokenizer
automotive vocabulary
encode
Feature Encoding
fine-tuned embeddings
Fine-tuned Classifier
SageMaker endpoint · 92% acc.
multi-label
30+ Domain Labels
suspension · engine · brakes…
score
Confidence Scoring
per-label probabilities
Confidence
≥ threshold?
yes
Structured Output
NHTSA taxonomy · auto-filed
no
Manual Review
low-confidence queue
Why fine-tune: Prompt-only → 73%. Fine-tuned on automotive domain vocab → 92%.  |  Scale: Batch endpoints on SageMaker — 100k+ records/month.  |  Multi-label: Single complaint can map to multiple issue domains simultaneously.
Raw Input
any domain
detect
PII Entity Detection
Microsoft Presidio
PII Found?
yes → mask
Anonymised Input
zero raw exposure
Policy Guardrails
content + intent check
pass
LLM Inference
masked context only
validate
Schema Validation
Pydantic strict mode
Confidence
≥ 0.85?
yes
Safe Output
controlled unmask
Compliance Audit Log
immutable · timestamped · hashed
PII Entities: SSN, DOB, MRN, names, addresses, drug IDs — masked before any LLM call.  |  Threshold: 0.85 confidence for pharmacy safety (vs 0.80 for RAG).  |  Audit: Every transformation logged to immutable S3 with timestamp, hash, and operator ID.
04 — Selected Work

Systems built.
Impact measured.

Each entry shows the real engineering decision — not just what was built, but why the architecture choices were made.

01
CortexIQ — Pharmacy-Safe Multi-Agent System
LangGraph·Presidio·Redis·RLHF-ready
Pharma · 2024–25
+
Problem

Pharmacy domain needed LLM-powered dialogue under simultaneous PII/PHI, hallucination, and regulatory constraints — no framework handled all three in a multi-agent flow.

Architecture

Defense-in-depth: Presidio masking → LangGraph agent flow (intent → clarification → grounding → generation → compliance) → controlled unmask. Redis session state.

Key Decision

MongoDB → Redis for AgentState gave 2× session retrieval speed. RLHF-ready feedback model built from day one — enables reward modeling without rearchitecting.

7s→3s
Latency Reduced
0
PHI Exposures
100%
Compliance Coverage
02
Contract Risk Detection System
GCP Vertex AI·Document AI·RAG
Banking
+
Problem

Legal teams manually reviewing large-scale banking documents for compliance risks — high error rate under deadline pressure, regulatory exposure when risks were missed.

Architecture

Vertex AI extraction → semantic RAG → risk classification with citation grounding. Every flagged risk traces to a source document — no black-box decisions for legal teams.

Key Decision

Hybrid retrieval (keyword + semantic) for legal precision. Pure cosine similarity missed technical clause matches. Thresholds calibrated on historical risk labels by domain experts.

90%+
Detection Accuracy
70%
Faster Review
Full
Audit Trail
03
NHTSA Complaint Categorization System
LLM Fine-tuning·SageMaker·Multi-label
Automotive
+
Problem

Thousands of unstructured NHTSA safety complaints required consistent categorization into 30+ issue domains for regulatory reporting — impossible at scale manually.

Architecture

Fine-tuned multi-label classifier on automotive vocabulary. Batch pipeline on SageMaker endpoints. Output aligned to NHTSA taxonomy with confidence scoring per label.

Key Decision

Prompt-only: 73% accuracy. Fine-tuned: 92%. Training cost justified at 100k+ records/month. Batch over real-time for cost efficiency at scale.

92%
Accuracy
100k+
Records/Month
30+
Issue Domains
04
Banking RAG Intelligent Search Platform
RAG·Vector DB·GCP
Banking
+
Problem

Fragmented banking product information across documents. Query resolution was slow, inconsistent, and dependent on human subject-matter experts for every answer.

Architecture

RAG pipeline over banking corpus. Hybrid retrieval for precision. Responses grounded with citations. Confidence gating routes low-certainty queries to human agents.

Key Decision

Citation grounding non-negotiable for banking — hallucination without attribution is a compliance failure, not just a quality issue. Every answer traces to a source document.

Retrieval Accuracy
Resolution Time
Full
Traceability
05 — War Stories

What broke &
what I learned.

Most engineers hide their failures. I document them — because real credibility is built on post-mortems, not highlight reels.

01
Multi-Agent Latency

7-Second Responses in CortexIQ

Sequential agent execution looked fine in testing. Under real conversational load, each agent waiting for the previous made the UX feel broken. Users dropped queries mid-flow.

Mapped full dependency graph in LangGraph. Parallelized independent nodes — intent + clarification ran simultaneously. Added Redis session caching. Final: ~3s avg response time.

02
RAG Hallucination Spike

High Retrieval Score, Wrong Documents

Cosine similarity returned confident scores for mismatched context. LLM answered confidently with wrong information — dangerous in regulated banking where users act on answers.

Switched to hybrid retrieval (BM25 + semantic). Added cross-encoder reranking. Implemented citation grounding validation — answer not traceable to a source chunk doesn't go out.

03
PII Architecture Gap

PHI Reaching LLM in Early Build

Early regex-based masking had gaps — composite patient codes bypassed sanitization. Pre-launch QA caught it before production, but the near-miss was sobering.

Replaced all custom regex with Microsoft Presidio as sole masking authority. Made PII scanning the mandatory first node — if it fails, request is rejected. No exceptions.

04
Model Drift

Output Quality Dropped Without Code Changes

Six weeks post-deployment, NHTSA classifier output format drifted. No code changes. Root cause: provider silently updated the underlying model version affecting token distributions.

Pydantic schema validation on every LLM response. Automated regression tests run daily against production endpoints. Schema mismatch triggers an alert before users are affected.

05
Wrong Database

MongoDB for Conversational State

AgentState stored in MongoDB added round-trip overhead per agent step. In a 7-node pipeline, small latencies compound. Fine in design — painful in production profiling.

Migrated to Redis. Key insight: conversational memory is ephemeral and latency-sensitive — MongoDB is wrong for this. Redis key structure optimized to eliminate JSON deserialization overhead.

06 — Writing

AI that drives impact
at production scale.

Technical writing on agentic AI, LLM safety, and high-impact production systems on Medium. No content marketing — just engineering depth.

07 — Experience

Where I've built at scale.


Every system listed under Projects was built here. One company, two-plus years, five production systems across three regulated domains.

Jun 2023 — Present
Generative AI Engineer
Tata Consultancy Services (TCS) · Pune
  • Designed CortexIQ — pharmacy-safe multi-agent system (LangGraph + Presidio + Redis). Reduced latency from 7s to 3s, zero PHI exposures.
  • Architected LLM-powered contract risk detection on GCP Vertex AI — 90%+ accuracy, 70% faster review cycle across large-scale legal documents.
  • Built RAG intelligent search platform for banking products, improving retrieval accuracy and customer query resolution time significantly.
  • Developed NHTSA complaint classification — 92% accuracy across 30+ automotive issue domains at 100k+ records/month on SageMaker.
  • Engineered agentic LangChain/LangGraph workflows: multi-step reasoning with intent → retrieval → generation → validation pipelines.
  • Implemented RLHF-ready feedback mechanism — structured for future fine-tuning and reward modeling without rearchitecting.
Certifications
Applying AI Principles with Google Cloud — Google
Agentic AI with LangChain and LangGraph — IBM / Coursera
Agentic AI with LangGraph, CrewAI, and AutoGen — IBM / Coursera
Advanced RAG with Vector Databases and Retrievers — IBM / Coursera
08 — Let's Talk

Let's build AI
that scales.

Send Me An Email →

Open to Senior Generative AI, Agentic AI Architect, and AI Engineering roles. If you're building AI that needs to make real impact — let's talk.

Ravindra Kupatkar · Generative AI Engineer · Pune, India
Available for Senior Roles