AI Assistant for Hydrogen & Renewable Energy Teams

Built a multi-agent RAG system serving as an always-available domain expert for technical teams in hydrogen and renewable energy — reducing information retrieval from hours to seconds and eliminating hallucinations through rigorous RAG architecture.

85% faster analysis through automated SQL queries and visualization generation

Information retrieval reduced from hours to seconds

100% domain accuracy via rigorous RAG engineering — hallucinations eliminated

Critical insights unlocked from previously inaccessible PDF and image-embedded documents

The Problem

Technical teams at a renewable energy and hydrogen operator spent an inordinate amount of time searching for information before making engineering and regulatory decisions. Documentation was fragmented across multiple repositories, file formats, and systems. A critical question — about a specific process parameter, a regulatory threshold, or a historical equipment failure — required searching through dozens of documents manually, often across multiple hours.

The volume and technical density of hydrogen and renewable energy documentation made the problem structural, not just inconvenient. Regulations, engineering specs, maintenance records, and vendor documentation exist as PDFs, scanned documents, and embedded tables — formats that standard search tools cannot interrogate intelligently.

The Constraints

Domain accuracy was non-negotiable. In hydrogen and renewable energy operations, incorrect information has physical consequences. An AI system providing a wrong regulatory threshold or a misquoted safety parameter could drive bad engineering decisions. Hallucination was not an acceptable failure mode — the system had to know what it did not know.

Heterogeneous document formats. The knowledge base included scanned PDFs, image-embedded tables, handwritten field notes digitized to image files, and structured SQL databases with operational history. A retrieval system that could only handle clean text would miss a substantial portion of the available knowledge.

Conversation continuity. Engineers ask follow-up questions. A system that answered each question in isolation — without retaining context from prior exchanges — required users to re-establish context constantly, defeating the purpose of an intelligent assistant.

Our Approach

We built a multi-agent Retrieval-Augmented Generation (RAG) system using LangChain and OpenAI GPT-4, with specialized agents handling distinct query types in a coordinated pipeline.

The knowledge base construction phase used OCR (pyTesseract) to extract text from image-embedded documents and scanned PDFs — making previously inaccessible content retrievable. All documents were chunked, embedded, and stored in ChromaDB for semantic similarity search at query time.

Four specialized agents handle the retrieval and reasoning pipeline:

Router agent classifies each query and dispatches it to the appropriate specialist
Document search agent performs semantic retrieval from the ChromaDB vector store
SQL agent queries structured operational databases and generates the corresponding visualizations
Synthesis agent integrates outputs across sources into a coherent, cited response

The system maintains conversation history within each session, allowing engineers to ask follow-up questions without re-stating context. A relevance filter prevents the system from attempting to answer questions outside its knowledge domain — it returns “I don’t have sufficient information on this” rather than hallucinating an answer.

FastAPI serves the conversational interface; the entire system is containerized in Docker and deployed on Azure.

The Outcome

Information retrieval time dropped from hours to seconds for standard technical queries
Analysis speed improved by 85% through automated SQL query generation and visualization
Domain accuracy held at 100% on evaluated queries — the RAG architecture eliminated hallucinations by grounding every response in retrieved documents
Knowledge access expanded to include previously inaccessible document formats, increasing the effective size of the queryable knowledge base significantly

Team

Engagement: 3 months, 3 engineers (1 AI/ML, 1 backend, 1 data engineering).

Stack: Python, LangChain, OpenAI GPT-4, FastAPI, ChromaDB, pyTesseract, Azure, Docker