The Challenge
Expert supervision is a bottleneck. Technicians waste hours cross-referencing dense paper manuals to identify specific components in complex machinery.

An AI maintenance assistant that uses Object Detection and LLMs to answer common maintenance questions.
Expert supervision is a bottleneck. Technicians waste hours cross-referencing dense paper manuals to identify specific components in complex machinery.
I built an AR-capable AI assistant that acts as a real-time supervisor. It uses computer vision to track components and projects contextual instructions directly onto the hardware, while an LLM handles natural language Q&A for hands-free troubleshooting.
Drastically cut training time and cognitive load for new technicians. The combination of spatial AR overlays and contextual LLM reasoning proved that automated visual guidance can replace static manuals.
The system follows a distributed client-server model. The Raspberry Pi functions as the edge device, managing the camera input, calibration, and projector output for the AR overlay. The backend consists of a Dockerized server environment that hosts the computation-heavy SSD Object Detection model and LLM logic. Python scripts facilitate real-time network communication, transmitting images for inference and returning bounding box coordinates and textual guidance to the edge device for immediate visualization.
Fine-tuned SSD Object Detection on a custom-curated dataset of maintenance tasks.
Integration of Raspberry Pi for real-time object detection and visual feedback.
Dockerized architecture ensuring consistent deployment across edge (Pi) and server environments
Python scripts are used to control the Raspberry Pi and the server.
Fine-tuned SSD Object Detection on a custom-curated dataset of maintenance tasks.
Developed the Python backend for the Server and the Raspberry Pi.
Added image checking, camera calibration, network-wide server scanner and visualization of the object detection with a beamer.
Researched the viability of implementing thesis findings versus training a new scene graph generation model.