The Challenge
Traditional industrial maintenance relies on dense printed manuals and expert physical oversight, resulting in significant cognitive load and onboarding delays for field technicians.

An interactive industrial maintenance assistant leveraging real-time edge computer vision and large language models for localized repair guidance.
Traditional industrial maintenance relies on dense printed manuals and expert physical oversight, resulting in significant cognitive load and onboarding delays for field technicians.
Developed an augmented reality assisted maintenance environment that provides real-time guidance. By tracking machine components via computer vision, the system projects step-by-step assembly instructions directly onto the workspace, while an integrated LLM handles hands-free voice-based questions and answers.
Significantly reduced training onboarding times and technician cognitive load. The prototype demonstrated that integrating spatial augmented reality overlays with semantic LLM assistants provides a superior, hands-free alternative to traditional printed manuals.
The system follows a distributed client-to-server model. The Raspberry Pi functions as the edge device, managing camera input, calibration, and projector output for the augmented reality overlay. The backend consists of a Dockerized server environment hosting the computationally heavy SSD Object Detection model and LLM logic. Custom Python workflows coordinate real-time network communication, transmitting frames for inference and returning bounding box coordinates alongside textual guidance to the edge device for immediate visual mapping.
Fine-tuned SSD Object Detection on a custom dataset of maintenance tasks.
Integration of Raspberry Pi for real-time component tracking and visual feedback.
Dockerized architecture ensuring consistent deployment across edge and server environments.
Custom Python workflows to coordinate edge camera capture, frame transport, and server-side model inference.
Fine-tuned SSD Object Detection on a custom dataset of maintenance tasks.
Developed the Python backend for both the server and the edge device.
Added image validation, camera calibration, network-wide server scanning, and AR overlay visualization using a projector.
Analyzed spatial reasoning extensions, evaluating the integration of bounding box geometric heuristics versus training an end-to-end scene graph generation model.