The Challenge
Constructing domain-specific semantic graphs manually remains highly inefficient, while standard text-to-graph pipelines ignore spatial coordinate topologies and multi-view visual data critical for industrial automation and maintenance environments.
The Approach
Designed and developed an automated extraction pipeline that parses pixel coordinates of detected objects, translates them into deterministic spatial relations (such as left of, inside of, or above), and populates formal OWL ontologies using owlready2. Additionally, implemented a multi-view spatial fusion algorithm to consolidate 2D geometric inputs into a cohesive 3D semantic model.
The Impact
Demonstrated that a deterministic geometric-to-semantic translation pipeline can match or exceed the accuracy of large end-to-end multi-modal models for structured spatial reasoning in industrial contexts. The thesis and its defense received a grade of 1.4, reflecting methodological rigor and structured LLM evaluation.
System Architecture
The system processes annotated image data in tabular format containing bounding boxes. It calculates the center of each object and applies threshold-based logic to determine spatial relationships. It utilizes owlready2 to generate standard-compliant RDF triples. The output was evaluated against four LLMs including DeepSeek-R1, DeepSeek-V3, Llama 3.1, and Qwen 2.5, using specific metrics for correctness and completeness.