SecretLLM - Cultural QA System

SecretLLM - Cultural QA System

Optimizing Llama-3-8B for cultural reasoning for a 2 month project at TU Dresden.

Overview

Developed for the "Behind the Secrets of Large Language Models" module at TU Dresden, this project addresses the "cultural gap" in standard LLMs. I engineered a question-answering system using Meta-Llama-3-8B that improves accuracy on cultural tasks. While Supervised Fine-Tuning (SFT) helped with format alignment, the implementation of a Dynamic Retrieval-Augmented Generation (RAG) system yielded the most significant results, increasing Short Answer Question (SAQ) accuracy by 0.16 over the baseline. Full project report can be found in the github repository.

Tech Stack

PythonTransformersSentence TransformersPEFT / LoRALlama 3 8BHugging FaceWandBBitsAndBytes

The Challenge

Standard Large Language Models often exhibit Western-centric biases and lack the nuance required for specific cultural queries. Additionally, the 8B parameter model struggled with strict output formatting (JSON) and zero-shot reasoning for complex Short Answer Questions (SAQ), often leading to "instruction drift".

The Solution

I moved from complex JSON prompts to simplified natural language instructions and implemented a "Dynamic Few-Shot" RAG framework. By converting Multiple Choice data into Short Answer pairs, I created an augmented knowledge base. For every incoming query, the system retrieves and injects the top-k = 3 semantically similar examples into the prompt, grounding the model in relevant cultural context.

System Architecture

The solution utilizes the Llama-3-8B model with 4-bit quantization (NF4) for efficiency. It employs `all-MiniLM-L6-v2` for semantic embedding and retrieval. The pipeline includes a data augmentation stage where training data is stripped of options to create direct QA pairs. Inference uses Greedy Search to ensure deterministic and concise outputs required by the evaluation script.

Key Features

Dynamic RAG

Retrieval system that injects semantically relevant "in-context" examples for each specific query, boosting SAQ accuracy by 16%.

Data Augmentation

Automated pipeline to transform MCQ datasets into SAQ pairs, effectively doubling the training resources for the retrieval corpus.

Efficient Fine-Tuning

Used LoRA (Low-Rank Adaptation) and quantization to fine-tune the 8B model on limited hardware, optimizing for task alignment.

Ablation Studies

Evaluated external internet search (DuckDuckGo), discovering that "clean" internal data outperforms noisy web results for this specific domain.

Project Gallery

SecretLLM - Cultural QA System - Gallery Image 1
SecretLLM - Cultural QA System - Gallery Image 2
SecretLLM - Cultural QA System - Gallery Image 3

Development Timeline

Dec 2025

Setup & Training

Setting up the environment with Transformers/PEFT and executing initial Fine-Tuning runs.

Dec 2025

RAG Implementation

Developing the Dynamic Few-Shot logic and constructing the augmented knowledge base.

Dec 2025

Evaluation

Running ablation studies on decoding strategies and external search integration.

Jan 2026

Report Submission

Finalizing the project report and analysis of the 0.16 accuracy gain.