The Challenge
Large language models struggle with geographic and cultural nuance due to representation gaps in their pre-training corpora. Additionally, smaller open-weights models (such as those with 8 billion parameters) demonstrate instruction drift and unstable formatting compliance when presented with complex zero-shot formatting instructions.
The Approach
Replaced complex formatting constraints with natural instruction layouts and built a dynamic few-shot retrieval framework. By structuring the reference corpus into clean query-response pairs, the pipeline retrieves and inserts the top three most semantically relevant context exemplars dynamically during inference.
The Impact
The pipeline combining parameter-efficient fine-tuning (PEFT using LoRA) and dynamic RAG significantly outperformed baseline models, satisfying strict verification metrics and securing a 16% absolute accuracy improvement on short-answer evaluation suites.
System Architecture
The solution utilizes the Llama-3-8B model with 4-bit quantization for efficiency. It employs the MiniLM-L6-v2 model for semantic embedding and retrieval. The pipeline includes a data augmentation stage where training data is stripped of options to create direct query-response pairs. Inference uses Greedy Search to ensure deterministic and concise outputs required by the evaluation script.