The Challenge
Standard Large Language Models often exhibit Western-centric biases and lack the nuance required for specific cultural queries. Additionally, the 8B parameter model struggled with strict output formatting (JSON) and zero-shot reasoning for complex Short Answer Questions (SAQ), often leading to "instruction drift".
The Approach
I stripped away complex JSON prompts in favor of natural language instructions and built a "Dynamic Few-Shot" RAG framework. By transforming multiple-choice datasets into direct QA pairs, the system dynamically retrieves and injects the top 3 most relevant examples into every prompt.
The Impact
The dual approach of LoRA fine-tuning and Dynamic RAG crushed the baseline zero-shot performance. The system outputted logically sound, culturally accurate answers required by the strict academic evaluator, resulting in a definitive 16% accuracy gain.
System Architecture
The solution utilizes the Llama-3-8B model with 4-bit quantization (NF4) for efficiency. It employs `all-MiniLM-L6-v2` for semantic embedding and retrieval. The pipeline includes a data augmentation stage where training data is stripped of options to create direct QA pairs. Inference uses Greedy Search to ensure deterministic and concise outputs required by the evaluation script.