about
notes
projects (current)

Math SLM (SFT + DPO)

Two-stage LoRA on DeepSeek-R1-Distill-Qwen-7B. SFT +6.4 pp across four math benchmarks; DPO a config-bottlenecked no-op. End-to-end on 8× H100 for ~$93.

Read the writeup.

© Copyright 2026 Debtirtha Saha. Built with Jekyll and the al-folio theme. Hosted on GitHub Pages.