Ukiyo-e Haiku VLM (planned)
Vision-language model that writes haiku about Japanese ukiyo-e woodblock prints. SigLIP vision encoder + Qwen LLM, LoRA fine-tuned on Met Museum API images.
Pipeline:
- Vision encoder. SigLIP, frozen.
- LLM. Qwen (small), LoRA-adapted.
- Bridge. Projection layer from SigLIP embeddings into Qwen’s embedding space.
- Data. Ukiyo-e print images from the Met Museum Open Access API, paired with generated haiku captions.
Phase 2 of a VLM-from-scratch effort. Phase 1 (nanoVLM) is a prerequisite build. Budget: ~$15-25.
Writeup will follow when the model trains.