Tech

Developer demonstrates low-cost fine-tuning of local LLMs to replicate 1990s technical writing style

A weekend-long project reveals that small open-source models can be trained to mimic specific historical documentation conventions, though human oversight remains essential for production use.

Author
Owen Mercer
Markets and Finance Editor
Published
Draft
Source: Hacker News · original
Tech
No image available
Experiment utilising Bitsavers archive and QLoRA adapters shows Qwen models outperform Llama in style retention

A developer has successfully demonstrated that open-source large language models can be fine-tuned to replicate the specific structural and vocal characteristics of 1990s Microsoft technical documentation. The experiment utilised a corpus of over 37 million words sourced from the Bitsavers archive, covering out-of-print materials published between 1977 and 2005. By employing Quantized Low-Rank Adaptation (QLoRA) adapters on cloud GPU infrastructure, the author achieved a cost-effective method for imparting period-specific style to small models, with total training expenses amounting to approximately $50.

The study focused on two primary instruct models: Llama 3.1 8B and Qwen 2.5 7B. The training data underwent a rigorous two-stage cleaning process, beginning with Python scripts to remove OCR artifacts and concluding with a classification pass using the OpenRouter-hosted gemma-4-26b model to filter for intelligibility. This secondary step cost an additional $8. The final dataset comprised 192,456 examples in JSONL format, with text chunks capped at approximately 512 tokens to optimise training efficiency.

Results indicated a distinct divergence in performance between the two model families. Qwen models demonstrated superior retention of the target 1990s voice, producing period-structured documents that included formal headings and specific syntactic conventions. In contrast, Llama models struggled with the style transfer, with the Llama Instruct 40k variant producing what the author described as "bland marketing prose." The author attributed this deficiency to the heavy reinforcement learning from human feedback (RLHF) applied to the Llama models, which may have hindered the adoption of the older, less polished writing style.

Parameter testing revealed that structural rank and training epochs played critical roles in the outcome. Adapters with a rank of 8, offering fewer degrees of freedom, committed more readily to the training fiction than those with a rank of 16. However, combining a rank 16 adapter with only one epoch resulted in frequent hallucinations, suggesting that higher expressiveness without sufficient reinforcement leads to instability. The base Llama model, which lacks instruction tuning, failed entirely, generating raw corpus text rather than coherent responses.

The experiment concluded that while fine-tuning is a viable and inexpensive technique for creating specialised drafting tools, it serves as an augmentative aid rather than a replacement for human writers. The fine-tuned models retained the same lack of judgement as their untrained counterparts and required significant steering. The author noted that achieving the optimal balance of parameters remains a time-consuming process, but the results confirm that local, style-specific models are feasible for tasks such as style review and document drafting.

Continue reading

More from Tech

Read next: Apple to roll out manual EQ controls for AirPods in iOS 27 update
Read next: Apple rolls out visionOS 27, integrating AI-driven Siri into Vision Pro headset
Read next: Apple Overhauls Siri with Google Gemini Partnership and Standalone App at WWDC 2026