There's a fine-tune of Qwen3 4B called "Jan Nano" that I started playing with yesterday, which is basically just fine-tuned to be more inclined to look things up via web searches than to answer them "off the dome". It's not good-good, but it does seem to have a much lower effective hallucination rate than other models of its size.
It seems like maybe similar approaches could be used for coding tasks, especially with tool calls for reading man pages, info pages, running `tldr`, specifically consulting Stack Overflow, etc. Some of the recent small MoE models from Chinese companies are significantly smarter than models like Qwen 4B, but run about as quickly, so maybe on systems with high RAM or high unified memory, even with middling GPUs, they could be genuinely useful for coding if they are made to be avoid doing anything without tool use.
It seems like maybe similar approaches could be used for coding tasks, especially with tool calls for reading man pages, info pages, running `tldr`, specifically consulting Stack Overflow, etc. Some of the recent small MoE models from Chinese companies are significantly smarter than models like Qwen 4B, but run about as quickly, so maybe on systems with high RAM or high unified memory, even with middling GPUs, they could be genuinely useful for coding if they are made to be avoid doing anything without tool use.