More

antoineMoPa · 2026-03-09T00:56:20 1773017780

Training a tiny LLM for fun using Rust/Candle - I constantly tweak stuff and keep track of results in a spreadsheet and work on generating a bigger corpus with LLMs. It's a project for fun, so I don't care about finding actual human generated text, I'd rather craft data in the format I want using LLMs - Probably not the best practice, but I can sleep properly despite doing that.

My favorite output so far is that I asked it what life was and in a random stroke of genius, it answered plainly: "It is.".

It's able to answer simple questions where the answer is in the question with up to 75% accuracy. Example success: 'The car was red. Q: What was red? ' |> 'the car' - Example failure: 'The stars twinkled at night. Q: What twinkled at night? ' |> 'the night'.

So nothing crazy, but I'm learning and having fun. My current corpus is ~17mb of stories, generated encyclopedia content, json examples, etc. JSON content is new from this weekend and the model is pretty bad at it so far, but I'm curious to see if I can get it somewhere interesting in the next few weeks.

https://github.com/antoineMoPa/rust-text-experiments

antoineMoPa · 2025-09-23T13:53:10 1758635590

I also loved Zig when manually typing code, but I increasingly use AI to write my code even in personal projects. In that context, I'd rather use Rust more, since the AI takes care of complex syntax anyway. Also, the rust ecosystem is bigger, so I'd rather stick to this community.

> Developers are not Idiots

I'm often distracted and AIs are idiots, so a stricter language can keep both me and AIs from doing extra dumb stuff.

kevinrineer · 2025-09-23T18:59:41 1758653981

> I'm often distracted

I really appreciate this in my role, where I have an office right next to the entrance to the building. I get walk-ins all of the time. When my door is closed, I get knocks on the door all of the time. Both AI and strict languages are great tools in my environment, where focus for me is as abundant as water in a desert.

antoineMoPa · 2025-07-25T13:38:34 1753450714

It would be fun to compare with inference providers (groq/vertex ai, etc.).

alexellman · 2025-07-25T13:53:44 1753451624

yes going to add that

antoineMoPa · 2025-03-30T22:55:45 1743375345

I'm making a tiny tiny LLM in rust (using candle) to teach myself AI https://github.com/antoineMoPa/rust-text-experiments

antoineMoPa · on Feb 26, 2025

What I don't get about attention is why it would be necessary when a fully connected layer can also "attend" to all of the input. With very small datasets (think 0 - 500 tokens), I found that attention makes training longer and results worse. I guess the benefits show up with much larger datasets. Note that I'm an AI noob just doing some personal AI projects, so I'm not exactly a reference.

andersource · on Feb 26, 2025

A fully connected layer has different weights for each feature (or position in input in your formulation). So the word "hello" would be treated completely differently if it were to appear in position 15 vs. 16, for example.

Attention, by contrast, would treat those two occurrences similarly, with the only difference depending on positional encoding - so you can learn generalized patterns more easily.

antoineMoPa · on Feb 26, 2025

I think that this is the explanation I needed, thanks!

smallmancontrov · on Feb 26, 2025

This is the case with most clever neural architectures: in theory, you could always replace them with dense layers that would perform better with enough resources/training, but that's just it, efficiency matters (number of parameters, training data, training time, FLOPS) and dense layers aren't as efficient (to put it mildly).

You have seen this play out on a small scale, but if you calculate the size of the dense layers necessary to even theoretically replicate a big attention layer or even convolution, to say nothing of the data needed to train them without the help of the architecture's inductive bias, you will see that the clever architectures are quite necessary at scale.

DrVoidberg · on Feb 26, 2025

attention grows dynamically with the input size - mlps not

antoineMoPa · on Jan 19, 2025

It's not clear from the landing page whether it's a git code platform / mercurial / entirely new VCS. I wish it was clearer (looking at the Readme, looks like it's indeed a git hosting platform).

I don't really care about the governance model as a user seeing this landing page for the first time, so I wonder why it's so prominent, vs telling me what the actual product is.

yencabulator · on Jan 20, 2025

The governance model is the reason for its existence -- Gitea went stupid proprietary, so this is the rebel alliance being proud of being the rebel alliance.

antoineMoPa · on Nov 6, 2024

Did anyone build a text-to-splat 3D generation model? Seems like it would be pretty straightforward? Should make it really easy to generate assets for video games.

EDIT: yep - https://gsgen3d.github.io/

viraptor · on Nov 6, 2024

Aren't gaussian splats incompatible with most common game development styles? No shadows, no rigging/animation as main issues - or maybe I'm misunderstanding / behind on the research - please correct me.

rallyforthesun · on Nov 7, 2024

You are right, that many features in game engines cannot be used yet, especially relighting and reflections. But there are cases where game engines (like Unreal Engine 5) are used, for example in Virtual Production with Ledscreens, where a photorealistic background is needed (3d gaussians do look more realistic and are cheaper to produce than a comparable scene made of polygons)

Supersplat is actually a game engine (but for the web)

rallyforthesun · on Nov 7, 2024

This looks like a promising tool to (also) generate completely generative 4d Multiviews which then can be used to generate 3D-GS, their pipeline also supports animated objects, camera zoom and pan. They do benchmark their results with 3D GS. The code is unfortunately not yet published, cant wait to try it. https://gen-x-d.github.io/

antoineMoPa · on Oct 21, 2024

Nope! I'm also not convinced by it.

antoineMoPa · on Sept 25, 2024

To me, this looks like something chatgpt would write.

latexr · on Sept 25, 2024

I am surprised I had to scroll down this far to find someone making this point. In addition to being the obvious joke in this situation, the message was so dull, generic, and “this incredible journey” that I instinctively began to read diagonally before finishing the second paragraph.

squigz · on Sept 25, 2024

Or, like, any PR person from the past... forever.

betimsl · on Sept 26, 2024

As an albanian, I can confirm she wrote it herself (obviously with the help of ChatGPT) -- no finesse and other writing elements.

blitzar · on Sept 26, 2024

It was not written by her, it was written by the other sides lawyers.

antoineMoPa · on Aug 12, 2024

A tiny tiny LLM (essentially removing the "Large" part of "Large Language Model"). I taught neural networks to remember wikipedia articles (Actually, just one wikipedia article about horses.) and throw it back as-is by predicting the next token (when given the first token).

https://github.com/antoineMoPa/tfjs-text-experiment/blob/mai...