Hacker Newsnew | past | comments | ask | show | jobs | submit | antoineMoPa's commentslogin

Training a tiny LLM for fun using Rust/Candle - I constantly tweak stuff and keep track of results in a spreadsheet and work on generating a bigger corpus with LLMs. It's a project for fun, so I don't care about finding actual human generated text, I'd rather craft data in the format I want using LLMs - Probably not the best practice, but I can sleep properly despite doing that.

My favorite output so far is that I asked it what life was and in a random stroke of genius, it answered plainly: "It is.".

It's able to answer simple questions where the answer is in the question with up to 75% accuracy. Example success: 'The car was red. Q: What was red? ' |> 'the car' - Example failure: 'The stars twinkled at night. Q: What twinkled at night? ' |> 'the night'.

So nothing crazy, but I'm learning and having fun. My current corpus is ~17mb of stories, generated encyclopedia content, json examples, etc. JSON content is new from this weekend and the model is pretty bad at it so far, but I'm curious to see if I can get it somewhere interesting in the next few weeks.

https://github.com/antoineMoPa/rust-text-experiments


I also loved Zig when manually typing code, but I increasingly use AI to write my code even in personal projects. In that context, I'd rather use Rust more, since the AI takes care of complex syntax anyway. Also, the rust ecosystem is bigger, so I'd rather stick to this community.

> Developers are not Idiots

I'm often distracted and AIs are idiots, so a stricter language can keep both me and AIs from doing extra dumb stuff.


> I'm often distracted

I really appreciate this in my role, where I have an office right next to the entrance to the building. I get walk-ins all of the time. When my door is closed, I get knocks on the door all of the time. Both AI and strict languages are great tools in my environment, where focus for me is as abundant as water in a desert.


It would be fun to compare with inference providers (groq/vertex ai, etc.).


yes going to add that


I'm making a tiny tiny LLM in rust (using candle) to teach myself AI https://github.com/antoineMoPa/rust-text-experiments


What I don't get about attention is why it would be necessary when a fully connected layer can also "attend" to all of the input. With very small datasets (think 0 - 500 tokens), I found that attention makes training longer and results worse. I guess the benefits show up with much larger datasets. Note that I'm an AI noob just doing some personal AI projects, so I'm not exactly a reference.


A fully connected layer has different weights for each feature (or position in input in your formulation). So the word "hello" would be treated completely differently if it were to appear in position 15 vs. 16, for example.

Attention, by contrast, would treat those two occurrences similarly, with the only difference depending on positional encoding - so you can learn generalized patterns more easily.


I think that this is the explanation I needed, thanks!


This is the case with most clever neural architectures: in theory, you could always replace them with dense layers that would perform better with enough resources/training, but that's just it, efficiency matters (number of parameters, training data, training time, FLOPS) and dense layers aren't as efficient (to put it mildly).

You have seen this play out on a small scale, but if you calculate the size of the dense layers necessary to even theoretically replicate a big attention layer or even convolution, to say nothing of the data needed to train them without the help of the architecture's inductive bias, you will see that the clever architectures are quite necessary at scale.


attention grows dynamically with the input size - mlps not


It's not clear from the landing page whether it's a git code platform / mercurial / entirely new VCS. I wish it was clearer (looking at the Readme, looks like it's indeed a git hosting platform).

I don't really care about the governance model as a user seeing this landing page for the first time, so I wonder why it's so prominent, vs telling me what the actual product is.


The governance model is the reason for its existence -- Gitea went stupid proprietary, so this is the rebel alliance being proud of being the rebel alliance.


Did anyone build a text-to-splat 3D generation model? Seems like it would be pretty straightforward? Should make it really easy to generate assets for video games.

EDIT: yep - https://gsgen3d.github.io/


Aren't gaussian splats incompatible with most common game development styles? No shadows, no rigging/animation as main issues - or maybe I'm misunderstanding / behind on the research - please correct me.


You are right, that many features in game engines cannot be used yet, especially relighting and reflections. But there are cases where game engines (like Unreal Engine 5) are used, for example in Virtual Production with Ledscreens, where a photorealistic background is needed (3d gaussians do look more realistic and are cheaper to produce than a comparable scene made of polygons)

Supersplat is actually a game engine (but for the web)


This looks like a promising tool to (also) generate completely generative 4d Multiviews which then can be used to generate 3D-GS, their pipeline also supports animated objects, camera zoom and pan. They do benchmark their results with 3D GS. The code is unfortunately not yet published, cant wait to try it. https://gen-x-d.github.io/


Nope! I'm also not convinced by it.


To me, this looks like something chatgpt would write.


I am surprised I had to scroll down this far to find someone making this point. In addition to being the obvious joke in this situation, the message was so dull, generic, and “this incredible journey” that I instinctively began to read diagonally before finishing the second paragraph.


Or, like, any PR person from the past... forever.


As an albanian, I can confirm she wrote it herself (obviously with the help of ChatGPT) -- no finesse and other writing elements.


It was not written by her, it was written by the other sides lawyers.


A tiny tiny LLM (essentially removing the "Large" part of "Large Language Model"). I taught neural networks to remember wikipedia articles (Actually, just one wikipedia article about horses.) and throw it back as-is by predicting the next token (when given the first token).

https://github.com/antoineMoPa/tfjs-text-experiment/blob/mai...


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: