Hacker Newsnew | past | comments | ask | show | jobs | submit | GaggiX's commentslogin

The C compiler written by Claude a few months was able to compile a hello world.

The main problem I think that it was extremely slow.


Using larger contexts often costs more in the APIs or consume more of your quota but this is becoming less of a problem with models using more clever attention mechanisms and not just full attention on all layers.

You can look at: https://sebastianraschka.com/llm-architecture-gallery/ and see how much things have changed.


This is also something of a non issue because as context grows and attention gets diluted, the models perform worse. It'll cost Anthropic more to run your 900k context session, yes, but it's in your interest not to have a 900k session in the first place.


You’re right about performance degradation, but good luck trying to sell that as a product.

You can drive this car, but the last mile of this trip will use as much gas as the first 20 miles.

I think it’s in anthropics interest to keep this fact hidden from CEOs who push for ai adoption.


First G-shock G-LIDE*


At 4-bit quantization it should already fit quite nicely.


Unfortunately not with a reasonable context length.


I've got 139k context with the UD-Q4_K_XL on a 4090, q8_0 ctk/v. Could probably squeeze a little more but that's enough for me for the moment.


Hey, buddy! Can I bum a command line arg list off ya?


The model uses Gated DeltaNet and Gated Attention so the memory usage of the KV cache is very low, even at BF16 precision.


It really depends on what you think a reasonable context length is, but I can get 50k-60k on a 4090.


Fil-C is much slower, no free lunch, if you want the language to be fast and memory safe you need to add restrictions to allow proper static analysis of the code.


Much better idea to just buy oil and gas from Russia /s


Better than nuclear power plants getting hit by drones.

We will have Chernobyl longer than dependency on Russian oil and gas


There are plenty of good models on Openrouter that are very cheap, maybe it's time to experiment with alternatives.


what are some of them?


MiniMax M2.7, MiMo-V2-Pro, GLM-5, GLM5-turbo, Kimi K2.5, DeepSeek V3.2, Step 3.5 Flash (this last one is particularly cheap while still being powerful).


Can't judge on the quality of the comparison but I'd start from https://arena.ai/leaderboard/code and maybe from OpenRouter's ranking.


Kimi K2


>so despite the name it is probably best compared with the 8B/9B

It runs much faster than a standard 8B/9B model, the name is given by the fact that it uses per-layer embedding (PLE).


There is also: https://github.com/linto-ai/whisper-timestamped

It doesn't use an extra model (so it supports every language that works with Whisper out of the box and use less memory), it works by applying Dynamic Time Warping to cross-attention weights.


Just a warning that plain WhisperX is more accurate and Whisper-timestamped has many weird quirks.


I will copy the supermarket and paste it somewhere else.

I'm also going to download a car.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: