Just adding for context that I use Gemini Ultra and across all models from Gemini 3.1 Pro to Claude Opus 4.6, I have never hit 429s as well as hitting model quota limits is incredibly rare and only happens if I am trying to run 3 projects at once. While not the biggest agentic coding fan, I have been toying with them and have been running it for at least 7-8 hours a day if not longer.
The amount of goal post shifting is so amusing to see. Yes, sure this was probably not an "important" or a particularly "challenging" problem which had been open for a while. Sure, maybe it remained open because it didn't get enough eyeballs from the right people to care about spending time on it.
Yes, there is too much overhyping and we are all tired of it somewhat. I still think if someone 10 years ago told me we would get "AI" to a stage where it can solve olympiad level problems and getting gold medals in IMO on top of doing so with input not in a very structured input but rather our complex, messy human natural language and being able to do so while interpreting, to various degrees of meaning what interpreting means, image and video data and doing so in almost real time I would have called you nuts and this thing in such a duration sci-fi. So some part of me feels crazy how quickly we have normalized to this new reality.
A reality where we are talking about if the problem solved by the automated model using formal verification was too easy.
Don't get me wrong, I am not saying any of this means we get AGI or something or even if we continue to see improvements. We can still appreciate things. It doesn't need to be a binary. What a time to be alive!
> The amount of goal post shifting is so amusing to see
Can you be specific about the goal posts being shifted? Like the specific comments you're referring to here. Maybe I'm just falling for the bait, but non specific claims like this seem designed just to annoy while having nothing specific to converse about.
I got to the end of your comment and counting all the claims you discounted, the only goal post I see left is that people aren't using a sufficiently excited tone while sifting fact from hype? A lot of us follow this work pretty closely and don't feel the need to start every post with "there is no need for excitement to abate, still exciting! but...".
> I am not saying any of this means we get AGI or something or even if we continue to see improvements. We can still appreciate things. It doesn't need to be a binary.
You'll note, however, that the hype guys happily include statements like "Vibe proving is here" in their posts with no nuance, all binary. Why not call them out?
Well there's a comment here saying "I won't consider it 'true' AI until it solves all millenium problems"... That goalpost seems to be defining AI as not only human level but as superhuman level (e.g. 1 in a million level intellect or harder)
Except nobody ever actually considered the "turing test" to be anything other than a curiosity in the early days of a certain branch of philosophy.
If the turing test is a goal, then we passed it 60 years ago and AGI has been here since the LISP days. If the turing test is not a goal (which is the correct interpretation), nobody should care what a random nobody thinks about an LLM "passing" it.
"LLMs pass the turing test so they are intelligent (or whatever)" is not a valid argument full stop, because "the turing test" was never a real thing ever meant to actually tell the difference between human intelligence and artificial intelligence, and was never formalized, and never evaluated for its ability to do so. The entire point of the turing test was to be part of a conversation about thinking machines in a world where that was an interesting proposition.
The only people who ever took the turing test as a "goal" were the misinformed public. Again, that interpretation of the turing test has been passed by things like ELIZA and markov chain based IRC bots.
Yeah, I agree. As a mathematician it's easy to say that it's not a super hard proof. But then again, the first proofs found by running AI on a large bucket of open problems was always going to be some easy proofs on a not-very-worked-on problem. The fact that no one did it before them definitely shows real progress being made. When you're an expert, it's hard to lose track of the fact that things that you consider trivial vs very hard may be in fact a very small distance in the grand scheme of things (rel to entities of different oom strengths)
I am rather pro-AI in general but I just can't imagine in 2015 what I would think if you told me that we would have AI that could solve an Erdos problem from natural language but it can't answer my work emails.
It actually doesn't help me at all at anything at my job and I really wish it could.
That isn't really goal post moving as much as a very strange shape to the goal post.
My memory of the recent history of the web isn't as cynical. I am not denying the perverse incentives involved but at least for Facebook & Instagram(maybe) we had the Graph API which was accessible and provided a decent amount of functionality and access but was curtailed post public backlash from misuse like with Cambridge Analytica. Similar for twitter where it provided decent APIs which wouldn't exhaust easily for end consumers and even the year(or two) before the Musk purchase they even provided academia with API access equivalent to their enterprise offerings for free. But there too bots and all were at least used as the public justification for curtailing many features.
Now none of this is meant to excuse the behaviour of all these large platforms for all the terrible practices they engage in. But at the same time, we never figured out how to safely deal with the power exposed by these APIs.
And recently API access has been a way to obtain AI training data, which user-content sites don't want to allow, since they want to sell it for money. See the Reddit API fiasco too.
Yes, it's just open weights however they do allow commercial use but with stipulations to do no harm, follow the law and not circumvent the guardrails in place. I would worry more about the revokable nature of it more than anything for commercial use.
I have been on an M1 macbook pro since launch and while I love the hardware, easily my favourite device I have ever owned but MacOS has just always been the thing to be the faustian bargain coming from being a linux person. I spend a lot of time SSHed into more GPU capable linux machines for most of my work and thus get an escape but after driving a friend's linux machine I started looking for a way to daily drive a linux machine. I tried Asahi Linux and also tried to find some non apple machines including with Snapdragon X Elite ones but so far I haven't found anything with good battery life and a decent linux driver support.
So far Asahi linux with the reduced battery life seems to be the best bet.
I don't mind tinkering. I love tinkering. I am not looking for "just works" but something which I could get to work after putting in the hours. If someone has suggestions please share.
Edit: Sorry to go somewhat off topic.
Stay away from ARM laptops and SoCs, they aren't there yet when it comes to Linux. If you like to tinker, go for it, but expect hardware to just not work, or worse, you'll get stuck on a kernel fork that never gets updated.
If you want a good Linux machine, buy one from a vendor that explicitly sells and supports machines with Linux on them.
IMO you can tinker as much as you want without forcing hardware compatibility issues upon yourself in order to have something to tinker with.
The Thinkpad x13s is more-or-less there. I've been using it as my primary machine (and laptop) for the last month, and it 'just works'. All day battery life, fanless so it's dead silent, and a crisp screen with decent DPI. KDE and Vivaldi run as fast as my i7-13700 desktop.
That seems to be the conclusion I have been avoiding to reach. With graviton and other arm based linux server machines being a good bulk of my work I hoped I wouldn't have to worry about multi architecture docker builds. Ah well.
Any suggestions for something well built but lightweight and that one could figure out how to get 8+ hours of actual daily usage battery life on?
I've had a great experience with my Framework 13 (AMD), although I usually get 4-5 hours of battery life, so not quite the full 8 hours you're looking for.
I have tried multiple framework devices as a partner firm uses them. Good devices, want to support what they are doing but yeah battery life is really lacking for my use case.
Others have mentioned thinkpads and in my experience the better ones all get 8h+, just stay away from the X1 carbon (my current work machine) with hybrid nvidia graphics. Those have problems of not turning off the external GPU and sucking the battery empty, but that isn't just a Linux problem it seems from lots of forum posts.
It has been a while since I daily drove one but my old laptop used to have an nvidia hybrid setup and it was possible to get power management to work decently with it but that might have been me being lucky with the configuration. Thanks for the headsup.
A recent ThinkPad with one of the latest AMD Ryzen U CPUs should have a very decent battery life. You just need some custom udev rules to set the right power saving states for different devices. Powertop should make this straightforward. IMHO, this is a great compromise, because you stay on x86_64 and Linux, you get within 3/4 of ARM's power efficiency, and hardware support is perfect. I've squeezed more than 11 hours from some models.
One thing that is often discounted is that Safari is marvel of power efficiency, which adds up to the efficiency of Apple M chips. IMHO, there should be dedicated Chromium and Firefox builds with compile flags and options that optimize efficiency. To counter that, running a barebones Linux setup is a good option. Keeping your CPU wakeups/s low lets you cross the 10 hour barrier.
I use chromium based browser and applications so don't get to enjoy the benefit of the power efficiency gains of safari sadly.
Geat to know this about your experience with thinkpads. Due to your familiarity with multiple such devices would you be able to recommend a starting point for someone who wants something light with budget not being a constraint?
The new x13 Gen 6 AMD with a U-series CPU and, ideally, configured with the 70Whr battery looks very nice. Should be less than 1 kg, similar to a MacBook Air.
I've used the previous generation, and it's possible to get 10 hours of use on battery with some tweaks.
It's a very popular model, so it's well tested and everything should just work with the latest kernel.
ASUD ProArt P16. I never want another machine. Slender, stiff, machined out of something expensive feeling. Everything works on 6.16, 4k OLED display, wonderful keyboard. Solid RDNA unit, NVIDIA card alongside.
With a clean hyprland setup, light as a feather, battery lasts forever unless you run it hard.
Will try to procure one for testing. It's a bit on the larger side for me though. How much battery life do you get on your average daily workload and what kind of workload is it? If you feel comfortable sharing.
Agreed about macs feeling bloated. I will not get overdramatic by calling latency in many functions unbearable but it is certainly quite high.
I like Framework devices but yeah sadly haven't used one with battery life that works for me. I can get a bit more than 5hrs on macbook under Asahi fedora with my usual usage.
x86 Thinkpads + Fedora work great. Hardware support out the box is almost perfect (I would say perfect because I don’t recall anything not working, but I may be missing something). In fact, Thinkpads used to have Fedora as an OS option, which is why I think the support is so good.
Outside that maybe something like system 76. They advertise 14h for one of their models.
x86 Thinkpads is what I am gravitating towards as well.
Regarding system76 I have heard really good things about their workstations but not about laptops. Have you used them? I recommend PopOS to anyone getting started with linux as the first distro though.
I have not used their laptops. You’ll find the usual complaints about their fit and finish on the web, mostly because they are rebranded cleo’s. But I think everything else considered they get positive reviews outside of that.
I spent years fussing about getting all of my APIs to fit the definition of REST and to do HATEAOS properly. I spent way too much time trying to conform everything as an action on a resource. Now, don't get me wrong. It is quite helpful to try to model things at stateless resources with a limited set of actions on them and to think about idempotency for specific actions in ways I don't think we did it properly in the SOAP days(at least I didn't). And in many cases it led to less brittle interfaces which were easier to reason about.
I still like REST and try to use it as much as I can when developing interfaces but I am not beholden to it. There are many cases which are not resources or are not stateless and sure you can find some obtuse way to make them be resources but that at times either leads to bad abstractions that don't convey the vocabulary of the underlying system and thus over time creates this rift in context between the interface and the underlying logic or we expose underlying implementation details as they could be easier to model as resources.
RPi is a good option if one has less RAM requirements especially if you take into account the quality of the drivers and software support in general.
RPi can be a compelling option if you need lower power draw. It does take some effort to squeeze out power efficiency but if the requirement can't be handled by a microcontroller then it is the most convenient of-the-shelf option.
For everything, RPi isn't a very compelling option. Even for GPIO, during RPi shortage I started experimenting with just using STM32 dev boards connected via USB to a NUC or an old PC and it worked well. But I just prefer to use ESP8266 or ESP32 for those tasks most of the time. Bandwidth and latency of USB communication/wifi to the main device has been low enough for it not to be a concern for me and I recon outside of very specific robotics cases it won't be for most.
CSI port is quite nice though and not many great alternatives.
I am curious as to how are you envisioning such a platform be funded and operated? Is it government(which country? or a coalition?) funded and operated by an "independent" board. Or something more akin to Wikimedia Foundation where the public from around the world fund the endowment and that combined with volunteers help run the platform?