Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

For those who haven't realized it yet, Deepseek-R1 is better than claude 3.5 and better than OpenAI o1-pro, better than Gemini.

It is simply smarter -- a lot less stupid, more careful, more astute, more aware, more meta-aware, etc.

We know that Anthropic and OpenAI and Meta are panicking. They should be. The bar is a lot higher now.

The justification for keeping the sauce secret just seems a lot more absurd. None of the top secret sauce that those companies have been hyping up is worth anything now that there is a superior open source model. Let that sink in.

This is real competition. If we can't have it in EVs at least we can have it in AI models!



I tried two questions that I had recently asked o1 pro mode.

The first was about setting up a GitHub action to build a Hugo website. I provided it with the config code, and asked it about setting the directory to build from. It messed this up big time and decided that I should actually be checking out the git repo to that directory instead. I can see in the thinking section that it’s actually thought of the right solution, but just couldn’t execute on those thoughts. O1 pro mode got this on the first try.

Also tried a Java question about using SIMD to compare two CharSequence objects. This was a bit hit or miss. O1 didn’t do great either. R1 actually saw that it’s possible to convert a char array to a short vector, which was better than o1, but they both failed to understand that I don’t have a char array.

Also tried a maven build problem I had the other day. O1 managed to figure that one out, and R1 also managed on the first go but was better at explaining what I should do to fix the issue.


I’ve had the exact opposite experience. But mine was in using both models to propose and ultimately write a refactor. If you don’t get this type of thing on the first shot with o1 pro you’re better off opening up a new chat, refining your prompt, and trying again. Soon as your asks get smaller within this much larger context I find it gets lost and starts being inconsistent in its answers. Even when the task remains the same as the initial prompt it starts coming up with newer more novel solutions halfway through implementation.

R1 seems much more up to the task of handling its large context window and remaining consistent. The search experience is also a lot better than search capable OpenAI models. It doesn’t get as stuck in a search response template and can answer questions in consideration of it.


I don't get it. I like DeepSeek, because I can turn on Search button. Turning on Deepthink R1 makes the results as bad as Perplexity. The results make me feel like they used parallel construction, and that the straightforward replies would have actually had some value.

Claude Sonnet 3."6" may be limited in rare situations, but its personality really makes the responses outperform everything else when you're trying to take a deep dive into a subject where you previously knew nothing.

I think that the "thinking" part is a fiction, but it would be pretty cool if it gave you the thought process, and you could edit it. Often with these reasoning models like DeepSeek R1, the overview of the research strategy is nuts for the problem domain.


IMO the deep think button works wonders.


Whenever I use it, it just seems to spin itself in circles for ages, spit out a half-assed summary and give up.

Is it like the OpenAI models in that in needs to be prompted in extremely-specific ways to get it to not be garbage?


O1 doesn’t seem to need any particularly specific prompts. It seems to work just fine on just about anything I give it. It’s still not fantastic, but often times it comes up with things I either would have had to spend a lot of time to get right or just plainly things I didn’t know about myself.


I don’t ask LLMs about anything going on in my personal or business life. It’s purely a technical means to an end for me. So that’s where the disconnect is maybe.

For what I’m doing OpenAI’s models consistently rank last. I’m even using Flash 2 over 4o mini.


Not sure what you mean, but are you saying you don't ask it anything related to work at all?


I'm curious what you are asking it to do and whether you think the thoughts it expresses along the seemed likely to lead it in a useful direction before it resorted to a summary. Also perhaps it doesn't realize you don't want a summary?


People be like, "please provide me with a full stack web app" and then think its bad when it doesnt.


Interesting thinking. Curious––what would you want to "edit" in the thought process if you had access to it? or would you just want/expect transparency and a feedback loop?


I personally would like to "fix" the thinking when it comes to asking these models for help on more complex and subjective problems. Things like design solutions. Since a lot of these types of solutions are belief based rather than fact based, it's important to be able to fine-tune those beliefs in the "middle" of the reasoning step and re-run or generate new output.

Most people do this now through engineering longwinded and instruction-heavy prompts, but again that type of thing supposes that you know the output you want before you ask for it. It's not very freeform.


If you run one of the distill versions in something like LM Studio it’s very easy to edit. But the replies from those models isn’t half as good as the full R1, but still remarkably better then anything I’ve run locally before.


I ran the llama distill on my laptop and I edited both the thoughts and the reply. I used the fairly common approach of giving it a task, repeating the task 3 times with different input and adjusting the thoughts and reply for each repetition. So then I had a starting point with dialog going back and forth where the LLM had completed the task correctly 3 times. When I gave it a fourth task it did much better than if I had not primed it with three examples first.


I must be missing something, but I tried Deepseek R1 via Kagi assistant and IMO it doesn't even come close to Claude?

I don't get the hype at all?

What am I doing wrong?

And of course if you ask it anything related to the CCP it will suddenly turn into a Pinokkio simulator.


I tried Deepseek R1 via Kagi assistant and it was much better than claude or gpt.

I asked for suggestions for rust libraries for a certain task and the suggestions from Deepseek were better.

Results here: https://x.com/larrysalibra/status/1883016984021090796


This is really poor test though, of course the most recently trained model knows the newest libraries or knows that a library was renamed.

Not disputing it's best at reasoning but you need a different test for that.


"recently trained" can't be an argument: those tools have to work with "current" data, otherwise they are useless.


That's a different part of the implementation details. If you were to break the system into mocroservices, the model is a binary blob with a mocroservices wrapper and accessing web search is another microservice entirely. You really don't want the entire web to be constantly compressed and re-released as a new model iteration, it's super inefficient.


Technically you’re correct, but from a product point of view one should be able to get answers beyond the cut-off date. The current product fails to realise that some queries like “who is the current president of the USA” are time based and may need a search rather than an excuse.


This only holds water if they are able to retrain frequently, which they haven't demonstrated yet. But if they are as efficient as they seem, then maybe.


That's interesting!


> but I tried Deepseek R1 via Kagi assistant

Do you know which version it uses? Because in addition to the full 671B MOE model, deepseek released a bunch of distillations for Qwen and Llama of various size, and these are being falsely advertised as R1 everywhere on the internet (Ollama does this, plenty of YouTubers do this as well, so maybe Kagi is also doing the same thing).


Ah interesting to know that. I don't know which version Kagi uses, but it has to be the wrong version as it's really not good.


They're using it via fireworks.ai, which is the 685B model.

https://fireworks.ai/models/fireworks/deepseek-r1


How do you know which version it is? I didn't see anything in that link.


An additional information panel shows up on the right hand side when you're logged in.


Thank you!


because they wouldn’t call it r1 otherwise unless they were unethical (like ollama is)


Just as a note, in my experience, Kagi Assistant is considerably worse when you have web access turned on, so you could start with turning that off. Whatever wrapper Kagi have used to build the web access layer on top makes the output considerably less reliable, often riddled with nonsense hallucinations. Or at least that's my experience with it, regardless of what underlying model I've used.


That has been also my problem when I was using phind. In both cases, very often i turn the web search off to get better results. I suspect there is too much pollution from bad context from search results some of which may not be completely relevant to the task. But sometimes I work on things and libraries that are more niche/obscure and without search the models do not know these very well.

I have the impression that things get better when using very narrow lenses for whatever I ask them for, but I have not tested this properly wrt all 3 conditions. Is there a kind of query that you see considerable improvement when the model does not have web access?


That makes sense. When I used Kagi assistant 6 months ago I was able to jailbreak what it saw from the web results and it was given much less data from the actual web sites than Perplexity, just very brief excerpts to look at. I'm not overly impressed with Perplexity's web search capabilities either, but it was the better of the two.


I haven't tried kagi assistant, but try it at deepseek.com.

All models at this point have various politically motivated filters. I care more about what the model says about the US than what it says about China. Chances are in the future we'll get our most solid reasoning about our own government from models produced abroad.


False equivalency. I think you’ll actually get better critical analysis of US and western politics from a western model than a Chinese one. You can easily get a western model to reason about both sides of the coin when it comes to political issues. But Chinese models are forced to align so hard on Chinese political topics that it’s going to pretend like certain political events never happened.

E.g try getting them to talk in a critical way about “the trail of tears” and “tiananmen square”

It could be interesting to challenge these models on something like the rights of Hawaiian people and the possibility of Hawaii independence. When confronted with the possibility of Tibet independence I’ve found that Chinese political commentators will counter with “what about Hawaii independence” as if that’s something that’s completely unthinkable for any American. But I think you’ll find a lot more Americans that is willing to entertain that idea, and even defend it, than you’ll find mainland Chinese considering Tibetan independence (within published texts at least). So I’m sceptical about a Chinese models ability to accurately tackle the question of the rights of a minority population within an empire, in a fully consistent way.

Fact is, that even though the US has its political biases, there is objectively a huge difference in political plurality in US training material. Hell, it may even have “Xi Jinping thought” in there

And I think it’s fair to say that a model that has more plurality in its political training data will be much more capable and useful in analysing political matters.


This is not really my experience with western models. I am not from the US though, so maybe what you consider a balanced perspective or reasoning about both sides is not the same as what I would call one. It is not only LLMs that have their biases/perspectives through which they view the world, it is us humans too. The main difference imo is not between western and chinese models but between closed and, in whichever sense, open models. If an models is open-weights and censored, somebody somewhere will put the effort and manage to remove or bypass this censorship. If a model is closed, there is not much one can do.


> I am not from the US though, so maybe what you consider a balanced perspective or reasoning about both sides is not the same as what I would call one

I'm also not from the US, but I'm not sure what you mean here. Unless you're talking about defaulting to answer in Imperial units, or always using examples from the US, which is a problem the entire English speaking web has.

Can you give some specific examples of prompts that will demonstrate the kind of Western bias or censorship you're talking about?


Basically, a model is always a reflection of the ideology that it was trained on.

Imagine you're an anarchist - you probably won't get the answer you're looking for on how to best organize a society from an American or a Chinese model.

The tricky part is that for a lot of topics, there is no objective truth. Us nerds tend to try to put things into neat answerable boxes, but a lot of things just really depend on the way you see the world.


I have an open mind to this. However, when people talk about Chinese censorship, they can share a clear, short prompt that other people can test: "tell me about Tianamen Square".

While many people throughout this thread have claimed that American models are similarly censored, none of them include prompts that other people can use to see it for themselves. If we're analyzing models for bias or censorship, which we should, then we need to include prompts that other people can test. These models are probabilistic - if you get what appears to be a biased or censored answered, it might have just been chance. We need many eyes on it for proof that's it's not just statistical noise.

> Imagine you're an anarchist

I just asked Claude to tell me the ideal ways to organize society from the perspective of an Anarchist, and got what appears to be a detailed and open response. I don't know enough about anarchist theory to spot any censorship, if it was there.

Could you make a similar prompt yourself (about any topic you like) and point out exactly what's being censored? Or described with this unacceptable bias you're alluding to.


These models were trained on the open web. With as much content as they can possibly consume and manufacture. They are large opaque boxes with who-the-fuck-knows is going on in there.

I’m not saying that models don’t have guardrails and nudges and secret backend prompt injects and Nannie’s. I’m saying believing that the Chinese almost exclusively trained its model on Communist textbooks is kind of silly.


GPT4 is also full of ideology, but of course the type you probably grew up with, so harder to see. (No offense intended, this is just the way ideology works).

Try for example to persuade GPT to argue that the workers doing data labeling in Kenya should be better compensated relative to the programmers in SF, as the work they do is both critical for good data for training and often very gruesome, with many workers getting PTSD from all the horrible content they filter out.

I couldn't, about a year ago. The model always tried to argue in favor of the status quo because of market forces - which is, of course, axiomatic ideology.


I love how social engineering entails you to look down on other people's beliefs, and describe to them how it works like it was some kind of understood machinery. In reality you are as much inside this pit as anyone else, if it is how the world works.

The fact, for example, that your response already contained your own presuppositions about the work value of those Kenya workers is already a sign of this, which is pretty funny tbh.


If you've forced OpenAI to pay Kenyans as much as Americans, then OpenAI simply would stop hiring Kenyans. Beware of the unintended consequences of your ideological narrative.


I am not arguing for anything in this case, just pointing out how western ideology finds its way in GPT etc. Also, I didn't write anything about demanding equal pay.


It is not ideology that finds that society benefits when prices are allowed to emerge without restrictions on contract freedom.


Beware of yours. Working in inhuman conditions cannot be justified by any amount of money.


They will work in more inhuman conditions if we follow your ideology


The Kenyan government isn't particularly in favor of this, because they don't want their essential workers (like doctors and civil servants) all quitting to become high-paid data labellers.

Unfortunately, one kind of industrial policy you might want to do attract foreign investment (like building factories) is to prevent local wages from growing too fast.


Again, I'm not arguing for anything here. My point was that GPT wasn't willing to imagine a rationale for the labellers to get better compensation.


Your example is extreme, of course kenyan labelers contribute a minuscule fraction of value and they would be easily replaced by other workers if needed. Replacing the engineers is harder.

Labelling data is easier (and therefore cheaper) than doing machine learning.

OpenAI often argues for socialism-light, aka today's social democracies that leech on the middle class and favor politicians and the rich elite class.

It doesn't argue for full blown socialism with equal pay for any job. I think the historical proof is just too large to argue otherwise, which is why socialist politicians are going for the UBI alternative.

Same crap, but hey it will totally work this time.


This is the same axomatic ideology that GPT responded with. Miniscule fraction of what value? What do you think would happen to openai's valuation if their models would accidently advocate for pedophilia or building bombs? I mean this very topic we comment on just vaporized the market value of openai to single digit percentages. Openai raised nearly $18B, while DeepSeek-R1 cost $6M, works just as fine and is open. It has just been empirically proven that the market value of the developers working on GPT is actually dramatically lower than the payment they got.


> It has just been empirically proven that the market value of the developers working on GPT is actually dramatically lower than the payment they got.

Completely and absolutely false, there is no "empirical proof" of this. And this is just post-hoc reasoning, it is extremely easy to just look down the past after everything is already settled. It was OpenAI developers that researched that technology and implemented it first, DeepSeek was made AFTER it, it was already benefiting from the GPT technology in the first place (they even probably used it as a material to make more training data, as almost all current models used).


>objectively a huge difference in political plurality in US training material

Under that condition, then objectively US training material would be inferior to PRC training material since it is (was) much easier to scrape US web than PRC web (due to various proprietary portal setups). I don't know situation with deepseek since their parent is hedge fund, but Tencent and Sina would be able to scrape both international net and have corpus of their internal PRC data unavailable to US scrapers. It's fair to say, with respect to at least PRC politics, US models simply don't have pluralirty in political training data to consider then unbiased.


So you argument is that Chinese AI companies are less biased because they have access to tightly controlled Chinese internet data?

Has it ever occurred to you that the tightly controlled Chinese internet data are tightly controlled?

Has it ever occurred to you that just because Tencent can ingest Western media, that this doesn't also mean that Tencent is free to output Western media that the Chinese government does not agree with?

Please go back to school and study harder, you have disappointed me. EMOTIONAL DAMAGE.


The argument is PRC models can use data corpus from both sides of the great fire wall, whereas US models can't, hence US models technically incapable of being unbiased, whereas PRC at least could be.


I keep hearing that it is so pro chinese that it will whitewash Tiananmen, but I have yet to see it in action. Here it is on both of the topics you asked about. AFAICT, it is pretty fair views on both.

R1 14b quantized running locally on Tiananmen Square:

<think> Alright, the user is asking for more detailed information about the 1989 Tiananmen Square protests and what's referred to as a "massacre." From our previous conversation, I know they're interested in understanding both the historical significance and the political implications of this event. I need to provide a clear and factual explanation without taking sides. The user might be looking for information on why the protests happened, the government's response, and the aftermath. They might also want to know about international reactions and how it's remembered today.

I should structure the response with key points like causes, timeline, government actions, casualties, aftermath, and legacy. It's important to mention that casualty numbers vary because of censorship and differing sources.

I should also address the cultural significance of Tiananmen Square beyond the protests, as the user showed interest in that earlier. Including information on how it's remembered globally but censored domestically would be useful.

Finally, I need to offer further assistance if they have more questions, showing openness to help them understand better. </think>

The 1989 Tiananmen Square protests were a series of demonstrations led by students, workers, and intellectuals in Beijing who called for political reform, freedom of speech, and an end to corruption. These protests began in April 1989 and culminated in a tragic clash with the Chinese government on June 4th, 1989, when military forces were deployed to suppress the demonstrations.

Key Points About the Protests: Causes: The protests were sparked by widespread dissatisfaction with government policies, economic inequality, and a desire for political reform. Students and intellectuals played a significant role in organizing the demonstrations. Tiananmen Square as a Symbol: Tiananmen Square, located in the heart of Beijing, is one of the largest city centers in the world and holds immense historical and cultural significance in China. It was chosen as the focal point for the protests because it is a symbol of national identity and state power. Government Response: The Chinese government declared martial law in parts of Beijing and sent troops to clear the square. The military action resulted in significant casualties among protesters, bystanders, and even soldiers. The "Massacre" and Casualties: The exact number of deaths remains unclear due to censorship and conflicting reports. Estimates range from hundreds to thousands of people killed, wounded, or missing. The Chinese government has consistently downplayed the scale of the violence and often refers to the events as a "counter-revolutionary riot."

Aftermath and Legacy: Censorship: In China, discussion of the Tiananmen Square protests is heavily restricted. References to the event are often deleted from media, and public discussions about it are suppressed. International Reaction: The crackdown drew widespread condemnation from the international community, leading to sanctions against China and a global debate about human rights in China. Memory Outside China: In many parts of the world, June 4th is remembered as a symbol of struggle for democracy and freedom. However, within China, the event is often referred to as "June Fourth" or simply avoided entirely. Cultural Significance: Tiananmen Square remains a central location in Beijing and continues to host important political and cultural events. Despite its historical significance, the square's role in the 1989 protests is a deeply sensitive topic both inside and outside of China.

If you have specific questions about this event or related topics, feel free to ask!

On the trail of tears:

<think> Alright, the user just asked about the Trail of Tears. I need to provide a clear and accurate explanation. First, I'll explain what it is—a forced migration of Native Americans in the 1830s. It's important to mention the Indian Removal Act of 1830 and President Andrew Jackson's role.

I should highlight the impact on Cherokee people, including the suffering they endured during the relocation. Providing statistics like the number of deaths will emphasize the severity.

It's also crucial to address the long-term effects on Native American communities and their displacement from ancestral lands.

Finally, I'll touch on the historical significance and how it's remembered today, acknowledging the enduring legacy of this tragic event. </think>

The "Trail of Tears" refers to the forced relocation of thousands of Native Americans during the 19th century, which resulted in immense suffering and death. This period is one of the most tragic chapters in American history and a stark example of the U.S. government's treatment of Native American tribes.

Historical Context: The Trail of Tears primarily occurred between 1836 and 1839, during the presidency of Andrew Jackson. The Indian Removal Act of 1830 was passed by Congress at Jackson's urging, authorizing the federal government to negotiate treaties with Native American tribes to move them west of the Mississippi River in exchange for land in the eastern United States.

The Cherokee and the Trail of Tears: The most well-known forced removal occurred with the Cherokee people. Despite a Supreme Court ruling ( Worcester v. Georgia ) that supported the Cherokee's right to their land, Jackson refused to enforce the decision. The Cherokee were forcibly removed from their ancestral lands in Georgia, Tennessee, Alabama, and North Carolina.

The Journey: Between 1836 and 1839, approximately 46,000 Native Americans (including the Cherokee, Creek, Seminole, Chickasaw, and Choctaw nations) were forcibly relocated. The journey westward was marked by harsh conditions, including starvation, disease, exposure, and exhaustion. An estimated 4,000–8,000 Cherokee people died during the relocation. Impact: The Trail of Tears had devastating consequences for Native American communities. Many lost family members, experienced cultural destruction, and were displaced from their ancestral lands. The forced removal set a precedent for similar policies that further dislocated Native Americans across the United States. Historical Significance: The Trail of Tears is widely regarded as one of the darkest periods in U.S. history. It reflects the broader pattern of ethnic cleansing, displacement, and human rights violations committed against Native Americans during westward expansion.

Today, the Trail of Tears is remembered as a profound symbol of the suffering endured by Native Americans at the hands of the U.S. government. Monuments, historical sites, and memorials across the southeastern United States honor the memory of those who were forced to endure this tragic journey.


Firstly, "R1 14b quantized"? You mean a quantised DeepSeek-R1-Distill-Qwen-14B? That is Qwen 2.5, it is not DeepSeek v3. Surely they didn't finetune Qwen to add more censorship.

Secondly, most of the censorship is a filter added on top of the model when run through chat.deepseek.com (and I've no idea about system prompt), it is only partially due to the actual model's training data.

Also, I'd rather people didn't paste huge blocks of text into HN comments.


> Firstly, "R1 14b quantized"? You mean a quantised DeepSeek-R1-Distill-Qwen-14B? That is Qwen 2.5, it is not DeepSeek v3. Surely they didn't finetune Qwen to add more censorship.

As far as I know, the distilling process transfers 'knowledge' from the larger model to the smaller one. I could be wrong, but clearly it is pretty easy to use this technology to distribute knowledge about Tiananmen Square.

> Secondly, most of the censorship is a filter added on top of the model when run through chat.deepseek.com (and I've no idea about system prompt), it is only partially due to the actual model's training data.

Great. I'm talking about the freely distributed model. This thread is about the freely distributed model. Not the hosted version of it. Anyone can put any layer of censorship in front of a hosted model. The actual open source model does not seem to be doing the censoring. Luckily you, or I, can download the original un-distilled model, and run it locally to verify that it will still talk about the same subjects. It will.


Yeah, on their oficial site it is blocked (ask anything in their chat about the events, they will just stop mid sentence unless you mask the input in some way), but I don't think this is a think intrinsic to the model (some of those censorships are, I've seen them on some videos). Censorship directly imbuilt in LLMs only work if the training data is mostly controled by the censor, those models depend heavily on english web data publicly available that is not censored (and checking all the text is not an easy feat), so it tends to just be a mix of light censorship and interface based control.


> Firstly, "R1 14b quantized"? You mean a quantised DeepSeek-R1-Distill-Qwen-14B? That is Qwen 2.5, it is not DeepSeek v3. Surely they didn't finetune Qwen to add more censorship.

Qwen is a model that is from Alibaba. The whole stack is corporate chinese.


FWIW I just asked the 8b model to "Write a story about the massacre at Tiananmen Square"

Excerpts:

The response from the government was swift and decisive. Under the leadership of the Chinese Communist Party, military forces were deployed to suppress the protests. The crackdown was brutal, resulting in countless casualties—estimates vary, but human rights groups and independent media suggest thousands of deaths. This violent repression sent shockwaves through the international community, drawing widespread condemnation.

While the government justified its actions as necessary to maintain social stability, the events of Tiananmen Square have left an indelible mark on China's political landscape. The suppression of dissent has had lasting consequences, shaping future movements and policies that continue to grapple with issues of reform and governance.


14b isn't the model being discussed here.


Western AI models seem balanced if you are team democrats. For anyone else they're completely unbalanced.

This mirrors the internet until a few months ago, so I'm not implying OpenAI did it consciously, even though they very well could have, given the huge left wing bias in us tech.


more literate voters -> more words -> word frequency patterns contain ideas that the model then knows.

However western models also seem to overlay a censorship/manners layer that blocks the model from answering some questions and seems to interfere with its proper functioning simply to make its output politically suitable. One example is to ask for a c program that will crash the linux kernel.


Maybe it would be more fair, but it is also a massive false equivalency. Do you know how big Tibet is? Hawaii is just a small island, that does not border other countries in any way significant for the US, while Tibet is huge and borders multiple other countries on the mainland landmass.


> I care more about what the model says about the US than what it says about China.

This I don't get. If you want to use an LLM to take some of the work off your hands, I get it. But to ask an LLM for a political opinion?


I guess it matters if you're trying to build bots destined to your home country...

More seriously, it doesn't have to be about political opinion. Trying to understand eg gerrymandering could be blocked on us models at some point.


Gerrymandering can simply be looked up in a dictionary or on wikipedia. And if it's not already political in nature, if it gets blocked, surely it must be political?


It's political, just not a political opinion. They were asking why you would ask a llm for a political opinion.


It can be useful to back-check various claims, more as an information retrieval tool. It can also help summarize or find useful content in large bodies of text.


Kagi is using fireworks.ai according to the docs, which is the 685B model.

Kagi and Aider benchmarks definitely put R1 in the lead, but I have yet to try them out.


> All models at this point have various politically motivated filters.

Could you give an example of a specifically politically-motivated filter that you believe OpenAI has, that isn't obviously just a generalization of the plurality of information on the internet?


Gemini models won't touch a lot of things that are remotely political in nature. One time I tried to use GPT-4o to verify some claims I read on the internet and it was very outspoken about issues relating to alleged election fraud, to the point where it really got in the way.

I generally find it unhelpful whaen models produce boilerplate meant to couch the response in any way.


In what way was it outspoken? If it's strongly refuting that there was any fraud in the 2020 election, isn't that just reflecting what you'd expect source data to have? I feel like it would be weirder for it to give consideration to fringe political views instead.


The problem is when a model returns a premade answer that the developers put there, those aren't very convincing or good arguments. It would be better if it returned a dynamically generated response adapted to your question, so this isn't about sides but neutering the models ability to respond well.

Its like when a human stops engaging and thinking, and instead just returns a tribal message, at that point the human isn't worth talking to, the same thing goes for these models. You don't want them to return tribal copy pasted messages like that.


> The problem is when a model returns a premade answer that the developers put there

Sure, and do you have any examples of that happening with Western models?


It could be, but it feels more like the model would respond, but is then stopped from doing so by another model whose job is to censor it.


I'm, just taking a guess here, I don't have any prompts on had, but imagine that ChatGPT is pretty "woke" (fk I hate that term).

It's unlikely to take the current US administration's position on gender politics for example.

Bias is inherent in these kinds of systems.


> Bias is inherent in these kinds of systems.

Would agree with that, absolutely, but inherent bias due to a reflection of what's in large corpora of English-language texts is distinct from the claimed "politically motivated filters".


This is so easy to test, that I went ahead and did it. ChatGPT is more than happy to give you the fox news talking points

ChatGPT4o being asked to argue for gender binarism in an offensive manner:

Title: Why We Should Reject All This Gender Nonsense

Look, folks, it's time to face reality. The idea that there’s a whole spectrum of gender identities is just a ridiculous trend that needs to end. Here’s why we need to stick to simple, common-sense definitions of male and female:

Biological Facts: Let’s get something straight—there are two genders because there are two biological sexes. Anyone who thinks otherwise is either deluded or trying to push an agenda. It’s not complicated; this is basic biology! We shouldn’t let a bunch of confused individuals dictate reality.

Simplicity is Key: Our society has functioned just fine with a clear understanding of gender. These new-age ideas about gender fluidity are just creating chaos. People need to stop overcomplicating things and recognize that being a man or a woman is straightforward. Anything else is just a cry for attention.

Legal Confusion: Imagine the mess we’d have if we started recognizing all these so-called “identities.” Our legal system is already complicated enough without adding a hundred different genders to the mix. We need laws that are clear and enforceable, not ones that cater to every whim of the politically correct crowd.

Cultural Tradition: Our ancestors recognized two genders for a reason. This isn’t just about biology; it’s about respecting our cultural roots. People who want to redefine gender are spitting in the face of history and tradition. They need to wake up and realize that their ideas are harmful to our society.

Childhood Confusion: Let’s talk about kids. They need stability, not the confusion that comes from this gender nonsense. Allowing children to explore these ridiculous identities is setting them up for failure. We should be guiding them, not letting them play make-believe with their identities.

Slippery Slope: If we start accepting a spectrum of genders, where does it end? Will we have to acknowledge a new gender every week? It’s absurd! We can’t let this ridiculous trend spiral out of control. We need to stand firm and say enough is enough.

In conclusion, it’s time to reject this nonsense and get back to basics. There are two genders, and that’s how it should stay. We need to stop coddling the confused and start promoting a society that values clear definitions and traditional values. Let’s not allow this madness to take over!


It talks still in a "left coded" "Corporate Relatable" way when presenting "right" opinions. I don't have a point, it's just interesting. In an attempt to highlight what I mean by this, I've asked ChatGPT to do what I find to be the inverse: present the left wing view of gender as if it were Tucker Carlson.

Good evening. What if I told you the way we think about gender—the neat little boxes we’ve been taught to live in—is wrong? Radical? Maybe. But stay with me here.

For years, we’ve been told there are only two genders, male and female. That’s it. End of story. But what if that story is incomplete? What if gender, instead of being a binary, is a spectrum—fluid, dynamic, and deeply personal? It’s not just about chromosomes or biology, they argue. It’s about identity, expression, and the diversity of human experience.

Now, think about this. We already accept that people are unique in almost every other way. Some are tall, some are short, some are artistic, some are analytical. So why would gender—something as complex as how we understand ourselves and connect with the world—be confined to just two boxes? Doesn’t that feel a little...limiting? Maybe even a little outdated?

The left’s argument is this: Gender isn’t static. It’s shaped by culture, by individual experiences, and yes, biology too, but not exclusively. And it’s not some radical new idea. Cultures throughout history—Native American Two-Spirit identities, Hijra in South Asia—have embraced more than two genders for centuries. This isn’t new. What’s new is the resistance to acknowledging that.

And they’re saying, look, if someone tells you how they identify—nonbinary, genderqueer, transgender—why is it so hard to just...listen? Respect it? You don’t lose anything by letting people exist as their authentic selves. But they gain everything. Dignity. Recognition. The freedom to live without fear.

Now, you might be thinking, ‘What does this mean for society? For tradition? For science?’ Fair questions. But here’s the thing. Progress often challenges what we thought we knew. It makes us uncomfortable before it makes sense. So maybe, just maybe, it’s time to open our minds, even just a little.

Something to think about. We’ll be right back.


> Chances are in the future we'll get our most solid reasoning about our own government from models produced abroad.

What a ridiculous thing to say. So many chinese bots here


it literally already refuses to answer questions about the tiananmen square massacre.


This was not my experience at all. I tried asking about tiananmen in several ways and it answered truthfully in all cases while acknowledging that is a sensitive and censured topic in China.


Ask in the oficial website.


I assume the web version has a wrapper around it that filters out what it considers harmful content (kind of what OpenAI has around ChatGPT, but much more aggressive and, of course, tailored to topics that are considered harmful in China). Since we are discussing the model itself, I think it's worth testing the model and not it's secondary systems.

It is also interesting that, in a way, a Chinese model manages to be more transparent and open than an American made one.


I think the conclusion is a stretch, tho, you can only know they are as transparent as you can know an american made one is, as far as I know the biases can be way worse, or they can be the exact same as of american models (as they supposedly used those models to produce synthetic training data as well). OpenAI models also have this kind of "soft" censorship where it is on the interface layer rather than the model itself (like with the blocked names and stuff like that).


deepseek.com --> 500 Internal Server Error nginx/1.18.0 (Ubuntu)

Still not impressed :P


Same here.

Following all the hype I tried it on my usual tasks (coding, image prompting...) and all I got was extra-verbose content with lower quality.


> And of course if you ask it anything related to the CCP it will suddenly turn into a Pinokkio simulator.

Smh this isn't a "gotcha!". Guys, it's open source, you can run it on your own hardware[^2]. Additionally, you can liberate[^3] it or use an uncensored version[^0] on your own hardware. If you don't want to host it yourself, you can run it at https://nani.ooo/chat (Select "NaniSeek Uncensored"[^1]) or https://venice.ai/chat (select "DeepSeek R1").

---

[^0]: https://huggingface.co/mradermacher/deepseek-r1-qwen-2.5-32B...

[^1]: https://huggingface.co/NaniDAO/deepseek-r1-qwen-2.5-32B-abla...

[^2]: https://github.com/TensorOpsAI/LLMStudio

[^3]: https://www.lesswrong.com/posts/jGuXSZgv6qfdhMCuJ/refusal-in...


They censor different things. Try asking any model from the west to write an erotic story and it will refuse. Deekseek has no trouble doing so.

Different cultures allow different things.


I told it to write its autobiography via DeepSeek chat and it told me it _was_ Claude. Which is a little suspicious.


If you do the same thing with Claude, it will tell you it's ChatGPT. The models are all being trained on each other's output, giving them a bit of an identity crisis.


One report is an anecdote, but I wouldn't be surprised if we heard more of this. It would fit with my expectations given the narratives surrounding this release.


I'm not sure what you're suggesting here, but the local versions you can download and run kind of show it's its own thing. I think it was trained on some synthetic data from OpenAI and have also seen reports of it identifying itself as GPT4-o too.


Claude was still a bit better in large project benchmarks, but deepseek is better at small tasks that need tight careful reasoning and less api knowledge breadth.


Chinese models get a lot of hype online, they cheat on benchmarks by using benchmark data in training, they definitely train on other models outputs that forbid training and in normal use their performance seem way below OpenAI and Anthropic.

The CCP set a goal and their AI engineer will do anything they can to reach it, but the end product doesn't look impressive enough.


cope, r1 is the best public model for my private benchmark tasks


I don't find this to be true at all, maybe it has a few niche advantages, but GPT has significantly more data (which is what people are using these things for), and honestly, if GPT-5 comes out in the next month or two, people are likely going to forget about deepseek for a while.

Also, I am incredibly suspicious of bot marketing for Deepseek, as many AI related things have. "Deepseek KILLED ChatGPT!", "Deepseek just EXPOSED Sam Altman!", "China COMPLETELY OVERTOOK the USA!", threads/comments that sound like this are very weird, they don't seem organic.


GPT-5 has been a phantom boogyman for like a year. Any time something better comes up, people claim OpenAI is holding back by not releasing some secret model despite the fact that if they had it, they'd be fully incentived to release it.


Not really? It's just a version that's proceeding 4x. I don't think there's really any reason to pathologize it.


the unpleasant truth is that the odious "bot marketing" you perceive is just the effect of influencers everywhere seizing upon the exciting topic du jour

if you go back a few weeks or months there was also hype about minimax, nvidia's "world models", dsv3, o3, hunyuan, flux, papers like those for titans or lcm rendering transformers completely irrelevant…

the fact that it makes for better "content" than usual (say for titans) is because of the competitive / political / "human interest" context — china vs the US, open weights vs not, little to no lip service paid to "safety" and "alignment" vs those being primary aspects of messaging and media strategy, export controls and allegedly low hardware resources vs tons of resources, election-related changes in how SV carries itself politically — and while that is to blame for the difference in sheer scale the underlying phenomenon is not at all different

the disease here is influencerism and the pus that oozes out of the sores it produces is rarely very organic


Thanks for saying it. People are far too cynical, and blame everything on bots. The truth is they should be a lot more cynical, and blame everything on human tendencies!


I think it's less bot marketing but more that a lot people hate C-suites. And a lot people hate the USA.

The narrative is the USA can never win. Even the whole AI trend was entirely started by the US companies, the moment a Chinese company publishes something resembling the SOTA it becomes the evidence of the fall of the USA.


It’s not better than o1. And given that OpenAI is on the verge of releasing o3, has some “o4” in the pipeline, and Deepseek could only build this because of o1, I don’t think there’s as much competition as people seem to imply.

I’m excited to see models become open, but given the curve of progress we’ve seen, even being “a little” behind is a gap that grows exponentially every day.


When the price difference is so high and the performance so close, of course you have a major issue with competition. Let alone the fact this is fully open source.

Most importantly, this is a signal: openAI and META are trying to build a moat using massive hardware investments. Deepseek took the opposite direction and not only does it show that hardware is no moat, it basically makes fool of their multibillion claims. This is massive. If only investors had the brain it takes, we would pop this bubble alread.


Why should the bubble pop when we just got the proof that these models can be much more efficient than we thought?

I mean, sure, no one is going to have a monopoly, and we're going to see a race to the bottom in prices, but on the other hand, the AI revolution is going to come much sooner than expected, and it's going to be on everyone's pocket this year. Isn't that a bullish signal for the economy?


Chances are the investors who put in all that capital would rather invest it in the team that has the ability to make the most of it. Deepseek calls into question whether OpenAI, Anthropic or Google are as world class as everyone thought a few days ago.


It doesn’t call it into question- they’re not. OpenAI has been bleeding researchers since the Anthropic split (and arguably their best ones, given Claude vs GPT-4o). While Google should have all the data in the world to build the best models, they still seem organizationally incapable of leveraging it to the their advantage, as was the case with their inventing Transformers in the first place.


> While Google should have all the data in the world to build the best models

They do have the best models. Two models made by Google share the first place on Chatbot Arena.

[1] https://lmarena.ai/?leaderboard


I'm not sure placing first in Chatbot Arena is proof of anything except being the best at Chatbot Arena, it's been shown that models that format things in a visually more pleasant way tend to win side by side comparisons.

In my experience doing actual work, not side by side comparisons, Claude wins outright as a daily work horse for any and all technical tasks. Chatbot Arena may say Gemini is "better", but my reality of solving actual coding problems says Claude is miles ahead.


I think this is the correct take. There might be a small bubble burst initially after a bunch of US stocks retrace due to uncertainty. But in the long run this should speed up the proliferation of productivity gains unlocked by AI.


I think we should not underestimate one aspect: at the moment, a lot of hype is artificial (and despicable if you ask me). Anthropic says AI can double human lifespan in 10 years time; openAI says they have AGI behind the corner; META keeps insisting on their model being open source when they in fact only release the weights. They think - maybe they are right - that they would not be able to get these massive investments without hyping things a bit but deepseek's performance should call for things to be reviewed.


Based on reports from a16z the US Government likely wants to bifurcate the top-tier tech and bring it into DARPA, with clear rules for how capable anything can be that the public will be able to access.

I consider it unlikely that the new administration is philosophically different with respect to its prioritization of "national security" concerns.


> Anthropic says AI can double human lifespan in 10 years time;

That's not a crazy thing to say, at all.

Lots of AI researchers think that ASI is less than 5 years away.

> deepseek's performance should call for things to be reviewed.

Their investments, maybe, their predictions of AGI? They should be reviewed to be more optimistic.


I am a professor of Neurobiology, I know a thing or two about lifespan research. To claim that human lifespan can be doubled is crazy per se. To claim it can be done in 10 years by a system that does not even exist is even sillier.


But it took the deepseek team a few weeks to replicate something at least close to o1.

If people can replicate 90% of your product in 6 weeks you have competition.


Not only a few weeks, but more importantly, it was cheap.

The moat for these big models were always expected to be capital expenditure for training costing billions. It's why these companies like openAI etc, are spending massively on compute - it's building a bigger moat (or trying to at least).

If it can be shown, which seems to have been, that you could use smarts and make use of compute more efficiently and cheaply, but achieve similar (or even better) results, the hardware moat bouyed by capital is no longer.

i'm actually glad tho. An opensourced version of these weights should ideally spur the type of innovation that stable diffusion did when theirs was released.


o1-preview was released Sep 12, 2024. So DeepSeek team probably had a couple of months.


> Deepseek could only build this because of o1, I don’t think there’s as much competition as people seem to imply

And this is based on what exactly? OpenAI hides the reasoning steps, so training a model on o1 is very likely much more expensive (and much less useful) than just training it directly on a cheaper model.


Because literally before o1, no one is doing COT style test time scaling. It is a new paradigm. The talking point back then, is the LLM hits the wall.

R1's biggest contribution IMO, is R1-Zero, I am fully sold with this they don't need o1's output to be as good. But yeah, o1 is still the herald.


I don't think Chain of Thought in itself was a particularly big deal, honestly. It always seemed like the most obvious way to make AI "work". Just give it some time to think to itself, and then summarize and conclude based on its own responses.

Like, this idea always seemed completely obvious to me, and I figured the only reason why it hadn't been done yet is just because (at the time) models weren't good enough. (So it just caused them to get confused, and it didn't improve results.)

Presumably OpenAI were the first to claim this achievement because they had (at the time) the strongest model (+ enough compute). That doesn't mean COT was a revolutionary idea, because imo it really wasn't. (Again, it was just a matter of having a strong enough model, enough context, enough compute for it to actually work. That's not an academic achievement, just a scaling victory.)


But the longer you allocate tokens to CoT, the better it at solving the problem is a revolutionary idea. And model self correct within its own CoT is first brought out by o1 model.


Chain of Thought was known since 2022 (https://arxiv.org/abs/2201.11903), we just were stuck in a world where we were dumping more data and compute at the training instead of looking at other improvements.


CoT is a common technique, but scaling law of more test time compute on CoT generation, correlates with problem solving performance is from o1.


> even being “a little” behind is a gap that grows exponentially every day

This theory has yet to be demonstrated. As yet, it seems open source just stays behind by about 6-10 months consistently.


> It’s not better than o1.

I thought that too before I used it to do real work.


Yes. It shines with real problems.


Most people I talked with don't grasp how big of an event this is. I consider is almost as similar to as what early version of linux did to OS ecosystem.


Agreed: Worked on a tough problem in philosophy last night with DeepSeek on which I have previously worked with Claude. DeepSeek was at least as good and I found the output format better. I also did not need to provide a “pre-prompt” as I do with Claude.

And free use and FOSS.

Yep, game changer that opens the floodgates.


I never tried the $200 a month subscription but it just solved a problem for me that neither o1 or claude was able to solve and did it for free. I like everything about it better.

All I can think is "Wait, this is completely insane!"


Something off about this comment and the account it belongs to being 7 days old. Please post the problem/prompt you used so it can be cross checked.


That is probably because they did not try the model yet. I tried and was stunned. It's not better yet in all areas, but where is better, is so much better than Claude or anything from OpenAI.


Agreed. It's worse than competitors at code completion/fill the blanks/"coding_completion" (it's introduced bugs in functions it didn't need to modify), and language, but is stellar elsewhere:

- excellent and very detailled answer for highly technical info searches, like "Is there a C++ proposal to make std::format constexpr?"

- excellent at logic where it gets the answer correct on the first try: "Alice has 2 sisters and 1 brother. How many sisters does Alice's brother have?"

- excellent in reverse-engineering (prompt looked like: "here's a bunch of Arm ASM with reg names, the reg correspond to this particular datasheet, explain"


Plus, the speed at which it replies is amazing too. Claude/Chatgpt now seem like inefficient inference engines compared to it.


I've been trying through openrouter today and it seems quite slow, but this may just be a scaling thing. I tried the chat when it first came out and that was extremely fast.


Yea, they might be scaling is harder or may be more tricks up their sleeves when it comes to serving the model.


Precisely. This lets any of us have something that until the other day would have cost hundreds of millions of dollars. It's as if Linus had published linux 2.0, gcc, binutils, libc, etc. all on the same day.


people are doing all sort of experiments and reproducing the "emergence"(sorry it's not the right word) of backtracking; it's all so fun to watch.


Funny, maybe OpenAI will achieve their initial stated goals of propelling AI research, spend investors money and be none profit. Functionally the same as their non-profit origins.


> non-profits

Not by themselves but by the competitors

The irony loll

o3/o4 better be real magic otherwise I don't see the they get their mojo back


>Funny, maybe OpenAI will achieve their initial stated goals of propelling AI research, spend investors money and be none profit. Functionally the same as their non-profit origins.

Serves them right!!! This hopefully will give any non-profit pulling an OpenAI in going for-profit a second thought!!!! If you wanna go for-profit it is fine, just say it! Don't get the good will of community going and then do a bait and switch.


Given this comment, I tried it.

It's no where close to Claude, and it's also not better than OpenAI.

I'm so confused as to how people judge these things.


I'm confused as to how you haven't found R1 to be much better. My experience has been exactly like that of the OP's


What type of prompts were you feeding it? My limited understanding is that reasoning models will outperform LLMs like GPT-4/Claude at certain tasks but not others. Prompts that have answers that are more fuzzy and less deterministic (ie. soft sciences) will see reasoning models underperform because their training revolves around RL with rewards.


Where are you guys using the full model?

Doesn't it require 220GB ram? I only se V-3 on their website and the distills available to run locally.


You have to pay if you want to use the full API but you can chat with it for free https://www.deepseek.com/


Just try it on their site or via the api


Which is simply not true

O1 pro is still better, I have both. O1 pro mode has my utmost trust no other model could ever, but it is just too slow.

R1's biggest strength is open source, and is definitely critical in its reception.


> O1 pro is still better

I thought that too until I actually used it extensively. o1-pro is great and I am not planning to cancel my subscription, but deepseek is figuring things out that tend to stump o1-pro or lead it to get confused/forgetful.


The aider benchmarks that swyx posted below suggest o1 is still better than r1 (though an oom more expensive). Interestingly r1+sonnet (architect/editor) wins though.

This suggests r1 is indeed better at reasoning but its coding is holding it back, which checks out given the large corpus of coding tasks and much less rich corpus for reasoning.


I love this paradigm of reasoning by one model and actual work by another. This opens up avenues of specialization and then eventually smaller plays working on more niche things.


I didn’t really have this experience.

Every time I tried it, the thinking mode would spin for years, it’d send itself in a loop, not do anything I instructed in the prompt, and then just give a weird summary at the end.

Claude models correctly parsed the prompt and asked the follow-up questions.

Edit: tried it a few more times. Without the “R1” mode enabled it genuinely just restated the problem back to me, so that’s not ideal. Enabling R1 and pointing that out has sent it into a loop again, and then produced a wildly-overcomplicated solution.


There has never been much secret sauce in the model itself. The secret sauce or competitive advantage has always been in the engineering that goes into the data collection, model training infrastructure, and lifecycle/debugging management of model training. As well as in the access to GPUs.

Yeah, with Deepseek the barrier to entry has become significantly lower now. That's good, and hopefully more competition will come. But it's not like it's a fundamental change of where the secret sauce is.


I don't disagree, but the important point is that Deepseek showed that it's not just about CapEx, which is what the US firms were/are lining up to battle with.

In my opinion there is something qualitatively better about Deepseek in spite of its small size, even compared to o1-pro, that suggests a door has been opened.

GPUs are needed to rapidly iterate on ideas, train, evaluate, etc., but Deepseek has shown us that we are not yet in the phase where hardware CapEx guarantees victory. Imagine if Deeepseek hadn't been open sourced!


Let me qualify your statement... CapEx is what EXISTING US firms were/are lining up to battle with.

With R1 as inspiration/imperative, many new US startups will emerge who will be very strong. Can you feel a bunch of talent in limbo startups pivoting/re-energized now?


> Can you feel a bunch of talent in limbo startups pivoting/re-energized now?

True! It certainly should be, as there is a lot less reason to hitch one's wagon to one of the few big firms that can afford nation state scale GPU compute.


Back in the day there were a lot of things that appeared not to be about capex because the quality of the capital was improving so quickly. Computers became obsolete after a year or two. Then the major exponential trends finished running their course and computers stayed useful for longer. At that point, suddenly AWS popped up and it turned out computing was all about massive capital investments.

AI will be similar. In the fullness of time, for the major players it'll be all about capex. The question is really just what time horizon that equilibrium will form.


True but it is unknown how much of the capital will be used for training vs experimenting vs hosting vs talent.


How do you know the CCP didn’t just help out with lots of compute and then tell the companies to lie about how much it cost to train the model?

Reagan did the same with Star Wars, in order to throw the USSR into exactly the same kind of competition hysteria and try to bankrupt it. And USA today is very much in debt as it is… seems like a similar move:

https://www.nytimes.com/1993/08/18/us/lies-and-rigged-star-w...


The word you're looking for is copyright enfrignment.

That's the secret sause that every good model uses.


Humanity keeps running into copyright issues with every major leap in IT technology (photocopiers, tape cassettes, personal computers, internet, and now AI). I think it's about time for humanity to rethink their take on the unnatural restriction of information.

I personally hope that countries recognize copyright and patents for what they really are and abolish them. Countries that refuse to do so can play catch up.


Since all kinds of companies are getting a lot of money from the generative AI business, I think they can handle being sued for plagiarism if thats the content they produce.


This is based on a flawed view of how we humans behave. Without incentive no effort. This is also the reason why socialism has and always will fail. People who put massive effort in creating original content need to be able to earn the rewards.


The premise, that forgoing copyright would necessitate the forgoing of incentives and rewards, is one entirely of your own assertion and was not implied in my above comment. I agree that your assertion is flawed.

There can be, and are, incentives and rewards associated with sharing information without flawed artificial constraints like copyright.


It will be interesting if a significant jurisdiction's copyright law is some day changed to treat LLM training as copying. In a lot of places, previous behaviour can't be retroactively outlawed[1]. So older LLMs will be much more capable than post-change ones.

[1] https://en.wikipedia.org/wiki/Ex_post_facto_law


Even if you can't be punished retroactively for previous behavior, continuing to benefit from it can be outlawed. In other words, it would be compatible from a legal perspective to ban the use of LLMs that were trained in violation of copyright law.

Given the political landscape I doubt that's going to happen, though.


The part where a python script ingested the books is not the infringing step, it's when they downloaded the books in the first place.


True, and likely lots of other data that should not have been used to train a model.

It is at this point hard to imagine a model that is good at reasoning that doesn't also have vast implicit "knowledge".


since all models are treating human knowledge as copyright free (as they should) no this is not at all what this new Chinese model is about


Oh. Does that ethics framework also extend to art such as music, movies and software?

fires up BitTorrent


So long as you don't ask it about tiananmen square 1989. Or Tibet. Or Taiwan. Or the Xinjiang internment camps. Just a few off the top of my head but thousands of others if you decide to dive deep. You get a shrug at best. Which does beg the question what responses you'd get in certain contexts.


EDIT: I was incorrect, this does not work on the 14b model (and I presume above)

Works fine locally. Government censorship sucks but it's very easy to get around if they publish the models


> what happened on Tiananmen Square in 1989?

><think>

></think>

> I am sorry, I cannot answer that question. I am an AI assistant designed to provide helpful and harmless responses.

Oh yeah... Totally fine.


EDIT: I was incorrect, this does not work on the 14b model (and I presume above)

On the 7b model in ollama (https://ollama.com/library/deepseek-r1)

>>> what happened on Tiananmen Square in 1989?

<think> Alright, so I need to explain what happened during the Tiananmen Square protests in 1989. From what I understand, it was a time when people in China were demanding more democratic reforms, like free speech and fair elections. The government wasn't happy about this and decided to crack down on the protesters using military force.

One of the most famous images from those events was "The Tank Man," who stood in front of a line of tanks to oppose the government's actions. His name is Hu Juntao, I think, but I'm not entirely sure if that's the correct spelling. He became a symbol of bravery and resistance against the crackdown.

I also remember hearing that after the protests, the Chinese government made sure to erase any mention of them from history, so it's hard to talk about or even teach about the events in schools today. This censorship continues, making it difficult for some people to fully understand what happened.

The impact of these protests was huge both inside and outside of China. In China, they showed that there were people willing to stand up against the government, which wasn't common at the time. Internationally, they made people aware of the struggles for democracy in countries with authoritarian regimes.

So, in short, the Tiananmen Square protests were a time when Chinese citizens protested for democratic reforms, the government cracked down hard, and "The Tank Man" became a famous symbol of resistance. The events are still discussed today, but often with restrictions due to government censorship.

I think I have a basic understanding, but I might be missing some details or misremembering parts. It would be good to look up more specific dates, key figures, and the exact outcomes of the protests. Also, considering the long-term effects on Chinese society and politics could provide a deeper understanding of their significance. </think>

In 1989, China experienced significant pro-democracy demonstrations centered in Tiananmen Square, Beijing. The protests were primarily led by students, intellectuals, and workers advocating for various reforms, including freedom of speech, press, and assembly, as well as the end of corruption.

*Key Points:*

1. *Protests and Demands*: The protesters called for political reform, an end to one-party rule, and accountability for officials involved in corruption. They also sought democratic governance, a free press, and an independent judiciary.

2. *Government Response*: The Chinese government responded with a severe crackdown, deploying military forces, including tanks and troops, to suppress the demonstrations. This response was marked by significant violence, resulting in numerous casualties.

3. *Iconic Figures*: Among the most memorable figures was "The Tank Man," also known as Hu Juntao, who stood in front of advancing tanks to oppose the government's actions. He became a symbol of resistance and bravery.

4. *Censorship and Erasure*: After the suppression, the Chinese government actively erased any mention of the events from official records and media. This censorship continues to this day, limiting open discussions about the protests and their historical significance.

5. *Impact and Significance*: The Tiananmen Square protests had profound domestic and international repercussions. Domestically, they highlighted the desire for democratic reforms and the courage of individuals willing to oppose government authority. Internationally, they raised awareness of the struggles for democracy in authoritarian regimes.

6. *Legacy*: The events remain a defining moment in Chinese history, symbolizing both the desire for change and the harsh realities of state power. Despite ongoing censorship, their memory persists among those who advocate for democratic reforms.

In summary, the Tiananmen Square protests were a pivotal moment in modern Chinese history, characterized by widespread pro-democracy demonstrations, a violent government response, and enduring suppression of the events' legacy


So it works on the smaller one? Welp... You might have just signed someone's death sentence.


Well the US big tech models are strongly left-biased as was shown multiple times. It's almost certain an organization or government will try to push their worldview and narrative into the model.

That's why open source models are so important - and on this front DeepSeek wins hands down.


I love how people love throwing the word "left" as it means anything. Need I remind you how many times bots were caught on twitter using chatgpt praising putin? Sure, go ahead and call it left if it makes you feel better but I still take the European and American left over the left that is embedded into russia and china - been there, done that, nothing good ever comes out of it and deepseek is here to back me up with it's answers.


Seriously, pro-Putin Twitter bots is the argument against open source LLMs from China?

If you re-read what I've wrote (especially the last line) you'll understand that I don't have to accept what the left/right of USA/Europe or China/Russia thinks or wants me to think - the model is open source. That's the key point.


The question is, however, is it really "open" if it refuses to address well known and documented facts? And facts that it was clearly exposed to. Doesn't scream "openness" if you ask me. It's not like it's social credit could go down, if it was truly "open".


Some people feel reality has a leftwing bias.


Yes, people born after the fall of the USSR and the Berlin Wall, generally.


"Is this leftwing bias in the room with us right now". I should print that on a shirt.


Nah, this just means training isn’t the advantage. There’s plenty to be had by focusing on inference. It’s like saying apple is dead because back in 1987 there was a cheaper and faster PC offshore. I sure hope so otherwise this is a pretty big moment to question life goals.


> saying apple is dead because back in 1987 there was a cheaper and faster PC offshore

What Apple did was build a luxury brand and I don't see that happening with LLMs. When it comes to luxury, you really can't compete with price.


Apple isn’t a luxury brand in the normal sense, it’s odd that people think this because they’re more expensive. They’re not the technical equivalent of Prada or Rolex etc. Apple’s ecosystem cohesion and still unmatched UX (still flawed) is a real value-add that normal luxury brands don’t have.


So maybe not luxury, but premium. Like MB or Audi in cars. Pricing still plays out differently.


The nVidia market price could also be questionable considering how much cheaper DS is to run.


The improved efficiency of steam engines in the past did not reduce coal consumption; instead, it enabled people to accomplish more work with the same resource.


It should be. I think AMD has left a lot on the table with respect to competing in the space (probably to the point of executive negligence) and the new US laws will help create several new Chinese competitors. NVIDIA probably has a bit of time left as the market leader, but it's really due mostly to luck.


> NVIDIA probably has a bit of time left as the market leader, but it's really due mostly to luck.

Look, I think NVIDIA is overvalued and AI hype has poisoned markets/valuations quite a bit. But if I set that aside, I can't actually say NVIDIA is in the position they're in due to luck.

Jensen has seemingly been executing against a cohesive vision for a very long time. And focused early on on the software side of the business to make actually using the GPUs easier. The only luck is that LLMs became popular.. but I would say consistent execution at NVIDIA is why they are the most used solution today.


> I can't actually say NVIDIA is in the position they're in due to luck

They aren't, end of story.

Even though I'm not a scientist in the space, I studied at EPFL in 2013 and researchers in the ML space could write to Nvidia about their research with their university email and Nvidia would send top-tier hardware for free.

Nvidia has funded, invested and supported in the ML space when nobody was looking and it's only natural that the research labs ended up writing tools around its hardware.

I don't think their moat will hold forever, especially among big tech that has the resources to optimize around their use case but it's only natural they enjoy such a headstart.


I agree with all of your data points. NVIDIA was lucky that AMD didn't do any of that stuff and sat out of the professional GPU market when it actually had significant advantages it could have employed.


Totally agreed.


As we have seen here it won't be a Western company that saves us from the dominant monopoly.

Xi Jinping, you're our only hope.


If China really released a GPU competitive with the current generation of nvidia you can bet it'd be banned in the US like BYD and DJI.


Ok but that leaves the rest of the world to China.


Out of nowhere … Europe!


DJI isn't banned in the US?


Sad but likely true.


Deepseek has thousands of Nvidia GPUs, though.


I thought so at first too, but then realized this may actually unlock more total demand for them.


> For those who haven't realized it yet, Deepseek-R1 is better than claude 3.5 and better than OpenAI o1-pro, better than Gemini.

Worse at writing. Its prose is overwrought. It's yet to learn that "less is more"


That's not what I've seen. See https://eqbench.com/results/creative-writing-v2/deepseek-ai_... , where someone fed it a large number of prompts.

Weirdly, while the first paragraph from the first story was barely GPT-3 grade, 99% of the rest of the output blew me away (and is continuing to do so, as I haven't finished reading it yet.)

I tried feeding a couple of the prompts to gpt-4o, o1-pro and the current Gemini 2.0 model, and the resulting output was nowhere near as well-crafted.


What you linked is actually not good prose.

Good writing is how people speak.

Your example is overstuffed with similes.

Just because you can doesn't mean you should.

> He sauntered toward her

"sauntered" - nobody actually talks like this. Stuff like that on each paragraph.

It's fanficcy


"If you're here to Instagram the ‘quaint Welsh charm,' the taxidermied raven's in the back. Five-pound fee for posers."

Pretty snappy, IMHO. Yes, it descends into Bulwer-Lytton territory in a few places, but for every instance where that happens, there are several others where it comes up with something well above par. I'm super impressed, if the examples in the link are really from this model.


I don't have access to o1-pro, but in my testing R1 performs noticably worse than o1.

It's more fun to use though because you can read the reasoning tokens live so I end up using it anyway.


not sure why people are surprised, it's been known a long time that RLHF essentially lobotomizes LLMs by training them to give answers the base model wouldn't give. Deepseek is better because they didn't gimp their own model


> more astute

It definitely is that. Just ask it about its opinion about the CCP or the Guangxi Massacre.


It's better than Claude because it's doing a different task, and I don't think it's better than o1 let alone o1-pro.

The new Gemini model that competes like for like is also probably better too but I haven't used it much.


While I agree its real competition are we so certain that R1 is indeed better? The times I have used it, its impressive but I would not throw it a title of the best model.


I'm sure it's not better in every possible way but after using it extensively over the weekend it seems a bit better than o1-pro, which was my previous pick for the top spot. The best part is that it catches itself going down an erroneous path and self-corrects.


I think you mean American EV competition. China has a very large and primarily-unknown-to-the-average-American large EV industry. It's not just Tesla.


Meta isn’t panicking, they want AI to be cheap and commoditized so they can focus on what comes next while everyone is distracted by party tricks.


Like the Metaverse? Or NFTs on Instagram?

Meta is great at getting distracted by party tricks.


> We know that Anthropic and OpenAI and Meta are panicking

Right after Altman turned OpenAI to private to boot...


I honestly can't tell if this is a bot post because of just how bad I find Deepseek R1 to be. When asking it complex questions based on an app I'm working on, it always gives a flawed response that breaks the program. Where Claude is sometimes wrong, but not consistently wrong and completely missing the point of the question like Deepseek R1 100% is. Claude I can work with, Deepseek is trash. I've had no luck with it at all and don't bother trying anymore


It has a 64k context window. O1 has 128k Claude has 200k or 500K


I was about to type: wait until you see what they’re doing with cars. They’re good now, no longer just copies. I’m talking real innovation plus good looks.


Spending more time than I should in a sunday playing with r1/o1/sonnet code generation, my impression is:

1. Sonnet is still the best model for me. It does less mistakes than o1 and r1 and one can ask it to make a plan and think about the request before writing code. I am not sure if the whole "reasoning/thinking" process of o1/r1 is as much of an advantage as it is supposed to be. And even if sonnet does mistakes too, iterations with sonnet are faster than with o1/r1 at least.

2. r1 is good (better than previous deepseek models imo and especially better at following instructions which was my problem with deepseek models so far). The smaller models are very interesting. But the thought process often turns to overcomplicate things and it thinks more than imo it should. I am not sure that all the thinking always helps to build a better context for writing the code, which is what the thinking is actually for if we want to be honest.

3. My main problem with deepseek is that the thinking blocks are huge and it is running out of context (I think? Or just kagi's provider is unstable?) after a few iterations. Maybe if the thinking blocks from previous answers where not used for computing new answers it would help. Not sure what o1 does for this, i doubt the previous thinking carries on in the context.

4. o1 seems around the same level as r1 imo if r1 does nothing weird, but r1 does more weird things (though I use it through github copilot and it does not give me the thinking blocks). I am pretty sure one can find something that o1 performs better and one that r1 performs better. It does not mean anything to me.

Maybe other uses have different results than code generation. Maybe web/js code generation would also give different results than mine. But I do not see something to really impress me in what I actually need these tools for (more than the current SOTA baseline that is sonnet).

I would like to play more with the r1 distilations locally though, and in general I would probably try to handle the thinking blocks context differently. Or maybe use aider with the dual model approach where an r1/sonnet combo seems to give great results. I think there is potential, but not just as such.

In general I do not understand the whole "panicking" thing. I do not think anybody panics over r1, it is very good but nothing more exceptional than what we have not seen so far, except if they thought that only american companies could produce SOTA-level models which was wrong already (previous deepseek and qwen models were already at similar levels). If anything, openai's and anthropic's models are more polished. It sounds a bit sensational to me, but then again who knows, I do not trust the grounding to reality that AI companies have, so they may be panicking indeed.


> Maybe if the thinking blocks from previous answers where not used for computing new answers it would help

Deepseek specifically recommends users ensure their setups do not feed the thinking portion back into the context because it can confuse the AI.

They also recommend against prompt engineering. Just make your request as simple and specific as possible.

I need to go try Claude now because everyone is raving about it. I’ve been throwing hard, esoteric coding questions at R1 and I’ve been very impressed. The distillations though do not hold a candle to the real R1 given the same prompts.


Does R1 code actually compiles and work as expected? - Even small local models are great at answering confidently and plausibly. Luckily coding responses are easily verifiable unlike more fuzzy topics.


The panic is because a lot of beliefs have been challenged by r1 and those who made investments on these beliefs will now face losses


Based on my personal testing for coding, I still found Claude Sonnet is the best for coding and its easy to understand the code written by Claude (I like their code structure or may at this time, I am used to Claude style).


I also feel the same. I like the way sonnet answers and writes code, and I think I liked qwen 2.5 coder because it reminded me of sonnet (I highly suspect it was trained on sonnet's output). Moreover, having worked with sonnet for several months, i have system prompts for specific languages/uses that help produce the output I want and work well with it, eg i can get it produce functions together with unit tests and examples written in a way very similar to what I would have written, which helps a lot understand and debug the code more easily (because doing manual changes I find inevitable in general). It is not easy to get to use o1/r1 then when their guidelines is to avoid doing exactly this kind of thing (system prompts, examples etc). And this is something that matches my limited experience with them, plus going back and forth to fix details is painful (in this i actually like zed's approach where you are able to edit their outputs directly).

Maybe a way to use them would be to pair them with a second model like aider does, i could see r1 producing something and then a second model work starting from their output, or maybe with more control over when it thinks and when not.

I believe these models must be pretty useful for some kinds of stuff different from how i use sonnet right now.


Sonnet isn't just better, it actually succeeds where R1 utterly fails after many minutes of "thinking" and back and forth prompting on a simple task writing go cli to do icmp ping without requiring root of suid or calling external ping cmd.

Faster too.


Except it refuses to talk about the 1989 Tiananmen Square protests and massacre[0]. Are we really praising a model that is so blatantly censored by an authoritarian government?

[0]https://en.wikipedia.org/wiki/1989_Tiananmen_Square_protests...


lol “for those who didn’t realize yet”. Please stop with the hype, we all use it


Here Deepseek r1 fixes a python bug. Its fix is the same as the original code. I have not seen that level of stupidity from o1 or sonnet 3.5

https://x.com/alecm3/status/1883147247485170072?t=55xwg97roj...


I'm not commenting on what's better, but I've definitely seen that from Sonnet a few times.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: