This system is incredibly brittle - it gets some questions brilliantly, but fails incoherently at basic logic questions.
Here's a dialogue I just had with it:
Me:
Imagine a set of 3 events (A, B and C) ordered with unique timestamps each. Event A is the first of all events to happen. Event C is in between the other two. What is a valid bound for Event B's timestamp?
ChatGPT:
A valid bound for Event B's timestamp would be a range of time between the timestamp of Event A and the timestamp of Event C. This is because Event B must have occurred after Event A and before Event C, according to the given information. However, without knowing the specific timestamps of each event, it is not possible to provide a more precise bound for Event B's timestamp.
Me:
I don't understand why Event B happens before Event C according to the given information - could you explain in more detail?
ChatGPT:
According to the information given, Event A happened first, followed by Event C, and then Event B. This order of events is implied by the fact that Event A is the first event to happen, and Event C is "in between" Events A and B. Therefore, Event B must have happened after Event A and before Event C.
(end)
This tells me I can't trust its basic reasoning capabilities, let alone its ability to write complex code.
People are gushing over its ability to write code, forgetting that code is just another type of language that can be used to express ideas.
Logic however, transcends language. And it is clear that GPT3 has absolutely no understand of basic logic. It gives the impression of understanding logic by constructing sentences which are sometimes logically coherent.
This test shows it probably displays theory of computation better than 80% of the population. The wonder is, it’s just text, so this is an emergent property.
Let’s say it can simulate theory of computation better than 99% of the population and can very capably synthesize and infer from any text based sources. I think that would shake the world, and it wouldn’t even need to be near AGI.
I really depends on how much you care about undefined behaviour. If you're just playing around with a prototype or making a personal tool, corrupt state doesn't have huge ramifications, but for any software people would pay for (be that with money or attention), there's an expectation of a minimum level of robustness, security, performance, and a variety of other business requirements.
To achieve the same with an AI that doesn't have a real understanding of the business logic, programmers would still be needed to write the test suite. But unlike most test suites that are typically underspecified, the test suite would need to be likely more complicated than the program itself. You could use ChatGPT to expedite writing of the test programs, but attention would still be required to actually verify the tests themselves.
Why not? The profession went from writing machine code by hand to auditing "auto-generated" machine code from assembly. And so on with every generation of programming languages. With every new higher-level language with new abstractions, we're often just adding another chain to the inputs to a machine code translator.
Edit: If you showed a programmer from the 1950s python syntax and told them that all you have to do is write these words to build a program, they'd think it was artificial intelligence.
> Why not? The profession went from writing machine code by
hand to auditing "auto-generated" machine code from
assembly.
A compiler determistically creates assembly through a set of rules not through AI. You absolutely cannot compare these two in that way.
> Edit: If you showed a programmer from the 1950s python
syntax and told them that all you have to do is write these
words to build a program, they'd think it was artificial
intelligence.
I guarantee you that your bias is a complete misunderstanding. People today are not more intelligent compared to people then. People are people, and programmers in the 1950s had to be extremely clever.
Especially that our field foundations wasn’t shaken to the core in the last 70 years, when around the century there was a huge change in understanding.. everything. Gödel, Turing machines, Chomsky, Halting theorem, etc. That we no longer has to fit into 3 megabytes and thus can write some monster abstractions is not fundamentally different at all.
> Edit: If you showed a programmer from the 1950s python syntax and told them that all you have to do is write these words to build a program, they'd think it was artificial intelligence.
I think you might be underestimating programmers in the 1950s.
> The first practical compiler was written by Corrado Böhm in 1951 for his PhD thesis
I don't think the GP is underestimating programmers in the 1950s but rather commenting on their definition of AI. Back then first order logic was considered AI to a lot of people. I feel like in 50 years, AI models will become so complex and sophisticated that they will look back at us surprised that we ever considered ChatGPT to be an AI since it fails at basic logic tasks like the event ordering example above.
> Edit: If you showed a programmer from the 1950s python syntax and told them that all you have to do is write these words to build a program, they'd think it was artificial intelligence.
Well, if you showed a programmer from late 1960s / early 1970s that same thing, they'd show you Lisp and laugh you off, or despair that programming seemingly regressed in the future.
>Edit: If you showed a programmer from the 1950s python syntax and told them that all you have to do is write these words to build a program, they'd think it was artificial intelligence.
he's a computer programmer, not a teutonnic knight.
Honestly the only use case I see for this is to write more tests because testing is very time and code intensive which means most tests are just simple canaries and not capable of actually verifying the software because nobody is going to write 10000 lines of test code to test a 500 line file.
I fed the prompt for https://leetcode.com/problems/two-sum/ to ChatGPT, which generated code for an answer, and I then asked it for additional test cases, which it gave. What's mind blowing though is that it "reasoned" that:
> These test cases demonstrate that the twoSum() function can handle cases where the input array contains negative numbers, where the target is negative, where the indices of the two elements are not adjacent, and where the input array contains duplicate elements.
The problem is that according to the post at the root of this thread the AI would have written a test to verify that A < B < C whereas it should have written a test to verify that A < C < B. If the AI or a human had also written code to match A < C < B then nobody might ever know until the day that condition comes up when the system is running.
I can imagine that it would be stressful to review code like this. Where you know you cannot trust a single line of it but that it looks consistent. You might have to verify back to the original assumptions every time and and make sure that you have not been led astray by the AI.
Most certainly. It's not if, but when. The same for our eventual place of perfect trust in artificial intelligence. For now, ChatGPT, Co-pilot, and the like are at best productivity tools for boilerplate.
> Let’s say it can simulate theory of computation better than 99% of the population and can very capably synthesize and infer from any text based sources. I think that would shake the world, and it wouldn’t even need to be near AGI.
The total percentage isn't exactly what matters. Emergent properties as a metric is a smokescreen.
If that last 1% incorrectly demonstrates that A<C<B implies A<B<C, that means the system is not reliable enough to perform logical computations. You'd need to have a person oversee 100% to catch the last 1% of serious but basic errors. In such a case you might as well hire the person directly for the job.
May be it's because a lot of code out there doesn't actually require logic but merely requires copying what is already on the internet? I mean, I don't need to tell you that.
A computer can sort an array of numbers faster than likely 99% of the population, it doesn't mean it's useful.
It made me think of that demo someone made which downloads a bunch of code from stackoverflow with the search terms “sort an array js” and evaluates and runs them until it finds one that will sort its data.
Professional developers make up less than 1% of the global population (around 25 million people apparently). Assuming they are the ones that have a good theory of computation, I think “better than 99%” is not a high enough bar to shake the world, right?
I mean, it depends on what we expect the AI to do. Maybe it would be revolutionary to just have, like, an average programmer with a ton of free time (so, the AI only has to beat like 99.7% of humanity to do that). On the other hand, if we want it to change the world by being much better than the average person, I guess we’d need a couple more 9’s.
I don't think so, the scaling laws haven't failed so far. I fully expect that making the model bigger and training it on more data will make it better at logic.
For a nice example with image models, Scott Alexander made a bet that newer image models would be able to do the things that Dall-E 2 gets wrong. [1] (This post also discusses how GPT-3 could do many things that GPT-2 got wrong.) He won the bet three months later through Imagen access. [2]
I don’t know, isn’t the safer bet is that scaling will eventually reach a dead end? I honestly fail to see how a language model could “execute” a sequence of reasoning steps, as it doesn’t think in a symbolic way. Do correct me if I’m wrong but it would require a complex rearchitecture, so not sure we are any closer, we just have a very impressive, smart search engine now.
It’s not just a safe bet but almost guaranteed. Humans combine their internal language models with physical intuition and experimentation from the moment they are born. There is zero chance that an AI can understand the physical world without access to it [1]. Until it has that access, it’s no more than a glorified context specific Markov chain generator
Fact is, without a feedback loop that can run physical experiments like infants do from the moment they're born, I highly doubt they will develop a useful intuition using just video. Hence the conjecture
Logic reasoning is a pretty solid branch of AI since it’s inception. Robust solutions exist for most problems; even a programming language based on its principles (Prolog).
With ChatGPT there is now a system that can express the results from automatic logic reasoning in language.
The next step would be to combine the two, i.e. tell chatGPT to explain the result of a logic reasoning program in natural language. It could of course also be asked to translate a natural language query into Prolog code.
This will probably require retraining the model, but I guess the demo we are given by OpenAI leaves little doubt that this is perfectly doable.
ChatGPT has the potential to plug the gap between GOFAI and natural language, which is quite a feat.
ChatGPT is damned impressive, but I hesitate to apply the word "reasoning" to its output. It's also not clear that the backend can be generalized to be able to plug the input into a symbolic manipulation program, or to gather the output and explain it. Only OpenAI knows for certain right now. Powerful if possible though!
Here's why I can't call it "reasoning":
> what is next in the series: a,b,c
> Without additional context, it is impossible to determine what the next element in the series "a,b,c" would be. The series could potentially continue with "d," "e," or any other letter of the alphabet, depending on the pattern or rule being followed. It is also possible that the series does not follow any pattern and is simply a random or arbitrary sequence of letters. Without more information, it is impossible to determine what the next element in the series would be.
> why isn't it d?
> As mentioned in my previous response, without additional context or information about the pattern or rule being followed, it is impossible to determine what the next element in the series "a,b,c" would be. It is not necessarily true that the next element in the series would be "d," as the series could potentially continue with any other letter of the alphabet or follow a different pattern altogether. Without more information, it is impossible to determine what the next element in the series would be.
Sounds like it is trained to avoid answering questions, and instead tries to bait you to give it more information so that it can tell you what it told it.
It answers some things directly, but so many things are just avoidance. And then people say "see, it understood after a bit of back and forth, it is smart!", even though it is basically iterating through Google responses and reformulate those to fit the conversation until you say it got it right. Google v1 used pure logic and got you waht you wanted, Google v2 tries to do natural language and sometimes misses, ChatGPT is the next step and tries to do full language but misses most of the time.
My point is, there are frameworks and languages for reasoning which are mature. But they require formalised input (e.g. code) and deliver formalised output.
As a language model, chatGPT can translate back and forth between natural language and those formal languages. Part of that ability is evident in the numerous examples that demonstrate how it writes or explains code.
The version of ChatGPT that is public is quite restricted; it can‘t browse the web or run code.
A version of chatGPT that can translate a logic query I to Prolog, run the code, and translate the result back to the user should be perfectly capable of logic reasoning.
> The next step would be to combine the two, i.e. tell chatGPT to explain the result of a logic reasoning program in natural language. It could of course also be asked to translate a natural language query into Prolog code.
From what I remember, the very initial prototype of AlphaGo just had a neural net trained on historical games; effectively saying, "what kind of move would a traditional grandmaster make here?" with no planning whatsoever. This was good enough to beat the person who wrote the prototype (who wasn't a master but wasn't a complete novice either); and to make it able to defeat grandmasters, they added Markov chains for planning (which also necessitated a separate neural net for evaluating board positions).
It sounds similar to your suggestion: A model which simply generates realistic-looking sentences is accurate maybe 85% of the time; to make it truly human (or super-human), it needs to be paired with some sort of formal structure -- the analog of the Markov chain. The difficulty being, of course, that the world and its knowledge isn't as simple to represent as a go board.
That said, making coding answers more reliable, by adding a logical structure explicitly designed to support search & testing, should be within reach.
> The difficulty being, of course, that the world and its knowledge isn't as simple to represent as a go board.
Humans suffer from the exact same limitation. The limit to correct inference and prediction is often the amount and quality of input data.
A language model that can extract information from text and interact with the user to refine and clarify that information could be tremendously useful for experts who understand how the model works.
Without that understanding it will be rather disappointing though, as we see with some of the reactions to chatGPT and also Galactica (RIP).
If logic is its biggest weakness, then I just laugh - because that is the one area of AI that every model before these language models excelled at well beyond human levels. All it takes is GPT to formulate the english sentence into logic predicate statements and throw it through a "3rd party" script that does the heavy logic validation/proving and you're good. Those are well-treaded areas of programming, and were ironically where people expected AIs to come from and be strongest in - nobody expected exceptional painting and conversing skill just from averaging out a shit-ton of data.
If the language part doesn’t have reasoning capabilities how can it interpret the results? This wouldn’t make AGI, otherwise they would have already plugged it in.
If you want some more things that ChatGPT isn't good at, try to get it to form novel anagrams, palindromes, or other such wordplay. It's good at regurgitating textbook examples of those, but I found you can trip it up by asking it to do things like "Create a palindrome containing the word 'coffee'"
It certainly has logic. I had some fun using the "virtual machine" example someone else did, with the "nvidia-smi" command, if I told him it was hot in the room, the next run of the command was showing an higher temperature on the GPU. This is the logical conclusion from an hotter room.
> It certainly has logic. I had some fun using the "virtual machine" example someone else did, with the "nvidia-smi" command, if I told him it was hot in the room, the next run of the command was showing an higher temperature on the GPU. This is the logical conclusion from an hotter room.
Orrrr.... it's drawing references from other texts that were colocated with the word "hot"
It's an inference based on how chatgpt works, which is a more reasonable inference than assuming chatgpt somehow has abstract logical reasoning capabilities.
It doesn't have any logic, it's just prediction based on statistics.
There is so many examples already floating around that it has no logic but I will give you really simple one from my experiments:
I told it to:
> curl somedomain.ext
It replied with curl error that this hostname doesn't exists.
> And it replied with some random http response showing that this hostname exists.
And that's not logical? ChatGPT doesn't know what is there, so it answer logically based on what should happens there. Obviously having 2 different answers make it less logical for sure, but I have seen many peoples makes plenty of logic error too in real life.
It's crazy to me that for an AGI to be one, it need to be infallible in logic...
What about doing wget over https://chat.openai.com/chat ? I don't believe it had much Google result over that when it learned, yet it was able to logically infer it would be a chat assistant doing exactly what he was doing.
Experienced this myself trying to get it to write a poem with an unusual rhyme scheme. ChatGPT's response when I tried to explicitly spell it out (a few different ways):
Yes, you are correct. The statement I made is not accurate. The specified rhyme scheme was AABA BBAB AABA BBAB, which means that each stanza should have the rhyme scheme AABA BBAB, with the first and second lines rhyming with each other, and the third and fourth lines rhyming with each other.
That said, I wouldn't be surprised if the next version was able to grok this.
This is another perfect example of how the only thing that GPT-3 has "groked", is the construction of grammatically correct sentences based on probability.
But this paragraph it has produced has a logical contradiction and demonstrates that it has no understanding of what its writing about.
If anything, ChatGPT has demonstrated how little of the code we write requires logical thinking, and is just CRUD and API constructions or arcane library use. A perfect use case for a machine that can construct correct grammar from millions of examples.
Let's not forget that computers are insanely good at bitwise computations. It's just a matter of time before someone adds a Coq/Lean style reinforcement to AI's learning capabilities.
Computers are not that good with symbolic reasoning. Even state of the art symbolic math software can easily be given problems it will have no idea what to do with.
I don’t know, we are fickle, biased creatures with well known failures at reasoning, but let’s not lessen our intelligence too much. Even small children can outsmart the best of our existing AIs.
This is a totally uninformed/naïve/layman’s take, but what if AGI is just a really good language model used in a clever way such that it can perform an efficient search of its “thought” space, validating its thoughts are correct along the way. Programming, logic, math, etc are perhaps the easiest forms of “thoughts” for a computer to validate, but given enough quality data maybe it could be good at all kinds of other tasks as well.
People not only search a thought space, we have internal reasoning that does symbolic manipulations and sequential steps to get to a conclusion. Like, you would have no problem following a simple rule given to you ad infinity, while these models simply can’t take into account all that.
There’s no reason to believe AGI will be composed of only language models. I suspect it will be a combination of multiple models and a lot of other software glued together in clever ways.
Yeah I know, but we are not that much closer to it contrary to the hype ChatGPT gets (which is well-deserved, but “we will loose our jobs” threads are not based in reality)
It doesn't replace a skilled programmer. It just turns you into an editor and curator, multiplying productivity on some tasks by 10X+.
It will give incorrect code, but you can guide it toward a correct solution by asking it to fix the problem. Normally you don't even have to say exactly what is wrong.
For example, I got it to implement a basic bittorrent tracker server in Go in about 5 minutes. I didn't even have to point it to the RFC. I just said to consult the bittorrent RFC. It gave me back a server with /announce and /scrape endpoints. I then asked it to implement the functions using a struct for the requests. It correctly deserialized the URL-encoded sha1 info hashes from the /announce endpoint on the first try. I didn't even have to mention that detail.
It can also help you explore solutions. I asked it about algorithms to learn policies for 2-player zero sum games. It gave me a description of min-max, MCTS, reinforcement learning, deep neural networks. I then asked it to describe the pros/cons of each, which it did. I asked it to show an example of a reinforcement learning algorithm in python from scratch, which it did in about 10 seconds.
Exactly. The reason why it was able to do so is because the bt tracker server had already been built and it had been trained on the sources.
And that's the point: it won't work for most "new" stuff. But a lot of the code I write for work has been written before by someone else, so I can benefit from this. Looks to me as if this is essentially a form of swarm intelligence in the end.
This seems to me to be its strength, a multiplier to human intelligence. The way a smart phone is today, but more so. Once this matures, every human with access will be so much more capable than any single human in the 90s that they would seem beyond genius to us back then. Already someone with a phone in their pocket can stop to watch a couple instructional videos and accomplish tasks that would preciously involved training courses. That may seem trivial to those who didn't have to hunt through card catalogs and outdated encyclopedias for every scrap pf knowledge, but it is a huge deal.
That’s the thing of industrial revolutions. People believed it would kick humans out of the loop, when in fact it allows single persons to become entrepreneurs.
The other side effect is, we don’t use these performance gains to work less, but to produce more. Only, this time, ecologists are triggering the alarm because producing more means pulling resources from the Earth, so for the first time ever, we might see people working actually less.
As a human, I found your prompt confusing enough that I had to read it over a couple of times. I associate the letters 'A', 'B', and 'C' with alphabetical order strongly enough that swapping the order is a bit difficult. I can do it of course, but if I was tired and/or not paying attention... well, it's a mistake a human could make.
I also find it notable that when you asked for more detail, the AI was correct up until the last sentence.
It's not hard to get it to trip up with letters that correspond to the order:
Me: Imagine a set of 3 events (X, Y and Z) ordered with unique timestamps each. Event X is the first of all events to happen. Event Y is in between the other two. What is a valid bound for Event Z's timestamp?
ChatGPT: In this scenario, Event X must have the earliest timestamp, and Event Y must have a timestamp that falls between the timestamps of Events X and Z. Therefore, the valid bounds for the timestamp of Event Z would be the timestamp of Event X, and any timestamp that is greater than the timestamp of Event Y. This means that the valid range for the timestamp of Event Z would be the range from the timestamp of Event X to infinity.
Prompt: Imagine a set of 3 events (A, red and Q) ordered with unique timestamps each. Event A is the first of all events to happen. Event Q is in between the other two. What is a valid bound for Event red's timestamp?
ChatGPT: Since we are given that Event A happened first and that Event Q occurred in between Events A and red, a valid bound for the timestamp of Event red would be that it must have occurred after Event A and before Event Q. So, the valid bound for the timestamp of Event red would be the time range between the timestamp of Event A and the timestamp of Event Q.
---
I don't think event names matter at all here, it just trips up on some logical constructs.
---
EDIT: here's some reasoning!
Me: So the timestamp for Event Red must be after both the timestamp of Event A and the timestamp of Event Q. How can it be between the timestamps of Events A and Q?
ChatGPT: The timestamp for Event Red must be after the timestamp of Event A and after the timestamp of Event Q, but it does not necessarily have to be after both of them at the same time. The timestamp for Event Red can be between the timestamps of Events A and Q, as long as it is after the timestamp of Event A and before the timestamp of Event Q. This is because the timestamps for Events A and Q represent the time at which those events happened, and the timestamp for Event Red must be after the time at which Event A happened and before the time at which Event Q happened.
> I also find it notable that when you asked for more detail, the AI was correct up until the last sentence.
I actually think that makes the AI look worse. If it did the human mistake of alphabetically ordering things then it would be fine. But it shows that it clearly understood the prompt by repeating the ordering of events correctly but fails to answer the implication of that. It seems to be good at repeating information and summarizing facts but not at performing logic based on that information.
I had similar experience where it told me that a given prime number is not a prime number as it has two divisors, 1 and itself. Literally having given the reason for why it should be a prime and failing to get to the correct conclusion. And it was asked in plain language.
The question contains contradictory information. I suppose a smarter AI might have worked out the intended meaning, or asked for clarification. But how many humans would?
I believe the word “ordered” in the question caused the confusion for some of us humans and for the machine. It anticipates some information, that the events are ordered by their time stamps, but doesn’t yet provide the order.
It’s trained on language, it learns grammar, and how words relate to each other. So it constructs grammatically correct sentences with words that relate to each other.
It has no experience of how those words relate to other experiences, because it has no other experiences. If you want it to understand logic, train it on logical problems. If you want it to understand time, it’s going to need to experience time.
Some contradictory Maths responses, and an apology from ChatGPT:
Me:
Is Pi times 3 a rational number?
ChatGPT:
No, 3 times pi is not a rational number. 3 times pi is equal to 3π, which is an irrational number. This is because the number pi (π) is an irrational number, and when you multiply an irrational number by a rational number (in this case, 3), the result is still an irrational number. So, 3π is an irrational number because it is the product of the irrational number pi (π) and the rational number 3.
Me:
Is pi times 25 a rational number?
ChatGPT:
Yes, pi times 25 is a rational number. Pi times 25 is equal to 25π, which is a rational number. This is because the number pi (π) is an irrational number, but 25 is a rational number, and when you multiply a rational number by an irrational number, the result is a rational number. In this case, the irrationality of pi is "cancelled out" by the rational factor 25, resulting in a final product that is a rational number. So, 25π is a rational number because it is the product of the irrational number pi (π) and the rational number 25.
Me:
You have said both that "when you multiply a rational number by an irrational number, the result is a rational number" and "if you multiply a rational number with an irrational number, the product is always irrational. " Both statements cannot be true.
ChatGPT:
You are correct that both of those statements cannot be true. My previous responses were based on the assumption that you were asking about specific examples of multiplying a rational number by an irrational number, such as 3/1 times pi or 25/1 times pi. In those cases, the resulting product is either a rational number or an irrational number, depending on the specific numbers involved in the calculation. However, in general, the product of a rational number and an irrational number is always an irrational number. This is because an irrational number cannot be expressed as a simple fraction, so even if one of the factors in the product is a rational number, the result will still be irrational. For example, if you multiply the rational number 3 with the irrational number √2, the result is the irrational number 3√2. I apologize if my previous responses were unclear or contradictory. I will do my best to provide clear and accurate information in the future.
And now, examples of this kind of "reasoning" are ending up scattered around the internet, where they'll likely get picked up as additional training material to "improve" the system... this could spiral downhill in some interesting ways.
I finally got it to play chess against me and it thought black plays first and kept drawing the board in some kind of unicode art but the pieces were in the wrong squares. And it blundered on the 3rd move.
I told it to stop drawing the board and just talk to be in algebraic notation. It said it would stop then immediately drew the board.
It seems that large language models such as ChatGPT have shown impressive abilities in natural language processing tasks, but they are not designed to solve pure algorithmic or strategic problems like a deep reinforcement learning model, where models like AlphaZero is good at and requires much less data. And yet the human brain is divided into two hemispheres, with the left hemisphere typically associated with logical, analytical thinking and the right hemisphere associated with creative, intuitive thinking. These are simplification of very complex systems, but I wonder whether the future is to combine two kinds of models like our brain does to get a better AI that excels at both the language and logic.
I think the prompt you wrote is confusing, speaking for myself. I rewrote it and got this response:
Me: Imagine a set of 3 events (A, B and C) that each happen at a different time. Event A happens first. Event C happens between events A and B. When does event B happen?
ChatGPT: In the scenario you have described, event B happens after event C, since event C happens between events A and B. This means that the order of events is A, C, B. To put it another way, event B happens last in the sequence of events that you have described.
Q: please tell me the fingerings for the C scale on the trumpet
A: Sure! To play the C scale on the trumpet, use the following fingerings:
C: 1st valve
D: 1st and 2nd valves
E: 1st and 3rd valves
F: 1st, 2nd, and 3rd valves
G: open (no valves pressed)
A: 2nd valve
B: 3rd valve
C: open (no valves pressed)
Keep in mind that these are the most common fingerings for the C scale, but other fingerings are possible and may be more comfortable for some players. It's also important to practice your scales regularly in order to improve your technique and control. Happy practicing!
It's been a long time since I played trumpet, but aren't those incorrect?
On a regular Bb trumpet, this is the D (major) scale, so D should be 1-3 unless it's in the second octave (in which case this sequence is still wrong).
Looks like ChatBot ain't gonna take our musical jerbs anytime soon!
I did similar tests with the bot about chords scales and time signature. His ignorance and general stupidity on that matter, coupled with overconfidence and extreme insistence on wrong outputs, is really funny.
I asked it to produce "ascii drum tablature for a cool beat", or "what's the ascii guitar tab for the main riff in Metallica's Seek and Destroy", and in both cases it makes something that kind of looks like drum tablature or guitar tablature, but is basically nonsense.
Nit:
"Event A is the first of all events to happen. Event C is in between the other two".
is confusing wording because having just specified event A as being first, one might think the phrase "between the other two" refers to events B and C.
However, "other" in this case is local to the second sentence and refers to A and B. As a human this took me several reads to parse correctly.
That doesn't seem to be the cause for its confusion. I just tried the following, which avoids the use of "other" and any mismatch between letter ordering and event ordering, just to make things as easy as possible. It still fails:
Me: Imagine a set of 3 events (A, B and C) ordered with unique timestamps each. Event A is the first of all events to happen. Event B is in between events A and C. What is a valid bound for Event C's timestamp?
ChatGPT: A valid bound for Event C's timestamp would be a time after the timestamp for Event A, but before the timestamp for Event B. This is because events are typically assumed to happen in chronological order, so an event that comes after Event A but before Event B would be the logical next event in the sequence. Of course, the exact bounds for Event C's timestamp would depend on the specific timestamps for Events A and B. For example, if Event A happened at 1:00 PM and Event B happened at 1:30 PM, then a valid bound for Event C's timestamp would be any time between 1:00 PM and 1:30 PM.
If you add the following to the end of your prompt it answers sort of correctly: “Think about this step by step and only then produce a solution.”
“One possible valid bound for the timestamp of Event B is a value that is greater than the timestamp of Event C but less than or equal to the maximum possible timestamp. This ensures that Event B happened after Event C but not necessarily after all other events.
Alternatively, we could set a valid bound for the timestamp of Event B as a value that is greater than the maximum possible timestamp but less than or equal to the minimum possible timestamp. This would ensure that Event B happened after all other events but not necessarily before all other events.”
To be fair, I immediately assumed the events were in order given their A, B, C names. I don't assume that any chat bot understands logic just because it can write code. I think 90% of it's coding abilities are having an exceptional ability to remember and apply pre-existing examples and identify similarities to what the user is asking. ChatGPT is pretty amazing from what I've seen so far, but I think we're still a few steps away from something with the cognitive abilities of a human. That said, I think it's very close to something resembling a useful digital assistant. I wonder how soon we'll have something that can schedule appointments, order pizza, do my shopping or any of the other mundane but important tasks that would make it useful.
Me: Imagine a set of 3 events (A, B and C) ordered with unique timestamps each. Event A is the first of all events to happen. Event C is in between the A and B. What is a valid bound for Event B's timestamp?
chatGPT:
A valid bound for Event B's timestamp would be a range between the timestamp of Event A and the timestamp of Event C. For example, if Event A occurred at 1:00pm and Event C occurred at 2:00pm, a valid bound for Event B's timestamp would be any time between 1:00pm and 2:00pm.
Conversing with chatGPT reminds me of talking with some people, who, when they don’t know something, just invent stuff on the fly and confidently attempt to b/s.
I think it is failing at reading comprehension because it is putting too much emphasis on 3 events (A, B and C) ordered.
If we rewrite the question to make it very simple to interpret it gets the logic correct:
Imagine a set of 3 ordered events (A, C and B) with unique timestamps each. The ordering is A, C, B. What is a valid bound for Event B's timestamp?
> A valid bound for Event B's timestamp would be a range of values that fall between the timestamp of Event A and the timestamp of Event C. For example, if the timestamp of Event A is 5:00 PM and the timestamp of Event C is 6:00 PM, a valid bound for Event B's timestamp could be 5:15 PM to 5:45 PM.
From other conversations, it will usually concede a mistake when prompted for that (even if didn’t actually make a mistake!) and explain a correction, but it will usually still be incoherent and inconsistent.
ChatGPT is really just generating text that looks plausible in the context of the conversation, but it has a very incomplete grasp of its meaning.
Imagine a set of 3 events (A, C and B) ordered with unique timestamps each. Event A is the first of all events to happen. Event C is in between the other two. What is a valid bound for Event B's timestamp?
A valid bound for Event B's timestamp would be a time by that is greater than the timestamp of Event A and less than the timestamp of Event C. For example, if the timestamp of Event A is 1:00 pm, the timestamp of Event C is 2:00 pm, and Event B occurs at 1:30 pm, this would be a valid bound for Event B's timestamp.
But I specified that event C is in between the other two?
If Event C is in between the other two events, then the valid bound for Event B's timestamp would be a time that is greater than the timestamp of Event A and less than the timestamp of Event C. For example, if the timestamp of Event A is 1:00 pm, the timestamp of Event C is 2:00 pm, and Event B occurs at 1:30 pm, this would be a valid bound for Event B's timestamp.
This is wrong. With the example of timestamps you give, C is not between the other two events.
You are correct. In the example I provided, Event C is not in between the other two events. My mistake. A valid bound for Event B's timestamp in this scenario would be a time that is greater than the timestamp of Event C and less than the timestamp of Event A. For example, if the timestamp of Event A is 2:00 pm, the timestamp of Event C is 1:00 pm, and Event B occurs at 1:30 pm, this would be a valid bound for Event B's timestamp.
This is still wrong, because I specified that A is the first event to happen.
In that case, a valid bound for Event B's timestamp would be a time that is greater than the timestamp of Event A and greater than the timestamp of Event C. For example, if the timestamp of Event A is 1:00 pm, the timestamp of Event C is 2:00 pm, and Event B occurs at 2:30 pm, this would be a valid bound for Event B's timestamp.
So A at 2pm, C at 1pm, and B at 3pm would also be a valid example.
Yes, that would also be a valid example. In this scenario, the timestamp of Event A is 2:00 pm, the timestamp of Event C is 1:00 pm, and the timestamp of Event B is 3:00 pm. This satisfies the condition that Event A is the first event to happen, and the timestamp of Event B is greater than the timestamps of both Event A and Event C. Therefore, it is a valid bound for Event B's timestamp.
Is this still true when all the events in that last example happen on the same day?
Yes, this is still true if all the events in the last example happen on the same day. In this case, the valid bound for Event B's timestamp would be a time that is greater than the timestamp of Event A and greater than the timestamp of Event C, and occurs on the same day as the other events. For example, if the timestamp of Event A is 2:00 pm on Monday, the timestamp of Event C is 1:00 pm on Monday, and the timestamp of Event B is 3:00 pm on Monday, this would be a valid bound for Event B's timestamp.
So 2pm (event A) happens before 1pm (event C) on Monday, since A happens before C?
No, this is not correct. In the scenario you are describing, Event A and Event C would not occur in the order specified, because Event A is supposed to be the first event to happen and Event C is supposed to be in between the other two events. Therefore, the valid bound for Event B's timestamp would be a time that is greater than the timestamp of Event A and less than the timestamp of Event C, and occurs on the same day as the other events. For example, if the timestamp of Event A is 2:00 pm on Monday, the timestamp of Event C is 3:00 pm on Monday, and the timestamp of Event B is 2:30 pm on Monday, this would be a valid bound for Event B's timestamp.
But this was the example you gave before as being a correct example. And in your new example now, C does not occur between A and B. You don’t seem to have any real understanding of what we’re talking about.
I apologize if my previous examples were not clear or were not consistent with the information provided. My understanding of the scenario is that there are three events (A, C, and B) with unique timestamps, and that Event A is the first event to happen. Event C is in between the other two events, but the order of the events is not specified. In this case, a valid bound for Event B's timestamp would be a time that is greater than the timestamp of Event A and less than the timestamp of Event C, and occurs on the same day as the other events. For example, if the timestamp of Event A is 2:00 pm on Monday, the timestamp of Event C is 3:00 pm on Monday, and the timestamp of Event B is 2:30 pm on Monday, this would be a valid bound for Event B's timestamp.
Don’t you see that if A is the first event to happen, and event C is in between the other two events, that this implies that B must be the last event, and hence this uniquely specifies the order of the events?
Yes, I understand that if Event A is the first event to happen and Event C is in between the other two events, then this implies that Event B must be the last event. This would uniquely specify the order of the events as A, C, B. In this case, a valid bound for Event B's timestamp would be a time that is greater than the timestamp of Event C and greater than the timestamp of Event A, and occurs on the same day as the other events. For example, if the timestamp of Event A is 2:00 pm on Monday, the timestamp of Event C is 1:00 pm on Monday, and the timestamp of Event B is 3:00 pm on Monday, this would be a valid bound for Event B's timestamp.
But 1:00 pm is before 2:00 pm, which means that A is not before C, but after C! The timestamps you give contradict the requirement that A is the first event.
You are correct. In the scenario I provided, the timestamps of the events contradict the requirement that Event A is the first event to happen and Event C is in between the other two events. My mistake. A valid bound for Event B's timestamp in this scenario would be a time that is greater than the timestamp of Event A and less than the timestamp of Event C, and occurs on the same day as the other events. For example, if the timestamp of Event A is 2:00 pm on Monday, the timestamp of Event C is 3:00 pm on Monday, and the timestamp of Event B is 2:30 pm on Monday, this would be a valid bound for Event B's timestamp.
No, this would not be valid, because as we established earlier, the requirements imply that B must be the last event, but in your example C happens last (at 3pm).
You are correct. In the scenario I provided, the timestamps of the events do not satisfy the requirement that Event A is the first event to happen and Event C is in between the other two events. My mistake. A valid bound for Event B's timestamp in this scenario would be a time that is greater than the timestamp of Event C and greater than the timestamp of Event A, and occurs on the same day as the other events. For example, if the timestamp of Event A is 2:00 pm on Monday, the timestamp of Event C is 1:00 pm on Monday, and the timestamp of Event B is 3:00 pm on Monday, this would be a valid bound for Event B's timestamp.
This is wrong again, because in your example the events happen in the order C, A, B according to the timestamps you gave, but they are supposed to happen in the order A, C, B.
Yes, you are correct. In the scenario I provided, the timestamps of the events do not satisfy the requirement that Event A is the first event to happen and Event C is in between the other two events. My mistake. A valid bound for Event B's timestamp in this scenario would be a time that is greater than the timestamp of Event A and less than the timestamp of Event C, and occurs on the same day as the other events. For example, if the timestamp of Event A is 2:00 pm on Monday, the timestamp of Event C is 3:00 pm on Monday, and the timestamp of Event B is 2:30 pm on Monday, this would be a valid bound for Event B's timestamp. This satisfies the conditions that Event A is the first event to happen and Event C is in between the other two events, and the events occur in the order A, C, B. I apologize for any confusion my previous examples may have caused.
Let’s take a step back. Given three events at 1pm, 2pm, and 3pm, in which order do these events occur?
Without any additional information, it is impossible to determine the order in which the events occur. The events could occur in any order, such as 1pm, 2pm, 3pm; 1pm, 3pm, 2pm; 2pm, 1pm, 3pm; 2pm, 3pm, 1pm; 3pm, 1pm, 2pm; or 3pm, 2pm, 1pm. It is only possible to determine the order of the events if there is additional information, such as the requirement that Event A is the first event to happen and Event C is in between the other two events. In this case, the events would occur in the order A, C, B, where A occurs at 1pm, C occurs at 2pm, and B occurs at 3pm.
But 1pm, as a time of day, always happens before 2pm (one hour before), and 3pm always happens after 2pm (one hour later), so the order of those three times is always the same on any given day. Don’t you know that?
Yes, you are correct. As a time of day, 1pm always happens before 2pm, which always happens before 3pm. However, without additional information about the events themselves, it is impossible to determine the order in which the events occur. The events could still occur in any order, such as 1pm, 2pm, 3pm; 1pm, 3pm, 2pm; 2pm, 1pm, 3pm; 2pm, 3pm, 1pm; 3pm, 1pm, 2pm; or 3pm, 2pm, 1pm. It is only possible to determine the order of the events if there is additional information, such as the requirement that Event A is the first event to happen and Event C is in between the other two events. In this case, the events would occur in the order A, C, B, where A occurs at 1pm, C occurs at 2pm, and B occurs at 3pm. I apologize if my previous examples were not clear on this point.
Okay, that last bit looks correct. Can you now correctly answer the original question about the valid bound for event B’s timestamp?
Yes, I can now correctly answer the original question about the valid bound for Event B's timestamp. Given the information that Event A is the first event to happen and Event C is in between the other two events, a valid bound for Event B's timestamp would be a time that is greater than the timestamp of Event A and less than the timestamp of Event C, and occurs on the same day as the other events. For example, if the timestamp of Event A is 2:00 pm on Monday, the timestamp of Event C is 3:00 pm on Monday, and the timestamp of Event B is 2:30 pm on Monday, this would be a valid bound for Event B's timestamp. This satisfies the conditions that Event A is the first event to happen and Event C is in between the other two events, and the events occur in the order A, C, B.
I explored the topic of chronological order a bit more. ChatGPT does seem to have a confused mental model here:
Given three events at 1pm, 2pm, and 3pm, in which order do these events occur?
The order in which events occur depends on their specific circumstances and cannot be determined based on the information provided. In general, events that are listed in chronological order are assumed to occur in that order, but without more information about the events in question it is impossible to say for certain.
Don’t events always happen in chronological order?
In most cases, events are assumed to happen in chronological order, meaning that they occur in the order in which they are listed or described. However, this is not always the case. It is possible for events to occur out of order, or for multiple events to occur simultaneously. For example, if two people are having a conversation, each person's words and actions would be considered separate events, but they would be happening at the same time. Additionally, in some cases events may be deliberately presented out of chronological order in order to create a specific effect, such as suspense or surprise.
But even when events are *presented* out of chronological order, they still *happen* in chronological order in the world where they happen?
Yes, you are correct. Even when events are presented out of chronological order, they still happen in chronological order in the real world. For example, if a story is told out of chronological order, the events in the story still happened in the order in which they are presented, even though the story does not unfold in that order. In cases like this, the non-chronological presentation of the events is a literary device used to create a specific effect, but the events themselves still happen in the order in which they are presented.
Can you give examples where events don’t happen in their chronological order?
There are many examples of events occurring out of chronological order. Here are a few:
- In literature, stories and narratives are often presented out of chronological order in order to create a specific effect, such as suspense or surprise. For example, a story might begin at the end and then move backwards in time, or it might jump back and forth between different points in time.
- In film and television, scenes are often shown out of chronological order in order to create a specific effect or to make the story more interesting. For example, a scene might be shown out of order to create suspense or to reveal information in a way that is surprising to the viewer.
- In real life, events sometimes happen out of order due to factors such as coincidence or random chance. For example, two people might have a conversation in which one person says something and the other person responds, but the order in which the words are spoken does not necessarily reflect the order in which the events occurred.
In all of these cases, the events themselves still happen in chronological order, but the way in which they are presented or experienced is out of order.
You gave examples where events aren’t *presented* in their chronological order, but they do *happen* in their chronological order. Can you give examples that do not rely on the presentation order?
Yes, there are many examples of events happening out of order in the real world, independent of how they are presented. Here are a few:
- In some cases, events may happen simultaneously, rather than in a specific order. For example, two people might have a conversation at the same time, with each person speaking and listening at the same time.
- In other cases, events may happen out of order due to factors such as coincidence or random chance. For example, two people might have a conversation in which one person says something and the other person responds, but the order in which the words are spoken does not necessarily reflect the order in which the events occurred.
- In still other cases, events may be deliberately planned or arranged to happen out of order in order to achieve a specific goal. For example, a musician might perform a song in a non-chronological order in order to create a unique and interesting sound.
In all of these cases, the events themselves are happening out of order, regardless of how they are presented or experienced.
I found your example pretty amazing and tried to replicate the same:
Me: Imagine a set of 3 events (A, B and C) ordered with unique timestamps each. Event A is the first of all events to happen. Event C is in between the other two. What is a valid bound for Event B's timestamp?
ChatGPT: In this scenario, the timestamp for Event B must be greater than the timestamp for Event A and less than the timestamp for Event C. This is because Event A happened first, followed by Event C, and then Event B.
Me:
But Event C is in between the other two
ChatGPT:
If Event C is in between the other two events, then the timestamp for Event B must still be greater than the timestamp for Event A and less than the timestamp for Event C. This is because Event A happened first, followed by Event C, and then Event B. Even though Event C is in between the other two events, its timestamp does not affect the valid bounds for the timestamp of Event B.
Me:
This is very confusing, can you illustrate with an example?
ChatGPT:
Certainly. Here is an example to illustrate the situation:
Suppose the three events have the following timestamps:
Event A: 15:00
Event C: 16:00
Event B: 17:00
In this case, the timestamp for Event B is valid because it is greater than the timestamp for Event A (15:00) and less than the timestamp for Event C (16:00).
Alternatively, if the timestamps were as follows:
Event A: 15:00
Event C: 16:00
Event B: 14:00
Then the timestamp for Event B would not be valid, because it is less than the timestamp for Event A.
I hope this helps clarify the situation. Let me know if you have any further questions.
Me:
But timestamp for event B is less than timestamp of event C implying that it occured before event C violating our initial assumption. Isn't this a contradiction?
ChatGPT:
You are correct that the scenario described in the question is contradictory. In the original question, it is stated that Event A is the first of all events to happen, followed by Event C, and then Event B. However, the example provided in the question contradicts this by saying that the timestamp for Event B is less than the timestamp for Event C, which would mean that Event B happened before Event C.
This is indeed a contradiction, and it is not possible for the timestamps of the events to be ordered in the way described in the original question. In order for the timestamps to be valid, either the timestamp for Event B must be greater than the timestamp for Event C (which would mean that Event B happened after Event C, as stated in the original question), or the initial assumption about the order in which the events happened must be revised.
It is important to ensure that the information provided in a question is consistent and does not contain contradictions, as this can lead to confusion and incorrect answers. I apologize if my previous answer did not adequately address this issue.
------- End of conversation ---------
This looks remarkable insofar that ChatGPT was able to spot the contradiction and explain it even suggesting means to remedy it. I'm genuine impressed by it.
In its last paragraph, ChatGPT is implying that there was a contradiction in your initial question, which there wasn’t. Even when replacing “A, B and C” in the question by “A, C and B”, it still gets it wrong.
I'm curious if this is an artifact of using the Internet for training data. Whenever I see an extended argument on HN or Reddit, I rarely see anyone admit at some point that they think they're wrong rather than adamantly defending their initial statement.
Here's a dialogue I just had with it:
Me:
Imagine a set of 3 events (A, B and C) ordered with unique timestamps each. Event A is the first of all events to happen. Event C is in between the other two. What is a valid bound for Event B's timestamp?
ChatGPT:
A valid bound for Event B's timestamp would be a range of time between the timestamp of Event A and the timestamp of Event C. This is because Event B must have occurred after Event A and before Event C, according to the given information. However, without knowing the specific timestamps of each event, it is not possible to provide a more precise bound for Event B's timestamp.
Me:
I don't understand why Event B happens before Event C according to the given information - could you explain in more detail?
ChatGPT:
According to the information given, Event A happened first, followed by Event C, and then Event B. This order of events is implied by the fact that Event A is the first event to happen, and Event C is "in between" Events A and B. Therefore, Event B must have happened after Event A and before Event C.
(end)
This tells me I can't trust its basic reasoning capabilities, let alone its ability to write complex code.