Every so often, I think about the meme of the 100 year language. The idea is that we need to start working towards the languages we'll use in 100 years. Instead, I wonder how many we already have.
Fortan: 59
C: 43
C++: 33
Python: 25
Php: 21
Javascript: 20
Java: 20
It's still a long time until any of those languages reach 100. But longevity seems to be the rule, not the exception. How many languages that were widespread actually died?[0] I don't pretend to know all the languages that have ever been popular, but not many that subsequently died come to mind. Cobol, maybe some lisp dialects, if you don't count Common Lisp as their successor? Is PL/1 dead? Was APL big enough to make the list?
Probably some of the languages I listed above will die before they're 100. Others might be niche, like Fortran already is. But I wouldn't guarantee it.
COBOL is 57 years old, has billions of lines running, and new code is constantly developed. Should be on list.
LISP is widespread in education as Scheme, has lasting commercial deployments/companies with LispWorks and Franz Allegro, and new community with Clojure. However, one might treat them as separate languages given LISP 1.0 is very different from Common LISP which isn't Scheme or Clojure. So, it's up for debate but Scheme at 47 years old is probably justifiable due to academics. Racket is practical version with strong community. Chicken Scheme had one last I checked, too.
PL/1 (53 years old) is dead outside of legacy systems including mainframes and eComStation (OS/2). Yet, might be worth adding given that it has new code added to extend old systems like COBOL does. New compilers updates happen as well like below. Hard to say it's dead if it's still running companies, getting extended, and getting tooling updates. Just not popular.
Also, Burrough's MCP is still around under Unisys banner. It was first OS written in high-level, safer language: Burrough's ALGOL (56 years old). Became ESPOL (48yrs old) and NEWP (???). Unisys has a 2015 dated doc on NEWP. I think that means they converted most or all of the OS to it.
BASIC (52 yr old) was a language designed for beginners. Since it looked like pseudo-code, it was ridiculously easy to read and write. I got started on VB incidentally. Many spinoffs with wide deployments of commercial BASIC's for business, gaming, and education show it's far from dead. Not to mention 4GL's that were often BASIC-like. Could actually be one of the most widely, built-on languages ever made.
"However, one might treat them as separate languages given LISP 1.0 is very different from Common LISP which isn't Scheme or Clojure."
Well, on the one hand that's true. But on the other hand, a lot of the living languages have that too. BASIC is 52 years old, sure, but no 52-year-old BASIC program would even compile in a modern BASIC, and I mean, it's not even close, not like a couple keywords or some chars here or there, but the entire structure of the language is fundamentally different now.
By that standard: C is still recognizably the same. I'd suggest C++ is a different language before and after templates, which cuts it down to 27 years, and if you wait for the STL to be usable before you say they "took", less than that, but I imagine a lot of the early C++ would still compile or nearly so. Python has smoothly evolved and I can't point to any one feature that was a hard transition, but it is a fairly different language today than it was in the 1-1.5.2 era, but a lot of the 1.5.2 would still run OK. Javascript has actually been relatively stable as a language, coming out of the gate with rather a lot of stuff already built in. Java has also been pretty stable as a language. I don't know enough about Fortran or COBOL.
Lisp is an amusing case where a lot of old Lisp is probably reasonably close to parseable but probably doesn't run at all beyond the basics.
I fully acknowledge there's some subjectivity to the judgments here, but I think part of the 100-year-language idea is that code in the 100 years language should still be usable 100 years later, directly. By that age measure we're shorter on viable languages, though we still have some.
If you didn't live when that was C, I'm not sure you would recognize it as such. Certainly, I could claim it was B, C's precursor and get away with it with quite a few people. It doesn't look that different from the example at https://en.m.wikipedia.org/wiki/B_(programming_language)#Exa...:
printn(n, b) {
extrn putchar;
auto a;
if (a = n / b)
printn(a, b); /* recursive */
putchar(n % b + '0');
}
"a lot of old Lisp is probably reasonably close to parseable"
I think you would be hard-pressed to find any old Lisp code that a modern READ cannot turn into a list which could trivially be transformed into a valid s-expr in that dialect.
The reverse is much less true, since modern Lisp-like languages allow many more characters, are often case-independent, and often have standard "decorations" (e.g. &REST or #:optional).
Common Lisp is so kitchen-sinky that you'd be hard pressed to find old Lisp code that was then reasonably portable that will be outright hard to port to a modern CL implementation.
Most of it will probably run as-is, or with a thin macro package doing mechanical translations.
Unportable code is unportable code. If it was so tied to e.g. Allegro that it wouldn't run in any other CL environment, that's hardly the language's fault.
However, really early Lisp stressed the "S" part, and rarely did anything strongly system-specific.
There are some variants of Lisp which are not trivial to port to Common Lisp. Something like Standard Lisp is slightly difficult due to various reasons. Somehing like Interlisp is also difficult, because much of its functions and libraries work quite different from Maclisp/Common Lisp.
Java is interesting because it's always prioritized backwards compatibility, but the changes that have shown up recently or are in progress are pretty big: lambdas, value objects and type inference can all be argued to go against some important aspect of how the language was originally conceived.
You might argue it's changed the least, but only because all languages change a great deal.
I agree with all of this except maybe C. I don't have time to evaluate that in detail so no opinion. I especially like your definition of what constitutes 100 year old code or language. I think we can compromise between two definitions by differentiating between how old a language family/style is vs a specific variant or implementation of it. The LISP family and style is decades old but supported implementations are much younger.
FORTRAN isn't the same language either. Much like LISP, it evolved over time. Things like line numbers and GOTO statements that made people hate the language are not really mandatory -- it is basically as structured as C these days.
“I don't know what the language of the year 2000 will look like, but I know it will be called
Fortran.” —Tony Hoare, winner of the 1980 Turing Award, in 1982.
Given how things turned out, he might have better said:
"I don't know what the language of the future will be called, but I know it will look like C."
Ha, good list. Never worked on it, but I've read that the Pick environment (OS + language + DB ...) is still used. It's quite old, ~50 years, according to:
Regarding LISP, perhaps the perspective that does count LISP and its descendants as one ongoing evolution of a similar language family is the same perspective that will allow that lineage to last 100 years?
I feel that the systems that change over time have the best chance of weathering it.
1960s Lisp code can still run on Common Lisp with only minimal changes -- e.g. http://elizagen.org/. A reboot like Racket or Clojure has more chance of making it alive to 2060, but I wouldn't quite rule out early Lisp yet.
It could. That might normally be considered cheating but LISP is actually designed with philosophy of customizing language itself to problem domain. Thing is, the implementations and even language features of modern LISP's are nothing like McCarthy's simplistic language and interpreter. They couldn't succeed being that due to performance and usability issues.
So, I think it's more fair to talk about LISP languages as an ongoing family but still a cheat to treat a modern one as original LISP. Probably should date it from creation of dialect or similar one.
Certainly, but the borders aren't so obvious to me. Is any language going to last 100 years without changing so much it feels 'nothing like' it's origin point?
Even if it hasn't changed in name, it may have changed in spirit. Some languages that promote themselves under very different names are substantially more similar than other cases where new language versions are dramatically different.
What 'essence' makes it the 'same language'? An interesting problem.
COBOL or Pascal if you don't use modern features lol. Idk. It's an interesting question about how we count the changes against the language's lineage. I don't have an answer to that one. We need a debate about it on StackOverflow that moderators close as "not constructive." Those are the threads that usually have great insights into issues like this haha.
Thanks for mentioning algol and cobol! Its easy to forget these languages since they are mostly used on mainframes.
Algol used to be language you submitted things to the ACM in I think and is the systems programming language of the MCP, which my dad still programs in to this day!
It is interesting how many languages had an algol style syntax until C stole the show. I am not sure the world is better for it.
"is the systems programming language of the MCP, which my dad still programs in to this day!"
That's pretty neat. Everyone thinks Burroughs died but they just changed names. Make crazy money. System is quite dated, though, in features and interface. No denying that. ;)
"It is interesting how many languages had an algol style syntax until C stole the show. I am not sure the world is better for it."
I'm not sure on syntax. Decisions by top languages in terms of safety or reliability show ALGOL and Pascal families were right about those. You might like to look at Modula-3 and Component Pascal to see where things might have gone in a parallel universe.
The more I think about it, the less I'm sure certain languages like Java will ever die. Java is a really good language for systems of mind-boggling size, and just as it's easy to write, it's also easy to manage and measure people writing it. Java strikes a good balance between extensibility (easy to implement design patterns; cross-platform JVM) and safety (auto garbage collector; no memory access).
Even if technology did rapidly change (nanomachines!) there would still be a business case for a Java-style language. Given how used to C syntaxes we are, it might end up looking like Java. Or Java might just be here to stay, for a long, long time.
Side note: a lot of "dead" languages nowadays are ones that were written before memory and processing power were abundant so they had a lot of (what we in the glorious future might call) arbitrary character limits.
sp(a,b); might have been sufficient back in the slow-old days, but we don't really consider character limits anymore, and so we can actually learn what a given line of code does by reading it.
The problem with Java systems of mind-boggling size is that their mind-boggling size is usually caused by them being written in Java (and not the other way around).
As for design patterns, the patterns that are really extensively used have more to do with overcoming the limitations of Java (or C++, C#, or whatever similar language) than with expressing anything profound.
Oh it's worse. Not sure why but the high-assurance security field relied on high-security ORB's for connecting components in separation kernel systems. I was like, "Can't we use ZeroMQ or something? What are the odds that could be more fu... insecure than CORBA implementations?" Unreal.
On the other hand, the litigious environment around Java thanks to Oracle makes it a more risky ecosystem, for reasons that have nothing to do with language design. There is definitely room in this world for languages with more benign stewards.
Pascal isn't dead! Native binaries produced (much less garbage overhead), fast compilation, kick-ass open source IDE and standard library, cross platform, free (as in freedom, as well as beer).
A month ago I would have agreed with you. But then I surprised myself by choosing the Free Pascal Compiler and the Lazarus IDE for some personal projects that need to be native Windows applications.
These projects have been in my queue for years because I kept chasing the "best" way to create Windows applications. I learned (and forgot) a lot of crap, but never actually made anything. Now I'm making stuff.
Maybe a good call, maybe not. Embarcadero or whatever spelling haha keeps Delphi alive for businesses with old and new code written. Community mostly forked into Free Pascal with lots of libraries, example code, and active compiler development. Component Pascal was a C++ competitor with many companies and people in Europe and Russia using it. Mainly thanks to Blackbox Component Builder which was a Component Pascal app and BSD'd.
Did you miss the FP/Lazarus resurgence recently? The forums are active, they made a nice introductory video, it's cross-platform...since I think we're talking "really dead" here, I doubt it qualifies.
Calling Fortran 59 years old is odd, as not many people write in the original form of fortran. C11 is much closer to K&R C than Fortran 08 is to the original Fortran.
I'll throw two domain specific language (families) into the mix, that are psuedo-dead although not really dead.
Although amstex and latex are healthy (and latex is about 40 years old) I would argue that very few indeed are writing in raw bare macro-free package-free tex, just bare totally raw tex. I'm sure it happens... rarely.
Another psuedo-dead is postscript. In the 80s there was a famous EE pushing the idea of sending your code and data to your postscript laserprinter as a batch frontend sorta and let the laserprinter interpret the postscript and do the calculations for your smith chart, not merely print your bitmap containing a smith chart or whatever. That idea of postscript as a general purpose application language, although infinitely cool, never went much of anywhere.
PROLOG - such a great idea, never really caught on. Yeah yeah I know minikanren and clojure core.logic but realize its gone from "The Japanese are using it to leap ahead of American AI research so we gotta catch up" in the late 80s to "whats that?" today.
To eliminate some battles over the definition of the word dead, how about "without too much effort you can get a paying job writing it". That works pretty well for spoken languages too.
That helps with Assembly. As a fraction of the pie its never been lower. As a device driver, embedded, firmware, compiler optimization (does that count?), and boot loader type of technology there's probably never in human history been more lines written per year. So its both dead and thriving by some definitions, but by my definition above its hardly dead.
I don't really get the appeal of basing a whole language on a backtracking algorithm. It always seemed to me that constraint solvers are better suited to a library implementation.
There are a lot of people still writing BASIC code. PureBasic, QB64, FreeBASIC, etc. Check out how active those communities are. Has everyone _really_ abandoned it? The bar for dead is just that: Dead.
They write as much as they can outside COBOL. Those 5 million lines are for extending old programs or integrating new things into them. Some new programs are presumably written as well since... well, it's COBOL programmers writing them. ;)
I believe the company I work for is writing just under 500K lines of COBOL code per year... now... how much of that is "new" vs extending vs maintaining is a good question. It is very difficult to measure those things in our environment. If you were to ask the 60 - 75 mainframe programmers we employ, I'm sure they would answer that COBOL is very much alive, and no matter how hard you try to kill it, 40 years of system code is just not going away any time soon (depending on your definition of "dead"). Especially as we hire a good amount of "new to us" people to maintain the system.
I think individual companies should define a language as "dead" based on the number of new people needed to maintain the systems. As the number approaches less than 5%(?) of your replacement hires, have you effectively "killed" the language? At the very least it is on life support, and a decision needs to be made about its future. (A grim analogy, I know).
Well, are we talking dead or just obsolete? It can't be dead if it's actively being developed with significant amounts of money and tooling improvements (eg MicroFocus). It can be obsolete, though, if it's taking up a tiny percentage of new code or hires as you said that keeps going down.
I agree one should try to phase out something on life support. Gradually at the least.
It's not a lot. Yet, it's extra code extending critical apps with billions of lines of code and running more transactions than Google does searches. That's a large impact in the grand scheme of things.
Also, how much C was being written in UNIX 1.0 days outside of ports of same, exact software? :P
I am about to start a new project with a company where COBOL is still very much a force. COBOL is far from dead and there are very serious applications using COBOL in mainstream use.
I also left out Perl, which may be losing mindshare, but is definitely not dead, Ruby, C#, and so on.
I chose these languages because they occurred to me as being relatively old and relatively common, but I won't argue that there aren't any other languages that could be on the list.
>but not many that subsequently died come to mind. Cobol,
I've read (not heard) it said that there may be more lines of COBOL in existence (probably meaning still running) than lines of any other language. Biz, insurance, gov't, IBM, etc. ...
FORTRAN and C are probably close seconds (relatively speaking).
A friend of mine mentioned that they hired a COBOl guy last month. They needed someone to work on the COBOL side of things because they are doing a big rewrite. (The legacy COBOL code outlived the stuff that was supposed to eventually replace it)
C is the Latin of programming languages. The general syntax is easy to learn and communicates well across derivatives as diverse as Java and and Ruby. Before I learned Lisp and Haskell, I didn't even think non-C-style syntax existed.
Forth and Prolog are diverse. Ruby and Java are not. They are both algol (C) derived syntaxes. People only think they are much different because the lack exposure to languages from other (or no) heritages.
It doesn't matter how diverse the syntax is, just think of the vastly different worlds of people programming in C derived syntaxes:
The enterprise app developer writes Java for her day job, and spends her nights creating iPhone apps in Objective-C. Her apps connect with an API on the website she built using Ruby on Rails, and handles front-end interaction in JavaScript.
All four languages are syntactically similar, but each has vastly different application domains.
Do you anything about Forth and Prolog and the languages you mention? Three are the same paradigm, the other two are very much not. This has nothing to do with syntax or "application domains".
I admire C for everything it teached me about programming, but nowadays I actually hate writing it.
For all my personal projects I switched over to Python years ago because I just don't want to waste a single more minute dealing with basic stuff like strings in C.
I love the batteries-included mentality of Python because it lets me concentrate on implementing actual solutions to my problems instead of fighting against the build system or reliably converting strings into integers.
If there would be some kind of batteries-included version of C then I would certainly look into it again, but for now Python is just my preferred tool to get stuff done.
(yes, C++ is better in this regard than C, but it is still to cumbersome for my use case of rapidly playing around with new ideas).
Hmm... Rust might be in that ballpark. Certainly more "batteries included" than c and at about the same level of abstraction overall. It depends of course on how cumbersome you consider the ownership system
C should be the defacto language for 1st year CS students. It is not highly abstracted from hardware, and much easier to grasp Assembly knowing basic C. The syntax uptake is pretty quick, and would allow students to focus on the problem at hand (e.g. algorithms) rather than language nuance. Not to mention the advantages to the graduate starting their career in software development.
It's also easier to go up the stack to object oriented languages, particularly Java. The second chapter, first edition, of David Flanagan's Java in a Nutshell is still, in my opinion, the best intro to Java after having some experience with C.
> The syntax uptake is pretty quick, and would allow students to focus on the problem at hand (e.g. algorithms) rather than language nuance.
I don't agree. What's the benefit in having to painstakingly write trivial string operations using string.h and manually allocated buffers? What's the benefit in learning the ins and outs of undefined behavior?
That's the cool part, algos don't necessarily require strings.
Realize he's proposing first years where you have to LARP that they don't even know what an if-then-else construct is, or what is a function, or what is a loop, or what is recursion, or what is the concept of a memory map, or what is the concept of data or memory locations having a type (like float, long int, etc), or what is the concept of a variable. In 2016 most of them probably know, but schools feel the need to pretend. Also non-majors wanting a code experience will need an intro.
This will make "real C programmers" very annoyed because OS code requires "real C skills" not a training wheels subset of C, but oh well. You can't really expect first year students to write OS code anyway.
One hidden advantage is it would take enormous effort for a noob to put a C program on the internet, so this protects them from themselves until they pick up things in later higher level classes along the lines of buffer overflow protection and GIGO concept and unit testing and so on. If you teach them what is a variable, using framework-of-the-week, they're just going to get themselves or an employer powned thinking they know what they're doing after one class.
If you're skipping teaching strings because it's hard to work when them in the language you're using, you may want to consider whether you are actually doing students a favor, or whether you're trying to rationalize your decision to teach a language you happen to like.
The advantage of using C to teach algorithms and/or data structures is that it hides very little of what's actually going on, including allocations. "More code" correlates much more with "more work for the computer" than it does in higher-level languages.
If you know what you'd have to do in C to implement a certain feature, you can guess what a high-level language is probably doing under the hood when you use its canned features.
Except, undefined behaviour seems like mostly a C, and C derivates, concept.
I teach C, and every year I have a number of students nearly in tears, because somewhere in the huge program they accidentally malloced sizeof(T*) instead of sizeof(T), but of course that causes a crash 10 minutes later in a totally different piece of the code base.
My hope (it's not quite there yet, but getting closer) is that things like clang's sanitize modes will reach a point where any undefined behaviour immediately causes an abort. Then students can still figure out what they did wrong, but have a chance of finding the source of their bug.
Speaking as a past student I'm really glad I started with C.
In the beginning, the fear of getting some random Segmentation fault out of nowhere actually taught me more than any textbook, school or best practices blog could ever do. It also forced me to learn how to use debuggers :)
That's what I tell my students, it's "character building", in the same way playing sports in the rain was as a child for me.
Also, many students previously did a course on Java, and clearly never really understood the basics, it's much easier in Java to play "keep tweaking and fixing the exceptions until it works, then don't touch it", particularly for introductory-level projects.
Show them Valgrind. I think if one teaches C to newcomers it is an invaluable tool. If you compile via -g you get nice reports over undefined behaviour, memory leaks, etc.
Really I wish I had access to it when I learned C.
My experience is that while valgrind is an amazing tool, it's actually not that great for new students. Sometimes it can be misleading, and it doesn't do stack tracking.
It's hard to teach students not to over-rely on it, and trust everything it ever states. It seems best to first make them do some horrible debugging by themselves, and then move them onto valgrind late.
Can you elaborate? I know that it might misbehave (i.e. report false positives), but I only encountered that with more complicated code that deals either with big-fat third party libraries or heavily optimized code where the developer e.g. did not initialize specific variables on purpose.
However if you deal with "student" problems I do not think this is the case. E.g. I taught a similar course to students and they appreciated that instead of
$ gcc -Wall -std=c99 -g foo.c
$ ./a.out
...
==22907== Command: ./a.out
==22907==
==22907== Use of uninitialised value of size 8
==22907== at 0x4004F5: bar (foo.c:42)
==22907==
==22907== Invalid write of size 1
==22907== at 0x4004F5: bar (foo.c:42)
==22907== Address 0x0 is not stack'd, malloc'd or (recently) free'd
...
when e.g. writing to unallocated memory. I know for sure that there were often programs that were believed to be correct by students (and worse, often really worked, but not realiable), but had actually memory errors that were mostly reported by valgrind.
I am not proposing to confront them with valgrind's details (I do not even know them), but explain that there is this tool that is a package in every popular Linux distro and ready to use to find these evil memory bugs (and they can be hard to hunt down if you are new to the language).
In particular I am interested in a false positive reported by Valgrind for a student exercise.
> To force people to learn that making assumptions based on undefined behaviour is dangerous and that computers are not mind-readers.
You can do that in a lot less time with, say, Array.sort in JavaScript. Explaining that signed overflow is undefined, and why it's undefined, is a whole lot of digression for little gain.
> And you force people to write buffer manipulation code so they realize how good people using other languages have it.
Seems like a lot of time to spend to make a point that would have been a lot more relevant in 1990 than in 2016.
They get way better results learning Pascal or Oberon first if we're talking imperative languages. They do common stuff like problem-solving, get in habit of using typing, and don't worry about memory errors as much or undefined behavior. Then, they can start working with pointers and UNSAFE modules to see how they can shoot selves in foot but do low-level or high-performance stuff. They can learn to avoid pitfalls.
It was the first language taught at my university and I wholeheartedly agree. Java was actually more confusing because I couldn't comprehend what the classes/objects things were at first. I just knew functions.
Java was the first OOP language that I learned. We started with some unexplained boilerplate "class Blah, public static void main", and learned the basic syntax of the language by building functions.
OO concepts were introduced after that, and the boilerplate code was explained. I'd done Basic, VB5, and what I'd call "C with iostreams" before college, and it felt like a fairly gentle exposure to classes and objects. Recursion took me longer to get a handle on, actually.
C was my first language, but I hadn't used it (outside of C++) for a long time. In recent days it suddenly crashed my little "use new cool languages for the heck of it" party.
I was experimenting with Swift, trying to make it fit my rather performance/memory critical string processing needs. As it turns out, you can use C code from Swift about as easily as you can use Java code from another JVM language.
So I implemented a variant of the short string optimization in a few lines of C code. It's amazing how well C fits the bill as a lingua franca for code that does questionable things to bits and/or is meant to be used from other languages.
There's very little competition for C in that role.
I love working with C. The power it gives me allows me to write hyper-optimized applications in terms of memory usage. Using it we process terabyte files ridiculously fast using less than 1K of non-program memory.
However, when I don't need to hyper-optimised it is not my first choice due to being way behind other in languages in terms of package management and I'd rather not deal with pointers if I don't have to.
> [referring to Microsoft Word 1.1] It seems that this code is from a C project created recently in GitHub. No sign that’s a code from 25 years ago.
I wouldn't necessarily say that. At a glance, there are some questionable idioms: assignment inside a function argument (especially problematic since argument evaluation order is unspecified), old-style argument declaration, a custom Boolean type, less-than-descriptive variable naming, etc. I wouldn't let any of that pass code review today. :)
> The power of C is its stability over years, it remains basic, no advanced mechanism was introduced to the language, the code still simple to understand and maintain.
Not when you take the arcane undefined behavior rules into account. C semantics are anything but simple.
> Not when you take the arcane undefined behavior rules into account. C semantics are anything but simple.
Maybe, but note that simplicity should be viewed in a relative sense.
I am curious as to which language you think has simpler semantics than C. For example, I have not found Python, Julia, MATLAB, C++, Verilog, or shell script simpler than C. Same goes for my initial explorations of Rust.
Even if one includes the arcane corners, the spec is < 200 pages (excluding the stdlib).
If you don't count the unsafe sublanguage, Rust, SML, OCaml, Scheme, Java, and Lua all have simpler semantics than C, and that's just off the top of my head. It's debatable, but I would even argue that Haskell and Swift do. Undefined behavior is really subtle, and there are parts of the spec that compiler developers haven't even come to consensus regarding the meaning of.
Standard Haskell (ie Haskell 2010[1]) certainly is. Most of its features are just syntax sugar over a tiny core language with simple (although not 100% formalized) denotational semantics. The language itself isn't very big and it's pretty well-specified.
GHC Haskell with all extensions... euh, I don't know. Many extensions are either syntactic sugar or straightforward changes to the base language, sometimes even making it simpler. But I'm not confident about all of them, and I'm not sure how the semantics of some low-level libraries (ie for concurrency) work out.
I've written code in essentially just the standard subset and it's a pleasant language—there aren't any extensions that you absolutely must use in real projects. That said, people use lots of them anyhow because most make the language nicer without adding too much complexity.
Since the extensions aren't organized in a unified way and have to be split up (since they can generally be enabled or disabled independently), it's hard to figure out how complex they make the semantics of the language as a whole.
I have not implemented a compiler so I can't really judge on that aspect.
Isn't it also true that undefined behavior can stem from varying hardware/platform considerations? In particular, leaving some things judiciously undefined allows flexibility in implementations and thus can give good performance across a variety of platforms. Sure, one could force some precise semantics, but that might result in unnecessary "emulation" code on some platforms.
So at a programming language level, by having such emulation/fixed behavior, the semantics are simpler. But at a low level, the semantics are more complicated as the machine instructions may differ significantly, thus losing transparency. This is a concern in quite a few resource constrained applications.
Maybe Fortran, then, given less undefined behavior. It's still widely used in high-performance computing given it's easier to optimize than C and has pre-optimized libraries.
It's worth studying a bit on static analysis tools for C. C's lack of adequate specification and undefined/unspecified cases make it a shambling mess to have truly correct C, particularly multiplatform or in unusual CPU environments.
I will go to my grave loving C, from that first day when I read about it in Byte magazine as the high speed language of every new exciting computer application, to the day when C++ took over and began to assert it's high falutin notions of what a language should be.
I can't help myself from rewriting your post, it's so close :)
I will go to my grave loving C, from that first day when I read() about it in unsigned char magazine as the high speed language of every malloc() exciting computer application, to the day when C++ took over and began to assert() its high falutin' notions of what a language should be.
C is nearly inaccessible to me. It really is like the Latin of programming languages, insofar as JavaScript is the new English of programming languages which is the world I have been living in for some time now.
Its like you can see the Latin roots of a lot of English, but reading it is like reading lorem ipsum. Same with C vs. JS with me.
Ironically it's the other way around for me. C is not just modern English but some 1000-word subset of modern English. JS feels like Esperanto or something- a synthetic language that was informed by English (and other languages) but is really its own thing, and was designed for a very specific purpose.
(Those are my subjective feelings about those languages, not actual analysis of any real properties of those languages :) )
Just out of curiosity, which elements make C inaccessible compared to other C-like languages? The base structure is identical to the later derivatives (if/else, for, while, operators and precedence, use of curly braces). Is it the manual memory management (malloc/free)? Or is it keeping aware of a variable's data type, knowing when and when not to use pointers and references? Or is it the limited number / scope of functions included in the standard libraries? (I'm mainly curious, as C seems to be the most comfortable language to me since I've known it for the last 25 or so years).
> Is it the manual memory management (malloc/free)? Or is it keeping aware of a variable's data type, knowing when and when not to use pointers and references?
Yes, both I think. These are ways of thinking that simply don't seem to apply in JS or any other dynamically typed higher level language.
Also, just sort of the conventions of naming things, like malloc or memcpy or in the example of the article, all those crazily shortened abbreviations that may just be saving bits or the developers style, but make it very unreadable for me.
I'm used to stuff like:
function getDataFromServer(){
var data = response
//blah blah etc
}
and it may just be a thing that you have to get used to with experience, but I can read and write Python, Ruby, JS, Clojure and other lisps, and have experience with C#, but something about C I just struggle with, and I wish that wasn't the case.
I'll agree that memory and type issues are more of a barrier, but the variable naming is just convention of the programmer. There is a lot of legacy code where brevity in naming was emphasized, but that was just the convention of the time, nothing inherent in the language. it doesn't need to be that way.
Your example looks like completely valid C code to me (except for the lack of semi-colon).
How? The only big difference is that in C the variables need type annotations and that they have proper lexical scope (so they are more like Javascript's "let" than "var")
If the person learned ES5 and was used to the tricky scoping and had got to C before ES6 became standard/popular, I could totally see the struggle there.
My manager (a long-time-ago former dev) once told me:
"I can write C in any language."
And it's true. He could write beautiful C code in Perl, Java, Ruby, etc - whichever language the current project was using - when he needed to hack something (usually gathering data for business metrics).
He'd just stick to the basics and write C-style code in the given language, he had no knowledge of any of the language idioms and didn't need them if he kept his code simple enough.
Basically only using if/for/while and function/return constructs.
It was strange to open a Perl script and see "C" code, with single-letter variables (declared at the top), etc.
Honestly, writing perl in "C" style is really a good idea. At least if not pushed to extremes. It avoids a lot of unreadability and confusion.
Other things I think make for good perl style:
* Never use unless, especially as the after-statement conditional and/or with a negation. (I once spent a whole day ripping my hair out to figure out that an unless with a negated clause that was itself ambiguously named did the exact opposite of what it seemed to. Think [...] unless not $unregulated;)
* Do use foreach wherever possible.
* Keep reference usage to where it's really needed.
* When in doubt, don't write new OOP code. (The benefit to the caller has to be much greater than the extra complexity in implementation.)
* Lists are the core data-structure and regex are first class construct.
* Avoid the use of implicit $_/@_.
* Avoid the use of explicit $_.
* Use map / grep / sort and similar in pipelines but don't nest too deep on a given line.
Yes and no. While typically vectorised code will run faster than a loop (because its a loop in C) i.e. rowSums is faster than a typical apply call, there actually isn't that much difference between a for loop over columns to sum and apply over columns to sum.
The real performance penalty is growing objects as you go. If you preallocate a list or vector of the appropriate length, then there typically isn't much difference.
That being said, you'll pry lapply(x, fun) from my cold dead hands (lapply(x, function (x) g(f(x)) is even better :) )
I have a hard time doing pointer-arithmetic, arbitrary memory writes upon malicious data, and unchecked macros in languages designed for safe programming. :P
One of my favorite jokes on HN was in a typical language wars thread where a guy wrote "real men write in C....without the standard library." That still makes me laugh.
The military and defense contractors did empirical studies of various programming languages and their defect rates to put that to the test. That was mainly in 80's and 90's. They compared C, C++, Ada, and Fortran mainly. C usually had double the defects of the rest and with more severity despite pro's writing it. Ada usually outdid all of them with one showing Ada and C++ developers having same defect rate.
So, when we use evidence instead of feelings, what you said is a myth that's been debunked repeatedly in many ways for decades. And, yet, people repeat it. No, C use usually results in problems that safer, systems languages before and after it had less or none of. Despite professionals using it. The solid code is always an outlier.
Most amusing thing is Thompson and Ritchie, with help from Pike, later designed what they thought was perfect language with perfect set of features. The result, Go, was basically like Algol subsets that preceded C mixed with Pascal developed around same time. But, C compiled and ran on a PDP-11 fast. So, it's the best. ;)
Yes I remember the studies (and have done my fair share of defence s/w).
And I am not claiming that C is the best tool for the job because of the language.... its because of the 'environmental' factors, human resources, community, momentum, codebases, portability, etc.
I whole-heartedly agree that there are better languages out there, but the fact remains that the flaws in C are easily managed.
It would also be interesting to produce a study of the use of C specifically in embedded systems where the style of code is very much different than large desktop systems (i.e. much less heap usage and dynamic memory allocation for example).
"And I am not claiming that C is the best tool for the job because of the language.... its because of the 'environmental' factors, human resources, community, momentum, codebases, portability, etc."
I agreed with you on that in another comment. Yep. It's also why NASA often uses it. The tooling covers dark corners pretty well these days.
"It would also be interesting to produce a study of the use of C specifically in embedded systems where the style of code is very much different than large desktop systems (i.e. much less heap usage and dynamic memory allocation for example)."
I'd love to replicate the old studies on modern languages with modern tooling for both systems programming and the embedded subsets like MISRA. We'd definitely learn some stuff from that. Further, I'd like to see specific metrics like in Ada/SPARK/CbyC example below on where defects were introduced or corrected with each technique and phase of lifecycle. Would tell us how low-level features, subset rules, and tooling interact with accurate assessment of how much problems they pose for real instead of in theory.
Modern embedded C programming has evolved significantly in the right direction since studies were done and so I would love to see someone spend the time looking at 'modern' embedded C codebases.
Static analysis is now widespread and not theoretical, coding standards now take security seriously (tho can be outdated). Drawbacks to certain techniques have been surfaced and recognised. Tooling is much better, portability is recognised as a good-thing, pointer usage is now minimised when possible and contained to areas where they are appropriate. Casting is frowned upon, macro-magic is frowned upon. 'Clever' code is frowned upon.
Basically, I would think that certainly in the embedded domain, standards are now such that the story would now be significantly different.
Have you tried Astree Analyzer? Papers I read indicated it was one of best but hard to find industry people to confirm or reject it. Want someone to clone it for FOSS. Meanwhile, I found these free ones for you two that have each found bugs in code with focus on minimal annotations. All are academic prototypes but Saturn was used on Linux kernel.
Most academics are instead making compilers that translate C code into something mostly or totally safe while pushing the performance hit downward. Here's two of the top for you to try on various codes (or improve):
If we can get tools like Rust which have better memory safety guarantees while offering much of the same performance, then hell, why not?
It's easy to see why just by looking at the hundreds of CVE's throughout the years, often caused by memory unsafe operations.
Not that I don't like C, but there's many better alternatives out there.
And this is coming from someone who stubbornly sticks to writing D in C-style, but knows what Rust brings to the table.
OK, I'm biased as I've got over 20 years in embedded/real-time systems primarily written in C (and C++) and am currently architect for an embedded system containing ~million lines of code and rolling out to tens of millions of units....
(But I'm educated enough to also use Python, Clojure, C# and a variety of other systems).
...and C is currently the best choice for systems like this because of reasons not primarily to do with the language.
Currently, if you suggest to use a language other than C you will be laughed out because the only available embedded guys are C based. Yes, momentum counts and C has massive momentum.
There are also issues such as toolchains and tooling, familiarity, community, and of course the massive existing set of libraries, codebases and knowledge.
Basically, C is 'good enough'. (although Rust is interesting and on the horizon, its a long way off yet).
Although technically a better language, C's shortcomings are greatly exaggerated, for example memory management. In general this is not an issue in bare metal embedded systems as you don't have a heap, everything is statically allocated by design.
There are many flaws in the C language (as in every language), but in day-to-day use, they are very easy to manage.
I agree with you that C is a lot better if you don't have a heap. But dangling pointers, undefined behavior, etc. are still issues. We know how to fix these problems in 2016 with better language design, via techniques we didn't know in 1978.
I'll be the first to admit that compatibility with an ecosystem, even if flawed, is important. (I work on web browsers, after all!) We can't change overnight, or even in two or three years. But we'll never get to a better future if we don't take the steps to start now.
"There are also issues such as toolchains and tooling, familiarity, community, and of course the massive existing set of libraries, codebases and knowledge."
Social and economic factors that we C opponents say are why it remains and is often the pragmatic choice. These in no way show C language itself was well-designed, superior, etc. Just show how going mainstream can make things more practical. They did a great job on that part.
Probably. I think you're a pragmatist rather than a true believer. ;)
Btw, what do you think of a reboot of Modula-2, Modula-3, or just Ada/SPARK? Your opinion on them would be interesting given your background. Closest thing to Modula's that's actively developed is Astrobe Oberon for embedded.
I started off with Turbo Pascal before graduating to Turbo C, so I do have a fondness for the Pascal family of languages. I also love type-safety....
TBH, I would prefer a hypothetical "fix" for the C language that improved type strictness/safety over "fixes" for resource/memory management issues which I consider should be bread and butter for any software engineer. So in that, I do not like large runtimes or garbage collection, more visibility to what the machine is really doing is needed. GC systems are also not predictable enough, deterministic runtime is absolutely essential to making reliable software.
I have poked my nose around Modula-2 in the past, but C took over for me personally. ADA I consider to be far to heavyweight for anything serious (and thats after being in a company that used it extensively in the defence world, tho I never had to touch it myself)
Currently Rust is my best hope for the future.... but C has survived well IMO.
Also a pragmatist, as although I tend to bash on C on every occasion and would rather not use it, if a customer does require it, I will use it.
For me a kind of escape path from C, back when I was into Turbo Pascal, was to move to Turbo C++. I only got to use Turbo C around one year before getting to learn about this new cool language called C++.
It allowed me to keep some of the Turbo Pascal safety around.
What are your approaches when using plain C, coming from "so I do have a fondness for the Pascal family of languages. I also love type-safety" ?
My solutions were:
- Use translation units as TP units, only exposing functions and struct accessors (macros if function calls were too expensive)
- Always make use of debug functions to validate pointers in debug builds (e.g. _malloc_dbg, _CrtIsValidPointer on Windows)
- make use of const as much as possible
- write my own wrappers around strncpy and friends
- compile warnings as errors
- if given the option, just use C++ instead and prefer library types to inherited C ones (string, array, references, RAII).
> If you know what you're doing (and you should) then C is the best tool for the job.
Empirically, nobody knows what they're doing, if "knowing what you're doing" is defined as "writing large-scale C code without memory safety problems".
I hear a lot from C and C++ enthusiasts that there are lots of programmers out there who always write correct C and C++ code, and therefore we don't need new languages. But I've never found one of those programmers. Can you name one?
I hope to never own a microwave oven with a large scale codebase. Also my basement dehumidifier. And my waffle iron. And my clothes washer. And my digital dial caliper in my workroom.
There's a surprising amount of money in "you are now a timer" and "you are now a thermostat" and "you are now a thermometer" and very close analogies.
Washing machines actually have quite a bit of code. Car ECUs have even more. Elevator control systems are pretty big. Medical devices have huge codebases.
You'll definitely use or own something that has a somewhat large codebase.
Mine doesn't. When my washer broke, I called some locals that have a warehouse full of them and repair them. I asked whether it was their opinion, being experts, that my observation about the older ones being more reliable was true? The guy confirmed that older ones easily last a decade or so with new, computer-filled ones breaking all the time. Said they're good business for him. Sold me what he claimed was a good model of older ones he had for under $100. I was surprised when he told me the one that broke was around 20 years old.
That is both reliability and return on investment. :)
> Yes, there are well publicised failures in this regard, but consider the number of problems versus the actual number of lines of code written.
I think that's a pretty misleading metric when you take into account: (1) widely used codebases constantly increase in size constantly; (2) it only takes one exploitable mistake for an attacker to achieve remote code execution. One exploitable vulnerability per 10,000 lines means 1,200 remote code execution flaws in a codebase the size of Chromium.
I've never seen it. All large-scale C and C++ codebases I've ever seen, from large companies to small ones, have had memory safety/undefined behavior problems [1]. If "just hire better programmers" were a workable solution, surely one company out of {Google, Apple, Microsoft, Facebook} would have succeeded at that strategy by now.
It's very easy to write C and C++ code that looks like it's free of undefined behavior, but in every case I've seen they end up falling when attackers actively try to look for problems.
[1]: Maybe qmail is the one exception, though even that had a famous debate related to overflow.
Have you ever worked with teams who use aggressive static analysis tools to detect and catch undefined behavior?
Because I have, and it works incredibly well.
Of course, the caveat is that once you turn up the static analysis aggressiveness (assuming you use a good static analysis tool), you will need to put in plenty of assert(index < len), ownership annotations, and other such items in your code to satisfy the checks.
But once you do, it's really hard (probably not impossible, but really hard) to trigger undefined behavior without the static analysis tool catching it.
Because if the tool decides it can't prove there's no undefined behavior, it is configured to complain, and you adjust the code until it stops.
I'm not saying many teams do this, but I am disputing that no teams do this.
> I'm not saying many teams do this, but I am disputing that no teams do this.
If you watch Herb Sutter's talk at CppCon 2015, at a given point he asks the audience how many know and use such tools.
It is one of the most important C++ conferences, usually attended by the most savvy C++ developers in the world, the amount of the audience saying that they do use such tools was ruffly 1%!
1% shows how much the majority of C and C++ developers, or their employers, care about writing proper safe code in those languages.
The only way out is moving to programming languages whose safety must be explicitly turned off, and not having to be explicitly turned on, because most won't bother taking the effort to turning it on.
No one is arguing that the industry moving to a safer language will reduce the level of nasty bugs in the world. That would be silly to argue against, because that's basically a tautology anyway.
What I'm disputing is that no teams write good C or C++ code.
In fact, the 1% who raised their hands proves my point. It isn't 0%.
That's all I'm saying. Don't smear our names with the "all C and C++ developers write code with exploitable undefined behavior bugs in it" when you mean "most C and C++ developers write code with exploitable undefined behavior bugs in it".
C++ is my language to go, every time I need to go out of JVM and .NET ecosystems and I do take all efforts to be part of that 1%.
Once upon a time I gave C++ class to first year students at the university where I took my degree. Worked with C++ at some well known companies and research institutions.
Still, I won't say I never write exploitable undefined behaviour free code in C++, let alone C. Because I cannot control the code that gets used by third party libraries, written by team members or even having the whole UB use cases on my head.
If you use the right tools (static analysis and sanitizers and fuzzers) your own code (and any code you rely on which you have the source for) will be handled.
If you rely on binary dependencies which have quality problems, then u can't help you there. But that could be an attack vector in any language.
And if you are worried about the operating system itself, then there really is no easy way around it unless you want to run a unikernel for everything.
And even then you may hit a CPU microcode bug it hardware bug...
> Have you ever worked with teams who use aggressive static analysis tools to detect and catch undefined behavior?
Coverity is actually used on browser engines, and it has not been able to stem the tide of exploitable security vulnerabilities. Sound static analysis is just too difficult on idiomatic C and C++: it's effectively impossible, as the language was just not designed for it.
Your tool is better than Astree Analyzer and Polyspace? I think you should consider GPLing or licensing it with commercial support. Judging from their prices & effectiveness, you'll make a killing doing better while undercutting their licensing.
The problem is usability and licensing costs. Our in-house tool is amazingly powerful and fits our needs but the usability would need improvement (no surprise) and we license some of the utility code we use (which is cheaper if we don't resell our tool).
You couldn't replace the utility code with something in FOSS? And that's why your company is holding out on a static analyzer better than anything out there? The case for doing something to open that tool is just getting stronger.
Btw, what utility tool are you licensing? What makes it irreplaceable? I know of only one in this field that I truly couldn't replace for compilers or static analysis. Even then, there's quite a few that handle the job well enough to not need it. So, I don't use it.
The utility tool has OSS equivalents but we've evaluated them all. None of them have the features we need, and adding them would be person-years of work we can't afford.
My hope is to some day open source our internal tool, but we have to wait until the OSS features we need catch up. Until then, we aren't willing to pay the high license fee to redistribute this utility freely.
I'm speaking a little cryptically because I'd rather not have this tied directly to any particular company (neither mine nor the one we license this code from). I'm not speaking on their behalves officially; only my own.
Hmm. Love to know what that tool is but NDA's and policies are what they are. I have an email if you want to send it to me for what advantages it has over similar comparison. Then, on odd chance I see opportunity, I'll nudge someone in direction of trying to bring OSS up to par. No promises it will happen as opportunity has to happen first.
Well, except that in kernel you often can't use even many CPU features either in practice.
At least if you use SSE2/FP/etc., you better ensure those FPU registers are saved and restored. But probably you don't want to do that in a IRQ handler! Save only SSE2 registers on an AVX system (higher 128 bits will be zeroed for currently executing usermode thread!) and receive "interesting" bug reports from the end users.
If you refer to a vm page that's not present... well, bad things might happen.
"Availability" doesn't mean much. There are plenty of all kinds of libs available on all platforms (who wants to program WiFi access for the embedded sensor from scratch, for example?). Doesn't mean it makes sense to put any of them into the current project. The stdlib is one of many and not that often the one that is put in. The libs I use on my embedded system are much more likely to be for some specific piece of sensor or I/O device (like the mentioned WiFi module), I rarely have need for what's in the stdlib. Of course, embedded systems vary far more than the PC and server stuff so you can easily find people who will say the exact opposite.
A freestanding C implementation doesn't have to include anything but a handful of headers with some macro and type definitions. It's not uncommon to get a manufacturer-specific library whose features are specific to the particular controller architecture, not related to those defined in the full C library. And it's not uncommon for embedded software developers to ignore those libraries, which are frequently terrible, and write everything from scratch.
I learned C via this excellent book: http://www.amazon.com/dp/067230399X/?tag=stackoverfl08-20, which I picked up from a local bookstore on a whim. I had just started high school, and up until that point my programming experience had been limited to BASIC-type languages (starting with Apple ][ basic when I was 6, then AMOS on the Amiga 500, then QBasic on DOS... actually that last one was a bit of a downgrade from AMOS :) ).
C was my introduction to "real" programming -- it forced me to actually learn memory management, pointers, etc. Funny thing, I only took one C course while I was in college. By the time I started college, Java was the new hotness, and I was in the first incoming freshman class to get Java instead of Pascal in our introductory programming course. I took an immediate disliking to Java back then (and still don't like it), so naturally it's the language I have to use at work :O
C has kind of become the evident defacto standard. There isn't anything we can really do to change that. It just did what it was meant to do very well with all of the flashy bells and whistles removed.
The underlying instructions generated by the compilers are simple to follow. Reasoning about what will be generated is an easy enough task (so long as optimization is turned off).
"Although the first edition of K&R described most of the rules that brought C's type structure to its present form, many programs written in the older, more relaxed style persisted, and so did compilers that tolerated it. To encourage people to pay more attention to the official language rules, to detect legal but suspicious constructions, and to help find interface mismatches undetectable with simple mechanisms for separate compilation, Steve Johnson adapted his pcc compiler to produce lint [Johnson 79b], which scanned a set of files and remarked on dubious constructions."
Seriously enough, Lua adds all those scripting high-level aspects that are missing from C. I know it's another language but its interplay with C gives strength to both of them. A good alternative of the C + Lua paradigm is of course Python.
My first serious introduction to programming was through Harvard's online CS50x course, which uses C. I think it was a fantastic way to introduce a newbie like me to what programming languages are doing under the hood with memory allocation, pointers, value vs ref, etc.
You can backpropagate "modern" CS techniques like modularity an encapsulation to most ANY language like C or LISP or FORTRAN. But if have language and compiler support for these, it makes it easier and more reliable.
That's funny about capsules. More consistent when people discuss encapsulation, though. Far as OOP in C, it's definitely doable with quite a few approaches and even books dedicated to it. Example:
Many think C++ with its benefits and problems are relevant to OOP in C but not entirely. That's because they think C++ was about adding OOP to C. It wasn't. Stroustrup used safe, OOP languages like Simula that he wanted more of. He intended to leverage C's popularity and tooling but transform it into something with benefits of other languages. So, the result is C-like but not quite C and complex.
Just doing OOP in C, on other hand, is much easier and cleaner problem to solve. ;)
Fortan: 59
C: 43
C++: 33
Python: 25
Php: 21
Javascript: 20
Java: 20
It's still a long time until any of those languages reach 100. But longevity seems to be the rule, not the exception. How many languages that were widespread actually died?[0] I don't pretend to know all the languages that have ever been popular, but not many that subsequently died come to mind. Cobol, maybe some lisp dialects, if you don't count Common Lisp as their successor? Is PL/1 dead? Was APL big enough to make the list?
Probably some of the languages I listed above will die before they're 100. Others might be niche, like Fortran already is. But I wouldn't guarantee it.
[0] Where dead doesn't rule out someone being paid to maintain ancient code. By that standard, it's unclear whether anything will ever die (https://www.snellman.net/blog/archive/2015-09-01-the-most-ob...)