45 years since its creation. The C language still very popular

hyperpape · on May 24, 2016

Every so often, I think about the meme of the 100 year language. The idea is that we need to start working towards the languages we'll use in 100 years. Instead, I wonder how many we already have.

Fortan: 59

C: 43

C++: 33

Python: 25

Php: 21

Javascript: 20

Java: 20

It's still a long time until any of those languages reach 100. But longevity seems to be the rule, not the exception. How many languages that were widespread actually died?[0] I don't pretend to know all the languages that have ever been popular, but not many that subsequently died come to mind. Cobol, maybe some lisp dialects, if you don't count Common Lisp as their successor? Is PL/1 dead? Was APL big enough to make the list?

Probably some of the languages I listed above will die before they're 100. Others might be niche, like Fortran already is. But I wouldn't guarantee it.

[0] Where dead doesn't rule out someone being paid to maintain ancient code. By that standard, it's unclear whether anything will ever die (https://www.snellman.net/blog/archive/2015-09-01-the-most-ob...)

nickpsecurity · on May 24, 2016

COBOL is 57 years old, has billions of lines running, and new code is constantly developed. Should be on list.

LISP is widespread in education as Scheme, has lasting commercial deployments/companies with LispWorks and Franz Allegro, and new community with Clojure. However, one might treat them as separate languages given LISP 1.0 is very different from Common LISP which isn't Scheme or Clojure. So, it's up for debate but Scheme at 47 years old is probably justifiable due to academics. Racket is practical version with strong community. Chicken Scheme had one last I checked, too.

PL/1 (53 years old) is dead outside of legacy systems including mainframes and eComStation (OS/2). Yet, might be worth adding given that it has new code added to extend old systems like COBOL does. New compilers updates happen as well like below. Hard to say it's dead if it's still running companies, getting extended, and getting tooling updates. Just not popular.

http://www.iron-spring.com/readme_linux.html

Also, Burrough's MCP is still around under Unisys banner. It was first OS written in high-level, safer language: Burrough's ALGOL (56 years old). Became ESPOL (48yrs old) and NEWP (???). Unisys has a 2015 dated doc on NEWP. I think that means they converted most or all of the OS to it.

BASIC (52 yr old) was a language designed for beginners. Since it looked like pseudo-code, it was ridiculously easy to read and write. I got started on VB incidentally. Many spinoffs with wide deployments of commercial BASIC's for business, gaming, and education show it's far from dead. Not to mention 4GL's that were often BASIC-like. Could actually be one of the most widely, built-on languages ever made.

https://en.wikipedia.org/wiki/List_of_BASIC_dialects

So, there's more for your list.

EDIT to add BASIC in there.

jerf · on May 24, 2016

"However, one might treat them as separate languages given LISP 1.0 is very different from Common LISP which isn't Scheme or Clojure."

Well, on the one hand that's true. But on the other hand, a lot of the living languages have that too. BASIC is 52 years old, sure, but no 52-year-old BASIC program would even compile in a modern BASIC, and I mean, it's not even close, not like a couple keywords or some chars here or there, but the entire structure of the language is fundamentally different now.

By that standard: C is still recognizably the same. I'd suggest C++ is a different language before and after templates, which cuts it down to 27 years, and if you wait for the STL to be usable before you say they "took", less than that, but I imagine a lot of the early C++ would still compile or nearly so. Python has smoothly evolved and I can't point to any one feature that was a hard transition, but it is a fairly different language today than it was in the 1-1.5.2 era, but a lot of the 1.5.2 would still run OK. Javascript has actually been relatively stable as a language, coming out of the gate with rather a lot of stuff already built in. Java has also been pretty stable as a language. I don't know enough about Fortran or COBOL.

Lisp is an amusing case where a lot of old Lisp is probably reasonably close to parseable but probably doesn't run at all beyond the basics.

I fully acknowledge there's some subjectivity to the judgments here, but I think part of the 100-year-language idea is that code in the 100 years language should still be usable 100 years later, directly. By that age measure we're shorter on viable languages, though we still have some.

Someone · on May 24, 2016

"C is still recognizably the same."

I wouldn't state that so categorically:

    foo(x)
    {
        return x+x;
    }

or

    bar(x)
    float x;
    {
        return x+x;
    }

If you didn't live when that was C, I'm not sure you would recognize it as such. Certainly, I could claim it was B, C's precursor and get away with it with quite a few people. It doesn't look that different from the example at https://en.m.wikipedia.org/wiki/B_(programming_language)#Exa...:

    printn(n, b) {
        extrn putchar;
        auto a;

        if (a = n / b)
            printn(a, b); /* recursive */
        putchar(n % b + '0');
    }

raattgift · on May 26, 2016

"a lot of old Lisp is probably reasonably close to parseable"

I think you would be hard-pressed to find any old Lisp code that a modern READ cannot turn into a list which could trivially be transformed into a valid s-expr in that dialect.

The reverse is much less true, since modern Lisp-like languages allow many more characters, are often case-independent, and often have standard "decorations" (e.g. &REST or #:optional).

Common Lisp is so kitchen-sinky that you'd be hard pressed to find old Lisp code that was then reasonably portable that will be outright hard to port to a modern CL implementation. Most of it will probably run as-is, or with a thin macro package doing mechanical translations.

Unportable code is unportable code. If it was so tied to e.g. Allegro that it wouldn't run in any other CL environment, that's hardly the language's fault.

However, really early Lisp stressed the "S" part, and rarely did anything strongly system-specific.

_19qg · on May 26, 2016

There are some variants of Lisp which are not trivial to port to Common Lisp. Something like Standard Lisp is slightly difficult due to various reasons. Somehing like Interlisp is also difficult, because much of its functions and libraries work quite different from Maclisp/Common Lisp.

hyperpape · on May 24, 2016

Java is interesting because it's always prioritized backwards compatibility, but the changes that have shown up recently or are in progress are pretty big: lambdas, value objects and type inference can all be argued to go against some important aspect of how the language was originally conceived.

You might argue it's changed the least, but only because all languages change a great deal.

nickpsecurity · on May 24, 2016

I agree with all of this except maybe C. I don't have time to evaluate that in detail so no opinion. I especially like your definition of what constitutes 100 year old code or language. I think we can compromise between two definitions by differentiating between how old a language family/style is vs a specific variant or implementation of it. The LISP family and style is decades old but supported implementations are much younger.

cm2187 · on May 25, 2016

You can say that of any language. Today's javascript has little to do with the pop-up opening nuisances of the 90s (even if it still also does that!).

And even today's Visual Basic has gone a long way from the VB6 that most people remember VB by.

sheepleherd · on May 24, 2016

early C++ compiling wouldn't be the problem you wouldn't uncover...

I worked on a large early C++ project and we had problems on the scale of a couple of years. Even if C++ wasn't changing, the compilers were.

TheGrassyKnoll · on May 25, 2016

"...Python has smoothly evolved..." Well, not exactly.

jhbadger · on May 24, 2016

FORTRAN isn't the same language either. Much like LISP, it evolved over time. Things like line numbers and GOTO statements that made people hate the language are not really mandatory -- it is basically as structured as C these days.

rootbear · on May 24, 2016

Obligatory and relevant Hoare quote:

“I don't know what the language of the year 2000 will look like, but I know it will be called Fortran.” —Tony Hoare, winner of the 1980 Turing Award, in 1982.

Given how things turned out, he might have better said:

"I don't know what the language of the future will be called, but I know it will look like C."

AlexeyBrin · on May 24, 2016

In some ways Fortran is more advanced than C and even C++, e.g. had modules since Fortran 90, pure functions and so on ...

It is a very different language from the original Fortran.

nickpsecurity · on May 24, 2016

It's been evolving a lot. They're at least honest about that as they put the year after the Fortran type.

vram22 · on May 24, 2016

Ha, good list. Never worked on it, but I've read that the Pick environment (OS + language + DB ...) is still used. It's quite old, ~50 years, according to:

https://en.wikipedia.org/wiki/Pick_operating_system

ColinDabritz · on May 24, 2016

Regarding LISP, perhaps the perspective that does count LISP and its descendants as one ongoing evolution of a similar language family is the same perspective that will allow that lineage to last 100 years?

I feel that the systems that change over time have the best chance of weathering it.

abecedarius · on May 24, 2016

1960s Lisp code can still run on Common Lisp with only minimal changes -- e.g. http://elizagen.org/. A reboot like Racket or Clojure has more chance of making it alive to 2060, but I wouldn't quite rule out early Lisp yet.

nickpsecurity · on May 24, 2016

It could. That might normally be considered cheating but LISP is actually designed with philosophy of customizing language itself to problem domain. Thing is, the implementations and even language features of modern LISP's are nothing like McCarthy's simplistic language and interpreter. They couldn't succeed being that due to performance and usability issues.

So, I think it's more fair to talk about LISP languages as an ongoing family but still a cheat to treat a modern one as original LISP. Probably should date it from creation of dialect or similar one.

ColinDabritz · on May 24, 2016

Certainly, but the borders aren't so obvious to me. Is any language going to last 100 years without changing so much it feels 'nothing like' it's origin point?

Even if it hasn't changed in name, it may have changed in spirit. Some languages that promote themselves under very different names are substantially more similar than other cases where new language versions are dramatically different.

What 'essence' makes it the 'same language'? An interesting problem.

nickpsecurity · on May 24, 2016

COBOL or Pascal if you don't use modern features lol. Idk. It's an interesting question about how we count the changes against the language's lineage. I don't have an answer to that one. We need a debate about it on StackOverflow that moderators close as "not constructive." Those are the threads that usually have great insights into issues like this haha.

robohamburger · on May 24, 2016

Thanks for mentioning algol and cobol! Its easy to forget these languages since they are mostly used on mainframes.

Algol used to be language you submitted things to the ACM in I think and is the systems programming language of the MCP, which my dad still programs in to this day!

It is interesting how many languages had an algol style syntax until C stole the show. I am not sure the world is better for it.

nickpsecurity · on May 24, 2016

"is the systems programming language of the MCP, which my dad still programs in to this day!"

That's pretty neat. Everyone thinks Burroughs died but they just changed names. Make crazy money. System is quite dated, though, in features and interface. No denying that. ;)

"It is interesting how many languages had an algol style syntax until C stole the show. I am not sure the world is better for it."

I'm not sure on syntax. Decisions by top languages in terms of safety or reliability show ALGOL and Pascal families were right about those. You might like to look at Modula-3 and Component Pascal to see where things might have gone in a parallel universe.

https://en.wikipedia.org/wiki/Modula-3

http://blackboxframework.org/index.php?cID=why-program-in-co...

Note how power, yet simple, Component Pascal is by looking at the full language grammar in two pages of Wikipedia:

https://en.wikipedia.org/wiki/Component_Pascal

rm_-rf_slash · on May 24, 2016

The more I think about it, the less I'm sure certain languages like Java will ever die. Java is a really good language for systems of mind-boggling size, and just as it's easy to write, it's also easy to manage and measure people writing it. Java strikes a good balance between extensibility (easy to implement design patterns; cross-platform JVM) and safety (auto garbage collector; no memory access).

Even if technology did rapidly change (nanomachines!) there would still be a business case for a Java-style language. Given how used to C syntaxes we are, it might end up looking like Java. Or Java might just be here to stay, for a long, long time.

Side note: a lot of "dead" languages nowadays are ones that were written before memory and processing power were abundant so they had a lot of (what we in the glorious future might call) arbitrary character limits.

sp(a,b); might have been sufficient back in the slow-old days, but we don't really consider character limits anymore, and so we can actually learn what a given line of code does by reading it.

SubmitPostOnHackerNews(commentID,messageText);

dfox · on May 24, 2016

The problem with Java systems of mind-boggling size is that their mind-boggling size is usually caused by them being written in Java (and not the other way around).

As for design patterns, the patterns that are really extensively used have more to do with overcoming the limitations of Java (or C++, C#, or whatever similar language) than with expressing anything profound.

pjmlp · on May 24, 2016

Except people tend to forget those mind-boggling systems used to be written in C, C++ and Smalltalk before Java was a thing.

Anyone that hates JEE, should spend some time having fun with CORBA or DCOM and see which one s/he likes best.

nickpsecurity · on May 25, 2016

Oh it's worse. Not sure why but the high-assurance security field relied on high-security ORB's for connecting components in separation kernel systems. I was like, "Can't we use ZeroMQ or something? What are the odds that could be more fu... insecure than CORBA implementations?" Unreal.

hodgesrm · on May 24, 2016

CORBA. Oogh. You would bring that up.

rectang · on May 24, 2016

On the other hand, the litigious environment around Java thanks to Oracle makes it a more risky ecosystem, for reasons that have nothing to do with language design. There is definitely room in this world for languages with more benign stewards.

wry_discontent · on May 24, 2016

> just as it's easy to write, it's also easy to manage and measure people writing it

I think you nailed why I don't care for writing Java perfectly.

macintux · on May 24, 2016

You'd be the first person to write Java perfectly! #grammaticalambiguityforthewin

philwelch · on May 24, 2016

Pascal is a good example of a dead language that used to be fairly popular.

analognoise · on May 24, 2016

Join us: http://www.lazarus-ide.org/

Pascal isn't dead! Native binaries produced (much less garbage overhead), fast compilation, kick-ass open source IDE and standard library, cross platform, free (as in freedom, as well as beer).

JOIN US.

philwelch · on May 24, 2016

Wow, I stand corrected. Lazarus is a great name though since it calls out the fact that Pascal was dead ;)

analognoise · on May 26, 2016

Yes, it was dead. Now though... please try it. You'll be impressed.

nickpsecurity · on May 25, 2016

That was a great burn. I'm going to reuse it if they get cocky in the future. :)

davegauer · on May 24, 2016

A month ago I would have agreed with you. But then I surprised myself by choosing the Free Pascal Compiler and the Lazarus IDE for some personal projects that need to be native Windows applications.

These projects have been in my queue for years because I kept chasing the "best" way to create Windows applications. I learned (and forgot) a lot of crap, but never actually made anything. Now I'm making stuff.

nickpsecurity · on May 24, 2016

Maybe a good call, maybe not. Embarcadero or whatever spelling haha keeps Delphi alive for businesses with old and new code written. Community mostly forked into Free Pascal with lots of libraries, example code, and active compiler development. Component Pascal was a C++ competitor with many companies and people in Europe and Russia using it. Mainly thanks to Blackbox Component Builder which was a Component Pascal app and BSD'd.

So, it doesn't seem quite dead.

themodelplumber · on May 24, 2016

Did you miss the FP/Lazarus resurgence recently? The forums are active, they made a nice introductory video, it's cross-platform...since I think we're talking "really dead" here, I doubt it qualifies.

kol · on May 24, 2016

Pascal is not dead. Lots of Windows applications have been written in Delphi (Object Pascal), and these will require maintenance. Or a rewrite in C#.

pjmlp · on May 24, 2016

Still pretty alive

http://www.mikroe.com/mikropascal/

hyperpape · on May 24, 2016

Good call.

noobermin · on May 24, 2016

Calling Fortran 59 years old is odd, as not many people write in the original form of fortran. C11 is much closer to K&R C than Fortran 08 is to the original Fortran.

VLM · on May 24, 2016

I'll throw two domain specific language (families) into the mix, that are psuedo-dead although not really dead.

Although amstex and latex are healthy (and latex is about 40 years old) I would argue that very few indeed are writing in raw bare macro-free package-free tex, just bare totally raw tex. I'm sure it happens... rarely.

Another psuedo-dead is postscript. In the 80s there was a famous EE pushing the idea of sending your code and data to your postscript laserprinter as a batch frontend sorta and let the laserprinter interpret the postscript and do the calculations for your smith chart, not merely print your bitmap containing a smith chart or whatever. That idea of postscript as a general purpose application language, although infinitely cool, never went much of anywhere.

simonbyrne · on May 24, 2016

And of course, there is PSTricks, which combines them both, letting you solve ODEs in your document: https://www.ctan.org/tex-archive/graphics/pstricks/contrib/p...

VLM · on May 24, 2016

BASIC - everyone's abandoned it.

PROLOG - such a great idea, never really caught on. Yeah yeah I know minikanren and clojure core.logic but realize its gone from "The Japanese are using it to leap ahead of American AI research so we gotta catch up" in the late 80s to "whats that?" today.

To eliminate some battles over the definition of the word dead, how about "without too much effort you can get a paying job writing it". That works pretty well for spoken languages too.

That helps with Assembly. As a fraction of the pie its never been lower. As a device driver, embedded, firmware, compiler optimization (does that count?), and boot loader type of technology there's probably never in human history been more lines written per year. So its both dead and thriving by some definitions, but by my definition above its hardly dead.

kbwt · on May 24, 2016

> PROLOG - such a great idea

I don't really get the appeal of basing a whole language on a backtracking algorithm. It always seemed to me that constraint solvers are better suited to a library implementation.

themodelplumber · on May 24, 2016

There are a lot of people still writing BASIC code. PureBasic, QB64, FreeBASIC, etc. Check out how active those communities are. Has everyone _really_ abandoned it? The bar for dead is just that: Dead.

kazinator · on May 24, 2016

> maybe some lisp dialects, if you don't count Common Lisp as their successor?

In that case Dennis Ritchie's K&R C is dead, if you don't count C11 as its successor.

If/when Python 2.7 dies out, Python's age can then be measured back to Python 3.

Etc.

jacquesm · on May 24, 2016

COBOL is anything but dead (unfortunately).

hyperpape · on May 24, 2016

Are there really that many new projects in COBOL, or is it almost all maintenance/extension of existing code? That's where I'd put the cutoff.

nickpsecurity · on May 24, 2016

Gartner puts it at 5 million lines a year:

http://blog.insight.com/2010/10/cobol-still-in-action/

They write as much as they can outside COBOL. Those 5 million lines are for extending old programs or integrating new things into them. Some new programs are presumably written as well since... well, it's COBOL programmers writing them. ;)

jdcskillet · on May 24, 2016

I believe the company I work for is writing just under 500K lines of COBOL code per year... now... how much of that is "new" vs extending vs maintaining is a good question. It is very difficult to measure those things in our environment. If you were to ask the 60 - 75 mainframe programmers we employ, I'm sure they would answer that COBOL is very much alive, and no matter how hard you try to kill it, 40 years of system code is just not going away any time soon (depending on your definition of "dead"). Especially as we hire a good amount of "new to us" people to maintain the system.

I think individual companies should define a language as "dead" based on the number of new people needed to maintain the systems. As the number approaches less than 5%(?) of your replacement hires, have you effectively "killed" the language? At the very least it is on life support, and a decision needs to be made about its future. (A grim analogy, I know).

nickpsecurity · on May 24, 2016

Well, are we talking dead or just obsolete? It can't be dead if it's actively being developed with significant amounts of money and tooling improvements (eg MicroFocus). It can be obsolete, though, if it's taking up a tiny percentage of new code or hires as you said that keeps going down.

I agree one should try to phase out something on life support. Gradually at the least.

jandrese · on May 24, 2016

5 million lines really isn't that much in the grand scheme of things.

nickpsecurity · on May 24, 2016

It's not a lot. Yet, it's extra code extending critical apps with billions of lines of code and running more transactions than Google does searches. That's a large impact in the grand scheme of things.

Also, how much C was being written in UNIX 1.0 days outside of ports of same, exact software? :P

SeanDav · on May 24, 2016

I am about to start a new project with a company where COBOL is still very much a force. COBOL is far from dead and there are very serious applications using COBOL in mainstream use.

pessimizer · on May 24, 2016

You left out Erlang, which has just turned 30.

hyperpape · on May 24, 2016

I also left out Perl, which may be losing mindshare, but is definitely not dead, Ruby, C#, and so on.

I chose these languages because they occurred to me as being relatively old and relatively common, but I won't argue that there aren't any other languages that could be on the list.

vram22 · on May 24, 2016

>but not many that subsequently died come to mind. Cobol,

I've read (not heard) it said that there may be more lines of COBOL in existence (probably meaning still running) than lines of any other language. Biz, insurance, gov't, IBM, etc. ...

FORTRAN and C are probably close seconds (relatively speaking).

Gibbon1 · on May 24, 2016

A friend of mine mentioned that they hired a COBOl guy last month. They needed someone to work on the COBOL side of things because they are doing a big rewrite. (The legacy COBOL code outlived the stuff that was supposed to eventually replace it)

Razengan · on May 24, 2016

> The idea is that we need to start working towards the languages we'll use in 100 years.

IF we'll still be programming in 100 years, let alone using languages, instead of just telling an A.I. what to make. :)

We already don't use a lot of tech that we used to, because automated tools do it for us now. Assembly language is a relevant example right there.

nemo44x · on May 24, 2016

Or an A.I. telling us what to make.

rm_-rf_slash · on May 24, 2016

C is the Latin of programming languages. The general syntax is easy to learn and communicates well across derivatives as diverse as Java and and Ruby. Before I learned Lisp and Haskell, I didn't even think non-C-style syntax existed.

njharman · on May 24, 2016

Forth and Prolog are diverse. Ruby and Java are not. They are both algol (C) derived syntaxes. People only think they are much different because the lack exposure to languages from other (or no) heritages.

rm_-rf_slash · on May 24, 2016

It doesn't matter how diverse the syntax is, just think of the vastly different worlds of people programming in C derived syntaxes:

The enterprise app developer writes Java for her day job, and spends her nights creating iPhone apps in Objective-C. Her apps connect with an API on the website she built using Ruby on Rails, and handles front-end interaction in JavaScript.

All four languages are syntactically similar, but each has vastly different application domains.

njharman · on May 27, 2016

Do you anything about Forth and Prolog and the languages you mention? Three are the same paradigm, the other two are very much not. This has nothing to do with syntax or "application domains".

yoodenvranx · on May 24, 2016

I admire C for everything it teached me about programming, but nowadays I actually hate writing it.

For all my personal projects I switched over to Python years ago because I just don't want to waste a single more minute dealing with basic stuff like strings in C.

I love the batteries-included mentality of Python because it lets me concentrate on implementing actual solutions to my problems instead of fighting against the build system or reliably converting strings into integers.

If there would be some kind of batteries-included version of C then I would certainly look into it again, but for now Python is just my preferred tool to get stuff done.

(yes, C++ is better in this regard than C, but it is still to cumbersome for my use case of rapidly playing around with new ideas).

krisdol · on May 24, 2016

Hmm... Rust might be in that ballpark. Certainly more "batteries included" than c and at about the same level of abstraction overall. It depends of course on how cumbersome you consider the ownership system

yoodenvranx · on May 24, 2016

Rust is on my Todo list!

At the moment I do a lot of image and audio processing via SciPy/matplotlib and I think that Rust is lacking in that department at the moment.

nimmer · on May 24, 2016

You seem to be describing Nim.

blueatlas · on May 24, 2016

C should be the defacto language for 1st year CS students. It is not highly abstracted from hardware, and much easier to grasp Assembly knowing basic C. The syntax uptake is pretty quick, and would allow students to focus on the problem at hand (e.g. algorithms) rather than language nuance. Not to mention the advantages to the graduate starting their career in software development.

It's also easier to go up the stack to object oriented languages, particularly Java. The second chapter, first edition, of David Flanagan's Java in a Nutshell is still, in my opinion, the best intro to Java after having some experience with C.

orlp · on May 24, 2016

I think people should learn Scheme or Python to learn programming, and C to learn our architecture.

IndianAstronaut · on May 25, 2016

Definitely. Beginner programmers are struggling with for loop syntax. You really don't need to start them on pointers.

pcwalton · on May 24, 2016

> The syntax uptake is pretty quick, and would allow students to focus on the problem at hand (e.g. algorithms) rather than language nuance.

I don't agree. What's the benefit in having to painstakingly write trivial string operations using string.h and manually allocated buffers? What's the benefit in learning the ins and outs of undefined behavior?

VLM · on May 24, 2016

None, so skip them.

That's the cool part, algos don't necessarily require strings.

Realize he's proposing first years where you have to LARP that they don't even know what an if-then-else construct is, or what is a function, or what is a loop, or what is recursion, or what is the concept of a memory map, or what is the concept of data or memory locations having a type (like float, long int, etc), or what is the concept of a variable. In 2016 most of them probably know, but schools feel the need to pretend. Also non-majors wanting a code experience will need an intro.

This will make "real C programmers" very annoyed because OS code requires "real C skills" not a training wheels subset of C, but oh well. You can't really expect first year students to write OS code anyway.

One hidden advantage is it would take enormous effort for a noob to put a C program on the internet, so this protects them from themselves until they pick up things in later higher level classes along the lines of buffer overflow protection and GIGO concept and unit testing and so on. If you teach them what is a variable, using framework-of-the-week, they're just going to get themselves or an employer powned thinking they know what they're doing after one class.

pcwalton · on May 24, 2016

If you're skipping teaching strings because it's hard to work when them in the language you're using, you may want to consider whether you are actually doing students a favor, or whether you're trying to rationalize your decision to teach a language you happen to like.

mike_hock · on May 24, 2016

The advantage of using C to teach algorithms and/or data structures is that it hides very little of what's actually going on, including allocations. "More code" correlates much more with "more work for the computer" than it does in higher-level languages.

If you know what you'd have to do in C to implement a certain feature, you can guess what a high-level language is probably doing under the hood when you use its canned features.

guyzero · on May 24, 2016

"What's the benefit in learning the ins and outs of undefined behavior?"

To force people to learn that making assumptions based on undefined behaviour is dangerous and that computers are not mind-readers.

And you force people to write buffer manipulation code so they realize how good people using other languages have it.

CJefferson · on May 24, 2016

Except, undefined behaviour seems like mostly a C, and C derivates, concept.

I teach C, and every year I have a number of students nearly in tears, because somewhere in the huge program they accidentally malloced sizeof(T*) instead of sizeof(T), but of course that causes a crash 10 minutes later in a totally different piece of the code base.

My hope (it's not quite there yet, but getting closer) is that things like clang's sanitize modes will reach a point where any undefined behaviour immediately causes an abort. Then students can still figure out what they did wrong, but have a chance of finding the source of their bug.

9248 · on May 24, 2016

Speaking as a past student I'm really glad I started with C.

In the beginning, the fear of getting some random Segmentation fault out of nowhere actually taught me more than any textbook, school or best practices blog could ever do. It also forced me to learn how to use debuggers :)

CJefferson · on May 24, 2016

That's what I tell my students, it's "character building", in the same way playing sports in the rain was as a child for me.

Also, many students previously did a course on Java, and clearly never really understood the basics, it's much easier in Java to play "keep tweaking and fixing the exceptions until it works, then don't touch it", particularly for introductory-level projects.

protomikron · on May 24, 2016

Show them Valgrind. I think if one teaches C to newcomers it is an invaluable tool. If you compile via -g you get nice reports over undefined behaviour, memory leaks, etc.

Really I wish I had access to it when I learned C.

CJefferson · on May 24, 2016

My experience is that while valgrind is an amazing tool, it's actually not that great for new students. Sometimes it can be misleading, and it doesn't do stack tracking.

It's hard to teach students not to over-rely on it, and trust everything it ever states. It seems best to first make them do some horrible debugging by themselves, and then move them onto valgrind late.

protomikron · on May 24, 2016

Can you elaborate? I know that it might misbehave (i.e. report false positives), but I only encountered that with more complicated code that deals either with big-fat third party libraries or heavily optimized code where the developer e.g. did not initialize specific variables on purpose.

However if you deal with "student" problems I do not think this is the case. E.g. I taught a similar course to students and they appreciated that instead of

    $ gcc -Wall -std=c99 foo.c
    $ ./a.out
    Segmentation Fault

they got something like

    $ gcc -Wall -std=c99 -g foo.c
    $ ./a.out
    ...
    ==22907== Command: ./a.out
    ==22907== 
    ==22907== Use of uninitialised value of size 8
    ==22907==    at 0x4004F5: bar (foo.c:42)
    ==22907== 
    ==22907== Invalid write of size 1
    ==22907==    at 0x4004F5: bar (foo.c:42)
    ==22907==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
    ...

when e.g. writing to unallocated memory. I know for sure that there were often programs that were believed to be correct by students (and worse, often really worked, but not realiable), but had actually memory errors that were mostly reported by valgrind.

I am not proposing to confront them with valgrind's details (I do not even know them), but explain that there is this tool that is a package in every popular Linux distro and ready to use to find these evil memory bugs (and they can be hard to hunt down if you are new to the language).

In particular I am interested in a false positive reported by Valgrind for a student exercise.

mdergosits · on May 24, 2016

Sometimes I use something like:

#define ALLOC(n, type) (type)malloc(n sizeof(type))

which makes it harder to have those types of bugs, though not impossible.

pcwalton · on May 24, 2016

> To force people to learn that making assumptions based on undefined behaviour is dangerous and that computers are not mind-readers.

You can do that in a lot less time with, say, Array.sort in JavaScript. Explaining that signed overflow is undefined, and why it's undefined, is a whole lot of digression for little gain.

> And you force people to write buffer manipulation code so they realize how good people using other languages have it.

Seems like a lot of time to spend to make a point that would have been a lot more relevant in 1990 than in 2016.

nickpsecurity · on May 24, 2016

They get way better results learning Pascal or Oberon first if we're talking imperative languages. They do common stuff like problem-solving, get in habit of using typing, and don't worry about memory errors as much or undefined behavior. Then, they can start working with pointers and UNSAFE modules to see how they can shoot selves in foot but do low-level or high-performance stuff. They can learn to avoid pitfalls.

Only then should they learn C and its horrors.

joslin01 · on May 24, 2016

It was the first language taught at my university and I wholeheartedly agree. Java was actually more confusing because I couldn't comprehend what the classes/objects things were at first. I just knew functions.

khedoros · on May 24, 2016

Java was the first OOP language that I learned. We started with some unexplained boilerplate "class Blah, public static void main", and learned the basic syntax of the language by building functions.

OO concepts were introduced after that, and the boilerplate code was explained. I'd done Basic, VB5, and what I'd call "C with iostreams" before college, and it felt like a fairly gentle exposure to classes and objects. Recursion took me longer to get a handle on, actually.

minionslave · on May 24, 2016

I agree. In 2011, my first programming course in college( computing 1), was taught in C using a plain text editor on a Unix-terminal.

I'm so glad it was in C , it allowed me to understand how the computer works better.

feklar · on May 24, 2016

It is as some schools, Harvard (CS50) and CMU (15-122) both cover C in first year.

kuschku · on May 24, 2016

(Scheme, ASM), (C, Java), (), (Haskell) is the usual I heard of (seperated by semester).

fauigerzigerk · on May 24, 2016

C was my first language, but I hadn't used it (outside of C++) for a long time. In recent days it suddenly crashed my little "use new cool languages for the heck of it" party.

I was experimenting with Swift, trying to make it fit my rather performance/memory critical string processing needs. As it turns out, you can use C code from Swift about as easily as you can use Java code from another JVM language.

So I implemented a variant of the short string optimization in a few lines of C code. It's amazing how well C fits the bill as a lingua franca for code that does questionable things to bits and/or is meant to be used from other languages.

There's very little competition for C in that role.

throwaway2016a · on May 24, 2016

I love working with C. The power it gives me allows me to write hyper-optimized applications in terms of memory usage. Using it we process terabyte files ridiculously fast using less than 1K of non-program memory.

However, when I don't need to hyper-optimised it is not my first choice due to being way behind other in languages in terms of package management and I'd rather not deal with pointers if I don't have to.

pnathan · on May 24, 2016

Have you looked at Rust? It is very good for those design cases.

throwaway2016a · on May 24, 2016

Rust is definitely on my list of things to learn. I haven't gotten around to it, unfortunately.

pcwalton · on May 24, 2016

> [referring to Microsoft Word 1.1] It seems that this code is from a C project created recently in GitHub. No sign that’s a code from 25 years ago.

I wouldn't necessarily say that. At a glance, there are some questionable idioms: assignment inside a function argument (especially problematic since argument evaluation order is unspecified), old-style argument declaration, a custom Boolean type, less-than-descriptive variable naming, etc. I wouldn't let any of that pass code review today. :)

> The power of C is its stability over years, it remains basic, no advanced mechanism was introduced to the language, the code still simple to understand and maintain.

Not when you take the arcane undefined behavior rules into account. C semantics are anything but simple.

gajjanag · on May 24, 2016

> Not when you take the arcane undefined behavior rules into account. C semantics are anything but simple.

Maybe, but note that simplicity should be viewed in a relative sense.

I am curious as to which language you think has simpler semantics than C. For example, I have not found Python, Julia, MATLAB, C++, Verilog, or shell script simpler than C. Same goes for my initial explorations of Rust.

Even if one includes the arcane corners, the spec is < 200 pages (excluding the stdlib).

pcwalton · on May 24, 2016

If you don't count the unsafe sublanguage, Rust, SML, OCaml, Scheme, Java, and Lua all have simpler semantics than C, and that's just off the top of my head. It's debatable, but I would even argue that Haskell and Swift do. Undefined behavior is really subtle, and there are parts of the spec that compiler developers haven't even come to consensus regarding the meaning of.

tikhonj · on May 24, 2016

Standard Haskell (ie Haskell 2010[1]) certainly is. Most of its features are just syntax sugar over a tiny core language with simple (although not 100% formalized) denotational semantics. The language itself isn't very big and it's pretty well-specified.

GHC Haskell with all extensions... euh, I don't know. Many extensions are either syntactic sugar or straightforward changes to the base language, sometimes even making it simpler. But I'm not confident about all of them, and I'm not sure how the semantics of some low-level libraries (ie for concurrency) work out.

I've written code in essentially just the standard subset and it's a pleasant language—there aren't any extensions that you absolutely must use in real projects. That said, people use lots of them anyhow because most make the language nicer without adding too much complexity.

Since the extensions aren't organized in a unified way and have to be split up (since they can generally be enabled or disabled independently), it's hard to figure out how complex they make the semantics of the language as a whole.

[1]: https://www.haskell.org/onlinereport/haskell2010/

gajjanag · on May 27, 2016

I have not implemented a compiler so I can't really judge on that aspect.

Isn't it also true that undefined behavior can stem from varying hardware/platform considerations? In particular, leaving some things judiciously undefined allows flexibility in implementations and thus can give good performance across a variety of platforms. Sure, one could force some precise semantics, but that might result in unnecessary "emulation" code on some platforms.

So at a programming language level, by having such emulation/fixed behavior, the semantics are simpler. But at a low level, the semantics are more complicated as the machine instructions may differ significantly, thus losing transparency. This is a concern in quite a few resource constrained applications.

For a simple example, see http://blog.llvm.org/2011/05/what-every-c-programmer-should-... and the section on oversized shift amounts.

nickpsecurity · on May 24, 2016

Its main competition to start with:

http://modula-2.info/m2info/

Then, Modula-3 was the C++ alternative and its semantics are probably still easier than C due to simplicity plus well-specified.

https://en.wikipedia.org/wiki/Modula-3

gajjanag · on May 27, 2016

I like your answer, but I was really referring to widely used languages.

nickpsecurity · on May 27, 2016

Maybe Fortran, then, given less undefined behavior. It's still widely used in high-performance computing given it's easier to optimize than C and has pre-optimized libraries.

pnathan · on May 24, 2016

It's worth studying a bit on static analysis tools for C. C's lack of adequate specification and undefined/unspecified cases make it a shambling mess to have truly correct C, particularly multiplatform or in unusual CPU environments.

C++ does not improve this.

nwmcsween · on May 25, 2016

Some of the UB is for 'unusual' CPU's that existed long ago.

michaelwww · on May 24, 2016

I will go to my grave loving C, from that first day when I read about it in Byte magazine as the high speed language of every new exciting computer application, to the day when C++ took over and began to assert it's high falutin notions of what a language should be.

prewett · on May 24, 2016

I can't help myself from rewriting your post, it's so close :)

I will go to my grave loving C, from that first day when I read() about it in unsigned char magazine as the high speed language of every malloc() exciting computer application, to the day when C++ took over and began to assert() its high falutin' notions of what a language should be.

rbonvall · on May 24, 2016

I loved it, let me add:

    I will goto my_grave loving C, from that day[0] ...

prewett · on May 24, 2016

I knew I was missing something, nice!

michaelwww · on May 25, 2016

Be my guest, that was great.

elliotec · on May 24, 2016

C is nearly inaccessible to me. It really is like the Latin of programming languages, insofar as JavaScript is the new English of programming languages which is the world I have been living in for some time now.

Its like you can see the Latin roots of a lot of English, but reading it is like reading lorem ipsum. Same with C vs. JS with me.

mwfunk · on May 24, 2016

Ironically it's the other way around for me. C is not just modern English but some 1000-word subset of modern English. JS feels like Esperanto or something- a synthetic language that was informed by English (and other languages) but is really its own thing, and was designed for a very specific purpose.

(Those are my subjective feelings about those languages, not actual analysis of any real properties of those languages :) )

derekp7 · on May 24, 2016

Just out of curiosity, which elements make C inaccessible compared to other C-like languages? The base structure is identical to the later derivatives (if/else, for, while, operators and precedence, use of curly braces). Is it the manual memory management (malloc/free)? Or is it keeping aware of a variable's data type, knowing when and when not to use pointers and references? Or is it the limited number / scope of functions included in the standard libraries? (I'm mainly curious, as C seems to be the most comfortable language to me since I've known it for the last 25 or so years).

elliotec · on May 24, 2016

> Is it the manual memory management (malloc/free)? Or is it keeping aware of a variable's data type, knowing when and when not to use pointers and references?

Yes, both I think. These are ways of thinking that simply don't seem to apply in JS or any other dynamically typed higher level language.

Also, just sort of the conventions of naming things, like malloc or memcpy or in the example of the article, all those crazily shortened abbreviations that may just be saving bits or the developers style, but make it very unreadable for me. I'm used to stuff like:

    function getDataFromServer(){
      var data = response
      //blah blah etc
    }

and it may just be a thing that you have to get used to with experience, but I can read and write Python, Ruby, JS, Clojure and other lisps, and have experience with C#, but something about C I just struggle with, and I wish that wasn't the case.

asynchronous13 · on May 24, 2016

I'll agree that memory and type issues are more of a barrier, but the variable naming is just convention of the programmer. There is a lot of legacy code where brevity in naming was emphasized, but that was just the convention of the time, nothing inherent in the language. it doesn't need to be that way.

Your example looks like completely valid C code to me (except for the lack of semi-colon).

elliotec · on May 25, 2016

oops. Thanks for calling me out on the semi-colon. I also forgot to pass response as an argument. I should be ashamed of myself. Must've been late.

nibs · on May 24, 2016

As someone who learned JS and then is recently working on C, variables.

ufo · on May 24, 2016

How? The only big difference is that in C the variables need type annotations and that they have proper lexical scope (so they are more like Javascript's "let" than "var")

elliotec · on May 24, 2016

If the person learned ES5 and was used to the tricky scoping and had got to C before ES6 became standard/popular, I could totally see the struggle there.

nibs · on May 25, 2016

This is exactly what happened to me.

jasonpeacock · on May 24, 2016

My manager (a long-time-ago former dev) once told me:

"I can write C in any language."

And it's true. He could write beautiful C code in Perl, Java, Ruby, etc - whichever language the current project was using - when he needed to hack something (usually gathering data for business metrics).

codemac · on May 24, 2016

We used that quote as a joke for not learning new languages well enough. Do you mean that he would write C and link it to each implementation?

jasonpeacock · on May 24, 2016

He'd just stick to the basics and write C-style code in the given language, he had no knowledge of any of the language idioms and didn't need them if he kept his code simple enough.

Basically only using if/for/while and function/return constructs.

It was strange to open a Perl script and see "C" code, with single-letter variables (declared at the top), etc.

acveilleux · on May 24, 2016

Honestly, writing perl in "C" style is really a good idea. At least if not pushed to extremes. It avoids a lot of unreadability and confusion.

Other things I think make for good perl style:

* Never use unless, especially as the after-statement conditional and/or with a negation. (I once spent a whole day ripping my hair out to figure out that an unless with a negated clause that was itself ambiguously named did the exact opposite of what it seemed to. Think [...] unless not $unregulated;)

* Do use foreach wherever possible.

* Keep reference usage to where it's really needed.

* When in doubt, don't write new OOP code. (The benefit to the caller has to be much greater than the extra complexity in implementation.)

* Lists are the core data-structure and regex are first class construct.

* Avoid the use of implicit $_/@_.

* Avoid the use of explicit $_.

* Use map / grep / sort and similar in pipelines but don't nest too deep on a given line.

namelezz · on May 24, 2016

> He'd just stick to the basics and write C-style code in the given language

I want to see how he writes C style in Haskell.

gbrown · on May 24, 2016

In some languages (e.g., R), the performance penalty for such code is enormous.

disgruntledphd2 · on May 24, 2016

Yes and no. While typically vectorised code will run faster than a loop (because its a loop in C) i.e. rowSums is faster than a typical apply call, there actually isn't that much difference between a for loop over columns to sum and apply over columns to sum.

The real performance penalty is growing objects as you go. If you preallocate a list or vector of the appropriate length, then there typically isn't much difference.

That being said, you'll pry lapply(x, fun) from my cold dead hands (lapply(x, function (x) g(f(x)) is even better :) )

nickpsecurity · on May 24, 2016

I have a hard time doing pointer-arithmetic, arbitrary memory writes upon malicious data, and unchecked macros in languages designed for safe programming. :P

Someone · on May 24, 2016

The original said "FORTRAN in any language" (http://web.mit.edu/humor/Computers/real.programmers)

alistproducer2 · on May 24, 2016

One of my favorite jokes on HN was in a typical language wars thread where a guy wrote "real men write in C....without the standard library." That still makes me laugh.

TickleSteve · on May 24, 2016

Bare metal C code is what is running on 99.9% of the embedded devices surrounding you right now...

vardump · on May 24, 2016

Scary thought. I'd feel more at ease if it was Rust (or something analogous).

Just disassemble/decompile some random firmware code... you wouldn't believe what one can find from those piles of err... binary dump.

TickleSteve · on May 24, 2016

There is a lot of scaremongery around C.... don't believe it.

If you know what you're doing (and you should) then C is the best tool for the job.

nickpsecurity · on May 24, 2016

The military and defense contractors did empirical studies of various programming languages and their defect rates to put that to the test. That was mainly in 80's and 90's. They compared C, C++, Ada, and Fortran mainly. C usually had double the defects of the rest and with more severity despite pro's writing it. Ada usually outdid all of them with one showing Ada and C++ developers having same defect rate.

So, when we use evidence instead of feelings, what you said is a myth that's been debunked repeatedly in many ways for decades. And, yet, people repeat it. No, C use usually results in problems that safer, systems languages before and after it had less or none of. Despite professionals using it. The solid code is always an outlier.

Most amusing thing is Thompson and Ritchie, with help from Pike, later designed what they thought was perfect language with perfect set of features. The result, Go, was basically like Algol subsets that preceded C mixed with Pascal developed around same time. But, C compiled and ran on a PDP-11 fast. So, it's the best. ;)

TickleSteve · on May 24, 2016

Yes I remember the studies (and have done my fair share of defence s/w).

And I am not claiming that C is the best tool for the job because of the language.... its because of the 'environmental' factors, human resources, community, momentum, codebases, portability, etc.

I whole-heartedly agree that there are better languages out there, but the fact remains that the flaws in C are easily managed.

It would also be interesting to produce a study of the use of C specifically in embedded systems where the style of code is very much different than large desktop systems (i.e. much less heap usage and dynamic memory allocation for example).

nickpsecurity · on May 24, 2016

"And I am not claiming that C is the best tool for the job because of the language.... its because of the 'environmental' factors, human resources, community, momentum, codebases, portability, etc."

I agreed with you on that in another comment. Yep. It's also why NASA often uses it. The tooling covers dark corners pretty well these days.

"It would also be interesting to produce a study of the use of C specifically in embedded systems where the style of code is very much different than large desktop systems (i.e. much less heap usage and dynamic memory allocation for example)."

I'd love to replicate the old studies on modern languages with modern tooling for both systems programming and the embedded subsets like MISRA. We'd definitely learn some stuff from that. Further, I'd like to see specific metrics like in Ada/SPARK/CbyC example below on where defects were introduced or corrected with each technique and phase of lifecycle. Would tell us how low-level features, subset rules, and tooling interact with accurate assessment of how much problems they pose for real instead of in theory.

http://www.sis.pitt.edu/jjoshi/Devsec/CorrectnessByConstruct...

TickleSteve · on May 24, 2016

This is the thing...

Modern embedded C programming has evolved significantly in the right direction since studies were done and so I would love to see someone spend the time looking at 'modern' embedded C codebases.

Static analysis is now widespread and not theoretical, coding standards now take security seriously (tho can be outdated). Drawbacks to certain techniques have been surfaced and recognised. Tooling is much better, portability is recognised as a good-thing, pointer usage is now minimised when possible and contained to areas where they are appropriate. Casting is frowned upon, macro-magic is frowned upon. 'Clever' code is frowned upon.

Basically, I would think that certainly in the embedded domain, standards are now such that the story would now be significantly different.

vardump · on May 24, 2016

Any recommendations on static C analysis for embedded systems?

TickleSteve · on May 24, 2016

No one-tool does the whole job. Setup your CI system to use multiple tools (Lint, CLang-analyser, cppcheck, etc).

Dont trust just one.

nickpsecurity · on May 25, 2016

Have you tried Astree Analyzer? Papers I read indicated it was one of best but hard to find industry people to confirm or reject it. Want someone to clone it for FOSS. Meanwhile, I found these free ones for you two that have each found bugs in code with focus on minimal annotations. All are academic prototypes but Saturn was used on Linux kernel.

Saturn http://saturn.stanford.edu/pages/overviewindex.html

CSolve http://goto.ucsd.edu/csolve/

Calysto http://www.domagoj-babic.com/index.php/ResearchProjects/Caly...

Most academics are instead making compilers that translate C code into something mostly or totally safe while pushing the performance hit downward. Here's two of the top for you to try on various codes (or improve):

https://www.cs.rutgers.edu/~santosh.nagarakatte/softbound/

http://sva.cs.illinois.edu/downloads.html

vardump · on May 24, 2016

Pretty much what we already use.

I was hoping for something new.

nickpsecurity · on May 25, 2016

Here you go. Experiment away and give them constructive feedback as they need it. Esp Softbound and SAFECODE.

https://news.ycombinator.com/item?id=11770726

vardump · on May 25, 2016

Thanks!

I'll look into those.

Profan · on May 24, 2016

Best tool is a stretch, it's a tool.

If we can get tools like Rust which have better memory safety guarantees while offering much of the same performance, then hell, why not? It's easy to see why just by looking at the hundreds of CVE's throughout the years, often caused by memory unsafe operations.

Not that I don't like C, but there's many better alternatives out there.

And this is coming from someone who stubbornly sticks to writing D in C-style, but knows what Rust brings to the table.

TickleSteve · on May 24, 2016

OK, I'm biased as I've got over 20 years in embedded/real-time systems primarily written in C (and C++) and am currently architect for an embedded system containing ~million lines of code and rolling out to tens of millions of units.... (But I'm educated enough to also use Python, Clojure, C# and a variety of other systems).

...and C is currently the best choice for systems like this because of reasons not primarily to do with the language.

Currently, if you suggest to use a language other than C you will be laughed out because the only available embedded guys are C based. Yes, momentum counts and C has massive momentum.

There are also issues such as toolchains and tooling, familiarity, community, and of course the massive existing set of libraries, codebases and knowledge.

Basically, C is 'good enough'. (although Rust is interesting and on the horizon, its a long way off yet).

Although technically a better language, C's shortcomings are greatly exaggerated, for example memory management. In general this is not an issue in bare metal embedded systems as you don't have a heap, everything is statically allocated by design. There are many flaws in the C language (as in every language), but in day-to-day use, they are very easy to manage.

So... don't believe the (bad) hype.

pcwalton · on May 24, 2016

I agree with you that C is a lot better if you don't have a heap. But dangling pointers, undefined behavior, etc. are still issues. We know how to fix these problems in 2016 with better language design, via techniques we didn't know in 1978.

I'll be the first to admit that compatibility with an ecosystem, even if flawed, is important. (I work on web browsers, after all!) We can't change overnight, or even in two or three years. But we'll never get to a better future if we don't take the steps to start now.

nickpsecurity · on May 24, 2016

"There are also issues such as toolchains and tooling, familiarity, community, and of course the massive existing set of libraries, codebases and knowledge."

Social and economic factors that we C opponents say are why it remains and is often the pragmatic choice. These in no way show C language itself was well-designed, superior, etc. Just show how going mainstream can make things more practical. They did a great job on that part.

TickleSteve · on May 24, 2016

Completely agree.

TBH, my ideal language for this job would probably be Rust, but that is just not a mature enough ecosystem currently.

Don't get me wrong, I would consider myself a C proponent currently, but I would love to see change in a Rust-like direction.

Despite you classing yourself as a C-opponent, I suspect we agree more than disagree.

nickpsecurity · on May 24, 2016

Probably. I think you're a pragmatist rather than a true believer. ;)

Btw, what do you think of a reboot of Modula-2, Modula-3, or just Ada/SPARK? Your opinion on them would be interesting given your background. Closest thing to Modula's that's actively developed is Astrobe Oberon for embedded.

https://en.wikipedia.org/wiki/Modula-3

http://www.astrobe.com/Oberon.htm

TickleSteve · on May 24, 2016

(Certainly a pragmatist)

I started off with Turbo Pascal before graduating to Turbo C, so I do have a fondness for the Pascal family of languages. I also love type-safety.... TBH, I would prefer a hypothetical "fix" for the C language that improved type strictness/safety over "fixes" for resource/memory management issues which I consider should be bread and butter for any software engineer. So in that, I do not like large runtimes or garbage collection, more visibility to what the machine is really doing is needed. GC systems are also not predictable enough, deterministic runtime is absolutely essential to making reliable software.

I have poked my nose around Modula-2 in the past, but C took over for me personally. ADA I consider to be far to heavyweight for anything serious (and thats after being in a company that used it extensively in the defence world, tho I never had to touch it myself)

Currently Rust is my best hope for the future.... but C has survived well IMO.

pjmlp · on May 25, 2016

Also a pragmatist, as although I tend to bash on C on every occasion and would rather not use it, if a customer does require it, I will use it.

For me a kind of escape path from C, back when I was into Turbo Pascal, was to move to Turbo C++. I only got to use Turbo C around one year before getting to learn about this new cool language called C++.

It allowed me to keep some of the Turbo Pascal safety around.

What are your approaches when using plain C, coming from "so I do have a fondness for the Pascal family of languages. I also love type-safety" ?

My solutions were:

- Use translation units as TP units, only exposing functions and struct accessors (macros if function calls were too expensive)

- Always make use of debug functions to validate pointers in debug builds (e.g. _malloc_dbg, _CrtIsValidPointer on Windows)

- make use of const as much as possible

- write my own wrappers around strncpy and friends

- compile warnings as errors

- if given the option, just use C++ instead and prefer library types to inherited C ones (string, array, references, RAII).

nickpsecurity · on May 24, 2016

Interesting response. Thanks.

pcwalton · on May 24, 2016

> If you know what you're doing (and you should) then C is the best tool for the job.

Empirically, nobody knows what they're doing, if "knowing what you're doing" is defined as "writing large-scale C code without memory safety problems".

I hear a lot from C and C++ enthusiasts that there are lots of programmers out there who always write correct C and C++ code, and therefore we don't need new languages. But I've never found one of those programmers. Can you name one?

VLM · on May 24, 2016

"large-scale C code"

I hope to never own a microwave oven with a large scale codebase. Also my basement dehumidifier. And my waffle iron. And my clothes washer. And my digital dial caliper in my workroom.

There's a surprising amount of money in "you are now a timer" and "you are now a thermostat" and "you are now a thermometer" and very close analogies.

vardump · on May 24, 2016

Washing machines actually have quite a bit of code. Car ECUs have even more. Elevator control systems are pretty big. Medical devices have huge codebases.

You'll definitely use or own something that has a somewhat large codebase.

nickpsecurity · on May 25, 2016

Mine doesn't. When my washer broke, I called some locals that have a warehouse full of them and repair them. I asked whether it was their opinion, being experts, that my observation about the older ones being more reliable was true? The guy confirmed that older ones easily last a decade or so with new, computer-filled ones breaking all the time. Said they're good business for him. Sold me what he claimed was a good model of older ones he had for under $100. I was surprised when he told me the one that broke was around 20 years old.

That is both reliability and return on investment. :)

imtringued · on May 25, 2016

That is mostly because the newer washing machines replace what used to be steel with plastic to cut costs.

nickpsecurity · on May 25, 2016

I'd believe it. He said the electronics screw up, too. He was fixing a control system on one that was just a few circuits in the knob on older ones.

TickleSteve · on May 24, 2016

(see the reply to the above response).

Basically, in day-to-day use, the flaws in the language are easily manageable. Resource (memory) management is a skill that should be a given.

Yes, there are well publicised failures in this regard, but consider the number of problems versus the actual number of lines of code written.

pcwalton · on May 24, 2016

> Yes, there are well publicised failures in this regard, but consider the number of problems versus the actual number of lines of code written.

I think that's a pretty misleading metric when you take into account: (1) widely used codebases constantly increase in size constantly; (2) it only takes one exploitable mistake for an attacker to achieve remote code execution. One exploitable vulnerability per 10,000 lines means 1,200 remote code execution flaws in a codebase the size of Chromium.

jjnoakes · on May 24, 2016

No one "always writes correct large-scale code".

I have worked with many who write large-scale C and C++ code with no undefined behavior. Is that what you meant?

pcwalton · on May 24, 2016

Yes.

I've never seen it. All large-scale C and C++ codebases I've ever seen, from large companies to small ones, have had memory safety/undefined behavior problems [1]. If "just hire better programmers" were a workable solution, surely one company out of {Google, Apple, Microsoft, Facebook} would have succeeded at that strategy by now.

It's very easy to write C and C++ code that looks like it's free of undefined behavior, but in every case I've seen they end up falling when attackers actively try to look for problems.

[1]: Maybe qmail is the one exception, though even that had a famous debate related to overflow.

jjnoakes · on May 24, 2016

Have you ever worked with teams who use aggressive static analysis tools to detect and catch undefined behavior?

Because I have, and it works incredibly well.

Of course, the caveat is that once you turn up the static analysis aggressiveness (assuming you use a good static analysis tool), you will need to put in plenty of assert(index < len), ownership annotations, and other such items in your code to satisfy the checks.

But once you do, it's really hard (probably not impossible, but really hard) to trigger undefined behavior without the static analysis tool catching it.

Because if the tool decides it can't prove there's no undefined behavior, it is configured to complain, and you adjust the code until it stops.

I'm not saying many teams do this, but I am disputing that no teams do this.

pjmlp · on May 25, 2016

> I'm not saying many teams do this, but I am disputing that no teams do this.

If you watch Herb Sutter's talk at CppCon 2015, at a given point he asks the audience how many know and use such tools.

It is one of the most important C++ conferences, usually attended by the most savvy C++ developers in the world, the amount of the audience saying that they do use such tools was ruffly 1%!

1% shows how much the majority of C and C++ developers, or their employers, care about writing proper safe code in those languages.

The only way out is moving to programming languages whose safety must be explicitly turned off, and not having to be explicitly turned on, because most won't bother taking the effort to turning it on.

jjnoakes · on May 25, 2016

No one is arguing that the industry moving to a safer language will reduce the level of nasty bugs in the world. That would be silly to argue against, because that's basically a tautology anyway.

What I'm disputing is that no teams write good C or C++ code.

In fact, the 1% who raised their hands proves my point. It isn't 0%.

That's all I'm saying. Don't smear our names with the "all C and C++ developers write code with exploitable undefined behavior bugs in it" when you mean "most C and C++ developers write code with exploitable undefined behavior bugs in it".

pjmlp · on May 26, 2016

C++ is my language to go, every time I need to go out of JVM and .NET ecosystems and I do take all efforts to be part of that 1%.

Once upon a time I gave C++ class to first year students at the university where I took my degree. Worked with C++ at some well known companies and research institutions.

Still, I won't say I never write exploitable undefined behaviour free code in C++, let alone C. Because I cannot control the code that gets used by third party libraries, written by team members or even having the whole UB use cases on my head.

jjnoakes · on May 26, 2016

If you use the right tools (static analysis and sanitizers and fuzzers) your own code (and any code you rely on which you have the source for) will be handled.

If you rely on binary dependencies which have quality problems, then u can't help you there. But that could be an attack vector in any language.

And if you are worried about the operating system itself, then there really is no easy way around it unless you want to run a unikernel for everything.

And even then you may hit a CPU microcode bug it hardware bug...

pcwalton · on May 24, 2016

> Have you ever worked with teams who use aggressive static analysis tools to detect and catch undefined behavior?

Coverity is actually used on browser engines, and it has not been able to stem the tide of exploitable security vulnerabilities. Sound static analysis is just too difficult on idiomatic C and C++: it's effectively impossible, as the language was just not designed for it.

jjnoakes · on May 24, 2016

I don't agree. I work with static analysis daily and idiomatic code is the easiest to analyze.

If coverity isn't catching the bugs and they are undefined behavior bugs then coverity needs to be more aggressive.

But it probably doesn't support things like ownership annotations, and it probably gives up strict checking in order to avoid false positives.

The problem is developers. Developers are lazy. Most developers won't even turn on compiler warnings, never mind using the sanitizers.

So tool makers create sub-par tools to cater to them. Fewer false warnings means more customers, even if it also means fewer security holes.

It's why we use an in house tool and crank up the checks to 11.

vardump · on May 24, 2016

What tools do you recommend?

jjnoakes · on May 24, 2016

Unfortunately I have never used one available commercially which meets my needs. We author one in-house.

nickpsecurity · on May 25, 2016

Your tool is better than Astree Analyzer and Polyspace? I think you should consider GPLing or licensing it with commercial support. Judging from their prices & effectiveness, you'll make a killing doing better while undercutting their licensing.

jjnoakes · on May 25, 2016

The problem is usability and licensing costs. Our in-house tool is amazingly powerful and fits our needs but the usability would need improvement (no surprise) and we license some of the utility code we use (which is cheaper if we don't resell our tool).

nickpsecurity · on May 25, 2016

You couldn't replace the utility code with something in FOSS? And that's why your company is holding out on a static analyzer better than anything out there? The case for doing something to open that tool is just getting stronger.

Btw, what utility tool are you licensing? What makes it irreplaceable? I know of only one in this field that I truly couldn't replace for compilers or static analysis. Even then, there's quite a few that handle the job well enough to not need it. So, I don't use it.

jjnoakes · on May 25, 2016

The utility tool has OSS equivalents but we've evaluated them all. None of them have the features we need, and adding them would be person-years of work we can't afford.

My hope is to some day open source our internal tool, but we have to wait until the OSS features we need catch up. Until then, we aren't willing to pay the high license fee to redistribute this utility freely.

I'm speaking a little cryptically because I'd rather not have this tied directly to any particular company (neither mine nor the one we license this code from). I'm not speaking on their behalves officially; only my own.

nickpsecurity · on May 26, 2016

Hmm. Love to know what that tool is but NDA's and policies are what they are. I have an email if you want to send it to me for what advantages it has over similar comparison. Then, on odd chance I see opportunity, I'll nudge someone in direction of trying to bring OSS up to par. No promises it will happen as opportunity has to happen first.

cperciva · on May 24, 2016

As a FreeBSD developer, I have to disagree. Real men write the standard library.

rat87 · on May 24, 2016

Obligatory link to the Story of Mel:

http://www.catb.org/jargon/html/story-of-mel.html

andreapaiola · on May 24, 2016

Ah! I've thought instead to MEL (ancient Maya scripting language)

andreapaiola · on May 24, 2016

Maya the 3D software, not the people

ndesaulniers · on May 24, 2016

Sounds like writing kernel code.

vardump · on May 24, 2016

Well, except that in kernel you often can't use even many CPU features either in practice.

At least if you use SSE2/FP/etc., you better ensure those FPU registers are saved and restored. But probably you don't want to do that in a IRQ handler! Save only SSE2 registers on an AVX system (higher 128 bits will be zeroed for currently executing usermode thread!) and receive "interesting" bug reports from the end users.

If you refer to a vm page that's not present... well, bad things might happen.

Etc.

LionessLover · on May 24, 2016

Happens every day when writing code for small embedded systems. :-)

duozerk · on May 24, 2016

To be fair, there is often some sort of limited version of the std lib available even for those (for example by the MCU manufacturer).

LionessLover · on May 24, 2016

"Availability" doesn't mean much. There are plenty of all kinds of libs available on all platforms (who wants to program WiFi access for the embedded sensor from scratch, for example?). Doesn't mean it makes sense to put any of them into the current project. The stdlib is one of many and not that often the one that is put in. The libs I use on my embedded system are much more likely to be for some specific piece of sensor or I/O device (like the mentioned WiFi module), I rarely have need for what's in the stdlib. Of course, embedded systems vary far more than the PC and server stuff so you can easily find people who will say the exact opposite.

jotux · on May 24, 2016

>I rarely have need for what's in the stdlib

It has been my experience that people who don't know or use the C stdlib end up unknowingly re-implementing stdlib functions over and over again.

marssaxman · on May 24, 2016

A freestanding C implementation doesn't have to include anything but a handful of headers with some macro and type definitions. It's not uncommon to get a manufacturer-specific library whose features are specific to the particular controller architecture, not related to those defined in the full C library. And it's not uncommon for embedded software developers to ignore those libraries, which are frequently terrible, and write everything from scratch.

_blrj · on May 24, 2016

This is surprisingly easy, though! Funny, nonetheless.

jcadam · on May 24, 2016

I learned C via this excellent book: http://www.amazon.com/dp/067230399X/?tag=stackoverfl08-20, which I picked up from a local bookstore on a whim. I had just started high school, and up until that point my programming experience had been limited to BASIC-type languages (starting with Apple ][ basic when I was 6, then AMOS on the Amiga 500, then QBasic on DOS... actually that last one was a bit of a downgrade from AMOS :) ).

C was my introduction to "real" programming -- it forced me to actually learn memory management, pointers, etc. Funny thing, I only took one C course while I was in college. By the time I started college, Java was the new hotness, and I was in the first incoming freshman class to get Java instead of Pascal in our introductory programming course. I took an immediate disliking to Java back then (and still don't like it), so naturally it's the language I have to use at work :O

gravypod · on May 24, 2016

C has kind of become the evident defacto standard. There isn't anything we can really do to change that. It just did what it was meant to do very well with all of the flashy bells and whistles removed.

The underlying instructions generated by the compilers are simple to follow. Reasoning about what will be generated is an easy enough task (so long as optimization is turned off).

pjmlp · on May 24, 2016

"Although the first edition of K&R described most of the rules that brought C's type structure to its present form, many programs written in the older, more relaxed style persisted, and so did compilers that tolerated it. To encourage people to pay more attention to the official language rules, to detect legal but suspicious constructions, and to help find interface mismatches undetectable with simple mechanisms for separate compilation, Steve Johnson adapted his pcc compiler to produce lint [Johnson 79b], which scanned a set of files and remarked on dubious constructions."

http://www.netzmafia.de/skripten/unix/chist.html -- Dennis M. Ritchie

kensai · on May 24, 2016

Lua + C == welcome to C in the 21st century. :)

Seriously enough, Lua adds all those scripting high-level aspects that are missing from C. I know it's another language but its interplay with C gives strength to both of them. A good alternative of the C + Lua paradigm is of course Python.

fanf2 · on May 24, 2016

Tcl + C == welcome to C in the 1990s :-)

georgeglue1 · on May 24, 2016

Also note that nearly every computer science or electrical engineering grad of the last 30 years has taken multiple courses in C.

This obviously gives it a big leg up for employer demand, etc.

cm2187 · on May 24, 2016

Most popular and most used are not exactly the same thing. Half of the population pay their taxes, I don't think one can claim taxes are very popular.

komali2 · on May 24, 2016

My first serious introduction to programming was through Harvard's online CS50x course, which uses C. I think it was a fantastic way to introduce a newbie like me to what programming languages are doing under the hood with memory allocation, pointers, value vs ref, etc.

peter303 · on May 24, 2016

You can backpropagate "modern" CS techniques like modularity an encapsulation to most ANY language like C or LISP or FORTRAN. But if have language and compiler support for these, it makes it easier and more reliable.

gp7 · on May 24, 2016

Encapsulation is done better and is easier to manage in C than it is C++.