Don't Hate on C

I am sure that many of us have had this discussion with someone, at some point, somwhere. Why is C still around? Straight off the bat, I'm going to show my colors: C rocks. And so do other languages. It's true, other languages which happen to be very readable, quite forgiving and incorporation features which aid the developer (garbage collection, boundary checking, implicit references, dictionaries,...), and in many cases perform very well. As a C guy, I'm the first who used to resort to performance as an argument to dismiss other languages as inferior. But the bottomline is that isn't necessarily true. Don't get me wrong: good C code - yay, we can also embed assembler, a good compiler, etc. 9/10 times will yield superior results - so the performance argument is still often valid. However that doesn't equate to all other languages being slow. Furthermore, not all applications require fine-graining the code for supereme performance. One language that has come a long way since its earlier days is Java, particularly in what refers to performance. Java used to be slow. Very slow. It always had it's good things though: C style syntax, object oriented, garbage collection, supposedly portable to any platform with a JVM... But that JVM the code ran on, and JIT'd on. Oh, it was slow. But it has come a long way, and now in certain operations such as arithmetic ops, it's pretty much just as fast as unoptimized C. I mean in many cases that argument is not valid anymore. Same goes for python. Man, some people dismiss python arguing it's an interpreted language and therefore slow. First of all, implementations are compiled or interpreted. The language is a language. Secondly, and most important: not strictly true. Python can generate bytecode and run on the python VM (just read more on pypy or cpython). Other languages, functional languages for instance, aid in defining very elegant solutions in certain contexts. I've heard people say Haskell's more widespread implementation, ghc, isn't particularly fast, but I've also read otherwise.... My point is there's very stiff competition, but C hasn't gone anywhere. C is still around, and will continue to be around. Lets try to understand why.

First of all, and one of the key reasons, is C has a very simple and strict grammar. A grammar that despite being simple - can be parsed by an LR(1) parser - does not create any major constraints for the developer. This fact creates a bunch of advantages for the language:
1. The simple syntax eases the development of compilers/implementations.
a. It's therefore "easy" to come up with a decent compiler for virtually any architecture. Therefore good code should be easy to diesseminate across architectures.
b. Because creating a compiler is simpler, it also becomes easier for the compiler to optimize the output binary.
c. Competition. The competition means compilers continue to improve - GCC, LVM - clang, intel, etc.
2. it's easy to learn. Although as you already know, due to the freedom it gives the developer: hard to master.

One caveat when dealing with C code, is undefined behavior. This is a big issue actually, because compilers will not typically warn you about undefined behavior. It's up to the developer to know what he's doing. Many people will interpret this as a downside for C. I honestly feel otherwise. C requires the coder to try to understand and appreciate the underlying architecture - this makes it a little harder, but it also makes you think more and deeper about what you're doing.

If we look at what C does for you you may be quick to say: not much. In a way you'd be right. But if we look at the bigger picture you'll realize that's not necessarily something bad. It gives you power and control over your code, the behavior and performance. All those niceties the JVM does for Java coders... that JVM is written in C. That pretty much helps sum things up. Those nice accesories and aids you get are courtesy of good C! When you dive a bit deeper into C it quickly becomes apparent how it (and its implementations) really was conceived to cut all unnecessary clutter to deliver execution speed. One great example is static vs automatic variable initialization. Static variables are initialized to 0, while automatic variables are simply not initialized at all. Why? Because initializing static variables is a one time effort, while doing so for an auto variable would require performing the variable initialization every time a function was called. Although the C standard does *not* specify any particular memory layout, it's very common for implementations to employ an execution stack - automatic storage also resides in this execution stack. Which explains how initializing these automatic variables would incur in a cost every single time a fucntion was called. Similarly, allocated storage resides on the heap. Allocated storage is not initialized either. Again, this makes sense. Imagine you allocate a large portion of memory you will be writing to as a buffer. Initializing that buffer to say 0 would be absolutely useless and on top of that you would incur in a cost which is not negligble and a function of the amount of memory allocated. Again, remove unnecessary clutter and let the coder decide when and how to initialize storage.

C Memory Layout

One of the characteristics about C that have allowed it to blossom for a vast array of architectures and environments is the fact that the C standard defines expected behavior quite well (except when it doesn't, you know, UB ;-), but says very little about how the standard should be implemented. This has made it easier to make solid implementations of C for a wide range of hardware.

Addmitedly, some aspects of C make it a little less friendly. There are several aspects which remain undefined/unspecified (usually to favor optimization opportunities). Evaluation order is one of them. Normally when we write code, we hope to evaluate expressions from left to right when precedence allows for it. For C, this is actually not specified and will be conducted differently in different platforms. (In fact, it may be conducted differently in different regions of your code ???? is this true ????). What does this mean to me as a coder? It means that when coding in C I have to be very careful about sequence points and sequencing rules - I need to be aware of the fact that I cannot assume anything about a variable's value before a sequencing point. When careless, this is the potential cause of bugs which may be hard to spot despite being in front of our eyes. So what are sequence points exactly a definition I like describes them as a point the execution flow of a program where all previose side-effects have already taken place and all future side-effects have not yet taken place. Where do we have sequence points in C?
- at the end of a full expression.
- in a function call, there is a sequence point after the valuation of the arguments but before the actual call to the function.
- The logical operators && and || guarantee left-to-right evalutation, and if the second operand is evaluated there is a sequence point between the evaluation of the first and second.
- The rarely used comma operator also guarantees left-to-right evaluation and defines a sequence point between the left and right operands.
- For the conditional operator ( ? : ), the condition is evaluated and there is then a sequence point between it's evaluation and the evaluation of the second/third operands.

If you don't bear some of these things in mind you will stumble over your own UB over and over again. So try to keep these things clear in your mind.

C also may confuse some with respect to memory. A couple of things a lot of people tend to forget about are memory alignment and variable sizes. First of all, stop assuming too many things about the size of your types. I'm not just talking about structs here, I'm talking about types in general. Precisely because C and C++ do such a great job in heterogenous hardware environments there are many things you just can't assume. Like the size of a long int, or a double, or whatever. I personally always resort to stdint.h to abstract myself from some of these and find the wrapper types (uint64_t, or int32_t, or int8_t, or whatever) very helpful and descriptive - giving place to more readable code. And with respect to structs many of you probably already know that with respect to structures 2+2 need not be 4. To improve memory access many C compilers will attempt to word-align structure members, it does so by adding padding bytes within the actual struct. Surprisingly, just be reordering the structure members you may produce more efficient memory access/size for your structures. You may also choose to pack your structures (using preprocesser directives or compiler flags), this will most probably make your memory access to members less efficient, but will also produce the smallest possible size for structures. C gives you the power to decide what is best for you, for each particular case - maybe you want to avoid overhead when serializing structures, or maybe you perform constant access to members so you must optimize memory access. Not many languages will allow you to do this - so be grateful for C ;-)

So after this recollection of thoughts with respect to C, it all comes down to understanding what C is and what it's not. Don't hate it because it doesn't give you all these nice features other languages do. Don't. If you do, then you probably don't understand the strengths of C. You have to take C for what it is, and love it for what it is. C is the language that will bring you closest to the machine. C is a building block for so many other languages you do love, so stop hating it. Yes, some C code can be very buggy. In fact, some very notorious bugs brought down some very serious missions like the Ariane rocket, or the Patriot missile mistiming, etc. have been deemed to be doomed by C bugs. But the truth about it is it wasn't C's fault. They were doomed by crappy code and carelessness. Which comes to sum up just how easy it is to slip up and introduce bugs into C code - because the people working in those projects were/are brilliant, definitely a lot more than I am. But then again, when it comes to mission critical code, real time applications, etc... there really is no other choice. C is the way to go. Because C is the only higher-level language that will allow you the sort of control you need over the resources, over what's under the hood.

Written on August 26, 2013