MuffinCups's comments

MuffinCups · on Aug 2, 2024

C is the thin wrapper of people talk around machine talk. It's the primary interface between man and microprocessor. FFI is a pretty good convention for giving your latest, redundant, unnecessary, vanity language the capability talk to the OS which is almost universally written in C.

pornel · on Aug 2, 2024

C is defined in terms of its own abstract machine (one that can't overflow a signed int), not the hardware that it runs on.

This creates the contentious issue of Undefined Behavior that C forbids, but which the actual hardware has no problems with.

mytailorisrich · on Aug 2, 2024

It's because in practice this "abstract machine" means "whatever your CPU is" and "undefined behaviour" means "whatever your CPU does"...

pornel · on Aug 2, 2024

That's a common misconception. Advanced compilers like GCC and LLVM have optimizers working primarily on a common intermediate representation of the program, which is largely backend-independent, and written for semantics of the C abstract machine.

UB has such inexplicable and hard to explain side effects, because all the implicit assumptions in the optimization passes, and symbolic execution used to simplify expressions follow semantics of the C spec, not the details of the specific target hardware.

pornel · on Aug 2, 2024

Programmers have an ideal obvious translation of C programs to machine instructions in their head, but there's no spec for that.

It creates impossible expectations for the compilers. My recent favorite paradox is: in clang's optimizer comparing addresses of two variables always gives false, because they're obviously separate entities (the hardcoded assumption allows optimizing out the comparison, and doesn't stop variables from being in registers).

But then clang is expected to avoid redundant memcpys and remove useless copies, so the same memory location can be reused instead of copying, and then two variables on the stack can end up having the same address, contradicting the previous hardcoded result. You get a different result of the same expression depending on order of optimizations. Clang won't remove this paradox, because there are programs and benchmarks that rely on both of these.

mytailorisrich · on Aug 2, 2024

And yet, in practice the C have written over the last 20+ years mostly ended up with "undefined behaviour" = "what CPU does"...

That may not be true in fact but it ends up that way because "undefined behaviour" tends to be implemented "to work" and we're used to behaviour of common CPUs that may have fed into expectation of what "to work" should be...

imtringued · on Aug 2, 2024

That's not how compiler developers interpret undefined behaviour. Undefined behaviour is closer to an 'U', 'X' or "don't care" in VHDL. These things don't exist in real hardware and only in simulation, therefore the synthesis tool simply assumes that they never happen and optimizes according to that. However, C does not have a simulation environment and UB propagation is not a thing. It will simply do weird shit, like run an infinite loop, because you forgot to write the return keyword, when you changed a function from void.

mytailorisrich · on Aug 2, 2024

Well, again in practice the following:

"undefined behavior behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements

NOTE Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).

EXAMPLE An example of undefined behavior is the behavior on integer overflow." (The C99 standard, section 3.4.3)

translates into "whatever your CPU does" because while there is not requirement imposed, in general the compiler does make it work "in a manner characteristic of the environment".

I believe that memory accesses outside of array bounds, signed integer overflow, null pointer dereference are all examples of "undefined behaviour", which in practice all boil down to what the CPU does in those cases. I.e. commonly memory access outside of array bounds returns whatever is at that address as long as address is valid because there are no checks and that's what the CPU does when asked to load from address. Integer overflow? If a result of adding/subtracting, commonly it wraps around because that's how the CPU behaves, etc.

And I believe this is all on purpose. C is an abstraction over assembly and I believe that people who were used to their CPU's behaviour wanted to keep it that way in C, and also compilers were simple.

armitron · on Aug 4, 2024

For someone who's been writing "C for 20+ years" according to your other post, you come across as extremely ignorant of how modern optimizing C compilers work. I suggest you thoroughly read and understand Russ Cox's simple expose [1] and the UB posts in John Regeh'r blog [2].

The C security issues that plague us today are partly fueled by the attitude of guesswork and ignorance demonstrated in your posts.

[1] https://research.swtch.com/ub

[2] https://blog.regehr.org/

mytailorisrich · on Aug 5, 2024

I always appreciate ad hominem attacks, insults even, thank you.

notepad0x90 · on Aug 2, 2024

You can write programs without so much as even using the concept of functions. there are single-program embedded devices that don't have an OS. how can you look at something ada for example and say it is a wrapper around C?

C does have rules that must be obeyed and string concepts of data types or lack thereof (like not having a string). There are assembly language designs specific to C++, until recently arm had java byte code specific feature set (Jazelle). C is dominant but I'd hesitate to say it is a machine code wrapper.

MuffinCups · on July 23, 2023

A dynamically linked library need only have one image of itself in memory.

If you are running a process that, for example, forks 128 of itself, do you want every library it uses to have a separate copy of that library in memory?

That's probably the biggest benefit. But it also speeds up load time if your executable doesn't have to load a huge memory image when it starts up, but can link to an already in-memory image of its library(s).

The only real downside is exporting your executable into another environment where the various dynamic library versions might cause a problem. For that we have Docker these days. Just ship the entire package.

never_inline · on July 23, 2023

> If you are running a process that, for example, forks 128 of itself, do you want every library it uses to have a separate copy of that library in memory?

Fork uses CoW, right?

Expurple · on July 23, 2023

> it also speeds up load time if your executable doesn't have to load a huge memory image when it starts up

I'm not sure about Windows and Mac, but Linux uses "demand paging" and only loads the used pages of the executable as needed. It doesn't load the entire executable on startup

anotherhue · on July 23, 2023

Wouldn't KSM help with that? If the security concerns aren't a factor.

kaba0 · on July 23, 2023

> For that we have Docker these days

Or just a proper package manager, like nix.

MuffinCups · on Oct 5, 2020

Yes, inoreader, I even paid for it.

This thing is packed with features, yet simple to use. Not only is it the best RSS reader I've ever used, it's up there in the top 10 of all software, of any kind, I've ever used.

MuffinCups · on Oct 4, 2020

Anxiety Reduction: I embed realtime monitoring in my code (high performance, specialized HTTP servers) using Statsd and Graphite (Graphana). I have to monitor 132 servers around the world all running the same software. Realtime stats and graphs completely removes a whole level of anxiety about "what is going on with my code". Actively measuring things like.

1) Transactions per second 2) Total time to process HTTP request and return response 3) Timing of specific sections of code 4) Exception counts 5) Specific function counts (get as detailed as you need here) 6) Response time as viewed from an external requester (with percentile breakdown) 7) Version numbers 8) Anything you feel is important to know about your software

I always deploy canaries and compare the stats of the canary box to the ones running version canary-1. Any difference in stats must be investigated and explained as valid before a new version can be deployed beyond canary.

The greatest tool I have ever created for regression testing, I call "A/B test". I record incoming network traffic, right off the wire using "ngrep". I have a tool that plays back this traffic to multiple different destinations and compares the response from each. Any difference in response between old version and new version must be explained and not caused by bugs. This tool also reads log files and other forms of output and does the same type of comparison. Obviously some things are always different like random numbers in responses, unique cookies, timestamps can differ by 1 second sometimes etc. Those things are removed before comparing.

This tool almost 100% guarantees no unintended change of behavior slips through a new release. The only way something could slip through here is if some odd request that wasn't tested causes it, or something in the data that we don't compare (due to it always being different like random number) causes something. I run 10 million requests through this A/B test and a "no unexplained difference" result is required before fully deploying.