HN Reader

I still find it wild that Godbolt is his actual name and not some cool term used for the tool to see what compiler output looks like.

1 month agoby dhosek

Isn't this the instruction that apparently the NSA asked for?

1 month agoby bombcar

As impressive as this analysis is by the compiler, I shudder to think how much time the compiler spends doing these kinds of optimizations. In my opinion, a better architecture would be for the compiler to provide a separate analysis tool that suggests source level changes for these kinds of optimizations. It could alert you that the loop could be replaced with a popcount (and optionally make the textual replacement in the source code for you). Then you pay the cost of the optimization once and have the benefit of clarity about what your code _actually_ runs at runtime instead of the compiler transparently pulling the rug out from underneath you when run with optimizations.

Side note: many years ago I wrote the backend for a private global surveillance system that has almost surely tracked the physical location of anyone reading this. We could efficiently track how often a device had been seen at a location in the prior 64 (days|weeks|months) in just 192 bytes and use popcount to compute the value. I am not proud that I built this.

1 month agoby thefaux

Hey! Popcount used to be my favorite instruction. Now I think I prefer LOP3 though :)

1 month agoby majke

There's a fun approach where do the computation as a tree in parallel. You do a little masking and shifting to add all the even-numbered bits to all the odd-numbered bits, and come up with a set of (assuming we're working on a 64-bit value) 32 partial sums of 2 bits each. Then you add pairs of those to get 16 partial sums of 4 bits each, and so forth until you get to the top. This requires six sums, plus shifts and masks for each one.

I don't know if compilers are able to detect this and compile it down to a single instruction, though.

1 month agoby wat10000

I <3 [AT]BM and BMI[12], and functional-equivalency matching with cost minimization optimization compiler passes.

I wish GCC and LLVM had compiler passes to semi/automagically "vectorize" hot sections using SIMD, i.e., magic transformation of UTF-8 conversion, regex matching, and string functions.

1 month agoby burnt-resistor

1 month agoby taeric