HN Reader

Revisiting Knuth's “Premature Optimization” Paper

190

139

I think the problem with the quote is that everyone forgets the line that comes after it.

  We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.

  vvvvvvvvvv
  Yet we should not pass up our opportunities in that critical 3%. A good programmer will not be lulled into complacency by such reasoning, he will be wise to look carefully at the critical code; but only after that code has been identified.
  ^^^^^^^^^^

This makes it clear, in context, that Knuth defines "Premature Optimization" as "optimizing before you profile your code"

@OP, I think you should lead with this. I think it gets lost by the time you actually reference it. If I can suggest, place the second paragraph after

  > People always use this quote wrong, and to get a feeling for that we just have to look at the original paper, and the context in which it was written.

The optimization part gets lost in the middle and this, I think, could help provide a better hook to those who aren't going to read the whole thing. Which I think how you wrote works good for that but the point (IMO) will be missed by more inattentive readers. The post is good also, so this is just a minor critique because I want to see it do better.

https://dl.acm.org/doi/10.1145/356635.356640 (alt) https://sci-hub.se/10.1145/356635.356640

2 days agoby godelski

I like this article. It’s easy to forget what these classic CS papers were about, and I think that leads to poorly applying them today. Premature optimisation of the kind of code discussed by the paper (counting instructions for some small loop) does indeed seem like a bad place to put optimisation efforts without a good reason, but I often see this premature optimisation quote used to:

- argue against thinking about any kind of design choice for performance reasons, eg the data structure decisions suggested in this article

- argue for a ‘fix it later’ approach to systems design. I think for lots of systems you have some ideas for how you would like them to perform, and you could, if you thought about it, often tell that some designs would never meet them, but instead you go ahead with some simple idea that handles the semantics without the performance only to discover that it is very hard to ‘optimise’ later.

2 days agoby dan-robertson

In practice repetition of this quote more often than not leads to a lazy attitude of "don't think about performance, it's too complicated, measure first and then let's talk".

In my experience all great programmers think about performance from the moment they start thinking about a problem.

Like, you're writing an O(n^2) nested loop over an array that's not going to be tiny? Get out of here! That just shouldn't happen and talking about "premature optimization" points in exactly the wrong direction.

1 day agoby svara

> Yet we should not pass up our opportunities in that critical 3 %

The funny thing is, we forgot how to write programs that spend most of their time in 3% of the code.

Profile a modern application and you see 50 level deep stacks and tiny slices of 0.3% of CPU time spent here and there. And yet these slices amount to 100%.

1 day agoby alexey-salmin

If you have benchmarked something then optimizations are not premature. People often used to 'optimize' code that was rarely run, often making the code harder to read for no gain.

beware too of premature pessimization. Don't write bubble sort just because you haven't benchmarked your code to show it is a bottleneck - which is what some will do and then incorrectly cite premature optimization when you tell them they should do better. Note that any compitenet languare has sort in the standard library that is better than bubble sort.

2 days agoby bluGill

This is my all-time favorite paper. It's so easy to read, and there's so much to think about, so much that still applies to everyday programming and language dedign.

Also there's Knuth admitting he avoids GO TO because he is afraid of being scolded by Edsger Dijkstra.

https://pic.plover.com/knuth-GOTO.pdf

2 days agoby mjd

> Usually people say “premature optimization is the root of all evil” to say “small optimizations are not worth it” but […] Instead what matters is whether you benchmarked your code and whether you determined that this optimization actually makes a difference to the runtime of the program.

In my experience the latter is actually often expressed. What else would “premature” mean, other than you don’t know yet whether the optimization is worth it?

The disagreement is usually more about small inefficiencies that may compound in the large but whose combined effects are difficult to assess, compiler/platform/environment-dependent optimizations that may be pessimizations elsewhere, reasoning about asymptotic runtime (which shouldn’t require benchmarking — but with cache locality effects sometimes it does), the validity of microbenchmarks, and so on.

2 days agoby layer8

I think the best way to go about optimization is to maintain a constant baseline level of mechanical sympathy with the computer's architecture.

If you burn into your soul the table of NUMA latencies and really accept the inevitable conclusions regarding practical computation, a lot of the optimization discussion simply boils down to keeping the closest cache as hot as possible as often as possible. Put differently, if you can get the working set to ~fit in L1, then you have nothing to worry about. The CPU is faster than most people can reason about now. You can talk to L1 1000x before you can talk to the GPU 1x. This stuff makes a ridiculous difference when applied correctly.

Context switching and communication between threads is where most developers begin to lose the rabbit. Very often a problem can be solved on one physical core much faster than if you force it to be solved across many physical cores.

1 day agoby bob1029

It is generally better just to focus on algorithmic complexity - O(xN^k). The first version of the code should bring code to the lowest possible k (unless N is very small then who cares). Worry about x later. Don't even think about parallelizing until k is minimized. Vectorize before parallelizing.

For parallel code, you basically have to know in advance that it is needed. You can't normally just take a big stateful / mutable codebase and throw some cores at it.

2 days agoby osigurdson

After 20 years of programming, I've came to the following realization:

On a long enough timeline, the probability that someone will call your code in a for-loop approaches 1.

1 day agoby meindnoch

It's also good practice to chose an appropriate language for the problem. It's not premature optimisation to use a compiled language instead of Python when you already know the code will run thousands of times in a loop because of the application and use a lot of electrical energy in the process.

10 hours agoby lo0dot0

- Knuth puts sentinels at the end of an array to avoid having to bounds check in the search. - Knuth uses the register keyword. - Knuth custom writes each data structure for each application.

2 days agoby monkeyelite

The real root of all evil is reasoning by unexamined phrases.

2 days agoby osigurdson

Love this paper and read it several times, most recently around 10 years ago when thinking about whether there were looping constructs missing from popular programming languages.

I have made the same point several times online and in person that the famous quote is misunderstood and often suggest people take the time to go back to the source and read it since it’s a wonderful read.

2 days agoby subharmonicon

I understand the Knuth's _Premature Optimization_ saying as: (1) find the most hot code (best way to have full heat chart), (2) optimize there. That's all of it, and it really works. E.g. If you have some code that prepares data for a loop that is n^2, and you work hard on this data preparation, profile it etc., this will not give you any visible improvement, because this is one-time code. What's inside the n^2 loop is more important and optimize there, even work together and prepare the data better to allow the loop work more efficient.

1 day agoby p0w3n3d

I happen to have also reread the paper last week. The last time I read it was 30 years ago, closer to when it was written than to the present, and I had forgotten a lot of it. One of the few bits that stuck with me was the Shanley Design Criterion. Another was "n and a half loops".

The bit about sequential search and optimization, the topic of this whole blog post, is kind of a minor detail in the paper, despite being so eloquently phrased that it's the part everyone quotes—sometimes without even knowing what “optimization” is. There's a table of contents on its second page, which is 33 lines long, of which "A Searching Example" and "Efficiency" are lines 4 and 5. They are on pages 266–269, 3 pages of a 41-page paper. (But efficiency is an issue he considers throughout.)

Mostly the paper is about control structures, and in particular how we can structure our (imperative!) programs to permit formal proofs of correctness. C either didn't exist yet or was only known to less than five people, so its break/continue control structures were not yet the default. Knuth talks about a number of other options that didn't end up being popular.

It was a really interesting reminder of how different things looked 51 years ago. Profilers had just been invented (by Dan Ingalls, apparently?) and were still not widely available. Compilers usually didn't do register allocation. Dynamically typed languages and functional programming existed, in the form of Lisp and APL, but were far outside the mainstream because they were so inefficient. You could reliably estimate a program's speed by counting machine instructions. People were sincerely advocating using loops that didn't allow "break", and subroutines without early "return", in the interest of building up their control flow algebraically. Knuth considered a recursive search solution to the N-queens problem to be interesting enough to mention it in CACM; similarly, he explains the tail-recursion optimization as if it's not novel but at least a bit recondite, requiring careful explanation.

He mentions COBOL, BCPL, BLISS, Algol W, Algol 60, Algol 68, other Algols, PL/I (in fact including some example code), Fortran, macro assemblers, "structured assemblers", "Wirth's Pascal language [97]", Lisp, a PDP-10 Algol compiler called SAIL (?), META-II, MIXAL, PL360, and something called XPL, but not Smalltalk, CLU, APL, FORTH, or BASIC.

He points out that it would be great for languages to bundle together the set of subroutines for operating on a particular data type, as Smalltalk and CLU did, but he doesn't mention CLU; it had only been introduced in the previous year. But he's clearly thinking along those lines (p.295):

> (...) it turns out that a given level of abstraction often involves several related routines and data definitions; for example, when we decide to represent a table in a certain way, we also want to specify the routines for storing and fetching data from that table. The next generation of languages will probably take into account such related routines.

Often when I read old papers, or old software documentation like the TENEX EXEC manual, it's obvious why the paths not taken were not taken; what we ended up doing is just obviously better. This paper is not like that. Most of the alternatives mentioned seem like they might have turned out just as well as what we ended up with.

1 day agoby kragen

I mean basically what he's saying is check the impact of your optimization. Every time there's an optimization, theres a complexity and brittleness cost. However sometimes unoptimized code is actually more difficult to read than the optimized version. I mean its quite logical tbf.

1 day agoby another_twist

Wait... people understood "premature optimisation" to mean "small optimisations are not worth it"? I've always understood it to mean exactly what it's supposed to mean, namely don't optimise something until you've shown that it's actually needed. I honestly don't know how it could be interpreted any other way.

1 day agoby globular-toast

Ironically, all pdfs of the famous paper have atrocious typesetting and are a pain to read.

2 days agoby yubblegum

(meta) I am probably wasting my time commenting on linked article here. Nobody does that /s.

I don't think the measurements support conclusion that well.

What I want to have when I see those measurements:

I want language abstract machine and compiler to not get in the way I want code on certain platforms to perform. This is currently not what I get at all. The language is actively working against me. For example, I can not read cache line from an address because my object may not span long enough. The compiler has its own direction at best. This means there is no way to specify things, and all I can do is test compiler output after each minor update. In a multi-year project such updates can happen dozens of times! The ergonomics of trying to specify things actually getting worse. The example with assembly is similar to my other experiences: the compiler ignores even intrinsics now. If it wants to optimize, it does.

I can't run to someone else for magic library solutions every time I need to write code. I need to be able to get things done in a reasonable amount of time. It is my organization development process that should decide if the solution I used should be part of some library or not. It usually means that efforts that cover only some platforms and only some libraries are not that universally applicable to my platforms and my libraries as folks at language conferences think /s.

Disclaimer. I work in gamedev.

1 day agoby SleepyMyroslav

the reason were still arguing over this is because knuth cant say things concisely; he didnt communicate this idea succintly enough

1 day agoby skeezyboy

I understood it to mean that optimising for speed at the expense of size is a bad idea unless there are extremely obvious performance improvements in doing so. By default you should always optimise for size.

2 days agoby userbinator

The famous "premature optimization" quote isn't from a dedicated paper on optimization, but from Knuth's 1974 "Structured Programming with go to Statements" paper where he was discussing broader programming methodology.

2 days agoby ethan_smith

It's no longer relevant. It was written when people were writing IBM operating systems in assembly language.

Things have changed.

Remember this: "speed is a feature"?

If you need fast softwqare to make it appealing then make it fast.

2 days agoby wewewedxfgdf