▲The Pentium FDIV bug, reverse-engineeredoldbytes.space

213 points by croes 210 days ago | 51 comments

jghn 208 days ago [-]

An anecdote regarding this bug that always cracks me up. My college roommate showed up with a shiny new pentium machine that year, and kept bragging about how awesome it was. We used some math software called Maple that was pretty intensive for PCs at the time, and he thought he was cool because he could do his homework on his PC instead of on one of the unix machines in the lab.

Except that he kept getting wrong answers on his homework.

And then he realized that when he did it on one of the unix machines, he got correct answers.

And then a few months later he realized why ....

Timwi 208 days ago [-]

“some math software called Maple”. I still use a version of Maple that I bought in 1998. I found subsequent versions of it much harder to use and I've never found an open-source software that could do what it can do. I don't need anything fancy, just occasionally solve an equation or a system of equations, or maybe plot a simple function. That old copy of Maple continues to serve me extremely well.

zozbot234 208 days ago [-]

I assume that you're familiar with Maxima? It's perhaps the most commonly used open-source CAS - there are some emerging alternatives like Sympy.

That aside, yes it's interesting that an old program from 1998 can still serve us quite well.

vrighter 206 days ago [-]

I avoid anything with the "-py" suffix like the plague. It's pretty much a guarantee that anything I write today, will not run next year. I'd stick with maxima. With wxMaxima, it's actually quite nice.

hulitu 207 days ago [-]

> Maxima

Never heard about it. BLAS, Octave, Scilab

xmcqdpt2 207 days ago [-]

Those are not CAS though, they are numerical software. Maxima main features are based on symbolic transformations.

mleo 208 days ago [-]

The mention of Maple brings back vivid memories of freshmen year of college when the math department decided to use the software as part of instruction and no one understood how to use it. There was a near revolt by the students.

kens 208 days ago [-]

Coincidentally, I was a developer on Maple for a summer, working on definite integration and other things.

rikthevik 207 days ago [-]

I've never used Maple for work, only for survival. We had a Calculus for Engineers textbook that was so riddled with errors I had to use a combination of the non-Engineering textbook and Maple to figure out wtf was going on. Bad textbook, bad prof, bad TAs and six other classes. What a mess. Those poor undergrads...

bee_rider 208 days ago [-]

And you know a week before the semester, the TA’s were given given a copy and told to figure it out.

WalterBright 208 days ago [-]

> were given given a copy and told to figure it out

Welcome to engineering. That's what we do. It's all we do.

Moru 208 days ago [-]

But you get paid to do it, teachers are expected to do this on their off time. :-)

"You know so much about computers, it's easy for you to figure it out!"

WalterBright 207 days ago [-]

I.e. the teachers are expected to learn the tools they want to teach the students how to use? The horror!

bee_rider 207 days ago [-]

Because of the way you did your quote, we’ve switched from talking about the TA’s to the teachers themselves. This is sort of different, the teaching assistants don’t really have teaching goals and they usually aren’t part of the decision making process for picking what gets taught, at least where I’ve been.

Anyway, as far as “learn the tools they want to teach the students how to use” goes, I dunno, hard to say. I wouldn’t be that surprised to hear that some department head got the wrong-headed idea that students needed to learn more practical tools and skills, shit rolled downhill, and some professor got stuck with the job.

Usually professors aren’t expert tool users after all, they are there for theory stuff. Hopefully nobody is expecting to become a vscode expert by listening to some professor who uses eMacs or vim like a proper grey-beard.

WalterBright 207 days ago [-]

A son and his father I know, both engineers, went to an aviation meet where on display was a radial aircraft engine, with cutaways. They spent a happy hour going through every bit of it to figure out what it did and how it worked. They paid to go there.

Reminds me of when I was able to climb aboard an old steam locomotive and spent a wonderful hour tracing all the pipes and tubes and controls and figuring out what they did.

A good engineering college doesn't actually teach engineering. They teach how to learn engineering. Then you teach yourself.

molticrystal 208 days ago [-]

The story was posted a couple days ago and ken left a couple comments there: https://news.ycombinator.com/item?id=42388455

I look forward to the promised proper write up that should be out soon.

pests 208 days ago [-]

There is a certain theme of posts on HN where I am just certain the author is gonna be Ken and again not disappointed.

mega_dingus 208 days ago [-]

Oh to remember mid-90s humor

How many Intel engineers does it take to change a light bulb? 0.99999999

zoky 208 days ago [-]

Why didn’t Intel call the Pentium the 586? Because they added 486+100 on the first one they made and got 585.999999987.

Dalewyn 208 days ago [-]

Amusing joke, but it actually is effectively called the 586 because the internal name is P5 and Penta from which Pentium is derived is 5.[1]

Incidentally, Pentium M to Intel Core through 16th gen Lunarrow Lake all identify as P6 ("Family 6") for 686 because they are all based off of the Pentium 3.

[1]: https://en.wikipedia.org/wiki/Pentium_(original)

xattt 208 days ago [-]

You’re missing the fact that Intel wanted to differentiate itself from the growing IA-32 clone chips from AMD and Cyrix. 586 couldn’t be trademarked, but Pentium could.

zusammen 208 days ago [-]

The main reason is that it’s impossible to trademark a model number.

Also, Weird Al would not have been able to make a song, “It’s All About the 586es.” It just doesn’t scan.

nayuki 208 days ago [-]

Also, as per the page:

> Intel used the Pentium name instead of 586, because in 1991, it had lost a trademark dispute over the "386" trademark, when a judge ruled that the number was generic.

perdomon 208 days ago [-]

This dude pulled out a microscope and said "there's your problem." Super impressive work. Really great micro-read.

layer8 208 days ago [-]

To be fair, he knew what the problem was (errors in the lookup table) beforehand.

pests 208 days ago [-]

It’s not like the microscope is going to show “a lookup table” though. You need to know how it was implemented in silicon, how transistors are integrated into the silicon, etc to even start identifying the actual physical mistake.

sgerenser 208 days ago [-]

I wonder what node generation doing this type of thing becomes impossible (due to features being too small to be visible with an optical microscope)? I would have guessed sometime before the first Pentium, but obviously not.

poizan42 207 days ago [-]

I think the limit for discernible features when using an optical microscope is somewhere around 200nm. This would put the limit somewhere around the 250nm node size, which was used around 1993-1998.

ornornor 206 days ago [-]

These things completely fly above my head. I wish I could understand them because it’s pretty cool, but I got lost at the very succinct SRT explanation and it went downhill from there. Still, die shots are always pretty to look at.

Cumpiler69 208 days ago [-]

This is probably one of the reasons Intel went to a microcode architecture after.

I wonder how many yet to be discover silicone bugs are out there on modern chips?

Lammy 208 days ago [-]

Older Intel CPUs were already using microcode. Intel went after NEC with a copyright case over 8086 microcode, and after AMD with a copyright case over 287/386/486 microcode:

- https://thechipletter.substack.com/p/intel-vs-nec-the-case-o...

- https://www.upi.com/Archives/1994/03/10/Jury-backs-AMD-in-di...

I would totally believe the FDIV bug is why Intel went to a patchable microcode architecture however. See “Intel P6 Microcode Can Be Patched — Intel Discloses Details of Download Mechanism for Fixing CPU Bugs (1997)” https://news.ycombinator.com/item?id=35934367

kens 208 days ago [-]

Intel used microcode starting with the 8086. However, patchable microcode wasn't introduced until the Pentium Pro. The original purpose was for testing, being able to run special test microcode routines. But after the Pentium, Intel realized that being able to patch microcode was also good for fixing bugs in the field.

peterfirefly 208 days ago [-]

Being able to patch the microcode only solves part of the possible problems a CPU can have.

My guess -- and I hope you can confirm it at some point in the future -- is that more modern CPUs can patch other data structures as well. Perhaps the TLB walker state machine, perhaps some tables involved in computation (like the FDIV table), almost certainly some of the decoder machinery.

How does one make a patchable parallel multi-stage decoder is what I'd really like to know!

Tuna-Fish 207 days ago [-]

Mostly, you can turn off parts of the CPU (so called chicken bits). They are invaluable for validation, but they have also been frequently used for for fixing broken CPUs. Most recently AMD just recently turned off their loop buffer in Zen4: https://chipsandcheese.com/p/amd-disables-zen-4s-loop-buffer

208 days ago [-]

wmf 208 days ago [-]

They always used microcode: https://www.righto.com/2022/11/how-8086-processors-microcode...

I'm not sure when Intel started supporting microcode updates but I think it was much later.

LukeShu 208 days ago [-]

Microcode updates came the very next generation with the Pentium Pro.

userbinator 208 days ago [-]

Look at how long the public errata lists are, and use that as a lower bound.

KingLancelot 208 days ago [-]

Silicone is plastic, Silicon is the element.

jghn 208 days ago [-]

Not according to General Beringer of NORAD! :) [1]

[1] https://youtu.be/iRsycWRQrc8?t=82

Thaxll 208 days ago [-]

So nowdays this table could have been fixed with a microcode update right?

phire 208 days ago [-]

The table couldn't be fixed. But it can be bypassed.

The microcode update would need to disable the entire FDIV instruction and re-implement it without using any floating point hardware at all, at least for the problematic devisors. It would be as slow as the software workarounds for the FDIV bug (average penalty for random divisors was apparently 50 cycles).

The main advantage of a microcode update is that all FDIVs are automatically intercepted system-wide, while the software workarounds needed to somehow find and replace all FDIVs in the target software. Some did it by recompiling, others scanned for FDIV instructions in machine code and replaced them; Both approaches were problematic and self-modifying code would be hard to catch.

A microcode update "might" have allowed Intel to argue their way out of an extensive recall. But 50 cycles on average is a massive performance hit, FDIV takes 19 cycles for single-precision. BTW, this microcode update would have killed performance in quake, which famously depended on floating point instructions (especially the expensive FDIV) running in parallel with integer instructions.

hakfoo 208 days ago [-]

It's interesting that there's no "trap instruction/sequence" feature built into the CPU architecture. That would presumably be valuable for auditing and debugging, or to create a custom instruction by trapping an otherwise unused bit sequence.

Tuna-Fish 207 days ago [-]

There is today, for all the reasons you state. Transistor budgets were tighter back then.

j16sdiz 208 days ago [-]

> create a custom instruction by trapping an otherwise unused bit sequence. ..

... until a new CPU support an instruction extension that use the same bit sequence.

immibis 207 days ago [-]

That's why architectures - including x86! - have opcodes they promise will always be undefined.

colejohnson66 206 days ago [-]

Those undefined opcodes are more for testing your fault handler or for forcing a trap in an “unreachable” part of your code, not to implement custom instructions.

immibis 206 days ago [-]

They can be used for anything you want. The CPU treats them as described in the document.

They're not fast, though. You probably don't want to run trapped fake instructions in tight loops, but you don't want that no matter how they're implemented.

phire 206 days ago [-]

Yeah...

Part of the reason why existing functionality is limited and nobody uses them to implement custom instructions, is that trapping is actually quite expensive.

The switch to kernel space and back is super expensive, but even if a CPU did implemente fast userspace trap handlers, it wouldn't be fast.

jeffbee 208 days ago [-]

With a microcode update that ruins FDIV performance, sure. Even at that time there were CPUs still using microcoded division, like the AMD K5.

Netch 207 days ago [-]

This division, using SRT loop with 2 bit output per iteration, perhaps would have already been microcoded - but using the lookup table as an accelerator. An alternative could use a simpler approach (e.g. 1-bit-iteration "non-restoring" division). Longer but still fitting into normal range.

But if they had understood possible aftermath of non-tested block they would have implemented two blocks, and switch to older one if misworking was detected.

tliltocatl 207 days ago [-]

That was before dark silicon and real estate was still somewhat expensive, so probably having two blocks wasn't really an option.

Loading comments...

jghn 208 days ago [-]

Except that he kept getting wrong answers on his homework.

And then he realized that when he did it on one of the unix machines, he got correct answers.

And then a few months later he realized why ....

Timwi 208 days ago [-]

zozbot234 208 days ago [-]

I assume that you're familiar with Maxima? It's perhaps the most commonly used open-source CAS - there are some emerging alternatives like Sympy.

That aside, yes it's interesting that an old program from 1998 can still serve us quite well.

vrighter 206 days ago [-]

hulitu 207 days ago [-]

> Maxima

Never heard about it. BLAS, Octave, Scilab

xmcqdpt2 207 days ago [-]

Those are not CAS though, they are numerical software. Maxima main features are based on symbolic transformations.

mleo 208 days ago [-]

kens 208 days ago [-]

Coincidentally, I was a developer on Maple for a summer, working on definite integration and other things.

rikthevik 207 days ago [-]

bee_rider 208 days ago [-]

And you know a week before the semester, the TA’s were given given a copy and told to figure it out.

WalterBright 208 days ago [-]

> were given given a copy and told to figure it out

Welcome to engineering. That's what we do. It's all we do.

Moru 208 days ago [-]

But you get paid to do it, teachers are expected to do this on their off time. :-)

"You know so much about computers, it's easy for you to figure it out!"

WalterBright 207 days ago [-]

I.e. the teachers are expected to learn the tools they want to teach the students how to use? The horror!

bee_rider 207 days ago [-]

WalterBright 207 days ago [-]

Reminds me of when I was able to climb aboard an old steam locomotive and spent a wonderful hour tracing all the pipes and tubes and controls and figuring out what they did.

A good engineering college doesn't actually teach engineering. They teach how to learn engineering. Then you teach yourself.

molticrystal 208 days ago [-]

The story was posted a couple days ago and ken left a couple comments there: https://news.ycombinator.com/item?id=42388455

I look forward to the promised proper write up that should be out soon.

pests 208 days ago [-]

There is a certain theme of posts on HN where I am just certain the author is gonna be Ken and again not disappointed.

mega_dingus 208 days ago [-]

Oh to remember mid-90s humor

How many Intel engineers does it take to change a light bulb? 0.99999999

zoky 208 days ago [-]

Why didn’t Intel call the Pentium the 586? Because they added 486+100 on the first one they made and got 585.999999987.

Dalewyn 208 days ago [-]

Amusing joke, but it actually is effectively called the 586 because the internal name is P5 and Penta from which Pentium is derived is 5.[1]

Incidentally, Pentium M to Intel Core through 16th gen Lunarrow Lake all identify as P6 ("Family 6") for 686 because they are all based off of the Pentium 3.

[1]: https://en.wikipedia.org/wiki/Pentium_(original)

xattt 208 days ago [-]

You’re missing the fact that Intel wanted to differentiate itself from the growing IA-32 clone chips from AMD and Cyrix. 586 couldn’t be trademarked, but Pentium could.

zusammen 208 days ago [-]

The main reason is that it’s impossible to trademark a model number.

Also, Weird Al would not have been able to make a song, “It’s All About the 586es.” It just doesn’t scan.

nayuki 208 days ago [-]

Also, as per the page:

> Intel used the Pentium name instead of 586, because in 1991, it had lost a trademark dispute over the "386" trademark, when a judge ruled that the number was generic.

perdomon 208 days ago [-]

This dude pulled out a microscope and said "there's your problem." Super impressive work. Really great micro-read.

layer8 208 days ago [-]

To be fair, he knew what the problem was (errors in the lookup table) beforehand.

pests 208 days ago [-]

sgerenser 208 days ago [-]

poizan42 207 days ago [-]

I think the limit for discernible features when using an optical microscope is somewhere around 200nm. This would put the limit somewhere around the 250nm node size, which was used around 1993-1998.

ornornor 206 days ago [-]

Cumpiler69 208 days ago [-]

This is probably one of the reasons Intel went to a microcode architecture after.

I wonder how many yet to be discover silicone bugs are out there on modern chips?

Lammy 208 days ago [-]

Older Intel CPUs were already using microcode. Intel went after NEC with a copyright case over 8086 microcode, and after AMD with a copyright case over 287/386/486 microcode:

- https://thechipletter.substack.com/p/intel-vs-nec-the-case-o...

- https://www.upi.com/Archives/1994/03/10/Jury-backs-AMD-in-di...

kens 208 days ago [-]

peterfirefly 208 days ago [-]

Being able to patch the microcode only solves part of the possible problems a CPU can have.

How does one make a patchable parallel multi-stage decoder is what I'd really like to know!

Tuna-Fish 207 days ago [-]

208 days ago [-]

wmf 208 days ago [-]

They always used microcode: https://www.righto.com/2022/11/how-8086-processors-microcode...

I'm not sure when Intel started supporting microcode updates but I think it was much later.

LukeShu 208 days ago [-]

Microcode updates came the very next generation with the Pentium Pro.

userbinator 208 days ago [-]

Look at how long the public errata lists are, and use that as a lower bound.

KingLancelot 208 days ago [-]

Silicone is plastic, Silicon is the element.

jghn 208 days ago [-]

Not according to General Beringer of NORAD! :) [1]

[1] https://youtu.be/iRsycWRQrc8?t=82

Thaxll 208 days ago [-]

So nowdays this table could have been fixed with a microcode update right?

phire 208 days ago [-]

The table couldn't be fixed. But it can be bypassed.

hakfoo 208 days ago [-]

Tuna-Fish 207 days ago [-]

There is today, for all the reasons you state. Transistor budgets were tighter back then.

j16sdiz 208 days ago [-]

> create a custom instruction by trapping an otherwise unused bit sequence. ..

... until a new CPU support an instruction extension that use the same bit sequence.

immibis 207 days ago [-]

That's why architectures - including x86! - have opcodes they promise will always be undefined.

colejohnson66 206 days ago [-]

Those undefined opcodes are more for testing your fault handler or for forcing a trap in an “unreachable” part of your code, not to implement custom instructions.

immibis 206 days ago [-]

They can be used for anything you want. The CPU treats them as described in the document.

They're not fast, though. You probably don't want to run trapped fake instructions in tight loops, but you don't want that no matter how they're implemented.

phire 206 days ago [-]

Yeah...

Part of the reason why existing functionality is limited and nobody uses them to implement custom instructions, is that trapping is actually quite expensive.

The switch to kernel space and back is super expensive, but even if a CPU did implemente fast userspace trap handlers, it wouldn't be fast.

jeffbee 208 days ago [-]

With a microcode update that ruins FDIV performance, sure. Even at that time there were CPUs still using microcoded division, like the AMD K5.

Netch 207 days ago [-]

But if they had understood possible aftermath of non-tested block they would have implemented two blocks, and switch to older one if misworking was detected.

tliltocatl 207 days ago [-]

That was before dark silicon and real estate was still somewhat expensive, so probably having two blocks wasn't really an option.