Thursday, August 17, 2017

Statistical Impossibilities


In Ethereum, data is stored by its hash (256 bit Keccak, to be exact).  Blocks point to their parents via hash, contracts are conjured up via hash, even that measly amount of Ether you purchased in a panic because the price was going up quickly back in April is stored in a data structure in a Patricia Merkle Trie on top of a LevelDB that you call up by....  256 bit Keccak hash.

I've trained a bunch of people on the guts of Ethereum over the last couple years, and every once and a while I get the hash collision question:

  -Hashes aren't unique.  Eventually, isn't someone gonna obliterate my Ether??

Yeah, hashes aren't unique.  This is kinda obvious by the fact that the number of hashable things (aka- *all-possible-data-sets*) is larger than the number of 256-bit hashes (ie- infinity > a finite value), so eventually you're gonna start having to reuse those hashes.  So it is technically *possible* that I will wake up one day and my super-secure-banking-smart-contract will have been replaced by a prototype for a soup sharing app....  But it isn't gonna happen.  It is a statistical impossibility.

Banks know this.  They are so confident that you aren't going to stumble upon an arbitrary large number, that they base their cryptography on it.

But, even if unlikely, as a developer, isn't it best practice to account for all possibilities?  Shouldn't you always write some code to (at least) warn that a catastrophe is occurring?

No.     No you should not.

For comparison, here are 10 things that are more likely than a single collision between two 256 bit numbers (a 1 in 1077 likelihood).

1. You will be run over and killed by a car *today*: 1 in 109

2. You will be killed by a lightning bolt: 1 in 107

3. A coconut will fall on your head, and you will die: 1 in 108

4. An asteroid will destroy humanity *this week*: 1 in 108

5. In the *same minute*, a television will fall on you and crush you, a tornado sweeps in from the sky and violently tosses you, then a shark eats you, all killing you.  Also, in that same minute, a coconut falls on your head, killing you again:
(1 in 1016) * (1 in 1012) * (1 in 1016) * (1 in 1016) = 1 in 1060

6. You will be the next president of the USA: 1 in 108
(assuming, of course, you are a citizen of the USA....  The chances go down a wee bit if not, but definitely not all the way down to 1 in 1077)

7. You will win Powerball 4 times in a row, and then you will win Mega Millions 4 times in a row: 1 in 1066

8. One day you will wake up with an uncontrollable Swedish accent, that will forever make it look like you are making fun of Swedish people: 1 in 1010
(it is only 1 in 108 that you will contract foreign accent syndrome, and extra two orders of magnitude that it be Swedish)

9. You have a plant growing inside you right now: (I don't know the probability, but it has happened at least once in our lifetime, so presumably higher than 1 in 7 billion)

10. The 10 top Ethereum developers all die of a heart attack tomorrow, leaving the Ethereum foundation in chaos: 1 in 1053



And remember, 1053 may sound like it's kinda *almost* as big as 1077, but that ain't how exponents work.  You would have to live in 1024 universes where the top 10 Ethereum developers die of a heart attack tomorrow (that is a billion billion million universes) before a single Keccak cracked.



But, you say, on the blockchain we have more than 2 hashes that can collide.  Certainly the chances of a single collision go up if you have, not 2, but many hashes floating around.  This is a very well understood problem, and it turns out that even with a billion hashes the probability of just one single collision just drops to about 1060.

OK, what about a deliberate hack that searches for a collision?  Could a shadowy figure engineer the downfall of the financial system with sufficient computing power?

Well, *maybe* Keccak will be mathematically cracked someday (or already, in a musky underground NSA lab), perhaps using a quantum computer, but unless this happens, I am not going to lose any sleep.  For instance,

1. Consider the brute force search, on a single super fast computer that can perform 1 billion Keccak hashes per second (real numbers are a few orders of magnitude slower).  Crack time would be 3*1060 years = 1050 "the-length-of-the-universe"s

Maybe you could string together a bunch of GPUs and speed this up using parallelization?

2. Consider the brute force search, on a massively parallel computer where we have converted all 1050 atoms on the planet earth (including all that lava) each to an individual, fully functional computing CPU core that can each perform 1 billion Keccaks in a second.  Crack time would be 1010 years = 1 "the-length-of-the-universe"s

or

3. Consider the brute force search, on a massively parallel computer where we have converted all 1068 atoms in the galaxy, each to an individual, fully functional computing CPU core that can each perform 1 billion Keccaks in a second: Crack time would be 0.01 seconds

Hey, 100 per second!  Now were are getting somewhere! (of course you will have to wait on average another 105 years for this information to propagate to you through the now-galaxy-computer at the speed of light).

If these laughable scenarios haven't convinced you of the impossibility of a collision, in case you still feel that it *just-can't-hurt* to add that code to deal with a hash collision, let me leave you with one final statistic:

1. The chances that the (difficult to test and never triggered) code you write to deal with a hash collision will add destabilizing bugs to the code: pretty close to 100%

No comments:

Post a Comment