Is there really no use for MD5 anymore?SHA512 faster than SHA256?Does “Shattered” actually show SHA-1-signed certificates are “unsafe”?How many trials does it take to break HMAC-MD5?Does it matter if I publish only publish good or bad MD5 hashes after recovering from a hack?What differentiates a password hash from a cryptographic hash besides speed?A question regarding relevance of vulnerability of MD5 when linking multiple records togetherCould a very long password theoretically eliminate the need for a slow hash?TCR hash functions from MD5Collision attacks on digital signaturesChecksum vs. non-cryptographic hashIs using a broken SHA-1 for password hashing secure?Unable to implement Client and Server Side Hashing (Validation problem)Keyspace in truncated MD5 hash?Very difficult hashing function?

How to solve constants out of the internal energy equation?

Is there a way to get a compiler for the original B programming language?

French for 'It must be my imagination'?

Interpret a multiple linear regression when Y is log transformed

How can I place the product on a social media post better?

a sore throat vs a strep throat vs strep throat

Is the claim "Employers won't employ people with no 'social media presence'" realistic?

A strange hotel

Can SQL Server create collisions in system generated constraint names?

Which big number is bigger?

How can Zone of Truth be defeated without the caster knowing?

Symbolic Multivariate Distribution

How to reduce LED flash rate (frequency)

Does Gita support doctrine of eternal cycle of birth and death for evil people?

Exchange,swap or switch

Minor Revision with suggestion of an alternative proof by reviewer

How could Tony Stark make this in Endgame?

As an international instructor, should I openly talk about my accent?

Binary Numbers Magic Trick

Apply MapThread to all but one variable

Why do games have consumables?

How to creep the reader out with what seems like a normal person?

Why did C use the -> operator instead of reusing the . operator?

how to find the equation of a circle given points of the circle



Is there really no use for MD5 anymore?


SHA512 faster than SHA256?Does “Shattered” actually show SHA-1-signed certificates are “unsafe”?How many trials does it take to break HMAC-MD5?Does it matter if I publish only publish good or bad MD5 hashes after recovering from a hack?What differentiates a password hash from a cryptographic hash besides speed?A question regarding relevance of vulnerability of MD5 when linking multiple records togetherCould a very long password theoretically eliminate the need for a slow hash?TCR hash functions from MD5Collision attacks on digital signaturesChecksum vs. non-cryptographic hashIs using a broken SHA-1 for password hashing secure?Unable to implement Client and Server Side Hashing (Validation problem)Keyspace in truncated MD5 hash?Very difficult hashing function?













21












$begingroup$


I read an article about password schemes that makes two seemingly conflicting claims:




MD5 is broken; it’s too slow to use as a general purpose hash; etc



The problem is that MD5 is fast




I know that MD5 should not be used for password hashing, and that it also should not be used for integrity checking of documents. There are way too many sources citing MD5 preimaging attacks and MD5s low computation time.



However, I was under the impression that MD5 still can be used as a non-cryptgraphic hash function:



  1. Identifying malicious files, such as when Linux Mint's download servers were compromised and an ISO file was replaced by a malicious one; in this case you want to be sure that your file doesn't match; collision attacks aren't a vector here.

  2. Finding duplicate files. By MD5-summing all files in a directory structure it's easy to find identical hashes. The seemingly identical files can then be compared in full to check if they are really identical. Using SHA512 would make the process slower, and since we compare files in full anyway there is no risk in a potential false positive from MD5. (In a way, this would be creating a rainbow table where all the files are the dictionary)

There are checksums of course, but from my experience, the likelihood of finding two different files with the same MD5 hash is very low as long as we can rule out foul play.



When the password scheme article states that "MD5 is fast", it clearly refers to the problem that hashing MD5 is too cheap when it comes to hashing a large amount of passwords to find the reverse of a hash. But what does it mean when it says that "[MD5 is] too slow to use as a general purpose hash"? Are there faster standardized hashes to compare files, that still have a reasonably low chance of collision?










share|improve this question









$endgroup$







  • 5




    $begingroup$
    The collision attack is the problem, however, MD5 still has pre-image resistance.
    $endgroup$
    – kelalaka
    2 days ago






  • 3




    $begingroup$
    "Using SHA512 would make the process slower ..." - On my system, openssl speed reports 747MB/s for MD5, and 738MB/s for SHA-512, so that's barely a difference ;)
    $endgroup$
    – marcelm
    yesterday






  • 2




    $begingroup$
    I believe MD5 can still be used as a PRF.
    $endgroup$
    – jww
    yesterday







  • 1




    $begingroup$
    For longer files on amd64 processors with optimized implementations, I think SHA-512 is one of the faster hashing algorithms (faster than SHA-256 as the type sizes are easier to manipulate).
    $endgroup$
    – Nick T
    yesterday






  • 1




    $begingroup$
    Fun fact: There's still a use for MD4 on millions of devices. The popular ext4 filesystem in Linux uses half-MD4 internally to generate a hash tree. I'm sure there are plenty of other uses too, even for MD5.
    $endgroup$
    – forest
    14 hours ago
















21












$begingroup$


I read an article about password schemes that makes two seemingly conflicting claims:




MD5 is broken; it’s too slow to use as a general purpose hash; etc



The problem is that MD5 is fast




I know that MD5 should not be used for password hashing, and that it also should not be used for integrity checking of documents. There are way too many sources citing MD5 preimaging attacks and MD5s low computation time.



However, I was under the impression that MD5 still can be used as a non-cryptgraphic hash function:



  1. Identifying malicious files, such as when Linux Mint's download servers were compromised and an ISO file was replaced by a malicious one; in this case you want to be sure that your file doesn't match; collision attacks aren't a vector here.

  2. Finding duplicate files. By MD5-summing all files in a directory structure it's easy to find identical hashes. The seemingly identical files can then be compared in full to check if they are really identical. Using SHA512 would make the process slower, and since we compare files in full anyway there is no risk in a potential false positive from MD5. (In a way, this would be creating a rainbow table where all the files are the dictionary)

There are checksums of course, but from my experience, the likelihood of finding two different files with the same MD5 hash is very low as long as we can rule out foul play.



When the password scheme article states that "MD5 is fast", it clearly refers to the problem that hashing MD5 is too cheap when it comes to hashing a large amount of passwords to find the reverse of a hash. But what does it mean when it says that "[MD5 is] too slow to use as a general purpose hash"? Are there faster standardized hashes to compare files, that still have a reasonably low chance of collision?










share|improve this question









$endgroup$







  • 5




    $begingroup$
    The collision attack is the problem, however, MD5 still has pre-image resistance.
    $endgroup$
    – kelalaka
    2 days ago






  • 3




    $begingroup$
    "Using SHA512 would make the process slower ..." - On my system, openssl speed reports 747MB/s for MD5, and 738MB/s for SHA-512, so that's barely a difference ;)
    $endgroup$
    – marcelm
    yesterday






  • 2




    $begingroup$
    I believe MD5 can still be used as a PRF.
    $endgroup$
    – jww
    yesterday







  • 1




    $begingroup$
    For longer files on amd64 processors with optimized implementations, I think SHA-512 is one of the faster hashing algorithms (faster than SHA-256 as the type sizes are easier to manipulate).
    $endgroup$
    – Nick T
    yesterday






  • 1




    $begingroup$
    Fun fact: There's still a use for MD4 on millions of devices. The popular ext4 filesystem in Linux uses half-MD4 internally to generate a hash tree. I'm sure there are plenty of other uses too, even for MD5.
    $endgroup$
    – forest
    14 hours ago














21












21








21


3



$begingroup$


I read an article about password schemes that makes two seemingly conflicting claims:




MD5 is broken; it’s too slow to use as a general purpose hash; etc



The problem is that MD5 is fast




I know that MD5 should not be used for password hashing, and that it also should not be used for integrity checking of documents. There are way too many sources citing MD5 preimaging attacks and MD5s low computation time.



However, I was under the impression that MD5 still can be used as a non-cryptgraphic hash function:



  1. Identifying malicious files, such as when Linux Mint's download servers were compromised and an ISO file was replaced by a malicious one; in this case you want to be sure that your file doesn't match; collision attacks aren't a vector here.

  2. Finding duplicate files. By MD5-summing all files in a directory structure it's easy to find identical hashes. The seemingly identical files can then be compared in full to check if they are really identical. Using SHA512 would make the process slower, and since we compare files in full anyway there is no risk in a potential false positive from MD5. (In a way, this would be creating a rainbow table where all the files are the dictionary)

There are checksums of course, but from my experience, the likelihood of finding two different files with the same MD5 hash is very low as long as we can rule out foul play.



When the password scheme article states that "MD5 is fast", it clearly refers to the problem that hashing MD5 is too cheap when it comes to hashing a large amount of passwords to find the reverse of a hash. But what does it mean when it says that "[MD5 is] too slow to use as a general purpose hash"? Are there faster standardized hashes to compare files, that still have a reasonably low chance of collision?










share|improve this question









$endgroup$




I read an article about password schemes that makes two seemingly conflicting claims:




MD5 is broken; it’s too slow to use as a general purpose hash; etc



The problem is that MD5 is fast




I know that MD5 should not be used for password hashing, and that it also should not be used for integrity checking of documents. There are way too many sources citing MD5 preimaging attacks and MD5s low computation time.



However, I was under the impression that MD5 still can be used as a non-cryptgraphic hash function:



  1. Identifying malicious files, such as when Linux Mint's download servers were compromised and an ISO file was replaced by a malicious one; in this case you want to be sure that your file doesn't match; collision attacks aren't a vector here.

  2. Finding duplicate files. By MD5-summing all files in a directory structure it's easy to find identical hashes. The seemingly identical files can then be compared in full to check if they are really identical. Using SHA512 would make the process slower, and since we compare files in full anyway there is no risk in a potential false positive from MD5. (In a way, this would be creating a rainbow table where all the files are the dictionary)

There are checksums of course, but from my experience, the likelihood of finding two different files with the same MD5 hash is very low as long as we can rule out foul play.



When the password scheme article states that "MD5 is fast", it clearly refers to the problem that hashing MD5 is too cheap when it comes to hashing a large amount of passwords to find the reverse of a hash. But what does it mean when it says that "[MD5 is] too slow to use as a general purpose hash"? Are there faster standardized hashes to compare files, that still have a reasonably low chance of collision?







hash md5






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked 2 days ago









jornanejornane

24825




24825







  • 5




    $begingroup$
    The collision attack is the problem, however, MD5 still has pre-image resistance.
    $endgroup$
    – kelalaka
    2 days ago






  • 3




    $begingroup$
    "Using SHA512 would make the process slower ..." - On my system, openssl speed reports 747MB/s for MD5, and 738MB/s for SHA-512, so that's barely a difference ;)
    $endgroup$
    – marcelm
    yesterday






  • 2




    $begingroup$
    I believe MD5 can still be used as a PRF.
    $endgroup$
    – jww
    yesterday







  • 1




    $begingroup$
    For longer files on amd64 processors with optimized implementations, I think SHA-512 is one of the faster hashing algorithms (faster than SHA-256 as the type sizes are easier to manipulate).
    $endgroup$
    – Nick T
    yesterday






  • 1




    $begingroup$
    Fun fact: There's still a use for MD4 on millions of devices. The popular ext4 filesystem in Linux uses half-MD4 internally to generate a hash tree. I'm sure there are plenty of other uses too, even for MD5.
    $endgroup$
    – forest
    14 hours ago













  • 5




    $begingroup$
    The collision attack is the problem, however, MD5 still has pre-image resistance.
    $endgroup$
    – kelalaka
    2 days ago






  • 3




    $begingroup$
    "Using SHA512 would make the process slower ..." - On my system, openssl speed reports 747MB/s for MD5, and 738MB/s for SHA-512, so that's barely a difference ;)
    $endgroup$
    – marcelm
    yesterday






  • 2




    $begingroup$
    I believe MD5 can still be used as a PRF.
    $endgroup$
    – jww
    yesterday







  • 1




    $begingroup$
    For longer files on amd64 processors with optimized implementations, I think SHA-512 is one of the faster hashing algorithms (faster than SHA-256 as the type sizes are easier to manipulate).
    $endgroup$
    – Nick T
    yesterday






  • 1




    $begingroup$
    Fun fact: There's still a use for MD4 on millions of devices. The popular ext4 filesystem in Linux uses half-MD4 internally to generate a hash tree. I'm sure there are plenty of other uses too, even for MD5.
    $endgroup$
    – forest
    14 hours ago








5




5




$begingroup$
The collision attack is the problem, however, MD5 still has pre-image resistance.
$endgroup$
– kelalaka
2 days ago




$begingroup$
The collision attack is the problem, however, MD5 still has pre-image resistance.
$endgroup$
– kelalaka
2 days ago




3




3




$begingroup$
"Using SHA512 would make the process slower ..." - On my system, openssl speed reports 747MB/s for MD5, and 738MB/s for SHA-512, so that's barely a difference ;)
$endgroup$
– marcelm
yesterday




$begingroup$
"Using SHA512 would make the process slower ..." - On my system, openssl speed reports 747MB/s for MD5, and 738MB/s for SHA-512, so that's barely a difference ;)
$endgroup$
– marcelm
yesterday




2




2




$begingroup$
I believe MD5 can still be used as a PRF.
$endgroup$
– jww
yesterday





$begingroup$
I believe MD5 can still be used as a PRF.
$endgroup$
– jww
yesterday





1




1




$begingroup$
For longer files on amd64 processors with optimized implementations, I think SHA-512 is one of the faster hashing algorithms (faster than SHA-256 as the type sizes are easier to manipulate).
$endgroup$
– Nick T
yesterday




$begingroup$
For longer files on amd64 processors with optimized implementations, I think SHA-512 is one of the faster hashing algorithms (faster than SHA-256 as the type sizes are easier to manipulate).
$endgroup$
– Nick T
yesterday




1




1




$begingroup$
Fun fact: There's still a use for MD4 on millions of devices. The popular ext4 filesystem in Linux uses half-MD4 internally to generate a hash tree. I'm sure there are plenty of other uses too, even for MD5.
$endgroup$
– forest
14 hours ago





$begingroup$
Fun fact: There's still a use for MD4 on millions of devices. The popular ext4 filesystem in Linux uses half-MD4 internally to generate a hash tree. I'm sure there are plenty of other uses too, even for MD5.
$endgroup$
– forest
14 hours ago











7 Answers
7






active

oldest

votes


















25












$begingroup$


I know that MD5 should not be used for password hashing, and that it also should not be used for integrity checking of documents. There are way too many sources citing MD5 preimaging attacks and MD5s low computation time.




There is no published preimage attack on MD5 that is cheaper than a generic attack on any 128-bit hash function. But you shouldn't rely on that alone when making security decisions, because cryptography is tricky and adversaries are clever and resourceful and can find ways around it!




  1. Identifying malicious files, such as when Linux Mint's download servers were compromised and an ISO file was replaced by a malicious one; in this case you want to be sure that your file doesn't match; collision attacks aren't a vector here.



The question of whether to publish known-good vs. known-bad hashes after a compromise is addressed elsewhere—in brief, there's not much that publishing known-bad hashes accomplishes, and according to the citation, Linux Mint published known-good, not known-bad, hashes. So what security do you get from known-good MD5 hashes?



There are two issues here:




  1. If you got the MD5 hash from the same source as the ISO image, there's nothing that would prevent an adversary from replacing both the MD5 hash and the ISO image.



    To prevent this, you and the Linux Mint curators need two channels: one for the hashes which can't be compromised (but need only have very low bandwidth), and another for the ISO image (which needs high bandwidth) on which you can then use the MD5 hash in an attempt to detect compromise.



    There's another way to prevent this: Instead of using the uncompromised channel for the hash of every ISO image over and over again as time goes on—which means more and more opportunities for an attacker to subvert it—use it once initially for a public key, which is then used to sign the ISO images; then there's only one opportunity for an attacker to subvert the public key channel.




  2. Collision attacks may still be a vector in cases like this. Consider the following scenario:



    • I am an evil developer. I write two software packages, whose distributions collide under MD5. One of the packages is benign and will survive review and audit. The other one will surreptitiously replace your family photo album by erotic photographs of sushi.

    • The Linux Mint curators carefully scrutinize and audit everything they publish in their package repository and publish the MD5 hashes of what they have audited in a public place that I can't compromise.

    • The Linux Mint curators cavalierly administer the package distributions in their package repository, under the false impression that the published MD5 hashes will protect users.

    In this scenario, I can replace the benign package by the erotic sushi package, pass the MD5 verification with flying colors, and give you a nasty—and luscious—surprise when you try to look up photos of that old hiking trip you took your kids on.




  1. Finding duplicate files. By MD5-summing all files in a directory structure it's easy to find identical hashes. The seemingly identical files can then be compared in full to check if they are really identical. Using SHA512 would make the process slower, and since we compare files in full anyway there is no risk in a potential false positive from MD5.



When I put my benign software package and my erotic sushi package, which collide under MD5, in your directory, your duplicate-detection script will initially think they are duplicates. In this case, you absolutely must compare the files in full. But there are much better ways to do this!



  • If you use SHA-512, you can safely skip the comparison step. Same if you use BLAKE2b, which can be even faster than MD5.


  • You could even use MD5 safely for this if you use it as HMAC-MD5 under a uniform random key, and safely skip the comparison step. HMAC-MD5 does not seem to be broken, as a pseudorandom function family—so it's probably fine for security, up to the birthday bound, but there are better faster PRFs like keyed BLAKE2 that won't raise any auditors' eyebrows.


  • Even better, you can choose a random key and hash the files with a universal hash under the key, like Poly1305. This is many times faster than MD5 or BLAKE2b, and the probability of a collision between any two files is less than $1/2^100$, so the probability of collision among $n$ files is less than $binom n 2 2^-100$ and thus you can still safely skip the comparison step until you have quadrillions of files.


  • You could also just use a cheap checksum like a CRC with a fixed polynomial. This will be the fastest of the options—far and away faster than MD5—but unlike the previous options you still absolutely must compare the files in full.


So, is MD5 safe for finding candidate duplicates to verify, if you subsequently compare the files bit by bit in full? Yes. So is the constant zero function.




(In a way, this would be creating a rainbow table where all the files are the dictionary)




This is not a rainbow table. A rainbow table is a specific technique for precomputing a random walk over a space of, say, passwords, via, say, MD5 hashes, in a way that saves effort trying to find MD5 preimages for hashes that aren't necessarily in your table in the first place, or doing it in parallel to speed up a multi-target search. It is not simply a list of precomputed hashes on a dictionary of inputs.



(The blog post by tptacek that you cited, and the blog post by Jeff Atwood that it was a response to, are both confused about what rainbow tables are.)




When the password scheme article states that "MD5 is fast", it clearly refers to the problem that hashing MD5 is too cheap when it comes to hashing a large amount of passwords to find the reverse of a hash. But what does it mean when it says that "[MD5 is] too slow to use as a general purpose hash"? Are there faster standardized hashes to compare files, that still have a reasonably low chance of collision?




I don't know what tptacek meant—you could email and ask—but if I had to guess, I would guess this meant it's awfully slow for things like hash tables, where you would truncate MD5 to a few bits to determine an index into an array of buckets or an open-addressing array.






share|improve this answer











$endgroup$












  • $begingroup$
    Why does selecting a different hash algorithm eliminate the risk of collisions (sushi bullet)?
    $endgroup$
    – chrylis
    2 days ago











  • $begingroup$
    @chrylis Nobody has ever published any way to find SHA-512 or BLAKE2b collisions, nor even, say, SHA-256 collisions.
    $endgroup$
    – Squeamish Ossifrage
    yesterday







  • 1




    $begingroup$
    @Alexander ‘Hash function’ means many things, and is usually some approximation to a uniform random choice of function in some context (random oracle model, pseudorandom function family, pseudorandom permutation family, etc.). A ‘checksum’ is used for some error-detecting capability; e.g., a well-designed 32-bit CRC is guaranteed to detect any 1-bit errors and usually guarantees detecting some larger number of bit errors in certain data word sizes, while a 32-bit truncation of SHA-256 might fail to detect some 1-bit errors. The terms are generally used quite loosely, however.
    $endgroup$
    – Squeamish Ossifrage
    yesterday






  • 1




    $begingroup$
    @SqueamishOssifrage "CRC is guaranteed to detect any 1-bit errors and usually guarantees detecting some larger number of bit errors in certain data word sizes, while a 32-bit truncation of SHA-256 might fail to detect some 1-bit errors. " woah. That's really powerful, and really cool. I didn't know that. I'll read more into it!
    $endgroup$
    – Alexander
    yesterday






  • 2




    $begingroup$
    I think you misunderstood the Linux Mint case. They published the MD5 hash of the infected ISO after they recovered from the hack, so that people can check whether the ISO they installed from was infected. Your sushi story seems to imply that you thought MD5 was used to prove the integrity of the original uninfected ISO file. This was not the case.
    $endgroup$
    – jornane
    yesterday


















8












$begingroup$


But what does it mean when it says that "[MD5 is] too slow to use as a general purpose hash"? Are there faster standardized hashes to compare files, that still have a reasonably low chance of collision?




BLAKE2 is faster than MD5 and currently known to provide 64-bit collision resistence when truncated to the same size as MD5 (compare ~30 of that of MD5).






share|improve this answer









$endgroup$




















    6












    $begingroup$

    There's not a compelling reason to use MD5; however, there are some embedded systems with a MD5 core that was used as a stream verifier. In those systems, MD5 is still used. They are moving to BLAKE2 because it's smaller in silicon, and it has the benefit of being faster than MD5 in general.



    The reason that MD5 started fall out of favor with hardware people was that the word reordering of the MD5 message expansions seems to be simple, but actually
    they require a lot of circuits for demultiplexing and interconnect, and the hardware efficiencies are greatly degraded compared to BLAKE. In contrast, the message expansion blocks for BLAKE algorithms can be efficiently implemented as simple feedback shift registers.



    The BLAKE team did a nice job of making it work well on silicon and in instructions.



    edit: SHA-1, SHA-2, etc also look pretty nice in circuits.






    share|improve this answer











    $endgroup$












    • $begingroup$
      I did not know about BLAKE, this seems interesting. I assumed from the post that there would be some hashing system that would not be cryptographically secure to be faster that MD5, but it seems BLAKE has managed to be the best of both worlds. I’m considering this answer as the accepted one, but I’ll wait a few days while there is activity around this question.
      $endgroup$
      – jornane
      yesterday










    • $begingroup$
      @jornane Squeamish Ossifrage has a better answer. I just wanted to mention the hardware side of things.
      $endgroup$
      – b degnan
      yesterday


















    3












    $begingroup$

    MD5 is currently used throughout the world both at home and in the enterprise. It's the file change mechanism within *nix's rsync if you opt for something other than changed timestamp detection. It's used for backup, archiving and file transfer between in-house systems. Even between enterprises over VPNs.



    Your comment that it "should not be used for integrity checking of documents" is interesting, as that's kinda what is done when transferring files (aka documents). A hacked file/document is philosophically a changed file/document. If on a source system an attacker changes a document in a smart way to produce the same MD5 hash, it will not propagate onward to the target system as the document has not changed in rsync's eyes. As colliding hashes can be found quickly now, a carefully made change can go unnoticed by rsync, and (niche) attacks can occur.



    So if you ask "Is there really no use for MD5 anymore?", an answer is that it's in current and widespread use at home and in the enterprise.



    In rsync's case, swapping out MD5 to something faster would only produce marginal overall speed improvement given storage and networking overheads. It would certainly be less than the simple ratio of hash rates suggests.






    share|improve this answer











    $endgroup$








    • 1




      $begingroup$
      I think librsync actually uses BLAKE2 now.
      $endgroup$
      – forest
      yesterday










    • $begingroup$
      BLAKE2b is 512, BLAKE2s is 256. It can be truncated though, of course.
      $endgroup$
      – forest
      yesterday










    • $begingroup$
      @forest Well you sound convincing, though man pages say MD5 and the hash is 32 hex characters. What would be the reason for truncation?
      $endgroup$
      – Paul Uszak
      yesterday










    • $begingroup$
      Truncation can be done to retain compatibility with the protocol. If the protocol is designed for a 128-bit hash, then it's simpler to truncate a larger hash than to change the protocol (possibly adding more overhead to something designed to minimize overhead). I'm not sure if it uses BLAKE2 the same way it used MD5, but I do know that it was "replacing MD5 with BLAKE2". The code has been added to librsync.
      $endgroup$
      – forest
      yesterday







    • 1




      $begingroup$
      I was linking to librsync, which provides the backend.
      $endgroup$
      – forest
      yesterday


















    1












    $begingroup$

    A case where the use of the MD5-hash would still make sense (and low risk of deleting duplicated files):



    If you want to find duplicate files you can just use CRC32.



    As soon as two files return the same CRC32-hash you recompute the files with MD5 hash. If the MD5 hash is again identical for both files then you know that the files are duplicates.




    In a case of high risk by deleting files:



    You want the process to be fast: Instead use a hash function that's not vulnerable for a second hash of the files, i.e. SHA2 or SHA3. It's extremely unlikely that these hashes would return an identical hash.



    Speed is no concern: Compare the files byte per byte.






    share|improve this answer











    $endgroup$








    • 7




      $begingroup$
      Why use a second step after CRC32 at all? Compare the files byte-by-byte if you're going to read them again completely anyhow!
      $endgroup$
      – Ruben De Smet
      2 days ago






    • 3




      $begingroup$
      @RubenDeSmet I think it's because to compare them byte-by-byte you'd have to buffer both files to a certain limit (because of memory constraints) and compare those. This will slow down sequential read speeds because you need to jump between the files. If this actually makes any real world difference provided a large enough buffer size is beyond my knowledge.
      $endgroup$
      – JensV
      2 days ago






    • 1




      $begingroup$
      @JensV I am pretty sure that the speed difference between a byte-by-byte comparison and a SHA3 comparison (with reasonable buffer sizes) will be trivial. It might even favour the byte-by-byte comparison.
      $endgroup$
      – Martin Bonner
      2 days ago






    • 5




      $begingroup$
      Comparing the files byte-by-byte requires communication. Computing a hash can be done locally. If the connection is slow compared to the hard drive speed, computing another hash after CRC32 might still be a reasonable option before comparing byte-by-byte.
      $endgroup$
      – JiK
      2 days ago






    • 1




      $begingroup$
      I have to agree with Ruben de Smet on the basic logic. In virtually all circumstances, it only makes sense to do two passes. In pass one, calculate one or more hashes. In pass 2, compare all bytes. If you're going to calculate SHA3, you might as well do so on the first pass and compare it immediately. The general problem domain is limited by 3 constraints: read speed, hash speed, and the speed of comparing hashes or full file contents. Splitting off the SHA3 hash only makes sense when that is the only slow step, and that's just unlikely.
      $endgroup$
      – MSalters
      2 days ago



















    1












    $begingroup$


    I know that MD5 should not be used for password hashing




    Indeed. However, that's about the direct applicability of MD5 to a password or to use it with just a password and salt. In that case MD5 is less secure than a dedicated password hash with a work factor, at least for common passwords and pass phrases.



    However, the use of MD5 within a PRF (HMAC) and within a password hash is still OK as it relies on pre-image resistance for security, rather than collision resistance.



    I'd rather not bet that MD5 stays secure for pre-image resistance though. Attacks only get better and although I don't see any progress on breaking MD5's pre-image resistance, I would not rule it out either.




    Identifying malicious files, such as when Linux Mint's download servers were compromised and an ISO file was replaced by a malicious one; in this case you want to be sure that your file doesn't match; collision attacks aren't a vector here.




    MD5 is still secure to check hashes from another server as long as hackers cannot alter the input of the MD5 hash. However, for something like a full ISO, I'd say that they would have plenty of opportunity bringing in binary files that seem innocent with regard to contents, while they alter the intermediate state of MD5 vulnerable to collision attacks.



    That was not the attack that you referred to; in that case the MD5 hash on the official server was different from the one calculated over the ISO image.



    But attack on file distribution do rely on collision resistance, and this kind of use case can definitely be attacked. It would probably not be all that easy (correctly lining up the binary data required for the attack at the start of the ISO and such), but it is a vector of attack none-the-less.



    The same goes for SHA-1 in Git by the way. Not easy to breach, but far from impossible, whatever Linus says.




    Finding duplicate files. By MD5-summing all files in a directory structure it's easy to find identical hashes. The seemingly identical files can then be compared in full to check if they are really identical. Using SHA512 would make the process slower, and since we compare files in full anyway there is no risk in a potential false positive from MD5. (In a way, this would be creating a rainbow table where all the files are the dictionary)




    Sure, if there is no possibility of attack or if you compare the files fully anyway then MD5 is fine.



    However, if there is a vector of attack then you need to perform a full binary compare even after the hash has matched. Otherwise an attacker could make you retrieve the wrong deduplicated file. If you'd use a cryptographically strong hash then you would not have to perform the full file comparison at all. As others have noticed, a 256-512 bit hash is a lot easier to handle than performing a full file compare when storing. Passing over the file twice is not very fast either; all the speed advantages of MD5 are very likely nullified by the I/O required.



    Besides that, if you would reference it using the hash then there is no comparison to be made; you would only have a single file (this is about deduplication after all).




    "[MD5 is] too slow to use as a general purpose hash"? Are there faster standardized hashes to compare files, that still have a reasonably low chance of collision?




    Others have already mentioned keyed hashes (Message Authentication Codes) and non-crypto hashes, and one or two really fast crypto hashes that are more secure and usually as fast as MD5. But yes, as cryptographic hashes go, MD5 is certainly rather fast. That's mainly because it is uncomplicated and because it has a small state / output size.



    As we found out, MD5 is so uncomplicated that it could be broken. Other algorithms such as SHA-256 and -512 largely rely on the same principles but are still deemed secure. Note that newer Intel and AMD processors have SHA-256 acceleration, so it is likely that they would perform similarly to MD5 if the hardware acceleration is indeed used.




    As you can see, MD5 is almost never is a good idea, and many (smart) people still believe MD5 or SHA-1 to be secure under "their specific circumstances". They can often be proven wrong and leave the door open for (future) attacks on the system. I'd try to avoid it under any circumstances, especially if it is not used within HMAC.



    What I also see is that it is defended because a system cannot be upgraded. MD5 has been under attack for years and years. If you still cannot migrate away from MD5 then there is something seriously wrong with the security of your system that transcends the use of MD5. If you're designing / programming or keeping systems without upgrade path then you're the main security hazard, not the hash algorithm.






    share|improve this answer











    $endgroup$












    • $begingroup$
      There's no modern password hash that uses MD5. One could instantiate PBKDF2 with HMAC-MD5, but I'm not sure I've ever seen anyone do that, and if one is making a custom choice to begin with, one might as well choose a modern password hash!
      $endgroup$
      – Squeamish Ossifrage
      19 hours ago






    • 1




      $begingroup$
      I'm very prolific on SO, I've seen it plenty of times. People know MD5 and for some reason like to use it.
      $endgroup$
      – Maarten Bodewes
      18 hours ago






    • 1




      $begingroup$
      > MD5 is still secure to check hashes from another server as long as hackers cannot alter the input of the MD5 hash. $$$$ This requires very careful qualification. The issue about how much control an attacker might have over the input is extremely subtle and not obvious if you're not very familiar with cryptography. See the story at crypto.stackexchange.com/a/70057 where the attacker doesn't change anything at the time the initial compromise is detected but still compromises everyone in the end even if they verify the good MD5 hashes.
      $endgroup$
      – Squeamish Ossifrage
      18 hours ago











    • $begingroup$
      Well, that was kind of the point of my answer, you think you're secure, but you're not. What you posted is an interesting attack vector, but it still relies on the adversary (or one of his companions, same thing in a theoretic sense) to control the input of the hash.
      $endgroup$
      – Maarten Bodewes
      18 hours ago






    • 1




      $begingroup$
      By the way, you posted a fine answer, but I started off with the password hashing part, and that was too long for a comment already. I'm not looking forward to cleaning up the comment mess, by the way :P
      $endgroup$
      – Maarten Bodewes
      18 hours ago



















    -1












    $begingroup$

    One of the things not mentioned here, is that the attraction of hashing algorithms, (like programming languages etc) is ubiquity. The 'reason' for using MD5 is:



    • everyone knows about it

    • it's implemented for pretty much every combination of architectures, OSs, programming languages etc

    • it's limitations are well understood.

    This is important both from a practicality and security point of view.
    It reasonable to expect someone to be able to handle it (practicality) and it helps devs audit the process and not skip steps because they are 'tricky' (security).



    All that said, SHA is catching up and I think in time MD5 will die out, and when it does so, it will be quite quickly. Because its only a self-fulfilling-prophecy that it is popular in the first place, as there are normally better choices.



    However, in the intervening time, that its widely adopted may well be a good reason in it own right to use MD5 (within it's limits).






    share|improve this answer








    New contributor




    ANone is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.






    $endgroup$












    • $begingroup$
      Welcome to crypto.stackexchange - I think there are some additions that would improve this answer: It seems like the bullet points are equally applicable to SHA1/SHA2 (considering that they're standardized algorithms). It is clear why those points would promote the use of an algorithm, but it's not clear to me why they would promote the use of MD5 over SHA1/SHA2. The last point seems to say that "everyone else is doing it" is a good enough reason to use MD5 (which is seldom a good reason to do anything). It also mentions limits, but does not elaborate on what those limits should be.
      $endgroup$
      – Ella Rose
      yesterday






    • 4




      $begingroup$
      Use of MD5 has been questionable for a quarter of a century since Hans Dobbertin published collisions in the compression function in 1996, and MD5 has been completely broken for a decade and a half since Xiaoyun Wang's team demonstrated collisions in 2004. Collisions in MD5 were exploited in practice by the United States and Israel to sabotage Iran's nuclear program. SHA-2 has been available since 2002, seventeen years. Note SHA-0 and SHA-1 are broken too; timeline. $$$$ The Caesar cipher meets all your criteria too.
      $endgroup$
      – Squeamish Ossifrage
      yesterday







    • 1




      $begingroup$
      Just too be clear: I don't think MD5 is better than SHA-2 (or good). The question was not 'which is better' but is there any reason to use MD5. Sure, the 17 years that SHA-2 has had to take over and is widely enough used that it's a good choice. Especially if you're the internet connected windows/linux space. If that's where you are and there's no special cases then don't use MD5 there are better alternatives. But that wan not the question. It take a while to become trusted and some tech stacks move a lot less rapidly. Sometimes 17 years isn't that long.
      $endgroup$
      – ANone
      yesterday






    • 1




      $begingroup$
      If someone is (a) actually designing a protocol (b) constrained to a specific, real environment that is (c) limited to MD5 in that environment for specific technical reasons that can be articulated, then they can ask a question in which we give useful guidance for security. That's not the case here: the original poster is asking about convenience, about an ad hoc vulnerability disclosure on highly capable computers that can easily use SHA-2, about cheap ways to keep collision probabilities low, etc.
      $endgroup$
      – Squeamish Ossifrage
      yesterday






    • 1




      $begingroup$
      @SqueamishOssifrage Sure but that wasn't the question. Also if your looking for hashing advice and your research is: Read one crypto.stackexchange Q/A titled: "Is there really no use for MD5 anymore?", scroll past most of the answers that say 'dont do it' and get to the one that says "well for completeness's sake: maybe", and take that to mean MD5 for the win... On that note, there'd be fewer of those hackernews posts if there wasn't a legit argument to say people had been overly negative. I think honesty about its weakness is better than scaring people away.
      $endgroup$
      – ANone
      yesterday











    Your Answer








    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "281"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    noCode: true, onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcrypto.stackexchange.com%2fquestions%2f70036%2fis-there-really-no-use-for-md5-anymore%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    7 Answers
    7






    active

    oldest

    votes








    7 Answers
    7






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    25












    $begingroup$


    I know that MD5 should not be used for password hashing, and that it also should not be used for integrity checking of documents. There are way too many sources citing MD5 preimaging attacks and MD5s low computation time.




    There is no published preimage attack on MD5 that is cheaper than a generic attack on any 128-bit hash function. But you shouldn't rely on that alone when making security decisions, because cryptography is tricky and adversaries are clever and resourceful and can find ways around it!




    1. Identifying malicious files, such as when Linux Mint's download servers were compromised and an ISO file was replaced by a malicious one; in this case you want to be sure that your file doesn't match; collision attacks aren't a vector here.



    The question of whether to publish known-good vs. known-bad hashes after a compromise is addressed elsewhere—in brief, there's not much that publishing known-bad hashes accomplishes, and according to the citation, Linux Mint published known-good, not known-bad, hashes. So what security do you get from known-good MD5 hashes?



    There are two issues here:




    1. If you got the MD5 hash from the same source as the ISO image, there's nothing that would prevent an adversary from replacing both the MD5 hash and the ISO image.



      To prevent this, you and the Linux Mint curators need two channels: one for the hashes which can't be compromised (but need only have very low bandwidth), and another for the ISO image (which needs high bandwidth) on which you can then use the MD5 hash in an attempt to detect compromise.



      There's another way to prevent this: Instead of using the uncompromised channel for the hash of every ISO image over and over again as time goes on—which means more and more opportunities for an attacker to subvert it—use it once initially for a public key, which is then used to sign the ISO images; then there's only one opportunity for an attacker to subvert the public key channel.




    2. Collision attacks may still be a vector in cases like this. Consider the following scenario:



      • I am an evil developer. I write two software packages, whose distributions collide under MD5. One of the packages is benign and will survive review and audit. The other one will surreptitiously replace your family photo album by erotic photographs of sushi.

      • The Linux Mint curators carefully scrutinize and audit everything they publish in their package repository and publish the MD5 hashes of what they have audited in a public place that I can't compromise.

      • The Linux Mint curators cavalierly administer the package distributions in their package repository, under the false impression that the published MD5 hashes will protect users.

      In this scenario, I can replace the benign package by the erotic sushi package, pass the MD5 verification with flying colors, and give you a nasty—and luscious—surprise when you try to look up photos of that old hiking trip you took your kids on.




    1. Finding duplicate files. By MD5-summing all files in a directory structure it's easy to find identical hashes. The seemingly identical files can then be compared in full to check if they are really identical. Using SHA512 would make the process slower, and since we compare files in full anyway there is no risk in a potential false positive from MD5.



    When I put my benign software package and my erotic sushi package, which collide under MD5, in your directory, your duplicate-detection script will initially think they are duplicates. In this case, you absolutely must compare the files in full. But there are much better ways to do this!



    • If you use SHA-512, you can safely skip the comparison step. Same if you use BLAKE2b, which can be even faster than MD5.


    • You could even use MD5 safely for this if you use it as HMAC-MD5 under a uniform random key, and safely skip the comparison step. HMAC-MD5 does not seem to be broken, as a pseudorandom function family—so it's probably fine for security, up to the birthday bound, but there are better faster PRFs like keyed BLAKE2 that won't raise any auditors' eyebrows.


    • Even better, you can choose a random key and hash the files with a universal hash under the key, like Poly1305. This is many times faster than MD5 or BLAKE2b, and the probability of a collision between any two files is less than $1/2^100$, so the probability of collision among $n$ files is less than $binom n 2 2^-100$ and thus you can still safely skip the comparison step until you have quadrillions of files.


    • You could also just use a cheap checksum like a CRC with a fixed polynomial. This will be the fastest of the options—far and away faster than MD5—but unlike the previous options you still absolutely must compare the files in full.


    So, is MD5 safe for finding candidate duplicates to verify, if you subsequently compare the files bit by bit in full? Yes. So is the constant zero function.




    (In a way, this would be creating a rainbow table where all the files are the dictionary)




    This is not a rainbow table. A rainbow table is a specific technique for precomputing a random walk over a space of, say, passwords, via, say, MD5 hashes, in a way that saves effort trying to find MD5 preimages for hashes that aren't necessarily in your table in the first place, or doing it in parallel to speed up a multi-target search. It is not simply a list of precomputed hashes on a dictionary of inputs.



    (The blog post by tptacek that you cited, and the blog post by Jeff Atwood that it was a response to, are both confused about what rainbow tables are.)




    When the password scheme article states that "MD5 is fast", it clearly refers to the problem that hashing MD5 is too cheap when it comes to hashing a large amount of passwords to find the reverse of a hash. But what does it mean when it says that "[MD5 is] too slow to use as a general purpose hash"? Are there faster standardized hashes to compare files, that still have a reasonably low chance of collision?




    I don't know what tptacek meant—you could email and ask—but if I had to guess, I would guess this meant it's awfully slow for things like hash tables, where you would truncate MD5 to a few bits to determine an index into an array of buckets or an open-addressing array.






    share|improve this answer











    $endgroup$












    • $begingroup$
      Why does selecting a different hash algorithm eliminate the risk of collisions (sushi bullet)?
      $endgroup$
      – chrylis
      2 days ago











    • $begingroup$
      @chrylis Nobody has ever published any way to find SHA-512 or BLAKE2b collisions, nor even, say, SHA-256 collisions.
      $endgroup$
      – Squeamish Ossifrage
      yesterday







    • 1




      $begingroup$
      @Alexander ‘Hash function’ means many things, and is usually some approximation to a uniform random choice of function in some context (random oracle model, pseudorandom function family, pseudorandom permutation family, etc.). A ‘checksum’ is used for some error-detecting capability; e.g., a well-designed 32-bit CRC is guaranteed to detect any 1-bit errors and usually guarantees detecting some larger number of bit errors in certain data word sizes, while a 32-bit truncation of SHA-256 might fail to detect some 1-bit errors. The terms are generally used quite loosely, however.
      $endgroup$
      – Squeamish Ossifrage
      yesterday






    • 1




      $begingroup$
      @SqueamishOssifrage "CRC is guaranteed to detect any 1-bit errors and usually guarantees detecting some larger number of bit errors in certain data word sizes, while a 32-bit truncation of SHA-256 might fail to detect some 1-bit errors. " woah. That's really powerful, and really cool. I didn't know that. I'll read more into it!
      $endgroup$
      – Alexander
      yesterday






    • 2




      $begingroup$
      I think you misunderstood the Linux Mint case. They published the MD5 hash of the infected ISO after they recovered from the hack, so that people can check whether the ISO they installed from was infected. Your sushi story seems to imply that you thought MD5 was used to prove the integrity of the original uninfected ISO file. This was not the case.
      $endgroup$
      – jornane
      yesterday















    25












    $begingroup$


    I know that MD5 should not be used for password hashing, and that it also should not be used for integrity checking of documents. There are way too many sources citing MD5 preimaging attacks and MD5s low computation time.




    There is no published preimage attack on MD5 that is cheaper than a generic attack on any 128-bit hash function. But you shouldn't rely on that alone when making security decisions, because cryptography is tricky and adversaries are clever and resourceful and can find ways around it!




    1. Identifying malicious files, such as when Linux Mint's download servers were compromised and an ISO file was replaced by a malicious one; in this case you want to be sure that your file doesn't match; collision attacks aren't a vector here.



    The question of whether to publish known-good vs. known-bad hashes after a compromise is addressed elsewhere—in brief, there's not much that publishing known-bad hashes accomplishes, and according to the citation, Linux Mint published known-good, not known-bad, hashes. So what security do you get from known-good MD5 hashes?



    There are two issues here:




    1. If you got the MD5 hash from the same source as the ISO image, there's nothing that would prevent an adversary from replacing both the MD5 hash and the ISO image.



      To prevent this, you and the Linux Mint curators need two channels: one for the hashes which can't be compromised (but need only have very low bandwidth), and another for the ISO image (which needs high bandwidth) on which you can then use the MD5 hash in an attempt to detect compromise.



      There's another way to prevent this: Instead of using the uncompromised channel for the hash of every ISO image over and over again as time goes on—which means more and more opportunities for an attacker to subvert it—use it once initially for a public key, which is then used to sign the ISO images; then there's only one opportunity for an attacker to subvert the public key channel.




    2. Collision attacks may still be a vector in cases like this. Consider the following scenario:



      • I am an evil developer. I write two software packages, whose distributions collide under MD5. One of the packages is benign and will survive review and audit. The other one will surreptitiously replace your family photo album by erotic photographs of sushi.

      • The Linux Mint curators carefully scrutinize and audit everything they publish in their package repository and publish the MD5 hashes of what they have audited in a public place that I can't compromise.

      • The Linux Mint curators cavalierly administer the package distributions in their package repository, under the false impression that the published MD5 hashes will protect users.

      In this scenario, I can replace the benign package by the erotic sushi package, pass the MD5 verification with flying colors, and give you a nasty—and luscious—surprise when you try to look up photos of that old hiking trip you took your kids on.




    1. Finding duplicate files. By MD5-summing all files in a directory structure it's easy to find identical hashes. The seemingly identical files can then be compared in full to check if they are really identical. Using SHA512 would make the process slower, and since we compare files in full anyway there is no risk in a potential false positive from MD5.



    When I put my benign software package and my erotic sushi package, which collide under MD5, in your directory, your duplicate-detection script will initially think they are duplicates. In this case, you absolutely must compare the files in full. But there are much better ways to do this!



    • If you use SHA-512, you can safely skip the comparison step. Same if you use BLAKE2b, which can be even faster than MD5.


    • You could even use MD5 safely for this if you use it as HMAC-MD5 under a uniform random key, and safely skip the comparison step. HMAC-MD5 does not seem to be broken, as a pseudorandom function family—so it's probably fine for security, up to the birthday bound, but there are better faster PRFs like keyed BLAKE2 that won't raise any auditors' eyebrows.


    • Even better, you can choose a random key and hash the files with a universal hash under the key, like Poly1305. This is many times faster than MD5 or BLAKE2b, and the probability of a collision between any two files is less than $1/2^100$, so the probability of collision among $n$ files is less than $binom n 2 2^-100$ and thus you can still safely skip the comparison step until you have quadrillions of files.


    • You could also just use a cheap checksum like a CRC with a fixed polynomial. This will be the fastest of the options—far and away faster than MD5—but unlike the previous options you still absolutely must compare the files in full.


    So, is MD5 safe for finding candidate duplicates to verify, if you subsequently compare the files bit by bit in full? Yes. So is the constant zero function.




    (In a way, this would be creating a rainbow table where all the files are the dictionary)




    This is not a rainbow table. A rainbow table is a specific technique for precomputing a random walk over a space of, say, passwords, via, say, MD5 hashes, in a way that saves effort trying to find MD5 preimages for hashes that aren't necessarily in your table in the first place, or doing it in parallel to speed up a multi-target search. It is not simply a list of precomputed hashes on a dictionary of inputs.



    (The blog post by tptacek that you cited, and the blog post by Jeff Atwood that it was a response to, are both confused about what rainbow tables are.)




    When the password scheme article states that "MD5 is fast", it clearly refers to the problem that hashing MD5 is too cheap when it comes to hashing a large amount of passwords to find the reverse of a hash. But what does it mean when it says that "[MD5 is] too slow to use as a general purpose hash"? Are there faster standardized hashes to compare files, that still have a reasonably low chance of collision?




    I don't know what tptacek meant—you could email and ask—but if I had to guess, I would guess this meant it's awfully slow for things like hash tables, where you would truncate MD5 to a few bits to determine an index into an array of buckets or an open-addressing array.






    share|improve this answer











    $endgroup$












    • $begingroup$
      Why does selecting a different hash algorithm eliminate the risk of collisions (sushi bullet)?
      $endgroup$
      – chrylis
      2 days ago











    • $begingroup$
      @chrylis Nobody has ever published any way to find SHA-512 or BLAKE2b collisions, nor even, say, SHA-256 collisions.
      $endgroup$
      – Squeamish Ossifrage
      yesterday







    • 1




      $begingroup$
      @Alexander ‘Hash function’ means many things, and is usually some approximation to a uniform random choice of function in some context (random oracle model, pseudorandom function family, pseudorandom permutation family, etc.). A ‘checksum’ is used for some error-detecting capability; e.g., a well-designed 32-bit CRC is guaranteed to detect any 1-bit errors and usually guarantees detecting some larger number of bit errors in certain data word sizes, while a 32-bit truncation of SHA-256 might fail to detect some 1-bit errors. The terms are generally used quite loosely, however.
      $endgroup$
      – Squeamish Ossifrage
      yesterday






    • 1




      $begingroup$
      @SqueamishOssifrage "CRC is guaranteed to detect any 1-bit errors and usually guarantees detecting some larger number of bit errors in certain data word sizes, while a 32-bit truncation of SHA-256 might fail to detect some 1-bit errors. " woah. That's really powerful, and really cool. I didn't know that. I'll read more into it!
      $endgroup$
      – Alexander
      yesterday






    • 2




      $begingroup$
      I think you misunderstood the Linux Mint case. They published the MD5 hash of the infected ISO after they recovered from the hack, so that people can check whether the ISO they installed from was infected. Your sushi story seems to imply that you thought MD5 was used to prove the integrity of the original uninfected ISO file. This was not the case.
      $endgroup$
      – jornane
      yesterday













    25












    25








    25





    $begingroup$


    I know that MD5 should not be used for password hashing, and that it also should not be used for integrity checking of documents. There are way too many sources citing MD5 preimaging attacks and MD5s low computation time.




    There is no published preimage attack on MD5 that is cheaper than a generic attack on any 128-bit hash function. But you shouldn't rely on that alone when making security decisions, because cryptography is tricky and adversaries are clever and resourceful and can find ways around it!




    1. Identifying malicious files, such as when Linux Mint's download servers were compromised and an ISO file was replaced by a malicious one; in this case you want to be sure that your file doesn't match; collision attacks aren't a vector here.



    The question of whether to publish known-good vs. known-bad hashes after a compromise is addressed elsewhere—in brief, there's not much that publishing known-bad hashes accomplishes, and according to the citation, Linux Mint published known-good, not known-bad, hashes. So what security do you get from known-good MD5 hashes?



    There are two issues here:




    1. If you got the MD5 hash from the same source as the ISO image, there's nothing that would prevent an adversary from replacing both the MD5 hash and the ISO image.



      To prevent this, you and the Linux Mint curators need two channels: one for the hashes which can't be compromised (but need only have very low bandwidth), and another for the ISO image (which needs high bandwidth) on which you can then use the MD5 hash in an attempt to detect compromise.



      There's another way to prevent this: Instead of using the uncompromised channel for the hash of every ISO image over and over again as time goes on—which means more and more opportunities for an attacker to subvert it—use it once initially for a public key, which is then used to sign the ISO images; then there's only one opportunity for an attacker to subvert the public key channel.




    2. Collision attacks may still be a vector in cases like this. Consider the following scenario:



      • I am an evil developer. I write two software packages, whose distributions collide under MD5. One of the packages is benign and will survive review and audit. The other one will surreptitiously replace your family photo album by erotic photographs of sushi.

      • The Linux Mint curators carefully scrutinize and audit everything they publish in their package repository and publish the MD5 hashes of what they have audited in a public place that I can't compromise.

      • The Linux Mint curators cavalierly administer the package distributions in their package repository, under the false impression that the published MD5 hashes will protect users.

      In this scenario, I can replace the benign package by the erotic sushi package, pass the MD5 verification with flying colors, and give you a nasty—and luscious—surprise when you try to look up photos of that old hiking trip you took your kids on.




    1. Finding duplicate files. By MD5-summing all files in a directory structure it's easy to find identical hashes. The seemingly identical files can then be compared in full to check if they are really identical. Using SHA512 would make the process slower, and since we compare files in full anyway there is no risk in a potential false positive from MD5.



    When I put my benign software package and my erotic sushi package, which collide under MD5, in your directory, your duplicate-detection script will initially think they are duplicates. In this case, you absolutely must compare the files in full. But there are much better ways to do this!



    • If you use SHA-512, you can safely skip the comparison step. Same if you use BLAKE2b, which can be even faster than MD5.


    • You could even use MD5 safely for this if you use it as HMAC-MD5 under a uniform random key, and safely skip the comparison step. HMAC-MD5 does not seem to be broken, as a pseudorandom function family—so it's probably fine for security, up to the birthday bound, but there are better faster PRFs like keyed BLAKE2 that won't raise any auditors' eyebrows.


    • Even better, you can choose a random key and hash the files with a universal hash under the key, like Poly1305. This is many times faster than MD5 or BLAKE2b, and the probability of a collision between any two files is less than $1/2^100$, so the probability of collision among $n$ files is less than $binom n 2 2^-100$ and thus you can still safely skip the comparison step until you have quadrillions of files.


    • You could also just use a cheap checksum like a CRC with a fixed polynomial. This will be the fastest of the options—far and away faster than MD5—but unlike the previous options you still absolutely must compare the files in full.


    So, is MD5 safe for finding candidate duplicates to verify, if you subsequently compare the files bit by bit in full? Yes. So is the constant zero function.




    (In a way, this would be creating a rainbow table where all the files are the dictionary)




    This is not a rainbow table. A rainbow table is a specific technique for precomputing a random walk over a space of, say, passwords, via, say, MD5 hashes, in a way that saves effort trying to find MD5 preimages for hashes that aren't necessarily in your table in the first place, or doing it in parallel to speed up a multi-target search. It is not simply a list of precomputed hashes on a dictionary of inputs.



    (The blog post by tptacek that you cited, and the blog post by Jeff Atwood that it was a response to, are both confused about what rainbow tables are.)




    When the password scheme article states that "MD5 is fast", it clearly refers to the problem that hashing MD5 is too cheap when it comes to hashing a large amount of passwords to find the reverse of a hash. But what does it mean when it says that "[MD5 is] too slow to use as a general purpose hash"? Are there faster standardized hashes to compare files, that still have a reasonably low chance of collision?




    I don't know what tptacek meant—you could email and ask—but if I had to guess, I would guess this meant it's awfully slow for things like hash tables, where you would truncate MD5 to a few bits to determine an index into an array of buckets or an open-addressing array.






    share|improve this answer











    $endgroup$




    I know that MD5 should not be used for password hashing, and that it also should not be used for integrity checking of documents. There are way too many sources citing MD5 preimaging attacks and MD5s low computation time.




    There is no published preimage attack on MD5 that is cheaper than a generic attack on any 128-bit hash function. But you shouldn't rely on that alone when making security decisions, because cryptography is tricky and adversaries are clever and resourceful and can find ways around it!




    1. Identifying malicious files, such as when Linux Mint's download servers were compromised and an ISO file was replaced by a malicious one; in this case you want to be sure that your file doesn't match; collision attacks aren't a vector here.



    The question of whether to publish known-good vs. known-bad hashes after a compromise is addressed elsewhere—in brief, there's not much that publishing known-bad hashes accomplishes, and according to the citation, Linux Mint published known-good, not known-bad, hashes. So what security do you get from known-good MD5 hashes?



    There are two issues here:




    1. If you got the MD5 hash from the same source as the ISO image, there's nothing that would prevent an adversary from replacing both the MD5 hash and the ISO image.



      To prevent this, you and the Linux Mint curators need two channels: one for the hashes which can't be compromised (but need only have very low bandwidth), and another for the ISO image (which needs high bandwidth) on which you can then use the MD5 hash in an attempt to detect compromise.



      There's another way to prevent this: Instead of using the uncompromised channel for the hash of every ISO image over and over again as time goes on—which means more and more opportunities for an attacker to subvert it—use it once initially for a public key, which is then used to sign the ISO images; then there's only one opportunity for an attacker to subvert the public key channel.




    2. Collision attacks may still be a vector in cases like this. Consider the following scenario:



      • I am an evil developer. I write two software packages, whose distributions collide under MD5. One of the packages is benign and will survive review and audit. The other one will surreptitiously replace your family photo album by erotic photographs of sushi.

      • The Linux Mint curators carefully scrutinize and audit everything they publish in their package repository and publish the MD5 hashes of what they have audited in a public place that I can't compromise.

      • The Linux Mint curators cavalierly administer the package distributions in their package repository, under the false impression that the published MD5 hashes will protect users.

      In this scenario, I can replace the benign package by the erotic sushi package, pass the MD5 verification with flying colors, and give you a nasty—and luscious—surprise when you try to look up photos of that old hiking trip you took your kids on.




    1. Finding duplicate files. By MD5-summing all files in a directory structure it's easy to find identical hashes. The seemingly identical files can then be compared in full to check if they are really identical. Using SHA512 would make the process slower, and since we compare files in full anyway there is no risk in a potential false positive from MD5.



    When I put my benign software package and my erotic sushi package, which collide under MD5, in your directory, your duplicate-detection script will initially think they are duplicates. In this case, you absolutely must compare the files in full. But there are much better ways to do this!



    • If you use SHA-512, you can safely skip the comparison step. Same if you use BLAKE2b, which can be even faster than MD5.


    • You could even use MD5 safely for this if you use it as HMAC-MD5 under a uniform random key, and safely skip the comparison step. HMAC-MD5 does not seem to be broken, as a pseudorandom function family—so it's probably fine for security, up to the birthday bound, but there are better faster PRFs like keyed BLAKE2 that won't raise any auditors' eyebrows.


    • Even better, you can choose a random key and hash the files with a universal hash under the key, like Poly1305. This is many times faster than MD5 or BLAKE2b, and the probability of a collision between any two files is less than $1/2^100$, so the probability of collision among $n$ files is less than $binom n 2 2^-100$ and thus you can still safely skip the comparison step until you have quadrillions of files.


    • You could also just use a cheap checksum like a CRC with a fixed polynomial. This will be the fastest of the options—far and away faster than MD5—but unlike the previous options you still absolutely must compare the files in full.


    So, is MD5 safe for finding candidate duplicates to verify, if you subsequently compare the files bit by bit in full? Yes. So is the constant zero function.




    (In a way, this would be creating a rainbow table where all the files are the dictionary)




    This is not a rainbow table. A rainbow table is a specific technique for precomputing a random walk over a space of, say, passwords, via, say, MD5 hashes, in a way that saves effort trying to find MD5 preimages for hashes that aren't necessarily in your table in the first place, or doing it in parallel to speed up a multi-target search. It is not simply a list of precomputed hashes on a dictionary of inputs.



    (The blog post by tptacek that you cited, and the blog post by Jeff Atwood that it was a response to, are both confused about what rainbow tables are.)




    When the password scheme article states that "MD5 is fast", it clearly refers to the problem that hashing MD5 is too cheap when it comes to hashing a large amount of passwords to find the reverse of a hash. But what does it mean when it says that "[MD5 is] too slow to use as a general purpose hash"? Are there faster standardized hashes to compare files, that still have a reasonably low chance of collision?




    I don't know what tptacek meant—you could email and ask—but if I had to guess, I would guess this meant it's awfully slow for things like hash tables, where you would truncate MD5 to a few bits to determine an index into an array of buckets or an open-addressing array.







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited yesterday

























    answered 2 days ago









    Squeamish OssifrageSqueamish Ossifrage

    23.6k134108




    23.6k134108











    • $begingroup$
      Why does selecting a different hash algorithm eliminate the risk of collisions (sushi bullet)?
      $endgroup$
      – chrylis
      2 days ago











    • $begingroup$
      @chrylis Nobody has ever published any way to find SHA-512 or BLAKE2b collisions, nor even, say, SHA-256 collisions.
      $endgroup$
      – Squeamish Ossifrage
      yesterday







    • 1




      $begingroup$
      @Alexander ‘Hash function’ means many things, and is usually some approximation to a uniform random choice of function in some context (random oracle model, pseudorandom function family, pseudorandom permutation family, etc.). A ‘checksum’ is used for some error-detecting capability; e.g., a well-designed 32-bit CRC is guaranteed to detect any 1-bit errors and usually guarantees detecting some larger number of bit errors in certain data word sizes, while a 32-bit truncation of SHA-256 might fail to detect some 1-bit errors. The terms are generally used quite loosely, however.
      $endgroup$
      – Squeamish Ossifrage
      yesterday






    • 1




      $begingroup$
      @SqueamishOssifrage "CRC is guaranteed to detect any 1-bit errors and usually guarantees detecting some larger number of bit errors in certain data word sizes, while a 32-bit truncation of SHA-256 might fail to detect some 1-bit errors. " woah. That's really powerful, and really cool. I didn't know that. I'll read more into it!
      $endgroup$
      – Alexander
      yesterday






    • 2




      $begingroup$
      I think you misunderstood the Linux Mint case. They published the MD5 hash of the infected ISO after they recovered from the hack, so that people can check whether the ISO they installed from was infected. Your sushi story seems to imply that you thought MD5 was used to prove the integrity of the original uninfected ISO file. This was not the case.
      $endgroup$
      – jornane
      yesterday
















    • $begingroup$
      Why does selecting a different hash algorithm eliminate the risk of collisions (sushi bullet)?
      $endgroup$
      – chrylis
      2 days ago











    • $begingroup$
      @chrylis Nobody has ever published any way to find SHA-512 or BLAKE2b collisions, nor even, say, SHA-256 collisions.
      $endgroup$
      – Squeamish Ossifrage
      yesterday







    • 1




      $begingroup$
      @Alexander ‘Hash function’ means many things, and is usually some approximation to a uniform random choice of function in some context (random oracle model, pseudorandom function family, pseudorandom permutation family, etc.). A ‘checksum’ is used for some error-detecting capability; e.g., a well-designed 32-bit CRC is guaranteed to detect any 1-bit errors and usually guarantees detecting some larger number of bit errors in certain data word sizes, while a 32-bit truncation of SHA-256 might fail to detect some 1-bit errors. The terms are generally used quite loosely, however.
      $endgroup$
      – Squeamish Ossifrage
      yesterday






    • 1




      $begingroup$
      @SqueamishOssifrage "CRC is guaranteed to detect any 1-bit errors and usually guarantees detecting some larger number of bit errors in certain data word sizes, while a 32-bit truncation of SHA-256 might fail to detect some 1-bit errors. " woah. That's really powerful, and really cool. I didn't know that. I'll read more into it!
      $endgroup$
      – Alexander
      yesterday






    • 2




      $begingroup$
      I think you misunderstood the Linux Mint case. They published the MD5 hash of the infected ISO after they recovered from the hack, so that people can check whether the ISO they installed from was infected. Your sushi story seems to imply that you thought MD5 was used to prove the integrity of the original uninfected ISO file. This was not the case.
      $endgroup$
      – jornane
      yesterday















    $begingroup$
    Why does selecting a different hash algorithm eliminate the risk of collisions (sushi bullet)?
    $endgroup$
    – chrylis
    2 days ago





    $begingroup$
    Why does selecting a different hash algorithm eliminate the risk of collisions (sushi bullet)?
    $endgroup$
    – chrylis
    2 days ago













    $begingroup$
    @chrylis Nobody has ever published any way to find SHA-512 or BLAKE2b collisions, nor even, say, SHA-256 collisions.
    $endgroup$
    – Squeamish Ossifrage
    yesterday





    $begingroup$
    @chrylis Nobody has ever published any way to find SHA-512 or BLAKE2b collisions, nor even, say, SHA-256 collisions.
    $endgroup$
    – Squeamish Ossifrage
    yesterday





    1




    1




    $begingroup$
    @Alexander ‘Hash function’ means many things, and is usually some approximation to a uniform random choice of function in some context (random oracle model, pseudorandom function family, pseudorandom permutation family, etc.). A ‘checksum’ is used for some error-detecting capability; e.g., a well-designed 32-bit CRC is guaranteed to detect any 1-bit errors and usually guarantees detecting some larger number of bit errors in certain data word sizes, while a 32-bit truncation of SHA-256 might fail to detect some 1-bit errors. The terms are generally used quite loosely, however.
    $endgroup$
    – Squeamish Ossifrage
    yesterday




    $begingroup$
    @Alexander ‘Hash function’ means many things, and is usually some approximation to a uniform random choice of function in some context (random oracle model, pseudorandom function family, pseudorandom permutation family, etc.). A ‘checksum’ is used for some error-detecting capability; e.g., a well-designed 32-bit CRC is guaranteed to detect any 1-bit errors and usually guarantees detecting some larger number of bit errors in certain data word sizes, while a 32-bit truncation of SHA-256 might fail to detect some 1-bit errors. The terms are generally used quite loosely, however.
    $endgroup$
    – Squeamish Ossifrage
    yesterday




    1




    1




    $begingroup$
    @SqueamishOssifrage "CRC is guaranteed to detect any 1-bit errors and usually guarantees detecting some larger number of bit errors in certain data word sizes, while a 32-bit truncation of SHA-256 might fail to detect some 1-bit errors. " woah. That's really powerful, and really cool. I didn't know that. I'll read more into it!
    $endgroup$
    – Alexander
    yesterday




    $begingroup$
    @SqueamishOssifrage "CRC is guaranteed to detect any 1-bit errors and usually guarantees detecting some larger number of bit errors in certain data word sizes, while a 32-bit truncation of SHA-256 might fail to detect some 1-bit errors. " woah. That's really powerful, and really cool. I didn't know that. I'll read more into it!
    $endgroup$
    – Alexander
    yesterday




    2




    2




    $begingroup$
    I think you misunderstood the Linux Mint case. They published the MD5 hash of the infected ISO after they recovered from the hack, so that people can check whether the ISO they installed from was infected. Your sushi story seems to imply that you thought MD5 was used to prove the integrity of the original uninfected ISO file. This was not the case.
    $endgroup$
    – jornane
    yesterday




    $begingroup$
    I think you misunderstood the Linux Mint case. They published the MD5 hash of the infected ISO after they recovered from the hack, so that people can check whether the ISO they installed from was infected. Your sushi story seems to imply that you thought MD5 was used to prove the integrity of the original uninfected ISO file. This was not the case.
    $endgroup$
    – jornane
    yesterday











    8












    $begingroup$


    But what does it mean when it says that "[MD5 is] too slow to use as a general purpose hash"? Are there faster standardized hashes to compare files, that still have a reasonably low chance of collision?




    BLAKE2 is faster than MD5 and currently known to provide 64-bit collision resistence when truncated to the same size as MD5 (compare ~30 of that of MD5).






    share|improve this answer









    $endgroup$

















      8












      $begingroup$


      But what does it mean when it says that "[MD5 is] too slow to use as a general purpose hash"? Are there faster standardized hashes to compare files, that still have a reasonably low chance of collision?




      BLAKE2 is faster than MD5 and currently known to provide 64-bit collision resistence when truncated to the same size as MD5 (compare ~30 of that of MD5).






      share|improve this answer









      $endgroup$















        8












        8








        8





        $begingroup$


        But what does it mean when it says that "[MD5 is] too slow to use as a general purpose hash"? Are there faster standardized hashes to compare files, that still have a reasonably low chance of collision?




        BLAKE2 is faster than MD5 and currently known to provide 64-bit collision resistence when truncated to the same size as MD5 (compare ~30 of that of MD5).






        share|improve this answer









        $endgroup$




        But what does it mean when it says that "[MD5 is] too slow to use as a general purpose hash"? Are there faster standardized hashes to compare files, that still have a reasonably low chance of collision?




        BLAKE2 is faster than MD5 and currently known to provide 64-bit collision resistence when truncated to the same size as MD5 (compare ~30 of that of MD5).







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered 2 days ago









        DannyNiuDannyNiu

        1,4751629




        1,4751629





















            6












            $begingroup$

            There's not a compelling reason to use MD5; however, there are some embedded systems with a MD5 core that was used as a stream verifier. In those systems, MD5 is still used. They are moving to BLAKE2 because it's smaller in silicon, and it has the benefit of being faster than MD5 in general.



            The reason that MD5 started fall out of favor with hardware people was that the word reordering of the MD5 message expansions seems to be simple, but actually
            they require a lot of circuits for demultiplexing and interconnect, and the hardware efficiencies are greatly degraded compared to BLAKE. In contrast, the message expansion blocks for BLAKE algorithms can be efficiently implemented as simple feedback shift registers.



            The BLAKE team did a nice job of making it work well on silicon and in instructions.



            edit: SHA-1, SHA-2, etc also look pretty nice in circuits.






            share|improve this answer











            $endgroup$












            • $begingroup$
              I did not know about BLAKE, this seems interesting. I assumed from the post that there would be some hashing system that would not be cryptographically secure to be faster that MD5, but it seems BLAKE has managed to be the best of both worlds. I’m considering this answer as the accepted one, but I’ll wait a few days while there is activity around this question.
              $endgroup$
              – jornane
              yesterday










            • $begingroup$
              @jornane Squeamish Ossifrage has a better answer. I just wanted to mention the hardware side of things.
              $endgroup$
              – b degnan
              yesterday















            6












            $begingroup$

            There's not a compelling reason to use MD5; however, there are some embedded systems with a MD5 core that was used as a stream verifier. In those systems, MD5 is still used. They are moving to BLAKE2 because it's smaller in silicon, and it has the benefit of being faster than MD5 in general.



            The reason that MD5 started fall out of favor with hardware people was that the word reordering of the MD5 message expansions seems to be simple, but actually
            they require a lot of circuits for demultiplexing and interconnect, and the hardware efficiencies are greatly degraded compared to BLAKE. In contrast, the message expansion blocks for BLAKE algorithms can be efficiently implemented as simple feedback shift registers.



            The BLAKE team did a nice job of making it work well on silicon and in instructions.



            edit: SHA-1, SHA-2, etc also look pretty nice in circuits.






            share|improve this answer











            $endgroup$












            • $begingroup$
              I did not know about BLAKE, this seems interesting. I assumed from the post that there would be some hashing system that would not be cryptographically secure to be faster that MD5, but it seems BLAKE has managed to be the best of both worlds. I’m considering this answer as the accepted one, but I’ll wait a few days while there is activity around this question.
              $endgroup$
              – jornane
              yesterday










            • $begingroup$
              @jornane Squeamish Ossifrage has a better answer. I just wanted to mention the hardware side of things.
              $endgroup$
              – b degnan
              yesterday













            6












            6








            6





            $begingroup$

            There's not a compelling reason to use MD5; however, there are some embedded systems with a MD5 core that was used as a stream verifier. In those systems, MD5 is still used. They are moving to BLAKE2 because it's smaller in silicon, and it has the benefit of being faster than MD5 in general.



            The reason that MD5 started fall out of favor with hardware people was that the word reordering of the MD5 message expansions seems to be simple, but actually
            they require a lot of circuits for demultiplexing and interconnect, and the hardware efficiencies are greatly degraded compared to BLAKE. In contrast, the message expansion blocks for BLAKE algorithms can be efficiently implemented as simple feedback shift registers.



            The BLAKE team did a nice job of making it work well on silicon and in instructions.



            edit: SHA-1, SHA-2, etc also look pretty nice in circuits.






            share|improve this answer











            $endgroup$



            There's not a compelling reason to use MD5; however, there are some embedded systems with a MD5 core that was used as a stream verifier. In those systems, MD5 is still used. They are moving to BLAKE2 because it's smaller in silicon, and it has the benefit of being faster than MD5 in general.



            The reason that MD5 started fall out of favor with hardware people was that the word reordering of the MD5 message expansions seems to be simple, but actually
            they require a lot of circuits for demultiplexing and interconnect, and the hardware efficiencies are greatly degraded compared to BLAKE. In contrast, the message expansion blocks for BLAKE algorithms can be efficiently implemented as simple feedback shift registers.



            The BLAKE team did a nice job of making it work well on silicon and in instructions.



            edit: SHA-1, SHA-2, etc also look pretty nice in circuits.







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited yesterday

























            answered 2 days ago









            b degnanb degnan

            2,1421829




            2,1421829











            • $begingroup$
              I did not know about BLAKE, this seems interesting. I assumed from the post that there would be some hashing system that would not be cryptographically secure to be faster that MD5, but it seems BLAKE has managed to be the best of both worlds. I’m considering this answer as the accepted one, but I’ll wait a few days while there is activity around this question.
              $endgroup$
              – jornane
              yesterday










            • $begingroup$
              @jornane Squeamish Ossifrage has a better answer. I just wanted to mention the hardware side of things.
              $endgroup$
              – b degnan
              yesterday
















            • $begingroup$
              I did not know about BLAKE, this seems interesting. I assumed from the post that there would be some hashing system that would not be cryptographically secure to be faster that MD5, but it seems BLAKE has managed to be the best of both worlds. I’m considering this answer as the accepted one, but I’ll wait a few days while there is activity around this question.
              $endgroup$
              – jornane
              yesterday










            • $begingroup$
              @jornane Squeamish Ossifrage has a better answer. I just wanted to mention the hardware side of things.
              $endgroup$
              – b degnan
              yesterday















            $begingroup$
            I did not know about BLAKE, this seems interesting. I assumed from the post that there would be some hashing system that would not be cryptographically secure to be faster that MD5, but it seems BLAKE has managed to be the best of both worlds. I’m considering this answer as the accepted one, but I’ll wait a few days while there is activity around this question.
            $endgroup$
            – jornane
            yesterday




            $begingroup$
            I did not know about BLAKE, this seems interesting. I assumed from the post that there would be some hashing system that would not be cryptographically secure to be faster that MD5, but it seems BLAKE has managed to be the best of both worlds. I’m considering this answer as the accepted one, but I’ll wait a few days while there is activity around this question.
            $endgroup$
            – jornane
            yesterday












            $begingroup$
            @jornane Squeamish Ossifrage has a better answer. I just wanted to mention the hardware side of things.
            $endgroup$
            – b degnan
            yesterday




            $begingroup$
            @jornane Squeamish Ossifrage has a better answer. I just wanted to mention the hardware side of things.
            $endgroup$
            – b degnan
            yesterday











            3












            $begingroup$

            MD5 is currently used throughout the world both at home and in the enterprise. It's the file change mechanism within *nix's rsync if you opt for something other than changed timestamp detection. It's used for backup, archiving and file transfer between in-house systems. Even between enterprises over VPNs.



            Your comment that it "should not be used for integrity checking of documents" is interesting, as that's kinda what is done when transferring files (aka documents). A hacked file/document is philosophically a changed file/document. If on a source system an attacker changes a document in a smart way to produce the same MD5 hash, it will not propagate onward to the target system as the document has not changed in rsync's eyes. As colliding hashes can be found quickly now, a carefully made change can go unnoticed by rsync, and (niche) attacks can occur.



            So if you ask "Is there really no use for MD5 anymore?", an answer is that it's in current and widespread use at home and in the enterprise.



            In rsync's case, swapping out MD5 to something faster would only produce marginal overall speed improvement given storage and networking overheads. It would certainly be less than the simple ratio of hash rates suggests.






            share|improve this answer











            $endgroup$








            • 1




              $begingroup$
              I think librsync actually uses BLAKE2 now.
              $endgroup$
              – forest
              yesterday










            • $begingroup$
              BLAKE2b is 512, BLAKE2s is 256. It can be truncated though, of course.
              $endgroup$
              – forest
              yesterday










            • $begingroup$
              @forest Well you sound convincing, though man pages say MD5 and the hash is 32 hex characters. What would be the reason for truncation?
              $endgroup$
              – Paul Uszak
              yesterday










            • $begingroup$
              Truncation can be done to retain compatibility with the protocol. If the protocol is designed for a 128-bit hash, then it's simpler to truncate a larger hash than to change the protocol (possibly adding more overhead to something designed to minimize overhead). I'm not sure if it uses BLAKE2 the same way it used MD5, but I do know that it was "replacing MD5 with BLAKE2". The code has been added to librsync.
              $endgroup$
              – forest
              yesterday







            • 1




              $begingroup$
              I was linking to librsync, which provides the backend.
              $endgroup$
              – forest
              yesterday















            3












            $begingroup$

            MD5 is currently used throughout the world both at home and in the enterprise. It's the file change mechanism within *nix's rsync if you opt for something other than changed timestamp detection. It's used for backup, archiving and file transfer between in-house systems. Even between enterprises over VPNs.



            Your comment that it "should not be used for integrity checking of documents" is interesting, as that's kinda what is done when transferring files (aka documents). A hacked file/document is philosophically a changed file/document. If on a source system an attacker changes a document in a smart way to produce the same MD5 hash, it will not propagate onward to the target system as the document has not changed in rsync's eyes. As colliding hashes can be found quickly now, a carefully made change can go unnoticed by rsync, and (niche) attacks can occur.



            So if you ask "Is there really no use for MD5 anymore?", an answer is that it's in current and widespread use at home and in the enterprise.



            In rsync's case, swapping out MD5 to something faster would only produce marginal overall speed improvement given storage and networking overheads. It would certainly be less than the simple ratio of hash rates suggests.






            share|improve this answer











            $endgroup$








            • 1




              $begingroup$
              I think librsync actually uses BLAKE2 now.
              $endgroup$
              – forest
              yesterday










            • $begingroup$
              BLAKE2b is 512, BLAKE2s is 256. It can be truncated though, of course.
              $endgroup$
              – forest
              yesterday










            • $begingroup$
              @forest Well you sound convincing, though man pages say MD5 and the hash is 32 hex characters. What would be the reason for truncation?
              $endgroup$
              – Paul Uszak
              yesterday










            • $begingroup$
              Truncation can be done to retain compatibility with the protocol. If the protocol is designed for a 128-bit hash, then it's simpler to truncate a larger hash than to change the protocol (possibly adding more overhead to something designed to minimize overhead). I'm not sure if it uses BLAKE2 the same way it used MD5, but I do know that it was "replacing MD5 with BLAKE2". The code has been added to librsync.
              $endgroup$
              – forest
              yesterday







            • 1




              $begingroup$
              I was linking to librsync, which provides the backend.
              $endgroup$
              – forest
              yesterday













            3












            3








            3





            $begingroup$

            MD5 is currently used throughout the world both at home and in the enterprise. It's the file change mechanism within *nix's rsync if you opt for something other than changed timestamp detection. It's used for backup, archiving and file transfer between in-house systems. Even between enterprises over VPNs.



            Your comment that it "should not be used for integrity checking of documents" is interesting, as that's kinda what is done when transferring files (aka documents). A hacked file/document is philosophically a changed file/document. If on a source system an attacker changes a document in a smart way to produce the same MD5 hash, it will not propagate onward to the target system as the document has not changed in rsync's eyes. As colliding hashes can be found quickly now, a carefully made change can go unnoticed by rsync, and (niche) attacks can occur.



            So if you ask "Is there really no use for MD5 anymore?", an answer is that it's in current and widespread use at home and in the enterprise.



            In rsync's case, swapping out MD5 to something faster would only produce marginal overall speed improvement given storage and networking overheads. It would certainly be less than the simple ratio of hash rates suggests.






            share|improve this answer











            $endgroup$



            MD5 is currently used throughout the world both at home and in the enterprise. It's the file change mechanism within *nix's rsync if you opt for something other than changed timestamp detection. It's used for backup, archiving and file transfer between in-house systems. Even between enterprises over VPNs.



            Your comment that it "should not be used for integrity checking of documents" is interesting, as that's kinda what is done when transferring files (aka documents). A hacked file/document is philosophically a changed file/document. If on a source system an attacker changes a document in a smart way to produce the same MD5 hash, it will not propagate onward to the target system as the document has not changed in rsync's eyes. As colliding hashes can be found quickly now, a carefully made change can go unnoticed by rsync, and (niche) attacks can occur.



            So if you ask "Is there really no use for MD5 anymore?", an answer is that it's in current and widespread use at home and in the enterprise.



            In rsync's case, swapping out MD5 to something faster would only produce marginal overall speed improvement given storage and networking overheads. It would certainly be less than the simple ratio of hash rates suggests.







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited 17 hours ago

























            answered 2 days ago









            Paul UszakPaul Uszak

            7,87911638




            7,87911638







            • 1




              $begingroup$
              I think librsync actually uses BLAKE2 now.
              $endgroup$
              – forest
              yesterday










            • $begingroup$
              BLAKE2b is 512, BLAKE2s is 256. It can be truncated though, of course.
              $endgroup$
              – forest
              yesterday










            • $begingroup$
              @forest Well you sound convincing, though man pages say MD5 and the hash is 32 hex characters. What would be the reason for truncation?
              $endgroup$
              – Paul Uszak
              yesterday










            • $begingroup$
              Truncation can be done to retain compatibility with the protocol. If the protocol is designed for a 128-bit hash, then it's simpler to truncate a larger hash than to change the protocol (possibly adding more overhead to something designed to minimize overhead). I'm not sure if it uses BLAKE2 the same way it used MD5, but I do know that it was "replacing MD5 with BLAKE2". The code has been added to librsync.
              $endgroup$
              – forest
              yesterday







            • 1




              $begingroup$
              I was linking to librsync, which provides the backend.
              $endgroup$
              – forest
              yesterday












            • 1




              $begingroup$
              I think librsync actually uses BLAKE2 now.
              $endgroup$
              – forest
              yesterday










            • $begingroup$
              BLAKE2b is 512, BLAKE2s is 256. It can be truncated though, of course.
              $endgroup$
              – forest
              yesterday










            • $begingroup$
              @forest Well you sound convincing, though man pages say MD5 and the hash is 32 hex characters. What would be the reason for truncation?
              $endgroup$
              – Paul Uszak
              yesterday










            • $begingroup$
              Truncation can be done to retain compatibility with the protocol. If the protocol is designed for a 128-bit hash, then it's simpler to truncate a larger hash than to change the protocol (possibly adding more overhead to something designed to minimize overhead). I'm not sure if it uses BLAKE2 the same way it used MD5, but I do know that it was "replacing MD5 with BLAKE2". The code has been added to librsync.
              $endgroup$
              – forest
              yesterday







            • 1




              $begingroup$
              I was linking to librsync, which provides the backend.
              $endgroup$
              – forest
              yesterday







            1




            1




            $begingroup$
            I think librsync actually uses BLAKE2 now.
            $endgroup$
            – forest
            yesterday




            $begingroup$
            I think librsync actually uses BLAKE2 now.
            $endgroup$
            – forest
            yesterday












            $begingroup$
            BLAKE2b is 512, BLAKE2s is 256. It can be truncated though, of course.
            $endgroup$
            – forest
            yesterday




            $begingroup$
            BLAKE2b is 512, BLAKE2s is 256. It can be truncated though, of course.
            $endgroup$
            – forest
            yesterday












            $begingroup$
            @forest Well you sound convincing, though man pages say MD5 and the hash is 32 hex characters. What would be the reason for truncation?
            $endgroup$
            – Paul Uszak
            yesterday




            $begingroup$
            @forest Well you sound convincing, though man pages say MD5 and the hash is 32 hex characters. What would be the reason for truncation?
            $endgroup$
            – Paul Uszak
            yesterday












            $begingroup$
            Truncation can be done to retain compatibility with the protocol. If the protocol is designed for a 128-bit hash, then it's simpler to truncate a larger hash than to change the protocol (possibly adding more overhead to something designed to minimize overhead). I'm not sure if it uses BLAKE2 the same way it used MD5, but I do know that it was "replacing MD5 with BLAKE2". The code has been added to librsync.
            $endgroup$
            – forest
            yesterday





            $begingroup$
            Truncation can be done to retain compatibility with the protocol. If the protocol is designed for a 128-bit hash, then it's simpler to truncate a larger hash than to change the protocol (possibly adding more overhead to something designed to minimize overhead). I'm not sure if it uses BLAKE2 the same way it used MD5, but I do know that it was "replacing MD5 with BLAKE2". The code has been added to librsync.
            $endgroup$
            – forest
            yesterday





            1




            1




            $begingroup$
            I was linking to librsync, which provides the backend.
            $endgroup$
            – forest
            yesterday




            $begingroup$
            I was linking to librsync, which provides the backend.
            $endgroup$
            – forest
            yesterday











            1












            $begingroup$

            A case where the use of the MD5-hash would still make sense (and low risk of deleting duplicated files):



            If you want to find duplicate files you can just use CRC32.



            As soon as two files return the same CRC32-hash you recompute the files with MD5 hash. If the MD5 hash is again identical for both files then you know that the files are duplicates.




            In a case of high risk by deleting files:



            You want the process to be fast: Instead use a hash function that's not vulnerable for a second hash of the files, i.e. SHA2 or SHA3. It's extremely unlikely that these hashes would return an identical hash.



            Speed is no concern: Compare the files byte per byte.






            share|improve this answer











            $endgroup$








            • 7




              $begingroup$
              Why use a second step after CRC32 at all? Compare the files byte-by-byte if you're going to read them again completely anyhow!
              $endgroup$
              – Ruben De Smet
              2 days ago






            • 3




              $begingroup$
              @RubenDeSmet I think it's because to compare them byte-by-byte you'd have to buffer both files to a certain limit (because of memory constraints) and compare those. This will slow down sequential read speeds because you need to jump between the files. If this actually makes any real world difference provided a large enough buffer size is beyond my knowledge.
              $endgroup$
              – JensV
              2 days ago






            • 1




              $begingroup$
              @JensV I am pretty sure that the speed difference between a byte-by-byte comparison and a SHA3 comparison (with reasonable buffer sizes) will be trivial. It might even favour the byte-by-byte comparison.
              $endgroup$
              – Martin Bonner
              2 days ago






            • 5




              $begingroup$
              Comparing the files byte-by-byte requires communication. Computing a hash can be done locally. If the connection is slow compared to the hard drive speed, computing another hash after CRC32 might still be a reasonable option before comparing byte-by-byte.
              $endgroup$
              – JiK
              2 days ago






            • 1




              $begingroup$
              I have to agree with Ruben de Smet on the basic logic. In virtually all circumstances, it only makes sense to do two passes. In pass one, calculate one or more hashes. In pass 2, compare all bytes. If you're going to calculate SHA3, you might as well do so on the first pass and compare it immediately. The general problem domain is limited by 3 constraints: read speed, hash speed, and the speed of comparing hashes or full file contents. Splitting off the SHA3 hash only makes sense when that is the only slow step, and that's just unlikely.
              $endgroup$
              – MSalters
              2 days ago
















            1












            $begingroup$

            A case where the use of the MD5-hash would still make sense (and low risk of deleting duplicated files):



            If you want to find duplicate files you can just use CRC32.



            As soon as two files return the same CRC32-hash you recompute the files with MD5 hash. If the MD5 hash is again identical for both files then you know that the files are duplicates.




            In a case of high risk by deleting files:



            You want the process to be fast: Instead use a hash function that's not vulnerable for a second hash of the files, i.e. SHA2 or SHA3. It's extremely unlikely that these hashes would return an identical hash.



            Speed is no concern: Compare the files byte per byte.






            share|improve this answer











            $endgroup$








            • 7




              $begingroup$
              Why use a second step after CRC32 at all? Compare the files byte-by-byte if you're going to read them again completely anyhow!
              $endgroup$
              – Ruben De Smet
              2 days ago






            • 3




              $begingroup$
              @RubenDeSmet I think it's because to compare them byte-by-byte you'd have to buffer both files to a certain limit (because of memory constraints) and compare those. This will slow down sequential read speeds because you need to jump between the files. If this actually makes any real world difference provided a large enough buffer size is beyond my knowledge.
              $endgroup$
              – JensV
              2 days ago






            • 1




              $begingroup$
              @JensV I am pretty sure that the speed difference between a byte-by-byte comparison and a SHA3 comparison (with reasonable buffer sizes) will be trivial. It might even favour the byte-by-byte comparison.
              $endgroup$
              – Martin Bonner
              2 days ago






            • 5




              $begingroup$
              Comparing the files byte-by-byte requires communication. Computing a hash can be done locally. If the connection is slow compared to the hard drive speed, computing another hash after CRC32 might still be a reasonable option before comparing byte-by-byte.
              $endgroup$
              – JiK
              2 days ago






            • 1




              $begingroup$
              I have to agree with Ruben de Smet on the basic logic. In virtually all circumstances, it only makes sense to do two passes. In pass one, calculate one or more hashes. In pass 2, compare all bytes. If you're going to calculate SHA3, you might as well do so on the first pass and compare it immediately. The general problem domain is limited by 3 constraints: read speed, hash speed, and the speed of comparing hashes or full file contents. Splitting off the SHA3 hash only makes sense when that is the only slow step, and that's just unlikely.
              $endgroup$
              – MSalters
              2 days ago














            1












            1








            1





            $begingroup$

            A case where the use of the MD5-hash would still make sense (and low risk of deleting duplicated files):



            If you want to find duplicate files you can just use CRC32.



            As soon as two files return the same CRC32-hash you recompute the files with MD5 hash. If the MD5 hash is again identical for both files then you know that the files are duplicates.




            In a case of high risk by deleting files:



            You want the process to be fast: Instead use a hash function that's not vulnerable for a second hash of the files, i.e. SHA2 or SHA3. It's extremely unlikely that these hashes would return an identical hash.



            Speed is no concern: Compare the files byte per byte.






            share|improve this answer











            $endgroup$



            A case where the use of the MD5-hash would still make sense (and low risk of deleting duplicated files):



            If you want to find duplicate files you can just use CRC32.



            As soon as two files return the same CRC32-hash you recompute the files with MD5 hash. If the MD5 hash is again identical for both files then you know that the files are duplicates.




            In a case of high risk by deleting files:



            You want the process to be fast: Instead use a hash function that's not vulnerable for a second hash of the files, i.e. SHA2 or SHA3. It's extremely unlikely that these hashes would return an identical hash.



            Speed is no concern: Compare the files byte per byte.







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited 2 days ago

























            answered 2 days ago









            AleksanderRasAleksanderRas

            3,1191937




            3,1191937







            • 7




              $begingroup$
              Why use a second step after CRC32 at all? Compare the files byte-by-byte if you're going to read them again completely anyhow!
              $endgroup$
              – Ruben De Smet
              2 days ago






            • 3




              $begingroup$
              @RubenDeSmet I think it's because to compare them byte-by-byte you'd have to buffer both files to a certain limit (because of memory constraints) and compare those. This will slow down sequential read speeds because you need to jump between the files. If this actually makes any real world difference provided a large enough buffer size is beyond my knowledge.
              $endgroup$
              – JensV
              2 days ago






            • 1




              $begingroup$
              @JensV I am pretty sure that the speed difference between a byte-by-byte comparison and a SHA3 comparison (with reasonable buffer sizes) will be trivial. It might even favour the byte-by-byte comparison.
              $endgroup$
              – Martin Bonner
              2 days ago






            • 5




              $begingroup$
              Comparing the files byte-by-byte requires communication. Computing a hash can be done locally. If the connection is slow compared to the hard drive speed, computing another hash after CRC32 might still be a reasonable option before comparing byte-by-byte.
              $endgroup$
              – JiK
              2 days ago






            • 1




              $begingroup$
              I have to agree with Ruben de Smet on the basic logic. In virtually all circumstances, it only makes sense to do two passes. In pass one, calculate one or more hashes. In pass 2, compare all bytes. If you're going to calculate SHA3, you might as well do so on the first pass and compare it immediately. The general problem domain is limited by 3 constraints: read speed, hash speed, and the speed of comparing hashes or full file contents. Splitting off the SHA3 hash only makes sense when that is the only slow step, and that's just unlikely.
              $endgroup$
              – MSalters
              2 days ago













            • 7




              $begingroup$
              Why use a second step after CRC32 at all? Compare the files byte-by-byte if you're going to read them again completely anyhow!
              $endgroup$
              – Ruben De Smet
              2 days ago






            • 3




              $begingroup$
              @RubenDeSmet I think it's because to compare them byte-by-byte you'd have to buffer both files to a certain limit (because of memory constraints) and compare those. This will slow down sequential read speeds because you need to jump between the files. If this actually makes any real world difference provided a large enough buffer size is beyond my knowledge.
              $endgroup$
              – JensV
              2 days ago






            • 1




              $begingroup$
              @JensV I am pretty sure that the speed difference between a byte-by-byte comparison and a SHA3 comparison (with reasonable buffer sizes) will be trivial. It might even favour the byte-by-byte comparison.
              $endgroup$
              – Martin Bonner
              2 days ago






            • 5




              $begingroup$
              Comparing the files byte-by-byte requires communication. Computing a hash can be done locally. If the connection is slow compared to the hard drive speed, computing another hash after CRC32 might still be a reasonable option before comparing byte-by-byte.
              $endgroup$
              – JiK
              2 days ago






            • 1




              $begingroup$
              I have to agree with Ruben de Smet on the basic logic. In virtually all circumstances, it only makes sense to do two passes. In pass one, calculate one or more hashes. In pass 2, compare all bytes. If you're going to calculate SHA3, you might as well do so on the first pass and compare it immediately. The general problem domain is limited by 3 constraints: read speed, hash speed, and the speed of comparing hashes or full file contents. Splitting off the SHA3 hash only makes sense when that is the only slow step, and that's just unlikely.
              $endgroup$
              – MSalters
              2 days ago








            7




            7




            $begingroup$
            Why use a second step after CRC32 at all? Compare the files byte-by-byte if you're going to read them again completely anyhow!
            $endgroup$
            – Ruben De Smet
            2 days ago




            $begingroup$
            Why use a second step after CRC32 at all? Compare the files byte-by-byte if you're going to read them again completely anyhow!
            $endgroup$
            – Ruben De Smet
            2 days ago




            3




            3




            $begingroup$
            @RubenDeSmet I think it's because to compare them byte-by-byte you'd have to buffer both files to a certain limit (because of memory constraints) and compare those. This will slow down sequential read speeds because you need to jump between the files. If this actually makes any real world difference provided a large enough buffer size is beyond my knowledge.
            $endgroup$
            – JensV
            2 days ago




            $begingroup$
            @RubenDeSmet I think it's because to compare them byte-by-byte you'd have to buffer both files to a certain limit (because of memory constraints) and compare those. This will slow down sequential read speeds because you need to jump between the files. If this actually makes any real world difference provided a large enough buffer size is beyond my knowledge.
            $endgroup$
            – JensV
            2 days ago




            1




            1




            $begingroup$
            @JensV I am pretty sure that the speed difference between a byte-by-byte comparison and a SHA3 comparison (with reasonable buffer sizes) will be trivial. It might even favour the byte-by-byte comparison.
            $endgroup$
            – Martin Bonner
            2 days ago




            $begingroup$
            @JensV I am pretty sure that the speed difference between a byte-by-byte comparison and a SHA3 comparison (with reasonable buffer sizes) will be trivial. It might even favour the byte-by-byte comparison.
            $endgroup$
            – Martin Bonner
            2 days ago




            5




            5




            $begingroup$
            Comparing the files byte-by-byte requires communication. Computing a hash can be done locally. If the connection is slow compared to the hard drive speed, computing another hash after CRC32 might still be a reasonable option before comparing byte-by-byte.
            $endgroup$
            – JiK
            2 days ago




            $begingroup$
            Comparing the files byte-by-byte requires communication. Computing a hash can be done locally. If the connection is slow compared to the hard drive speed, computing another hash after CRC32 might still be a reasonable option before comparing byte-by-byte.
            $endgroup$
            – JiK
            2 days ago




            1




            1




            $begingroup$
            I have to agree with Ruben de Smet on the basic logic. In virtually all circumstances, it only makes sense to do two passes. In pass one, calculate one or more hashes. In pass 2, compare all bytes. If you're going to calculate SHA3, you might as well do so on the first pass and compare it immediately. The general problem domain is limited by 3 constraints: read speed, hash speed, and the speed of comparing hashes or full file contents. Splitting off the SHA3 hash only makes sense when that is the only slow step, and that's just unlikely.
            $endgroup$
            – MSalters
            2 days ago





            $begingroup$
            I have to agree with Ruben de Smet on the basic logic. In virtually all circumstances, it only makes sense to do two passes. In pass one, calculate one or more hashes. In pass 2, compare all bytes. If you're going to calculate SHA3, you might as well do so on the first pass and compare it immediately. The general problem domain is limited by 3 constraints: read speed, hash speed, and the speed of comparing hashes or full file contents. Splitting off the SHA3 hash only makes sense when that is the only slow step, and that's just unlikely.
            $endgroup$
            – MSalters
            2 days ago












            1












            $begingroup$


            I know that MD5 should not be used for password hashing




            Indeed. However, that's about the direct applicability of MD5 to a password or to use it with just a password and salt. In that case MD5 is less secure than a dedicated password hash with a work factor, at least for common passwords and pass phrases.



            However, the use of MD5 within a PRF (HMAC) and within a password hash is still OK as it relies on pre-image resistance for security, rather than collision resistance.



            I'd rather not bet that MD5 stays secure for pre-image resistance though. Attacks only get better and although I don't see any progress on breaking MD5's pre-image resistance, I would not rule it out either.




            Identifying malicious files, such as when Linux Mint's download servers were compromised and an ISO file was replaced by a malicious one; in this case you want to be sure that your file doesn't match; collision attacks aren't a vector here.




            MD5 is still secure to check hashes from another server as long as hackers cannot alter the input of the MD5 hash. However, for something like a full ISO, I'd say that they would have plenty of opportunity bringing in binary files that seem innocent with regard to contents, while they alter the intermediate state of MD5 vulnerable to collision attacks.



            That was not the attack that you referred to; in that case the MD5 hash on the official server was different from the one calculated over the ISO image.



            But attack on file distribution do rely on collision resistance, and this kind of use case can definitely be attacked. It would probably not be all that easy (correctly lining up the binary data required for the attack at the start of the ISO and such), but it is a vector of attack none-the-less.



            The same goes for SHA-1 in Git by the way. Not easy to breach, but far from impossible, whatever Linus says.




            Finding duplicate files. By MD5-summing all files in a directory structure it's easy to find identical hashes. The seemingly identical files can then be compared in full to check if they are really identical. Using SHA512 would make the process slower, and since we compare files in full anyway there is no risk in a potential false positive from MD5. (In a way, this would be creating a rainbow table where all the files are the dictionary)




            Sure, if there is no possibility of attack or if you compare the files fully anyway then MD5 is fine.



            However, if there is a vector of attack then you need to perform a full binary compare even after the hash has matched. Otherwise an attacker could make you retrieve the wrong deduplicated file. If you'd use a cryptographically strong hash then you would not have to perform the full file comparison at all. As others have noticed, a 256-512 bit hash is a lot easier to handle than performing a full file compare when storing. Passing over the file twice is not very fast either; all the speed advantages of MD5 are very likely nullified by the I/O required.



            Besides that, if you would reference it using the hash then there is no comparison to be made; you would only have a single file (this is about deduplication after all).




            "[MD5 is] too slow to use as a general purpose hash"? Are there faster standardized hashes to compare files, that still have a reasonably low chance of collision?




            Others have already mentioned keyed hashes (Message Authentication Codes) and non-crypto hashes, and one or two really fast crypto hashes that are more secure and usually as fast as MD5. But yes, as cryptographic hashes go, MD5 is certainly rather fast. That's mainly because it is uncomplicated and because it has a small state / output size.



            As we found out, MD5 is so uncomplicated that it could be broken. Other algorithms such as SHA-256 and -512 largely rely on the same principles but are still deemed secure. Note that newer Intel and AMD processors have SHA-256 acceleration, so it is likely that they would perform similarly to MD5 if the hardware acceleration is indeed used.




            As you can see, MD5 is almost never is a good idea, and many (smart) people still believe MD5 or SHA-1 to be secure under "their specific circumstances". They can often be proven wrong and leave the door open for (future) attacks on the system. I'd try to avoid it under any circumstances, especially if it is not used within HMAC.



            What I also see is that it is defended because a system cannot be upgraded. MD5 has been under attack for years and years. If you still cannot migrate away from MD5 then there is something seriously wrong with the security of your system that transcends the use of MD5. If you're designing / programming or keeping systems without upgrade path then you're the main security hazard, not the hash algorithm.






            share|improve this answer











            $endgroup$












            • $begingroup$
              There's no modern password hash that uses MD5. One could instantiate PBKDF2 with HMAC-MD5, but I'm not sure I've ever seen anyone do that, and if one is making a custom choice to begin with, one might as well choose a modern password hash!
              $endgroup$
              – Squeamish Ossifrage
              19 hours ago






            • 1




              $begingroup$
              I'm very prolific on SO, I've seen it plenty of times. People know MD5 and for some reason like to use it.
              $endgroup$
              – Maarten Bodewes
              18 hours ago






            • 1




              $begingroup$
              > MD5 is still secure to check hashes from another server as long as hackers cannot alter the input of the MD5 hash. $$$$ This requires very careful qualification. The issue about how much control an attacker might have over the input is extremely subtle and not obvious if you're not very familiar with cryptography. See the story at crypto.stackexchange.com/a/70057 where the attacker doesn't change anything at the time the initial compromise is detected but still compromises everyone in the end even if they verify the good MD5 hashes.
              $endgroup$
              – Squeamish Ossifrage
              18 hours ago











            • $begingroup$
              Well, that was kind of the point of my answer, you think you're secure, but you're not. What you posted is an interesting attack vector, but it still relies on the adversary (or one of his companions, same thing in a theoretic sense) to control the input of the hash.
              $endgroup$
              – Maarten Bodewes
              18 hours ago






            • 1




              $begingroup$
              By the way, you posted a fine answer, but I started off with the password hashing part, and that was too long for a comment already. I'm not looking forward to cleaning up the comment mess, by the way :P
              $endgroup$
              – Maarten Bodewes
              18 hours ago
















            1












            $begingroup$


            I know that MD5 should not be used for password hashing




            Indeed. However, that's about the direct applicability of MD5 to a password or to use it with just a password and salt. In that case MD5 is less secure than a dedicated password hash with a work factor, at least for common passwords and pass phrases.



            However, the use of MD5 within a PRF (HMAC) and within a password hash is still OK as it relies on pre-image resistance for security, rather than collision resistance.



            I'd rather not bet that MD5 stays secure for pre-image resistance though. Attacks only get better and although I don't see any progress on breaking MD5's pre-image resistance, I would not rule it out either.




            Identifying malicious files, such as when Linux Mint's download servers were compromised and an ISO file was replaced by a malicious one; in this case you want to be sure that your file doesn't match; collision attacks aren't a vector here.




            MD5 is still secure to check hashes from another server as long as hackers cannot alter the input of the MD5 hash. However, for something like a full ISO, I'd say that they would have plenty of opportunity bringing in binary files that seem innocent with regard to contents, while they alter the intermediate state of MD5 vulnerable to collision attacks.



            That was not the attack that you referred to; in that case the MD5 hash on the official server was different from the one calculated over the ISO image.



            But attack on file distribution do rely on collision resistance, and this kind of use case can definitely be attacked. It would probably not be all that easy (correctly lining up the binary data required for the attack at the start of the ISO and such), but it is a vector of attack none-the-less.



            The same goes for SHA-1 in Git by the way. Not easy to breach, but far from impossible, whatever Linus says.




            Finding duplicate files. By MD5-summing all files in a directory structure it's easy to find identical hashes. The seemingly identical files can then be compared in full to check if they are really identical. Using SHA512 would make the process slower, and since we compare files in full anyway there is no risk in a potential false positive from MD5. (In a way, this would be creating a rainbow table where all the files are the dictionary)




            Sure, if there is no possibility of attack or if you compare the files fully anyway then MD5 is fine.



            However, if there is a vector of attack then you need to perform a full binary compare even after the hash has matched. Otherwise an attacker could make you retrieve the wrong deduplicated file. If you'd use a cryptographically strong hash then you would not have to perform the full file comparison at all. As others have noticed, a 256-512 bit hash is a lot easier to handle than performing a full file compare when storing. Passing over the file twice is not very fast either; all the speed advantages of MD5 are very likely nullified by the I/O required.



            Besides that, if you would reference it using the hash then there is no comparison to be made; you would only have a single file (this is about deduplication after all).




            "[MD5 is] too slow to use as a general purpose hash"? Are there faster standardized hashes to compare files, that still have a reasonably low chance of collision?




            Others have already mentioned keyed hashes (Message Authentication Codes) and non-crypto hashes, and one or two really fast crypto hashes that are more secure and usually as fast as MD5. But yes, as cryptographic hashes go, MD5 is certainly rather fast. That's mainly because it is uncomplicated and because it has a small state / output size.



            As we found out, MD5 is so uncomplicated that it could be broken. Other algorithms such as SHA-256 and -512 largely rely on the same principles but are still deemed secure. Note that newer Intel and AMD processors have SHA-256 acceleration, so it is likely that they would perform similarly to MD5 if the hardware acceleration is indeed used.




            As you can see, MD5 is almost never is a good idea, and many (smart) people still believe MD5 or SHA-1 to be secure under "their specific circumstances". They can often be proven wrong and leave the door open for (future) attacks on the system. I'd try to avoid it under any circumstances, especially if it is not used within HMAC.



            What I also see is that it is defended because a system cannot be upgraded. MD5 has been under attack for years and years. If you still cannot migrate away from MD5 then there is something seriously wrong with the security of your system that transcends the use of MD5. If you're designing / programming or keeping systems without upgrade path then you're the main security hazard, not the hash algorithm.






            share|improve this answer











            $endgroup$












            • $begingroup$
              There's no modern password hash that uses MD5. One could instantiate PBKDF2 with HMAC-MD5, but I'm not sure I've ever seen anyone do that, and if one is making a custom choice to begin with, one might as well choose a modern password hash!
              $endgroup$
              – Squeamish Ossifrage
              19 hours ago






            • 1




              $begingroup$
              I'm very prolific on SO, I've seen it plenty of times. People know MD5 and for some reason like to use it.
              $endgroup$
              – Maarten Bodewes
              18 hours ago






            • 1




              $begingroup$
              > MD5 is still secure to check hashes from another server as long as hackers cannot alter the input of the MD5 hash. $$$$ This requires very careful qualification. The issue about how much control an attacker might have over the input is extremely subtle and not obvious if you're not very familiar with cryptography. See the story at crypto.stackexchange.com/a/70057 where the attacker doesn't change anything at the time the initial compromise is detected but still compromises everyone in the end even if they verify the good MD5 hashes.
              $endgroup$
              – Squeamish Ossifrage
              18 hours ago











            • $begingroup$
              Well, that was kind of the point of my answer, you think you're secure, but you're not. What you posted is an interesting attack vector, but it still relies on the adversary (or one of his companions, same thing in a theoretic sense) to control the input of the hash.
              $endgroup$
              – Maarten Bodewes
              18 hours ago






            • 1




              $begingroup$
              By the way, you posted a fine answer, but I started off with the password hashing part, and that was too long for a comment already. I'm not looking forward to cleaning up the comment mess, by the way :P
              $endgroup$
              – Maarten Bodewes
              18 hours ago














            1












            1








            1





            $begingroup$


            I know that MD5 should not be used for password hashing




            Indeed. However, that's about the direct applicability of MD5 to a password or to use it with just a password and salt. In that case MD5 is less secure than a dedicated password hash with a work factor, at least for common passwords and pass phrases.



            However, the use of MD5 within a PRF (HMAC) and within a password hash is still OK as it relies on pre-image resistance for security, rather than collision resistance.



            I'd rather not bet that MD5 stays secure for pre-image resistance though. Attacks only get better and although I don't see any progress on breaking MD5's pre-image resistance, I would not rule it out either.




            Identifying malicious files, such as when Linux Mint's download servers were compromised and an ISO file was replaced by a malicious one; in this case you want to be sure that your file doesn't match; collision attacks aren't a vector here.




            MD5 is still secure to check hashes from another server as long as hackers cannot alter the input of the MD5 hash. However, for something like a full ISO, I'd say that they would have plenty of opportunity bringing in binary files that seem innocent with regard to contents, while they alter the intermediate state of MD5 vulnerable to collision attacks.



            That was not the attack that you referred to; in that case the MD5 hash on the official server was different from the one calculated over the ISO image.



            But attack on file distribution do rely on collision resistance, and this kind of use case can definitely be attacked. It would probably not be all that easy (correctly lining up the binary data required for the attack at the start of the ISO and such), but it is a vector of attack none-the-less.



            The same goes for SHA-1 in Git by the way. Not easy to breach, but far from impossible, whatever Linus says.




            Finding duplicate files. By MD5-summing all files in a directory structure it's easy to find identical hashes. The seemingly identical files can then be compared in full to check if they are really identical. Using SHA512 would make the process slower, and since we compare files in full anyway there is no risk in a potential false positive from MD5. (In a way, this would be creating a rainbow table where all the files are the dictionary)




            Sure, if there is no possibility of attack or if you compare the files fully anyway then MD5 is fine.



            However, if there is a vector of attack then you need to perform a full binary compare even after the hash has matched. Otherwise an attacker could make you retrieve the wrong deduplicated file. If you'd use a cryptographically strong hash then you would not have to perform the full file comparison at all. As others have noticed, a 256-512 bit hash is a lot easier to handle than performing a full file compare when storing. Passing over the file twice is not very fast either; all the speed advantages of MD5 are very likely nullified by the I/O required.



            Besides that, if you would reference it using the hash then there is no comparison to be made; you would only have a single file (this is about deduplication after all).




            "[MD5 is] too slow to use as a general purpose hash"? Are there faster standardized hashes to compare files, that still have a reasonably low chance of collision?




            Others have already mentioned keyed hashes (Message Authentication Codes) and non-crypto hashes, and one or two really fast crypto hashes that are more secure and usually as fast as MD5. But yes, as cryptographic hashes go, MD5 is certainly rather fast. That's mainly because it is uncomplicated and because it has a small state / output size.



            As we found out, MD5 is so uncomplicated that it could be broken. Other algorithms such as SHA-256 and -512 largely rely on the same principles but are still deemed secure. Note that newer Intel and AMD processors have SHA-256 acceleration, so it is likely that they would perform similarly to MD5 if the hardware acceleration is indeed used.




            As you can see, MD5 is almost never is a good idea, and many (smart) people still believe MD5 or SHA-1 to be secure under "their specific circumstances". They can often be proven wrong and leave the door open for (future) attacks on the system. I'd try to avoid it under any circumstances, especially if it is not used within HMAC.



            What I also see is that it is defended because a system cannot be upgraded. MD5 has been under attack for years and years. If you still cannot migrate away from MD5 then there is something seriously wrong with the security of your system that transcends the use of MD5. If you're designing / programming or keeping systems without upgrade path then you're the main security hazard, not the hash algorithm.






            share|improve this answer











            $endgroup$




            I know that MD5 should not be used for password hashing




            Indeed. However, that's about the direct applicability of MD5 to a password or to use it with just a password and salt. In that case MD5 is less secure than a dedicated password hash with a work factor, at least for common passwords and pass phrases.



            However, the use of MD5 within a PRF (HMAC) and within a password hash is still OK as it relies on pre-image resistance for security, rather than collision resistance.



            I'd rather not bet that MD5 stays secure for pre-image resistance though. Attacks only get better and although I don't see any progress on breaking MD5's pre-image resistance, I would not rule it out either.




            Identifying malicious files, such as when Linux Mint's download servers were compromised and an ISO file was replaced by a malicious one; in this case you want to be sure that your file doesn't match; collision attacks aren't a vector here.




            MD5 is still secure to check hashes from another server as long as hackers cannot alter the input of the MD5 hash. However, for something like a full ISO, I'd say that they would have plenty of opportunity bringing in binary files that seem innocent with regard to contents, while they alter the intermediate state of MD5 vulnerable to collision attacks.



            That was not the attack that you referred to; in that case the MD5 hash on the official server was different from the one calculated over the ISO image.



            But attack on file distribution do rely on collision resistance, and this kind of use case can definitely be attacked. It would probably not be all that easy (correctly lining up the binary data required for the attack at the start of the ISO and such), but it is a vector of attack none-the-less.



            The same goes for SHA-1 in Git by the way. Not easy to breach, but far from impossible, whatever Linus says.




            Finding duplicate files. By MD5-summing all files in a directory structure it's easy to find identical hashes. The seemingly identical files can then be compared in full to check if they are really identical. Using SHA512 would make the process slower, and since we compare files in full anyway there is no risk in a potential false positive from MD5. (In a way, this would be creating a rainbow table where all the files are the dictionary)




            Sure, if there is no possibility of attack or if you compare the files fully anyway then MD5 is fine.



            However, if there is a vector of attack then you need to perform a full binary compare even after the hash has matched. Otherwise an attacker could make you retrieve the wrong deduplicated file. If you'd use a cryptographically strong hash then you would not have to perform the full file comparison at all. As others have noticed, a 256-512 bit hash is a lot easier to handle than performing a full file compare when storing. Passing over the file twice is not very fast either; all the speed advantages of MD5 are very likely nullified by the I/O required.



            Besides that, if you would reference it using the hash then there is no comparison to be made; you would only have a single file (this is about deduplication after all).




            "[MD5 is] too slow to use as a general purpose hash"? Are there faster standardized hashes to compare files, that still have a reasonably low chance of collision?




            Others have already mentioned keyed hashes (Message Authentication Codes) and non-crypto hashes, and one or two really fast crypto hashes that are more secure and usually as fast as MD5. But yes, as cryptographic hashes go, MD5 is certainly rather fast. That's mainly because it is uncomplicated and because it has a small state / output size.



            As we found out, MD5 is so uncomplicated that it could be broken. Other algorithms such as SHA-256 and -512 largely rely on the same principles but are still deemed secure. Note that newer Intel and AMD processors have SHA-256 acceleration, so it is likely that they would perform similarly to MD5 if the hardware acceleration is indeed used.




            As you can see, MD5 is almost never is a good idea, and many (smart) people still believe MD5 or SHA-1 to be secure under "their specific circumstances". They can often be proven wrong and leave the door open for (future) attacks on the system. I'd try to avoid it under any circumstances, especially if it is not used within HMAC.



            What I also see is that it is defended because a system cannot be upgraded. MD5 has been under attack for years and years. If you still cannot migrate away from MD5 then there is something seriously wrong with the security of your system that transcends the use of MD5. If you're designing / programming or keeping systems without upgrade path then you're the main security hazard, not the hash algorithm.







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited 18 hours ago

























            answered 19 hours ago









            Maarten BodewesMaarten Bodewes

            56.2k679197




            56.2k679197











            • $begingroup$
              There's no modern password hash that uses MD5. One could instantiate PBKDF2 with HMAC-MD5, but I'm not sure I've ever seen anyone do that, and if one is making a custom choice to begin with, one might as well choose a modern password hash!
              $endgroup$
              – Squeamish Ossifrage
              19 hours ago






            • 1




              $begingroup$
              I'm very prolific on SO, I've seen it plenty of times. People know MD5 and for some reason like to use it.
              $endgroup$
              – Maarten Bodewes
              18 hours ago






            • 1




              $begingroup$
              > MD5 is still secure to check hashes from another server as long as hackers cannot alter the input of the MD5 hash. $$$$ This requires very careful qualification. The issue about how much control an attacker might have over the input is extremely subtle and not obvious if you're not very familiar with cryptography. See the story at crypto.stackexchange.com/a/70057 where the attacker doesn't change anything at the time the initial compromise is detected but still compromises everyone in the end even if they verify the good MD5 hashes.
              $endgroup$
              – Squeamish Ossifrage
              18 hours ago











            • $begingroup$
              Well, that was kind of the point of my answer, you think you're secure, but you're not. What you posted is an interesting attack vector, but it still relies on the adversary (or one of his companions, same thing in a theoretic sense) to control the input of the hash.
              $endgroup$
              – Maarten Bodewes
              18 hours ago






            • 1




              $begingroup$
              By the way, you posted a fine answer, but I started off with the password hashing part, and that was too long for a comment already. I'm not looking forward to cleaning up the comment mess, by the way :P
              $endgroup$
              – Maarten Bodewes
              18 hours ago

















            • $begingroup$
              There's no modern password hash that uses MD5. One could instantiate PBKDF2 with HMAC-MD5, but I'm not sure I've ever seen anyone do that, and if one is making a custom choice to begin with, one might as well choose a modern password hash!
              $endgroup$
              – Squeamish Ossifrage
              19 hours ago






            • 1




              $begingroup$
              I'm very prolific on SO, I've seen it plenty of times. People know MD5 and for some reason like to use it.
              $endgroup$
              – Maarten Bodewes
              18 hours ago






            • 1




              $begingroup$
              > MD5 is still secure to check hashes from another server as long as hackers cannot alter the input of the MD5 hash. $$$$ This requires very careful qualification. The issue about how much control an attacker might have over the input is extremely subtle and not obvious if you're not very familiar with cryptography. See the story at crypto.stackexchange.com/a/70057 where the attacker doesn't change anything at the time the initial compromise is detected but still compromises everyone in the end even if they verify the good MD5 hashes.
              $endgroup$
              – Squeamish Ossifrage
              18 hours ago











            • $begingroup$
              Well, that was kind of the point of my answer, you think you're secure, but you're not. What you posted is an interesting attack vector, but it still relies on the adversary (or one of his companions, same thing in a theoretic sense) to control the input of the hash.
              $endgroup$
              – Maarten Bodewes
              18 hours ago






            • 1




              $begingroup$
              By the way, you posted a fine answer, but I started off with the password hashing part, and that was too long for a comment already. I'm not looking forward to cleaning up the comment mess, by the way :P
              $endgroup$
              – Maarten Bodewes
              18 hours ago
















            $begingroup$
            There's no modern password hash that uses MD5. One could instantiate PBKDF2 with HMAC-MD5, but I'm not sure I've ever seen anyone do that, and if one is making a custom choice to begin with, one might as well choose a modern password hash!
            $endgroup$
            – Squeamish Ossifrage
            19 hours ago




            $begingroup$
            There's no modern password hash that uses MD5. One could instantiate PBKDF2 with HMAC-MD5, but I'm not sure I've ever seen anyone do that, and if one is making a custom choice to begin with, one might as well choose a modern password hash!
            $endgroup$
            – Squeamish Ossifrage
            19 hours ago




            1




            1




            $begingroup$
            I'm very prolific on SO, I've seen it plenty of times. People know MD5 and for some reason like to use it.
            $endgroup$
            – Maarten Bodewes
            18 hours ago




            $begingroup$
            I'm very prolific on SO, I've seen it plenty of times. People know MD5 and for some reason like to use it.
            $endgroup$
            – Maarten Bodewes
            18 hours ago




            1




            1




            $begingroup$
            > MD5 is still secure to check hashes from another server as long as hackers cannot alter the input of the MD5 hash. $$$$ This requires very careful qualification. The issue about how much control an attacker might have over the input is extremely subtle and not obvious if you're not very familiar with cryptography. See the story at crypto.stackexchange.com/a/70057 where the attacker doesn't change anything at the time the initial compromise is detected but still compromises everyone in the end even if they verify the good MD5 hashes.
            $endgroup$
            – Squeamish Ossifrage
            18 hours ago





            $begingroup$
            > MD5 is still secure to check hashes from another server as long as hackers cannot alter the input of the MD5 hash. $$$$ This requires very careful qualification. The issue about how much control an attacker might have over the input is extremely subtle and not obvious if you're not very familiar with cryptography. See the story at crypto.stackexchange.com/a/70057 where the attacker doesn't change anything at the time the initial compromise is detected but still compromises everyone in the end even if they verify the good MD5 hashes.
            $endgroup$
            – Squeamish Ossifrage
            18 hours ago













            $begingroup$
            Well, that was kind of the point of my answer, you think you're secure, but you're not. What you posted is an interesting attack vector, but it still relies on the adversary (or one of his companions, same thing in a theoretic sense) to control the input of the hash.
            $endgroup$
            – Maarten Bodewes
            18 hours ago




            $begingroup$
            Well, that was kind of the point of my answer, you think you're secure, but you're not. What you posted is an interesting attack vector, but it still relies on the adversary (or one of his companions, same thing in a theoretic sense) to control the input of the hash.
            $endgroup$
            – Maarten Bodewes
            18 hours ago




            1




            1




            $begingroup$
            By the way, you posted a fine answer, but I started off with the password hashing part, and that was too long for a comment already. I'm not looking forward to cleaning up the comment mess, by the way :P
            $endgroup$
            – Maarten Bodewes
            18 hours ago





            $begingroup$
            By the way, you posted a fine answer, but I started off with the password hashing part, and that was too long for a comment already. I'm not looking forward to cleaning up the comment mess, by the way :P
            $endgroup$
            – Maarten Bodewes
            18 hours ago












            -1












            $begingroup$

            One of the things not mentioned here, is that the attraction of hashing algorithms, (like programming languages etc) is ubiquity. The 'reason' for using MD5 is:



            • everyone knows about it

            • it's implemented for pretty much every combination of architectures, OSs, programming languages etc

            • it's limitations are well understood.

            This is important both from a practicality and security point of view.
            It reasonable to expect someone to be able to handle it (practicality) and it helps devs audit the process and not skip steps because they are 'tricky' (security).



            All that said, SHA is catching up and I think in time MD5 will die out, and when it does so, it will be quite quickly. Because its only a self-fulfilling-prophecy that it is popular in the first place, as there are normally better choices.



            However, in the intervening time, that its widely adopted may well be a good reason in it own right to use MD5 (within it's limits).






            share|improve this answer








            New contributor




            ANone is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
            Check out our Code of Conduct.






            $endgroup$












            • $begingroup$
              Welcome to crypto.stackexchange - I think there are some additions that would improve this answer: It seems like the bullet points are equally applicable to SHA1/SHA2 (considering that they're standardized algorithms). It is clear why those points would promote the use of an algorithm, but it's not clear to me why they would promote the use of MD5 over SHA1/SHA2. The last point seems to say that "everyone else is doing it" is a good enough reason to use MD5 (which is seldom a good reason to do anything). It also mentions limits, but does not elaborate on what those limits should be.
              $endgroup$
              – Ella Rose
              yesterday






            • 4




              $begingroup$
              Use of MD5 has been questionable for a quarter of a century since Hans Dobbertin published collisions in the compression function in 1996, and MD5 has been completely broken for a decade and a half since Xiaoyun Wang's team demonstrated collisions in 2004. Collisions in MD5 were exploited in practice by the United States and Israel to sabotage Iran's nuclear program. SHA-2 has been available since 2002, seventeen years. Note SHA-0 and SHA-1 are broken too; timeline. $$$$ The Caesar cipher meets all your criteria too.
              $endgroup$
              – Squeamish Ossifrage
              yesterday







            • 1




              $begingroup$
              Just too be clear: I don't think MD5 is better than SHA-2 (or good). The question was not 'which is better' but is there any reason to use MD5. Sure, the 17 years that SHA-2 has had to take over and is widely enough used that it's a good choice. Especially if you're the internet connected windows/linux space. If that's where you are and there's no special cases then don't use MD5 there are better alternatives. But that wan not the question. It take a while to become trusted and some tech stacks move a lot less rapidly. Sometimes 17 years isn't that long.
              $endgroup$
              – ANone
              yesterday






            • 1




              $begingroup$
              If someone is (a) actually designing a protocol (b) constrained to a specific, real environment that is (c) limited to MD5 in that environment for specific technical reasons that can be articulated, then they can ask a question in which we give useful guidance for security. That's not the case here: the original poster is asking about convenience, about an ad hoc vulnerability disclosure on highly capable computers that can easily use SHA-2, about cheap ways to keep collision probabilities low, etc.
              $endgroup$
              – Squeamish Ossifrage
              yesterday






            • 1




              $begingroup$
              @SqueamishOssifrage Sure but that wasn't the question. Also if your looking for hashing advice and your research is: Read one crypto.stackexchange Q/A titled: "Is there really no use for MD5 anymore?", scroll past most of the answers that say 'dont do it' and get to the one that says "well for completeness's sake: maybe", and take that to mean MD5 for the win... On that note, there'd be fewer of those hackernews posts if there wasn't a legit argument to say people had been overly negative. I think honesty about its weakness is better than scaring people away.
              $endgroup$
              – ANone
              yesterday















            -1












            $begingroup$

            One of the things not mentioned here, is that the attraction of hashing algorithms, (like programming languages etc) is ubiquity. The 'reason' for using MD5 is:



            • everyone knows about it

            • it's implemented for pretty much every combination of architectures, OSs, programming languages etc

            • it's limitations are well understood.

            This is important both from a practicality and security point of view.
            It reasonable to expect someone to be able to handle it (practicality) and it helps devs audit the process and not skip steps because they are 'tricky' (security).



            All that said, SHA is catching up and I think in time MD5 will die out, and when it does so, it will be quite quickly. Because its only a self-fulfilling-prophecy that it is popular in the first place, as there are normally better choices.



            However, in the intervening time, that its widely adopted may well be a good reason in it own right to use MD5 (within it's limits).






            share|improve this answer








            New contributor




            ANone is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
            Check out our Code of Conduct.






            $endgroup$












            • $begingroup$
              Welcome to crypto.stackexchange - I think there are some additions that would improve this answer: It seems like the bullet points are equally applicable to SHA1/SHA2 (considering that they're standardized algorithms). It is clear why those points would promote the use of an algorithm, but it's not clear to me why they would promote the use of MD5 over SHA1/SHA2. The last point seems to say that "everyone else is doing it" is a good enough reason to use MD5 (which is seldom a good reason to do anything). It also mentions limits, but does not elaborate on what those limits should be.
              $endgroup$
              – Ella Rose
              yesterday






            • 4




              $begingroup$
              Use of MD5 has been questionable for a quarter of a century since Hans Dobbertin published collisions in the compression function in 1996, and MD5 has been completely broken for a decade and a half since Xiaoyun Wang's team demonstrated collisions in 2004. Collisions in MD5 were exploited in practice by the United States and Israel to sabotage Iran's nuclear program. SHA-2 has been available since 2002, seventeen years. Note SHA-0 and SHA-1 are broken too; timeline. $$$$ The Caesar cipher meets all your criteria too.
              $endgroup$
              – Squeamish Ossifrage
              yesterday







            • 1




              $begingroup$
              Just too be clear: I don't think MD5 is better than SHA-2 (or good). The question was not 'which is better' but is there any reason to use MD5. Sure, the 17 years that SHA-2 has had to take over and is widely enough used that it's a good choice. Especially if you're the internet connected windows/linux space. If that's where you are and there's no special cases then don't use MD5 there are better alternatives. But that wan not the question. It take a while to become trusted and some tech stacks move a lot less rapidly. Sometimes 17 years isn't that long.
              $endgroup$
              – ANone
              yesterday






            • 1




              $begingroup$
              If someone is (a) actually designing a protocol (b) constrained to a specific, real environment that is (c) limited to MD5 in that environment for specific technical reasons that can be articulated, then they can ask a question in which we give useful guidance for security. That's not the case here: the original poster is asking about convenience, about an ad hoc vulnerability disclosure on highly capable computers that can easily use SHA-2, about cheap ways to keep collision probabilities low, etc.
              $endgroup$
              – Squeamish Ossifrage
              yesterday






            • 1




              $begingroup$
              @SqueamishOssifrage Sure but that wasn't the question. Also if your looking for hashing advice and your research is: Read one crypto.stackexchange Q/A titled: "Is there really no use for MD5 anymore?", scroll past most of the answers that say 'dont do it' and get to the one that says "well for completeness's sake: maybe", and take that to mean MD5 for the win... On that note, there'd be fewer of those hackernews posts if there wasn't a legit argument to say people had been overly negative. I think honesty about its weakness is better than scaring people away.
              $endgroup$
              – ANone
              yesterday













            -1












            -1








            -1





            $begingroup$

            One of the things not mentioned here, is that the attraction of hashing algorithms, (like programming languages etc) is ubiquity. The 'reason' for using MD5 is:



            • everyone knows about it

            • it's implemented for pretty much every combination of architectures, OSs, programming languages etc

            • it's limitations are well understood.

            This is important both from a practicality and security point of view.
            It reasonable to expect someone to be able to handle it (practicality) and it helps devs audit the process and not skip steps because they are 'tricky' (security).



            All that said, SHA is catching up and I think in time MD5 will die out, and when it does so, it will be quite quickly. Because its only a self-fulfilling-prophecy that it is popular in the first place, as there are normally better choices.



            However, in the intervening time, that its widely adopted may well be a good reason in it own right to use MD5 (within it's limits).






            share|improve this answer








            New contributor




            ANone is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
            Check out our Code of Conduct.






            $endgroup$



            One of the things not mentioned here, is that the attraction of hashing algorithms, (like programming languages etc) is ubiquity. The 'reason' for using MD5 is:



            • everyone knows about it

            • it's implemented for pretty much every combination of architectures, OSs, programming languages etc

            • it's limitations are well understood.

            This is important both from a practicality and security point of view.
            It reasonable to expect someone to be able to handle it (practicality) and it helps devs audit the process and not skip steps because they are 'tricky' (security).



            All that said, SHA is catching up and I think in time MD5 will die out, and when it does so, it will be quite quickly. Because its only a self-fulfilling-prophecy that it is popular in the first place, as there are normally better choices.



            However, in the intervening time, that its widely adopted may well be a good reason in it own right to use MD5 (within it's limits).







            share|improve this answer








            New contributor




            ANone is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
            Check out our Code of Conduct.









            share|improve this answer



            share|improve this answer






            New contributor




            ANone is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
            Check out our Code of Conduct.









            answered yesterday









            ANoneANone

            1311




            1311




            New contributor




            ANone is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
            Check out our Code of Conduct.





            New contributor





            ANone is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
            Check out our Code of Conduct.






            ANone is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
            Check out our Code of Conduct.











            • $begingroup$
              Welcome to crypto.stackexchange - I think there are some additions that would improve this answer: It seems like the bullet points are equally applicable to SHA1/SHA2 (considering that they're standardized algorithms). It is clear why those points would promote the use of an algorithm, but it's not clear to me why they would promote the use of MD5 over SHA1/SHA2. The last point seems to say that "everyone else is doing it" is a good enough reason to use MD5 (which is seldom a good reason to do anything). It also mentions limits, but does not elaborate on what those limits should be.
              $endgroup$
              – Ella Rose
              yesterday






            • 4




              $begingroup$
              Use of MD5 has been questionable for a quarter of a century since Hans Dobbertin published collisions in the compression function in 1996, and MD5 has been completely broken for a decade and a half since Xiaoyun Wang's team demonstrated collisions in 2004. Collisions in MD5 were exploited in practice by the United States and Israel to sabotage Iran's nuclear program. SHA-2 has been available since 2002, seventeen years. Note SHA-0 and SHA-1 are broken too; timeline. $$$$ The Caesar cipher meets all your criteria too.
              $endgroup$
              – Squeamish Ossifrage
              yesterday







            • 1




              $begingroup$
              Just too be clear: I don't think MD5 is better than SHA-2 (or good). The question was not 'which is better' but is there any reason to use MD5. Sure, the 17 years that SHA-2 has had to take over and is widely enough used that it's a good choice. Especially if you're the internet connected windows/linux space. If that's where you are and there's no special cases then don't use MD5 there are better alternatives. But that wan not the question. It take a while to become trusted and some tech stacks move a lot less rapidly. Sometimes 17 years isn't that long.
              $endgroup$
              – ANone
              yesterday






            • 1




              $begingroup$
              If someone is (a) actually designing a protocol (b) constrained to a specific, real environment that is (c) limited to MD5 in that environment for specific technical reasons that can be articulated, then they can ask a question in which we give useful guidance for security. That's not the case here: the original poster is asking about convenience, about an ad hoc vulnerability disclosure on highly capable computers that can easily use SHA-2, about cheap ways to keep collision probabilities low, etc.
              $endgroup$
              – Squeamish Ossifrage
              yesterday






            • 1




              $begingroup$
              @SqueamishOssifrage Sure but that wasn't the question. Also if your looking for hashing advice and your research is: Read one crypto.stackexchange Q/A titled: "Is there really no use for MD5 anymore?", scroll past most of the answers that say 'dont do it' and get to the one that says "well for completeness's sake: maybe", and take that to mean MD5 for the win... On that note, there'd be fewer of those hackernews posts if there wasn't a legit argument to say people had been overly negative. I think honesty about its weakness is better than scaring people away.
              $endgroup$
              – ANone
              yesterday
















            • $begingroup$
              Welcome to crypto.stackexchange - I think there are some additions that would improve this answer: It seems like the bullet points are equally applicable to SHA1/SHA2 (considering that they're standardized algorithms). It is clear why those points would promote the use of an algorithm, but it's not clear to me why they would promote the use of MD5 over SHA1/SHA2. The last point seems to say that "everyone else is doing it" is a good enough reason to use MD5 (which is seldom a good reason to do anything). It also mentions limits, but does not elaborate on what those limits should be.
              $endgroup$
              – Ella Rose
              yesterday






            • 4




              $begingroup$
              Use of MD5 has been questionable for a quarter of a century since Hans Dobbertin published collisions in the compression function in 1996, and MD5 has been completely broken for a decade and a half since Xiaoyun Wang's team demonstrated collisions in 2004. Collisions in MD5 were exploited in practice by the United States and Israel to sabotage Iran's nuclear program. SHA-2 has been available since 2002, seventeen years. Note SHA-0 and SHA-1 are broken too; timeline. $$$$ The Caesar cipher meets all your criteria too.
              $endgroup$
              – Squeamish Ossifrage
              yesterday







            • 1




              $begingroup$
              Just too be clear: I don't think MD5 is better than SHA-2 (or good). The question was not 'which is better' but is there any reason to use MD5. Sure, the 17 years that SHA-2 has had to take over and is widely enough used that it's a good choice. Especially if you're the internet connected windows/linux space. If that's where you are and there's no special cases then don't use MD5 there are better alternatives. But that wan not the question. It take a while to become trusted and some tech stacks move a lot less rapidly. Sometimes 17 years isn't that long.
              $endgroup$
              – ANone
              yesterday






            • 1




              $begingroup$
              If someone is (a) actually designing a protocol (b) constrained to a specific, real environment that is (c) limited to MD5 in that environment for specific technical reasons that can be articulated, then they can ask a question in which we give useful guidance for security. That's not the case here: the original poster is asking about convenience, about an ad hoc vulnerability disclosure on highly capable computers that can easily use SHA-2, about cheap ways to keep collision probabilities low, etc.
              $endgroup$
              – Squeamish Ossifrage
              yesterday






            • 1




              $begingroup$
              @SqueamishOssifrage Sure but that wasn't the question. Also if your looking for hashing advice and your research is: Read one crypto.stackexchange Q/A titled: "Is there really no use for MD5 anymore?", scroll past most of the answers that say 'dont do it' and get to the one that says "well for completeness's sake: maybe", and take that to mean MD5 for the win... On that note, there'd be fewer of those hackernews posts if there wasn't a legit argument to say people had been overly negative. I think honesty about its weakness is better than scaring people away.
              $endgroup$
              – ANone
              yesterday















            $begingroup$
            Welcome to crypto.stackexchange - I think there are some additions that would improve this answer: It seems like the bullet points are equally applicable to SHA1/SHA2 (considering that they're standardized algorithms). It is clear why those points would promote the use of an algorithm, but it's not clear to me why they would promote the use of MD5 over SHA1/SHA2. The last point seems to say that "everyone else is doing it" is a good enough reason to use MD5 (which is seldom a good reason to do anything). It also mentions limits, but does not elaborate on what those limits should be.
            $endgroup$
            – Ella Rose
            yesterday




            $begingroup$
            Welcome to crypto.stackexchange - I think there are some additions that would improve this answer: It seems like the bullet points are equally applicable to SHA1/SHA2 (considering that they're standardized algorithms). It is clear why those points would promote the use of an algorithm, but it's not clear to me why they would promote the use of MD5 over SHA1/SHA2. The last point seems to say that "everyone else is doing it" is a good enough reason to use MD5 (which is seldom a good reason to do anything). It also mentions limits, but does not elaborate on what those limits should be.
            $endgroup$
            – Ella Rose
            yesterday




            4




            4




            $begingroup$
            Use of MD5 has been questionable for a quarter of a century since Hans Dobbertin published collisions in the compression function in 1996, and MD5 has been completely broken for a decade and a half since Xiaoyun Wang's team demonstrated collisions in 2004. Collisions in MD5 were exploited in practice by the United States and Israel to sabotage Iran's nuclear program. SHA-2 has been available since 2002, seventeen years. Note SHA-0 and SHA-1 are broken too; timeline. $$$$ The Caesar cipher meets all your criteria too.
            $endgroup$
            – Squeamish Ossifrage
            yesterday





            $begingroup$
            Use of MD5 has been questionable for a quarter of a century since Hans Dobbertin published collisions in the compression function in 1996, and MD5 has been completely broken for a decade and a half since Xiaoyun Wang's team demonstrated collisions in 2004. Collisions in MD5 were exploited in practice by the United States and Israel to sabotage Iran's nuclear program. SHA-2 has been available since 2002, seventeen years. Note SHA-0 and SHA-1 are broken too; timeline. $$$$ The Caesar cipher meets all your criteria too.
            $endgroup$
            – Squeamish Ossifrage
            yesterday





            1




            1




            $begingroup$
            Just too be clear: I don't think MD5 is better than SHA-2 (or good). The question was not 'which is better' but is there any reason to use MD5. Sure, the 17 years that SHA-2 has had to take over and is widely enough used that it's a good choice. Especially if you're the internet connected windows/linux space. If that's where you are and there's no special cases then don't use MD5 there are better alternatives. But that wan not the question. It take a while to become trusted and some tech stacks move a lot less rapidly. Sometimes 17 years isn't that long.
            $endgroup$
            – ANone
            yesterday




            $begingroup$
            Just too be clear: I don't think MD5 is better than SHA-2 (or good). The question was not 'which is better' but is there any reason to use MD5. Sure, the 17 years that SHA-2 has had to take over and is widely enough used that it's a good choice. Especially if you're the internet connected windows/linux space. If that's where you are and there's no special cases then don't use MD5 there are better alternatives. But that wan not the question. It take a while to become trusted and some tech stacks move a lot less rapidly. Sometimes 17 years isn't that long.
            $endgroup$
            – ANone
            yesterday




            1




            1




            $begingroup$
            If someone is (a) actually designing a protocol (b) constrained to a specific, real environment that is (c) limited to MD5 in that environment for specific technical reasons that can be articulated, then they can ask a question in which we give useful guidance for security. That's not the case here: the original poster is asking about convenience, about an ad hoc vulnerability disclosure on highly capable computers that can easily use SHA-2, about cheap ways to keep collision probabilities low, etc.
            $endgroup$
            – Squeamish Ossifrage
            yesterday




            $begingroup$
            If someone is (a) actually designing a protocol (b) constrained to a specific, real environment that is (c) limited to MD5 in that environment for specific technical reasons that can be articulated, then they can ask a question in which we give useful guidance for security. That's not the case here: the original poster is asking about convenience, about an ad hoc vulnerability disclosure on highly capable computers that can easily use SHA-2, about cheap ways to keep collision probabilities low, etc.
            $endgroup$
            – Squeamish Ossifrage
            yesterday




            1




            1




            $begingroup$
            @SqueamishOssifrage Sure but that wasn't the question. Also if your looking for hashing advice and your research is: Read one crypto.stackexchange Q/A titled: "Is there really no use for MD5 anymore?", scroll past most of the answers that say 'dont do it' and get to the one that says "well for completeness's sake: maybe", and take that to mean MD5 for the win... On that note, there'd be fewer of those hackernews posts if there wasn't a legit argument to say people had been overly negative. I think honesty about its weakness is better than scaring people away.
            $endgroup$
            – ANone
            yesterday




            $begingroup$
            @SqueamishOssifrage Sure but that wasn't the question. Also if your looking for hashing advice and your research is: Read one crypto.stackexchange Q/A titled: "Is there really no use for MD5 anymore?", scroll past most of the answers that say 'dont do it' and get to the one that says "well for completeness's sake: maybe", and take that to mean MD5 for the win... On that note, there'd be fewer of those hackernews posts if there wasn't a legit argument to say people had been overly negative. I think honesty about its weakness is better than scaring people away.
            $endgroup$
            – ANone
            yesterday

















            draft saved

            draft discarded
















































            Thanks for contributing an answer to Cryptography Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcrypto.stackexchange.com%2fquestions%2f70036%2fis-there-really-no-use-for-md5-anymore%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Identifying “long and narrow” polygons in with PostGISlength and width of polygonWhy postgis st_overlaps reports Qgis' “avoid intersections” generated polygon as overlapping with others?Adjusting polygons to boundary and filling holesDrawing polygons with fixed area?How to remove spikes in Polygons with PostGISDeleting sliver polygons after difference operation in QGIS?Snapping boundaries in PostGISSplit polygon into parts adding attributes based on underlying polygon in QGISSplitting overlap between polygons and assign to nearest polygon using PostGIS?Expanding polygons and clipping at midpoint?Removing Intersection of Buffers in Same Layers

            Masuk log Menu navigasi

            อาณาจักร (ชีววิทยา) ดูเพิ่ม อ้างอิง รายการเลือกการนำทาง10.1086/39456810.5962/bhl.title.447410.1126/science.163.3863.150576276010.1007/BF01796092408502"Phylogenetic structure of the prokaryotic domain: the primary kingdoms"10.1073/pnas.74.11.5088432104270744"Towards a natural system of organisms: proposal for the domains Archaea, Bacteria, and Eucarya"1990PNAS...87.4576W10.1073/pnas.87.12.4576541592112744PubMedJump the queueexpand by handPubMedJump the queueexpand by handPubMedJump the queueexpand by hand"A revised six-kingdom system of life"10.1111/j.1469-185X.1998.tb00030.x9809012"Only six kingdoms of life"10.1098/rspb.2004.2705169172415306349"Kingdoms Protozoa and Chromista and the eozoan root of the eukaryotic tree"10.1098/rsbl.2009.0948288006020031978เพิ่มข้อมูล