3c22fce14c
- Only use Visual C++ pragma with appropriate compilers. - Working link for Thomas Wang's hash function. - Updated unordered rationale. - Fix `unnecessary_copy_tests` for Visual C++ 12. - Some extra insert tests. [SVN r86728]
51 lines
2.3 KiB
Plaintext
51 lines
2.3 KiB
Plaintext
|
|
[/ Copyright 2011 Daniel James.
|
|
/ Distributed under the Boost Software License, Version 1.0. (See accompanying
|
|
/ file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) ]
|
|
|
|
[section:rationale Rationale]
|
|
|
|
The rationale can be found in the original design
|
|
[footnote issue 6.18 of the __issues__ (page 63)].
|
|
|
|
[heading Quality of the hash function]
|
|
|
|
Many hash functions strive to have little correlation between the input
|
|
and output values. They attempt to uniformally distribute the output
|
|
values for very similar inputs. This hash function makes no such
|
|
attempt. In fact, for integers, the result of the hash function is often
|
|
just the input value. So similar but different input values will often
|
|
result in similar but different output values.
|
|
This means that it is not appropriate as a general hash function. For
|
|
example, a hash table may discard bits from the hash function resulting
|
|
in likely collisions, or might have poor collision resolution when hash
|
|
values are clustered together. In such cases this hash function will
|
|
preform poorly.
|
|
|
|
But the standard has no such requirement for the hash function,
|
|
it just requires that the hashes of two different values are unlikely
|
|
to collide. Containers or algorithms
|
|
designed to work with the standard hash function will have to be
|
|
implemented to work well when the hash function's output is correlated
|
|
to its input. Since they are paying that cost a higher quality hash function
|
|
would be wasteful.
|
|
|
|
For other use cases, if you do need a higher quality hash function,
|
|
then neither the standard hash function or `boost::hash` are appropriate.
|
|
There are several options
|
|
available. One is to use a second hash on the output of this hash
|
|
function, such as [@http://web.archive.org/web/20121102023700/http://www.concentric.net/~Ttwang/tech/inthash.htm
|
|
Thomas Wang's hash function]. This this may not work as
|
|
well as a hash algorithm tailored for the input.
|
|
|
|
For strings there are several fast, high quality hash functions
|
|
available (for example [@http://code.google.com/p/smhasher/ MurmurHash3]
|
|
and [@http://code.google.com/p/cityhash/ Google's CityHash]),
|
|
although they tend to be more machine specific.
|
|
These may also be appropriate for hashing a binary representation of
|
|
your data - providing that all equal values have an equal
|
|
representation, which is not always the case (e.g. for floating point
|
|
values).
|
|
|
|
[endsect]
|