Package squidpony.squidmath
Class CrossHash.Water
java.lang.Object
squidpony.squidmath.CrossHash.Water
- Enclosing class:
- CrossHash
public static final class CrossHash.Water extends Object
A fairly fast hashing algorithm in general, Water performs especially well on large arrays, and passes SMHasher's
newest and most stringent version of tests. The int-hashing
These hash functions are so fast because they operate in bulk on 4 items at a time, such as 4 ints (which is the optimal case), 4 bytes, or 4 longs (which uses a different algorithm). This bulk operation usually entails 3 multiplications and some other, cheaper operations per 4 items hashed. For long arrays, it requires many more multiplications, but modern CPUs can pipeline the operations on unrelated longs to run in parallel on one core. If any items are left over after the bulk segment, Water uses the least effort possible to hash the remaining 1, 2, or 3 items left. Most of these operations use the method
hash(int[])
method is almost twice as fast
as CrossHash.Hive.hash(int[])
and faster than Arrays.hashCode(int[])
; on longer arrays
CrossHash.Curlup.hash(int[])
is faster. Based on wyhash,
specifically the waterhash variant. This version passes
SMHasher for both the 32-bit output hash() methods and the 64-bit output hash64() methods (which use the slightly
tweaked wheathash variant in the waterhash Git repo, or woothash for hashing long arrays). While an earlier
version passed rurban/smhasher, it failed demerphq/smhasher (Yves' more stringent fork), so some minor tweaks
allowed the latest code to pass Yves' test. Uses 64-bit math, so it won't be as fast on GWT. Currently, the
methods that hash types other than int arrays aren't as fast as the int array hash, but they are usually faster
than the former default Hive implementation, and unlike Hive, these pass SMHasher. If you want to have a seed the
hash, so hashing the same data with a different seed produces different output, you can use CrossHash.Yolk
or
CrossHash.Curlup
, preferring Curlup unless all of your data is in small arrays (under 20 length, give or take).
These hash functions are so fast because they operate in bulk on 4 items at a time, such as 4 ints (which is the optimal case), 4 bytes, or 4 longs (which uses a different algorithm). This bulk operation usually entails 3 multiplications and some other, cheaper operations per 4 items hashed. For long arrays, it requires many more multiplications, but modern CPUs can pipeline the operations on unrelated longs to run in parallel on one core. If any items are left over after the bulk segment, Water uses the least effort possible to hash the remaining 1, 2, or 3 items left. Most of these operations use the method
mum(long, long)
, which helps take two inputs
and multiply them once, getting a more-random result after another small step. The long array code uses
wow(long, long)
(similar to mum upside-down), which mixes up its arguments with each other before
multplying. It finishes with either code similar to mum() for 32-bit output hash() methods, or a somewhat more
rigorous method for 64-bit output hash64() methods (still similar to mum).-
Field Summary
-
Constructor Summary
Constructors Constructor Description Water()
-
Method Summary
Modifier and Type Method Description static int
hash(boolean[] data)
static int
hash(byte[] data)
static int
hash(char[] data)
static int
hash(char[][] data)
static int
hash(char[] data, int start, int end)
Hashes only a subsection of the given data, starting at start (inclusive) and ending before end (exclusive).static int
hash(double[] data)
static int
hash(float[] data)
static int
hash(int[] data)
static int
hash(int[][] data)
static int
hash(int[] data, int length)
static int
hash(long[] data)
static int
hash(long[][] data)
static int
hash(short[] data)
static int
hash(CharSequence data)
static int
hash(CharSequence[] data)
static int
hash(CharSequence[]... data)
static int
hash(CharSequence data, int start, int end)
Hashes only a subsection of the given data, starting at start (inclusive) and ending before end (exclusive).static int
hash(Iterable<? extends CharSequence> data)
static int
hash(Object data)
static int
hash(Object[] data)
static int
hash(List<? extends CharSequence> data)
static long
hash64(boolean[] data)
static long
hash64(byte[] data)
static long
hash64(char[] data)
static long
hash64(char[][] data)
static long
hash64(char[] data, int start, int end)
Hashes only a subsection of the given data, starting at start (inclusive) and ending before end (exclusive).static long
hash64(double[] data)
static long
hash64(float[] data)
static long
hash64(int[] data)
static long
hash64(int[][] data)
static long
hash64(int[] data, int length)
static long
hash64(long[] data)
static long
hash64(long[][] data)
static long
hash64(short[] data)
static long
hash64(CharSequence data)
static long
hash64(CharSequence[] data)
static long
hash64(CharSequence[]... data)
static long
hash64(CharSequence data, int start, int end)
Hashes only a subsection of the given data, starting at start (inclusive) and ending before end (exclusive).static long
hash64(Iterable<? extends CharSequence> data)
static long
hash64(Object data)
static long
hash64(Object[] data)
static long
hash64(List<? extends CharSequence> data)
static long
mum(long a, long b)
Takes two arguments that are technically longs, and should be very different, and uses them to get a result that is technically a long and mixes the bits of the inputs.static long
wow(long a, long b)
A slower but higher-quality variant onmum(long, long)
that can take two arbitrary longs (with any of their 64 bits containing relevant data) instead of mum's 32-bit sections of its inputs, and outputs a 64-bit result that can have any of its bits used.
-
Field Details
-
b0
Big constant 0.- See Also:
- Constant Field Values
-
b1
Big constant 1.- See Also:
- Constant Field Values
-
b2
Big constant 2.- See Also:
- Constant Field Values
-
b3
Big constant 3.- See Also:
- Constant Field Values
-
b4
Big constant 4.- See Also:
- Constant Field Values
-
b5
Big constant 5.- See Also:
- Constant Field Values
-
-
Constructor Details
-
Method Details
-
mum
Takes two arguments that are technically longs, and should be very different, and uses them to get a result that is technically a long and mixes the bits of the inputs. The arguments and result are only technically longs because their lower 32 bits matter much more than their upper 32, and giving just any long won't work.
This is very similar to wyhash's mum function, but doesn't use 128-bit math because it expects that its arguments are only relevant in their lower 32 bits (allowing their product to fit in 64 bits).- Parameters:
a
- a long that should probably only hold an int's worth of datab
- a long that should probably only hold an int's worth of data- Returns:
- a sort-of randomized output dependent on both inputs
-
wow
A slower but higher-quality variant onmum(long, long)
that can take two arbitrary longs (with any of their 64 bits containing relevant data) instead of mum's 32-bit sections of its inputs, and outputs a 64-bit result that can have any of its bits used.
This was changed so it distributes bits from both inputs a little better on July 6, 2019.- Parameters:
a
- any longb
- any long- Returns:
- a sort-of randomized output dependent on both inputs
-
hash64
-
hash64
-
hash64
-
hash64
-
hash64
-
hash64
-
hash64
-
hash64
-
hash64
-
hash64
-
hash64
Hashes only a subsection of the given data, starting at start (inclusive) and ending before end (exclusive).- Parameters:
data
- the char array to hashstart
- the start of the section to hash (inclusive)end
- the end of the section to hash (exclusive)- Returns:
- a 64-bit hash code for the requested section of data
-
hash64
Hashes only a subsection of the given data, starting at start (inclusive) and ending before end (exclusive).- Parameters:
data
- the String or other CharSequence to hashstart
- the start of the section to hash (inclusive)end
- the end of the section to hash (exclusive)- Returns:
- a 64-bit hash code for the requested section of data
-
hash64
-
hash64
-
hash64
-
hash64
-
hash64
-
hash64
-
hash64
-
hash64
-
hash64
-
hash
-
hash
-
hash
-
hash
-
hash
-
hash
-
hash
-
hash
-
hash
-
hash
-
hash
Hashes only a subsection of the given data, starting at start (inclusive) and ending before end (exclusive).- Parameters:
data
- the char array to hashstart
- the start of the section to hash (inclusive)end
- the end of the section to hash (exclusive)- Returns:
- a 32-bit hash code for the requested section of data
-
hash
Hashes only a subsection of the given data, starting at start (inclusive) and ending before end (exclusive).- Parameters:
data
- the String or other CharSequence to hashstart
- the start of the section to hash (inclusive)end
- the end of the section to hash (exclusive)- Returns:
- a 32-bit hash code for the requested section of data
-
hash
-
hash
-
hash
-
hash
-
hash
-
hash
-
hash
-
hash
-
hash
-