Class CrossHash.Water
java.lang.Object
com.github.yellowstonegames.old.v300.CrossHash.Water
- Enclosing class:
CrossHash
A fairly fast hashing algorithm in general, Water performs especially well on large arrays, and passes SMHasher's
newest and most stringent version of tests. The int-hashing
These hash functions are so fast because they operate in bulk on 4 items at a time, such as 4 ints (which is the optimal case), 4 bytes, or 4 longs (which uses a different algorithm). This bulk operation usually entails 3 multiplications and some other, cheaper operations per 4 items hashed. For long arrays, it requires many more multiplications, but modern CPUs can pipeline the operations on unrelated longs to run in parallel on one core. If any items are left over after the bulk segment, Water uses the least effort possible to hash the remaining 1, 2, or 3 items left. Most of these operations use the method
hash(int[]) method is almost twice as fast
as CrossHash.Hive.hash(int[]) and faster than Arrays.hashCode(int[]); on longer arrays
CrossHash.Curlup.hash(int[]) is faster. Based on wyhash,
specifically the waterhash variant. This version passes
SMHasher for both the 32-bit output hash() methods and the 64-bit output hash64() methods (which use the slightly
tweaked wheathash variant in the waterhash Git repo, or woothash for hashing long arrays). While an earlier
version passed rurban/smhasher, it failed demerphq/smhasher (Yves' more stringent fork), so some minor tweaks
allowed the latest code to pass Yves' test. Uses 64-bit math, so it won't be as fast on GWT. Currently, the
methods that hash types other than int arrays aren't as fast as the int array hash, but they are usually faster
than the former default Hive implementation, and unlike Hive, these pass SMHasher. If you want to have a seed the
hash, so hashing the same data with a different seed produces different output, you can use CrossHash.Yolk or
CrossHash.Curlup, preferring Curlup unless all of your data is in small arrays (under 20 length, give or take).
These hash functions are so fast because they operate in bulk on 4 items at a time, such as 4 ints (which is the optimal case), 4 bytes, or 4 longs (which uses a different algorithm). This bulk operation usually entails 3 multiplications and some other, cheaper operations per 4 items hashed. For long arrays, it requires many more multiplications, but modern CPUs can pipeline the operations on unrelated longs to run in parallel on one core. If any items are left over after the bulk segment, Water uses the least effort possible to hash the remaining 1, 2, or 3 items left. Most of these operations use the method
mum(long, long), which helps take two inputs
and multiply them once, getting a more-random result after another small step. The long array code uses
wow(long, long) (similar to mum upside-down), which mixes up its arguments with each other before
multplying. It finishes with either code similar to mum() for 32-bit output hash() methods, or a somewhat more
rigorous method for 64-bit output hash64() methods (still similar to mum).-
Field Summary
Fields -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic inthash(boolean[] data) static inthash(byte[] data) static inthash(char[] data) static inthash(char[][] data) static inthash(char[] data, int start, int end) Hashes only a subsection of the given data, starting at start (inclusive) and ending before end (exclusive).static inthash(double[] data) static inthash(float[] data) static inthash(int[] data) static inthash(int[][] data) static inthash(int[] data, int length) static inthash(long[] data) static inthash(long[][] data) static inthash(short[] data) static inthash(CharSequence data) static inthash(CharSequence[] data) static inthash(CharSequence[]... data) static inthash(CharSequence data, int start, int end) Hashes only a subsection of the given data, starting at start (inclusive) and ending before end (exclusive).static inthash(Iterable<? extends CharSequence> data) static intstatic intstatic inthash(List<? extends CharSequence> data) static longhash64(boolean[] data) static longhash64(byte[] data) static longhash64(char[] data) static longhash64(char[][] data) static longhash64(char[] data, int start, int end) Hashes only a subsection of the given data, starting at start (inclusive) and ending before end (exclusive).static longhash64(double[] data) static longhash64(float[] data) static longhash64(int[] data) static longhash64(int[][] data) static longhash64(int[] data, int length) static longhash64(long[] data) static longhash64(long[][] data) static longhash64(short[] data) static longhash64(CharSequence data) static longhash64(CharSequence[] data) static longhash64(CharSequence[]... data) static longhash64(CharSequence data, int start, int end) Hashes only a subsection of the given data, starting at start (inclusive) and ending before end (exclusive).static longhash64(Iterable<? extends CharSequence> data) static longstatic longstatic longhash64(List<? extends CharSequence> data) static longmum(long a, long b) Takes two arguments that are technically longs, and should be very different, and uses them to get a result that is technically a long and mixes the bits of the inputs.static longwow(long a, long b) A slower but higher-quality variant onmum(long, long)that can take two arbitrary longs (with any of their 64 bits containing relevant data) instead of mum's 32-bit sections of its inputs, and outputs a 64-bit result that can have any of its bits used.
-
Field Details
-
b0
public static final long b0Big constant 0.- See Also:
-
b1
public static final long b1Big constant 1.- See Also:
-
b2
public static final long b2Big constant 2.- See Also:
-
b3
public static final long b3Big constant 3.- See Also:
-
b4
public static final long b4Big constant 4.- See Also:
-
b5
public static final long b5Big constant 5.- See Also:
-
-
Constructor Details
-
Water
public Water()
-
-
Method Details
-
mum
public static long mum(long a, long b) Takes two arguments that are technically longs, and should be very different, and uses them to get a result that is technically a long and mixes the bits of the inputs. The arguments and result are only technically longs because their lower 32 bits matter much more than their upper 32, and giving just any long won't work.
This is very similar to wyhash's mum function, but doesn't use 128-bit math because it expects that its arguments are only relevant in their lower 32 bits (allowing their product to fit in 64 bits).- Parameters:
a- a long that should probably only hold an int's worth of datab- a long that should probably only hold an int's worth of data- Returns:
- a sort-of randomized output dependent on both inputs
-
wow
public static long wow(long a, long b) A slower but higher-quality variant onmum(long, long)that can take two arbitrary longs (with any of their 64 bits containing relevant data) instead of mum's 32-bit sections of its inputs, and outputs a 64-bit result that can have any of its bits used.
This was changed so it distributes bits from both inputs a little better on July 6, 2019.- Parameters:
a- any longb- any long- Returns:
- a sort-of randomized output dependent on both inputs
-
hash64
public static long hash64(boolean[] data) -
hash64
public static long hash64(byte[] data) -
hash64
public static long hash64(short[] data) -
hash64
public static long hash64(char[] data) -
hash64
-
hash64
public static long hash64(int[] data) -
hash64
public static long hash64(int[] data, int length) -
hash64
public static long hash64(long[] data) -
hash64
public static long hash64(float[] data) -
hash64
public static long hash64(double[] data) -
hash64
public static long hash64(char[] data, int start, int end) Hashes only a subsection of the given data, starting at start (inclusive) and ending before end (exclusive).- Parameters:
data- the char array to hashstart- the start of the section to hash (inclusive)end- the end of the section to hash (exclusive)- Returns:
- a 64-bit hash code for the requested section of data
-
hash64
Hashes only a subsection of the given data, starting at start (inclusive) and ending before end (exclusive).- Parameters:
data- the String or other CharSequence to hashstart- the start of the section to hash (inclusive)end- the end of the section to hash (exclusive)- Returns:
- a 64-bit hash code for the requested section of data
-
hash64
public static long hash64(char[][] data) -
hash64
public static long hash64(int[][] data) -
hash64
public static long hash64(long[][] data) -
hash64
-
hash64
-
hash64
-
hash64
-
hash64
-
hash64
-
hash
public static int hash(boolean[] data) -
hash
public static int hash(byte[] data) -
hash
public static int hash(short[] data) -
hash
public static int hash(char[] data) -
hash
-
hash
public static int hash(int[] data) -
hash
public static int hash(int[] data, int length) -
hash
public static int hash(long[] data) -
hash
public static int hash(float[] data) -
hash
public static int hash(double[] data) -
hash
public static int hash(char[] data, int start, int end) Hashes only a subsection of the given data, starting at start (inclusive) and ending before end (exclusive).- Parameters:
data- the char array to hashstart- the start of the section to hash (inclusive)end- the end of the section to hash (exclusive)- Returns:
- a 32-bit hash code for the requested section of data
-
hash
Hashes only a subsection of the given data, starting at start (inclusive) and ending before end (exclusive).- Parameters:
data- the String or other CharSequence to hashstart- the start of the section to hash (inclusive)end- the end of the section to hash (exclusive)- Returns:
- a 32-bit hash code for the requested section of data
-
hash
public static int hash(char[][] data) -
hash
public static int hash(int[][] data) -
hash
public static int hash(long[][] data) -
hash
-
hash
-
hash
-
hash
-
hash
-
hash
-