Class CrossHash.Water

java.lang.Object
squidpony.squidmath.CrossHash.Water
Enclosing class:
CrossHash

public static final class CrossHash.Water
extends Object
A fairly fast hashing algorithm in general, Water performs especially well on large arrays, and passes SMHasher's newest and most stringent version of tests. The int-hashing hash(int[]) method is almost twice as fast as CrossHash.Hive.hash(int[]) and faster than Arrays.hashCode(int[]); on longer arrays CrossHash.Curlup.hash(int[]) is faster. Based on wyhash, specifically the waterhash variant. This version passes SMHasher for both the 32-bit output hash() methods and the 64-bit output hash64() methods (which use the slightly tweaked wheathash variant in the waterhash Git repo, or woothash for hashing long arrays). While an earlier version passed rurban/smhasher, it failed demerphq/smhasher (Yves' more stringent fork), so some minor tweaks allowed the latest code to pass Yves' test. Uses 64-bit math, so it won't be as fast on GWT. Currently, the methods that hash types other than int arrays aren't as fast as the int array hash, but they are usually faster than the former default Hive implementation, and unlike Hive, these pass SMHasher. If you want to have a seed the hash, so hashing the same data with a different seed produces different output, you can use CrossHash.Yolk or CrossHash.Curlup, preferring Curlup unless all of your data is in small arrays (under 20 length, give or take).
These hash functions are so fast because they operate in bulk on 4 items at a time, such as 4 ints (which is the optimal case), 4 bytes, or 4 longs (which uses a different algorithm). This bulk operation usually entails 3 multiplications and some other, cheaper operations per 4 items hashed. For long arrays, it requires many more multiplications, but modern CPUs can pipeline the operations on unrelated longs to run in parallel on one core. If any items are left over after the bulk segment, Water uses the least effort possible to hash the remaining 1, 2, or 3 items left. Most of these operations use the method mum(long, long), which helps take two inputs and multiply them once, getting a more-random result after another small step. The long array code uses wow(long, long) (similar to mum upside-down), which mixes up its arguments with each other before multplying. It finishes with either code similar to mum() for 32-bit output hash() methods, or a somewhat more rigorous method for 64-bit output hash64() methods (still similar to mum).
  • Field Summary

    Fields 
    Modifier and Type Field Description
    static long b0
    Big constant 0.
    static long b1
    Big constant 1.
    static long b2
    Big constant 2.
    static long b3
    Big constant 3.
    static long b4
    Big constant 4.
    static long b5
    Big constant 5.
  • Constructor Summary

    Constructors 
    Constructor Description
    Water()  
  • Method Summary

    Modifier and Type Method Description
    static int hash​(boolean[] data)  
    static int hash​(byte[] data)  
    static int hash​(char[] data)  
    static int hash​(char[][] data)  
    static int hash​(char[] data, int start, int end)
    Hashes only a subsection of the given data, starting at start (inclusive) and ending before end (exclusive).
    static int hash​(double[] data)  
    static int hash​(float[] data)  
    static int hash​(int[] data)  
    static int hash​(int[][] data)  
    static int hash​(int[] data, int length)  
    static int hash​(long[] data)  
    static int hash​(long[][] data)  
    static int hash​(short[] data)  
    static int hash​(CharSequence data)  
    static int hash​(CharSequence[] data)  
    static int hash​(CharSequence[]... data)  
    static int hash​(CharSequence data, int start, int end)
    Hashes only a subsection of the given data, starting at start (inclusive) and ending before end (exclusive).
    static int hash​(Iterable<? extends CharSequence> data)  
    static int hash​(Object data)  
    static int hash​(Object[] data)  
    static int hash​(List<? extends CharSequence> data)  
    static long hash64​(boolean[] data)  
    static long hash64​(byte[] data)  
    static long hash64​(char[] data)  
    static long hash64​(char[][] data)  
    static long hash64​(char[] data, int start, int end)
    Hashes only a subsection of the given data, starting at start (inclusive) and ending before end (exclusive).
    static long hash64​(double[] data)  
    static long hash64​(float[] data)  
    static long hash64​(int[] data)  
    static long hash64​(int[][] data)  
    static long hash64​(int[] data, int length)  
    static long hash64​(long[] data)  
    static long hash64​(long[][] data)  
    static long hash64​(short[] data)  
    static long hash64​(CharSequence data)  
    static long hash64​(CharSequence[] data)  
    static long hash64​(CharSequence[]... data)  
    static long hash64​(CharSequence data, int start, int end)
    Hashes only a subsection of the given data, starting at start (inclusive) and ending before end (exclusive).
    static long hash64​(Iterable<? extends CharSequence> data)  
    static long hash64​(Object data)  
    static long hash64​(Object[] data)  
    static long hash64​(List<? extends CharSequence> data)  
    static long mum​(long a, long b)
    Takes two arguments that are technically longs, and should be very different, and uses them to get a result that is technically a long and mixes the bits of the inputs.
    static long wow​(long a, long b)
    A slower but higher-quality variant on mum(long, long) that can take two arbitrary longs (with any of their 64 bits containing relevant data) instead of mum's 32-bit sections of its inputs, and outputs a 64-bit result that can have any of its bits used.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

  • Constructor Details

  • Method Details

    • mum

      public static long mum​(long a, long b)
      Takes two arguments that are technically longs, and should be very different, and uses them to get a result that is technically a long and mixes the bits of the inputs. The arguments and result are only technically longs because their lower 32 bits matter much more than their upper 32, and giving just any long won't work.
      This is very similar to wyhash's mum function, but doesn't use 128-bit math because it expects that its arguments are only relevant in their lower 32 bits (allowing their product to fit in 64 bits).
      Parameters:
      a - a long that should probably only hold an int's worth of data
      b - a long that should probably only hold an int's worth of data
      Returns:
      a sort-of randomized output dependent on both inputs
    • wow

      public static long wow​(long a, long b)
      A slower but higher-quality variant on mum(long, long) that can take two arbitrary longs (with any of their 64 bits containing relevant data) instead of mum's 32-bit sections of its inputs, and outputs a 64-bit result that can have any of its bits used.
      This was changed so it distributes bits from both inputs a little better on July 6, 2019.
      Parameters:
      a - any long
      b - any long
      Returns:
      a sort-of randomized output dependent on both inputs
    • hash64

      public static long hash64​(boolean[] data)
    • hash64

      public static long hash64​(byte[] data)
    • hash64

      public static long hash64​(short[] data)
    • hash64

      public static long hash64​(char[] data)
    • hash64

      public static long hash64​(CharSequence data)
    • hash64

      public static long hash64​(int[] data)
    • hash64

      public static long hash64​(int[] data, int length)
    • hash64

      public static long hash64​(long[] data)
    • hash64

      public static long hash64​(float[] data)
    • hash64

      public static long hash64​(double[] data)
    • hash64

      public static long hash64​(char[] data, int start, int end)
      Hashes only a subsection of the given data, starting at start (inclusive) and ending before end (exclusive).
      Parameters:
      data - the char array to hash
      start - the start of the section to hash (inclusive)
      end - the end of the section to hash (exclusive)
      Returns:
      a 64-bit hash code for the requested section of data
    • hash64

      public static long hash64​(CharSequence data, int start, int end)
      Hashes only a subsection of the given data, starting at start (inclusive) and ending before end (exclusive).
      Parameters:
      data - the String or other CharSequence to hash
      start - the start of the section to hash (inclusive)
      end - the end of the section to hash (exclusive)
      Returns:
      a 64-bit hash code for the requested section of data
    • hash64

      public static long hash64​(char[][] data)
    • hash64

      public static long hash64​(int[][] data)
    • hash64

      public static long hash64​(long[][] data)
    • hash64

      public static long hash64​(CharSequence[] data)
    • hash64

      public static long hash64​(CharSequence[]... data)
    • hash64

      public static long hash64​(Iterable<? extends CharSequence> data)
    • hash64

      public static long hash64​(List<? extends CharSequence> data)
    • hash64

      public static long hash64​(Object[] data)
    • hash64

      public static long hash64​(Object data)
    • hash

      public static int hash​(boolean[] data)
    • hash

      public static int hash​(byte[] data)
    • hash

      public static int hash​(short[] data)
    • hash

      public static int hash​(char[] data)
    • hash

      public static int hash​(CharSequence data)
    • hash

      public static int hash​(int[] data)
    • hash

      public static int hash​(int[] data, int length)
    • hash

      public static int hash​(long[] data)
    • hash

      public static int hash​(float[] data)
    • hash

      public static int hash​(double[] data)
    • hash

      public static int hash​(char[] data, int start, int end)
      Hashes only a subsection of the given data, starting at start (inclusive) and ending before end (exclusive).
      Parameters:
      data - the char array to hash
      start - the start of the section to hash (inclusive)
      end - the end of the section to hash (exclusive)
      Returns:
      a 32-bit hash code for the requested section of data
    • hash

      public static int hash​(CharSequence data, int start, int end)
      Hashes only a subsection of the given data, starting at start (inclusive) and ending before end (exclusive).
      Parameters:
      data - the String or other CharSequence to hash
      start - the start of the section to hash (inclusive)
      end - the end of the section to hash (exclusive)
      Returns:
      a 32-bit hash code for the requested section of data
    • hash

      public static int hash​(char[][] data)
    • hash

      public static int hash​(int[][] data)
    • hash

      public static int hash​(long[][] data)
    • hash

      public static int hash​(CharSequence[] data)
    • hash

      public static int hash​(CharSequence[]... data)
    • hash

      public static int hash​(Iterable<? extends CharSequence> data)
    • hash

      public static int hash​(List<? extends CharSequence> data)
    • hash

      public static int hash​(Object[] data)
    • hash

      public static int hash​(Object data)