Class CrossHash

java.lang.Object
com.github.yellowstonegames.old.v300.CrossHash

public class CrossHash extends Object
64-bit and 32-bit hashing functions that we can rely on staying the same cross-platform. Several algorithms are present here, each with some tradeoffs for performance, quality, and extra features. Each algorithm was designed for speed and general-purpose usability, but not cryptographic security. NOTE: most of the documentation here is probably very out-of-date! In particular, the SMHasher test battery has been updated and now finds several failures for Curlup (it already had found failures in Wisp; Hive and Mist weren't tested but can be reasonably assumed to have some failures). Water and Yolk are at least less-failing than the others; Yolk is used in digital's Hasher class, and it only needed small changes to pass SMHasher.
The hashes this returns are always 0 when given null to hash. Arrays with identical elements of identical types will hash identically. Arrays with identical numerical values but different types will sometimes hash differently. This class always provides 64-bit hashes via hash64() and 32-bit hashes via hash(), and Wisp provides a hash32() method that matches older behavior and uses only 32-bit math. The hash64() and hash() methods, except in Hive, use 64-bit math even when producing 32-bit hashes, for GWT reasons. GWT doesn't have the same behavior as desktop and Android applications when using ints because it treats doubles mostly like ints, sometimes, due to it using JavaScript. If we use mainly longs, though, GWT emulates the longs with a more complex technique behind-the-scenes, that behaves the same on the web as it does on desktop or on a phone. Since CrossHash is supposed to be stable cross-platform, this is the way we need to go, despite it being slightly slower.
The static methods in CrossHash, like hash64(int[]), delegate to the CrossHash.Water algorithm. This is a fairly fast and heavily-tested hash that developed from something like Wang Yi's wyhash algorithm, though only the constants and the general concept of a mum() function are shared with wyhash. There are several static inner classes in CrossHash CrossHash.Water (already mentioned), CrossHash.Yolk (which is very close to Water but allows a 64-bit salt or seed), CrossHash.Curlup (which is the fastest hash here for larger inputs, and also allows a 64-bit seed), CrossHash.Mist (which allows a 128-bit salt, but has mediocre quality), CrossHash.Hive (which is mostly here for compatibility, but has OK quality and good collision rates), and CrossHash.Wisp (which is fast for small inputs but has bad collision rates). There's also the inner IHasher interface, and the classes that implement it. Water, Yolk, and Curlup all pass the rigorous SMHasher test battery. The others don't pass it in full, or sometimes at all.
IHasher values are provided as static fields, and use Water to hash a specific type or fall back to Object.hashCode if given an object with the wrong type. IHasher values are optional parts of OrderedMap, OrderedSet, Arrangement, and the various classes that use Arrangement like K2 and K2V1, and allow arrays to be used as keys in those collections while keeping hashing by value instead of the normal hashing by reference for arrays. You probably won't ever need to make a class that implements IHasher yourself; for some cases you may want to look at the Hashers class for additional functions.
Note: This class was formerly called StableHash, but since that refers to a specific category of hashing algorithm that this is not, and since the goal is to be cross- platform, the name was changed to CrossHash. Note 2: FNV-1a was removed from SquidLib on July 25, 2017, and replaced as default with Wisp; Wisp was later replaced as default by Hive, and in June 2019 Hive was replaced by Water. Wisp was used because at the time SquidLib preferred 64-bit math when math needed to be the same across platforms; math on longs behaves the same on GWT as on desktop, despite being slower. Hive passed an older version of SMHasher, a testing suite for hashes, where Wisp does not (it fails just like Arrays.hashCode() does). Hive uses a cross-platform subset of the possible 32-bit math operations when producing 32-bit hashes of data that doesn't involve longs or doubles, and this should speed up the CrossHash.Hive.hash() methods a lot on GWT, but it slows down 32-bit output on desktop-class JVMs. Water became the default when newer versions of SMHasher showed that Hive wasn't as high-quality as it had appeared, and the recently-debuted wyhash by Wang Yi, a variation on a hash called MUM, opened some possibilities for structures that are simple but also very fast. Water is modeled after wyhash and uses the same constants in its hash64() methods, but avoids the 128-bit multiplication that wyhash uses. Because both wyhash and Water operate on 4 items at a time, they tend to be very fast on desktop platforms, but Water probably won't be amazing at GWT performance. Similarly, the recently-added Curlup performs very well due to SIMD optimizations that HotSpot performs, and probably won't do as well on GWT or Android.
Created by Tommy Ettinger on 1/16/2016.
  • Field Details

  • Constructor Details

    • CrossHash

      public CrossHash()
  • Method Details

    • hash64

      public static long hash64(CharSequence data)
    • hash64

      public static long hash64(boolean[] data)
    • hash64

      public static long hash64(byte[] data)
    • hash64

      public static long hash64(short[] data)
    • hash64

      public static long hash64(int[] data)
    • hash64

      public static long hash64(long[] data)
    • hash64

      public static long hash64(char[] data)
    • hash64

      public static long hash64(float[] data)
    • hash64

      public static long hash64(double[] data)
    • hash64

      public static long hash64(char[] data, int start, int end)
      Hashes only a subsection of the given data, starting at start (inclusive) and ending before end (exclusive).
      Parameters:
      data - the char array to hash
      start - the start of the section to hash (inclusive)
      end - the end of the section to hash (exclusive)
      Returns:
      a 64-bit hash code for the requested section of data
    • hash64

      public static long hash64(CharSequence data, int start, int end)
      Hashes only a subsection of the given data, starting at start (inclusive) and ending before end (exclusive).
      Parameters:
      data - the String or other CharSequence to hash
      start - the start of the section to hash (inclusive)
      end - the end of the section to hash (exclusive)
      Returns:
      a 64-bit hash code for the requested section of data
    • hash64

      public static long hash64(char[][] data)
    • hash64

      public static long hash64(int[][] data)
    • hash64

      public static long hash64(long[][] data)
    • hash64

      public static long hash64(CharSequence[] data)
    • hash64

      public static long hash64(CharSequence[]... data)
    • hash64

      public static long hash64(Iterable<? extends CharSequence> data)
    • hash64

      public static long hash64(List<? extends CharSequence> data)
    • hash64

      public static long hash64(Object[] data)
    • hash64

      public static long hash64(Object data)
    • hash

      public static int hash(CharSequence data)
    • hash

      public static int hash(boolean[] data)
    • hash

      public static int hash(byte[] data)
    • hash

      public static int hash(short[] data)
    • hash

      public static int hash(int[] data)
    • hash

      public static int hash(long[] data)
    • hash

      public static int hash(char[] data)
    • hash

      public static int hash(float[] data)
    • hash

      public static int hash(double[] data)
    • hash

      public static int hash(char[] data, int start, int end)
      Hashes only a subsection of the given data, starting at start (inclusive) and ending before end (exclusive).
      Parameters:
      data - the char array to hash
      start - the start of the section to hash (inclusive)
      end - the end of the section to hash (exclusive)
      Returns:
      a 32-bit hash code for the requested section of data
    • hash

      public static int hash(CharSequence data, int start, int end)
      Hashes only a subsection of the given data, starting at start (inclusive) and ending before end (exclusive).
      Parameters:
      data - the String or other CharSequence to hash
      start - the start of the section to hash (inclusive)
      end - the end of the section to hash (exclusive)
      Returns:
      a 32-bit hash code for the requested section of data
    • hash

      public static int hash(char[][] data)
    • hash

      public static int hash(int[][] data)
    • hash

      public static int hash(long[][] data)
    • hash

      public static int hash(CharSequence[] data)
    • hash

      public static int hash(CharSequence[]... data)
    • hash

      public static int hash(Iterable<? extends CharSequence> data)
    • hash

      public static int hash(List<? extends CharSequence> data)
    • hash

      public static int hash(Object[] data)
    • hash

      public static int hash(Object data)
    • equalityHelper

      public static boolean equalityHelper(Object[] left, Object[] right, CrossHash.IHasher inner)
      Not a general-purpose method; meant to ease implementation of CrossHash.IHasher.areEqual(Object, Object) methods when the type being compared is a multi-dimensional array (which normally requires the heavyweight method Arrays.deepEquals(Object[], Object[]) or doing more work yourself; this reduces the work needed to implement fixed-depth equality). As mentioned in the docs for CrossHash.IHasher.areEqual(Object, Object), example code that hashes 2D char arrays can be done using an IHasher for 1D char arrays called charHasher: return left == right || ((left instanceof char[][] && right instanceof char[][]) ? equalityHelper((char[][]) left, (char[][]) right, charHasher) : Objects.equals(left, right));
      Parameters:
      left - an array of some kind of Object, usually an array, that the given IHasher can compare
      right - an array of some kind of Object, usually an array, that the given IHasher can compare
      inner - an IHasher to compare items in left with items in right
      Returns:
      true if the contents of left and right are equal by the given IHasher, otherwise false