Class CrossHash.Lightning

java.lang.Object
squidpony.squidmath.CrossHash.Lightning
Enclosing class:
CrossHash

public static final class CrossHash.Lightning
extends Object
A quick, simple hashing function that seems to have good results. Like LightRNG, it stores a state that it updates independently of the output, and this starts at a large prime. At each step, it takes the current item in the array being hashed, adds a large non-prime used in LightRNG's generation function (it's 2 to the 64, times the golden ratio phi, and truncated to a signed long), multiplies by a prime called the "state multiplier", adds the result to the state and stores it, multiplies the value of the state by another prime called the "output multiplier", then XORs the current result with that value before moving onto the next item in the array. A finalization step XORs the result with a complex value made by adding the state (left over from the previous step) to what was the output multiplier, adding the last known value for result to the phi-related constant from LightRNG, multiplying that pair, adding the initial state (which turns out to be unusually good for this, despite no particularly special numeric qualities other than being a probable prime) and then bitwise-rotating it left by a seemingly-random number drawn from the highest 6 bits of the state.
This all can be done very quickly; a million hashes of a million different 16-element long arrays can be computed in under 18-20 ms (in the benchmark, some amount of that is overhead from generating a new array with LongPeriodRNG, since the benchmark uses that RNG's state for data, and the default Arrays.hashCode implementation is only somewhat faster at under 16 ms). After several tries and tweaks to the constants this uses, it also gets remarkably few hash collisions. On the same 0x100000, or 1048576, RNG states for data, Lightning gets 110 collisions, the JDK Arrays.hashCode method gets 129 collisions, Sip (implementing SipHash) gets 145 collisions, and CrossHash (using the FNV-1a algorithm) gets 135 collisions. Dispersion is not perfect, but at many bit sizes Lightning continues to have less collisions (it disperses better than the other hashes with several quantities of bits, at least on this test data). Lightning also does relatively well, though it isn't clearly ahead of the rest all the time, when hashing Strings, especially ones that use a larger set of letters, it seems (FakeLanguageGen was used to make test data, and languages that used more characters in their alphabet seemed to hash better with this than competing hashes for some reason).
There is certainly room for improvement with the specific numbers chosen; earlier versions used the state multiplier "Neely's number", which is a probable prime made by taking the radix-29 number "HARGNALLINSCLOPIOPEEPIO" (a reference to a very unusual TV show), truncating to 64 bits, and rotating right by 42 bits. This version uses "Neely's number" for an initial state and during finalization, and uses a different probable prime as the state multiplier, made with a similar process; it starts with the radix-36 number "EDSGERWDIJKSTRA", then does the same process but rotates right by 37 bits to obtain a different prime. This tweak seems to help with hash collisions. Extensive trial and error was used to find the current output multiplier, which has no real relationship to anything else but has exactly 32 of 64 bits set to 1, has 1 in the least and most significant bit indices (meaning it is negative and odd), and other than that seems to have better results on most inputs for mystifying reasons. Earlier versions applied a Gray code step to alter the output instead of a multiplier that heavily overflows to obfuscate state, but that had a predictable pattern for most of the inputs tried, which seemed less-than-ideal for a hash. Vitally, Lightning avoids predictable collisions that Arrays.hashCode has, like Arrays.hashCode(new long[]{0})==Arrays.hashCode(new long[]{-1}).
The output multiplier is 0xC6BC279692B5CC83L, the state multiplier is 0xD0E89D2D311E289FL, the number added to the state (from LightRNG and code derived from FastUtil, but obtained from the golden ratio phi) is 0x9E3779B97F4A7C15L, and the starting state ("Neely's Number") is 0x632BE59BD9B4E019L.
To help find patterns in hash output in a visual way, you can hash an x,y point, take the bottom 24 bits, and use that as an RGB color for the pixel at that x,y point. On a 512x512 grid of points, the patterns in Arrays.hashCode and the default CrossHash algorithm (FNV-1a) are evident, and Sip (implementing SipHash) does approximately as well as Lightning, with no clear patterns visible (Sip has been removed from SquidLib because it needs a lot of code and is slower than Mist). The idea is from a technical report on visual uses for hashing, http://www.clockandflame.com/media/Goulburn06.pdf .
  • Constructor Summary

    Constructors 
    Constructor Description
    Lightning()  
  • Method Summary

    Modifier and Type Method Description
    static int hash​(boolean[] data)  
    static int hash​(byte[] data)  
    static int hash​(char[] data)  
    static int hash​(char[][] data)  
    static int hash​(char[] data, int start, int end)
    Hashes only a subsection of the given data, starting at start (inclusive) and ending before end (exclusive).
    static int hash​(char[] data, int start, int end, int step)
    Hashes only a subsection of the given data, starting at start (inclusive), ending before end (exclusive), and moving between chars in increments of step (which is always greater than 0).
    static int hash​(double[] data)  
    static int hash​(float[] data)  
    static int hash​(int[] data)  
    static int hash​(long[] data)  
    static int hash​(long[][] data)  
    static int hash​(short[] data)  
    static int hash​(CharSequence data)  
    static int hash​(CharSequence[] data)  
    static int hash​(CharSequence[]... data)  
    static int hash​(Iterable<? extends CharSequence> data)  
    static int hash​(Object[] data)  
    static long hash64​(boolean[] data)  
    static long hash64​(byte[] data)  
    static long hash64​(char[] data)  
    static long hash64​(char[][] data)  
    static long hash64​(char[] data, int start, int end)
    Hashes only a subsection of the given data, starting at start (inclusive) and ending before end (exclusive).
    static long hash64​(char[] data, int start, int end, int step)
    Hashes only a subsection of the given data, starting at start (inclusive), ending before end (exclusive), and moving between chars in increments of step (which is always greater than 0).
    static long hash64​(double[] data)  
    static long hash64​(float[] data)  
    static long hash64​(int[] data)  
    static long hash64​(long[] data)  
    static long hash64​(long[][] data)  
    static long hash64​(short[] data)  
    static long hash64​(CharSequence data)  
    static long hash64​(CharSequence[] data)  
    static long hash64​(CharSequence[]... data)  
    static long hash64​(Iterable<? extends CharSequence> data)  
    static long hash64​(Object[] data)  

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Constructor Details

  • Method Details

    • hash64

      public static long hash64​(boolean[] data)
    • hash64

      public static long hash64​(byte[] data)
    • hash64

      public static long hash64​(short[] data)
    • hash64

      public static long hash64​(char[] data)
    • hash64

      public static long hash64​(int[] data)
    • hash64

      public static long hash64​(long[] data)
    • hash64

      public static long hash64​(float[] data)
    • hash64

      public static long hash64​(double[] data)
    • hash64

      public static long hash64​(char[] data, int start, int end)
      Hashes only a subsection of the given data, starting at start (inclusive) and ending before end (exclusive).
      Parameters:
      data - the char array to hash
      start - the start of the section to hash (inclusive)
      end - the end of the section to hash (exclusive)
      Returns:
      a 64-bit hash code for the requested section of data
    • hash64

      public static long hash64​(char[] data, int start, int end, int step)
      Hashes only a subsection of the given data, starting at start (inclusive), ending before end (exclusive), and moving between chars in increments of step (which is always greater than 0).
      Parameters:
      data - the char array to hash
      start - the start of the section to hash (inclusive)
      end - the end of the section to hash (exclusive)
      step - how many elements to advance after using one element from data; must be greater than 0
      Returns:
      a 64-bit hash code for the requested section of data
    • hash64

      public static long hash64​(CharSequence data)
    • hash64

      public static long hash64​(char[][] data)
    • hash64

      public static long hash64​(long[][] data)
    • hash64

      public static long hash64​(CharSequence[] data)
    • hash64

      public static long hash64​(Iterable<? extends CharSequence> data)
    • hash64

      public static long hash64​(CharSequence[]... data)
    • hash64

      public static long hash64​(Object[] data)
    • hash

      public static int hash​(boolean[] data)
    • hash

      public static int hash​(byte[] data)
    • hash

      public static int hash​(short[] data)
    • hash

      public static int hash​(char[] data)
    • hash

      public static int hash​(int[] data)
    • hash

      public static int hash​(long[] data)
    • hash

      public static int hash​(float[] data)
    • hash

      public static int hash​(double[] data)
    • hash

      public static int hash​(char[] data, int start, int end)
      Hashes only a subsection of the given data, starting at start (inclusive) and ending before end (exclusive).
      Parameters:
      data - the char array to hash
      start - the start of the section to hash (inclusive)
      end - the end of the section to hash (exclusive)
      Returns:
      a 32-bit hash code for the requested section of data
    • hash

      public static int hash​(char[] data, int start, int end, int step)
      Hashes only a subsection of the given data, starting at start (inclusive), ending before end (exclusive), and moving between chars in increments of step (which is always greater than 0).
      Parameters:
      data - the char array to hash
      start - the start of the section to hash (inclusive)
      end - the end of the section to hash (exclusive)
      step - how many elements to advance after using one element from data; must be greater than 0
      Returns:
      a 32-bit hash code for the requested section of data
    • hash

      public static int hash​(CharSequence data)
    • hash

      public static int hash​(char[][] data)
    • hash

      public static int hash​(long[][] data)
    • hash

      public static int hash​(CharSequence[] data)
    • hash

      public static int hash​(Iterable<? extends CharSequence> data)
    • hash

      public static int hash​(CharSequence[]... data)
    • hash

      public static int hash​(Object[] data)