Package squidpony.squidmath
Class CrossHash.Lightning
java.lang.Object
squidpony.squidmath.CrossHash.Lightning
- Enclosing class:
- CrossHash
public static final class CrossHash.Lightning extends Object
A quick, simple hashing function that seems to have good results. Like LightRNG, it stores a state that
it updates independently of the output, and this starts at a large prime. At each step, it takes the
current item in the array being hashed, adds a large non-prime used in LightRNG's generation function
(it's 2 to the 64, times the golden ratio phi, and truncated to a signed long), multiplies by a prime
called the "state multiplier", adds the result to the state and stores it, multiplies the value of the
state by another prime called the "output multiplier", then XORs the current result with that value
before moving onto the next item in the array. A finalization step XORs the result with a complex value
made by adding the state (left over from the previous step) to what was the output multiplier, adding
the last known value for result to the phi-related constant from LightRNG, multiplying that pair, adding
the initial state (which turns out to be unusually good for this, despite no particularly special numeric
qualities other than being a probable prime) and then bitwise-rotating it left by a seemingly-random
number drawn from the highest 6 bits of the state.
This all can be done very quickly; a million hashes of a million different 16-element long arrays can be computed in under 18-20 ms (in the benchmark, some amount of that is overhead from generating a new array with LongPeriodRNG, since the benchmark uses that RNG's state for data, and the default Arrays.hashCode implementation is only somewhat faster at under 16 ms). After several tries and tweaks to the constants this uses, it also gets remarkably few hash collisions. On the same 0x100000, or 1048576, RNG states for data, Lightning gets 110 collisions, the JDK Arrays.hashCode method gets 129 collisions, Sip (implementing SipHash) gets 145 collisions, and CrossHash (using the FNV-1a algorithm) gets 135 collisions. Dispersion is not perfect, but at many bit sizes Lightning continues to have less collisions (it disperses better than the other hashes with several quantities of bits, at least on this test data). Lightning also does relatively well, though it isn't clearly ahead of the rest all the time, when hashing Strings, especially ones that use a larger set of letters, it seems (FakeLanguageGen was used to make test data, and languages that used more characters in their alphabet seemed to hash better with this than competing hashes for some reason).
There is certainly room for improvement with the specific numbers chosen; earlier versions used the state multiplier "Neely's number", which is a probable prime made by taking the radix-29 number "HARGNALLINSCLOPIOPEEPIO" (a reference to a very unusual TV show), truncating to 64 bits, and rotating right by 42 bits. This version uses "Neely's number" for an initial state and during finalization, and uses a different probable prime as the state multiplier, made with a similar process; it starts with the radix-36 number "EDSGERWDIJKSTRA", then does the same process but rotates right by 37 bits to obtain a different prime. This tweak seems to help with hash collisions. Extensive trial and error was used to find the current output multiplier, which has no real relationship to anything else but has exactly 32 of 64 bits set to 1, has 1 in the least and most significant bit indices (meaning it is negative and odd), and other than that seems to have better results on most inputs for mystifying reasons. Earlier versions applied a Gray code step to alter the output instead of a multiplier that heavily overflows to obfuscate state, but that had a predictable pattern for most of the inputs tried, which seemed less-than-ideal for a hash. Vitally, Lightning avoids predictable collisions that Arrays.hashCode has, like
The output multiplier is 0xC6BC279692B5CC83L, the state multiplier is 0xD0E89D2D311E289FL, the number added to the state (from LightRNG and code derived from FastUtil, but obtained from the golden ratio phi) is 0x9E3779B97F4A7C15L, and the starting state ("Neely's Number") is 0x632BE59BD9B4E019L.
To help find patterns in hash output in a visual way, you can hash an x,y point, take the bottom 24 bits, and use that as an RGB color for the pixel at that x,y point. On a 512x512 grid of points, the patterns in Arrays.hashCode and the default CrossHash algorithm (FNV-1a) are evident, and Sip (implementing SipHash) does approximately as well as Lightning, with no clear patterns visible (Sip has been removed from SquidLib because it needs a lot of code and is slower than Mist). The idea is from a technical report on visual uses for hashing, http://www.clockandflame.com/media/Goulburn06.pdf .
This all can be done very quickly; a million hashes of a million different 16-element long arrays can be computed in under 18-20 ms (in the benchmark, some amount of that is overhead from generating a new array with LongPeriodRNG, since the benchmark uses that RNG's state for data, and the default Arrays.hashCode implementation is only somewhat faster at under 16 ms). After several tries and tweaks to the constants this uses, it also gets remarkably few hash collisions. On the same 0x100000, or 1048576, RNG states for data, Lightning gets 110 collisions, the JDK Arrays.hashCode method gets 129 collisions, Sip (implementing SipHash) gets 145 collisions, and CrossHash (using the FNV-1a algorithm) gets 135 collisions. Dispersion is not perfect, but at many bit sizes Lightning continues to have less collisions (it disperses better than the other hashes with several quantities of bits, at least on this test data). Lightning also does relatively well, though it isn't clearly ahead of the rest all the time, when hashing Strings, especially ones that use a larger set of letters, it seems (FakeLanguageGen was used to make test data, and languages that used more characters in their alphabet seemed to hash better with this than competing hashes for some reason).
There is certainly room for improvement with the specific numbers chosen; earlier versions used the state multiplier "Neely's number", which is a probable prime made by taking the radix-29 number "HARGNALLINSCLOPIOPEEPIO" (a reference to a very unusual TV show), truncating to 64 bits, and rotating right by 42 bits. This version uses "Neely's number" for an initial state and during finalization, and uses a different probable prime as the state multiplier, made with a similar process; it starts with the radix-36 number "EDSGERWDIJKSTRA", then does the same process but rotates right by 37 bits to obtain a different prime. This tweak seems to help with hash collisions. Extensive trial and error was used to find the current output multiplier, which has no real relationship to anything else but has exactly 32 of 64 bits set to 1, has 1 in the least and most significant bit indices (meaning it is negative and odd), and other than that seems to have better results on most inputs for mystifying reasons. Earlier versions applied a Gray code step to alter the output instead of a multiplier that heavily overflows to obfuscate state, but that had a predictable pattern for most of the inputs tried, which seemed less-than-ideal for a hash. Vitally, Lightning avoids predictable collisions that Arrays.hashCode has, like
Arrays.hashCode(new long[]{0})==Arrays.hashCode(new long[]{-1})
.
The output multiplier is 0xC6BC279692B5CC83L, the state multiplier is 0xD0E89D2D311E289FL, the number added to the state (from LightRNG and code derived from FastUtil, but obtained from the golden ratio phi) is 0x9E3779B97F4A7C15L, and the starting state ("Neely's Number") is 0x632BE59BD9B4E019L.
To help find patterns in hash output in a visual way, you can hash an x,y point, take the bottom 24 bits, and use that as an RGB color for the pixel at that x,y point. On a 512x512 grid of points, the patterns in Arrays.hashCode and the default CrossHash algorithm (FNV-1a) are evident, and Sip (implementing SipHash) does approximately as well as Lightning, with no clear patterns visible (Sip has been removed from SquidLib because it needs a lot of code and is slower than Mist). The idea is from a technical report on visual uses for hashing, http://www.clockandflame.com/media/Goulburn06.pdf .
Arrays.hashCode(int[])
: http://i.imgur.com/S4Gh1sX.pngCrossHash.hash(int[])
: http://i.imgur.com/x8SDqvL.png- (Former) CrossHash.Sip.hash(int[]): http://i.imgur.com/keSpIwm.png
hash(int[])
: http://i.imgur.com/afGJ9cA.png
-
Constructor Summary
Constructors Constructor Description Lightning()
-
Method Summary
Modifier and Type Method Description static int
hash(boolean[] data)
static int
hash(byte[] data)
static int
hash(char[] data)
static int
hash(char[][] data)
static int
hash(char[] data, int start, int end)
Hashes only a subsection of the given data, starting at start (inclusive) and ending before end (exclusive).static int
hash(char[] data, int start, int end, int step)
Hashes only a subsection of the given data, starting at start (inclusive), ending before end (exclusive), and moving between chars in increments of step (which is always greater than 0).static int
hash(double[] data)
static int
hash(float[] data)
static int
hash(int[] data)
static int
hash(long[] data)
static int
hash(long[][] data)
static int
hash(short[] data)
static int
hash(CharSequence data)
static int
hash(CharSequence[] data)
static int
hash(CharSequence[]... data)
static int
hash(Iterable<? extends CharSequence> data)
static int
hash(Object[] data)
static long
hash64(boolean[] data)
static long
hash64(byte[] data)
static long
hash64(char[] data)
static long
hash64(char[][] data)
static long
hash64(char[] data, int start, int end)
Hashes only a subsection of the given data, starting at start (inclusive) and ending before end (exclusive).static long
hash64(char[] data, int start, int end, int step)
Hashes only a subsection of the given data, starting at start (inclusive), ending before end (exclusive), and moving between chars in increments of step (which is always greater than 0).static long
hash64(double[] data)
static long
hash64(float[] data)
static long
hash64(int[] data)
static long
hash64(long[] data)
static long
hash64(long[][] data)
static long
hash64(short[] data)
static long
hash64(CharSequence data)
static long
hash64(CharSequence[] data)
static long
hash64(CharSequence[]... data)
static long
hash64(Iterable<? extends CharSequence> data)
static long
hash64(Object[] data)
-
Constructor Details
-
Method Details
-
hash64
-
hash64
-
hash64
-
hash64
-
hash64
-
hash64
-
hash64
-
hash64
-
hash64
Hashes only a subsection of the given data, starting at start (inclusive) and ending before end (exclusive).- Parameters:
data
- the char array to hashstart
- the start of the section to hash (inclusive)end
- the end of the section to hash (exclusive)- Returns:
- a 64-bit hash code for the requested section of data
-
hash64
Hashes only a subsection of the given data, starting at start (inclusive), ending before end (exclusive), and moving between chars in increments of step (which is always greater than 0).- Parameters:
data
- the char array to hashstart
- the start of the section to hash (inclusive)end
- the end of the section to hash (exclusive)step
- how many elements to advance after using one element from data; must be greater than 0- Returns:
- a 64-bit hash code for the requested section of data
-
hash64
-
hash64
-
hash64
-
hash64
-
hash64
-
hash64
-
hash64
-
hash
-
hash
-
hash
-
hash
-
hash
-
hash
-
hash
-
hash
-
hash
Hashes only a subsection of the given data, starting at start (inclusive) and ending before end (exclusive).- Parameters:
data
- the char array to hashstart
- the start of the section to hash (inclusive)end
- the end of the section to hash (exclusive)- Returns:
- a 32-bit hash code for the requested section of data
-
hash
Hashes only a subsection of the given data, starting at start (inclusive), ending before end (exclusive), and moving between chars in increments of step (which is always greater than 0).- Parameters:
data
- the char array to hashstart
- the start of the section to hash (inclusive)end
- the end of the section to hash (exclusive)step
- how many elements to advance after using one element from data; must be greater than 0- Returns:
- a 32-bit hash code for the requested section of data
-
hash
-
hash
-
hash
-
hash
-
hash
-
hash
-
hash
-