squidpony.NaturalLanguageCipher

All Implemented Interfaces:: Serializable

public class NaturalLanguageCipher
extends Object
implements Serializable

Class that builds up a dictionary of words in an English-language source text to words generated by a FakeLanguageGen, and can translate a source text to a similarly-punctuated, similarly-capitalized fake text; it will try to use variants on the translation of the same root word when it encounters conjugations of that root word or that root word with common English prefixes/suffixes. Performs basic stemming to separate a root word from prefixed, suffixes, and conjugation changes, then uses a phonetic hash of each such separate section to determine the RNG seed that FakeLanguageGen will use, so the translation is not random (similar-sounding root words with similar length will tend to be similar in the results as well). Can cipher an English text and generate a text with FakeLanguageGen, but also decipher such a generated text with a fully-complete, partially-complete, or partially-incorrect vocabulary.
This defaults to caching source-language words to their generated-language word translations in the field table, as well as the reverse translation in reverse. This can be changed to reduce memory usage for large vocabularies with setCacheLevel(), where it starts at 2 (writing to table and reverse), and can be lowered to 1 (writing to table only) if you don't need reverse to decipher a language easily, or to 0 (writing to neither) if you expect that memory will be at a premium and don't mind re-generating the same word each time it occurs in a source text. If cacheLevel is 1 or less, then this will not check for overlap between previously-generated words (it won't have an easy way to look up previously-generated ones) and so may be impossible to accurately decipher. As an example, one test of level 1 generated "he" as the translation for both "a" and "at", so every time "a" had been ciphered and then deciphered, the reproduced version said "at" instead. This won't happen by default, but the default instead relies on words being entered as inputs to cipher() or lookup() in the same order. If words are entered in two different orders to different runs of the program, they may have different generated results if cacheLevel is 2. One way to handle this is to use cacheLevel 2 and cipher the whole game script, or just the unique words in it (maybe just a large word list, such as 12dicts ), then serialize the NaturalLanguageCipher for later usage.

Author:: Tommy Ettinger Created by Tommy Ettinger on 5/1/2016.
See Also:: Serialized Form

Field Summary

Fields
Modifier and Type	Field	Description
`int`	`cacheLevel`	The degree of vocabulary to cache to speed up future searches at the expense of memory usage.
`FakeLanguageGen`	`language`	The FakeLanguageGen this will use to construct words; normally one of the static fields in FakeLanguageGen, a FakeLanguageGen produced by using the `FakeLanguageGen.mixAll(Object...)` method of two or more of them, or a random FakeLanguageGen produced by `FakeLanguageGen.randomLanguage(long)`.
`protected regexodus.Matcher`	`markupMatcher`
`HashMap<String,String>`	`reverse`	The mapping of lower-case word keys to lower-case word values, where keys are in the source language and values are generated by language.
`long`	`shift`
`HashMap<String,String>`	`table`	The mapping of lower-case word keys to lower-case word values, where keys are in the source language and values are generated by language.

Constructor Summary

Constructors
Constructor	Description
`NaturalLanguageCipher()`	Constructs a NaturalLanguageCipher that will generate simplified English-like text by default (this uses `FakeLanguageGen.SIMPLISH`).
`NaturalLanguageCipher(FakeLanguageGen language)`	Constructs a NaturalLanguageCipher that will use the given style of language generator to produce its text.
`NaturalLanguageCipher(FakeLanguageGen language, long shift)`	Constructs a NaturalLanguageCipher that will use the given style of language generator to produce its text, using the specified `shift` as a long to modify the generated words from the language's normal results.
`NaturalLanguageCipher(NaturalLanguageCipher other)`	Copies another NaturalLanguageCipher and constructs this one with the information in the other.

Method Summary

Modifier and Type	Method	Description
`String`	`cipher(String text)`	Given a String that should contain words in the source language, this translates each word to the fake language, using existing translations if previous calls to cipher() or lookup() had translated that word.
`String`	`cipherMarkup(CharSequence text)`	Given a String, StringBuilder, or other CharSequence that should contain words in the source language (almost always English, since this only knows English prefixes and suffixes), this finds sections of the text that start and end with `[?]` and `[?]`, translates each word between those start/end markers to the fake language, using existing translations if previous calls to cipher() or lookup() had translated that word, and removes the `[?]` markup afterwards.
`String`	`decipher(CharSequence text, Map<String,String> vocabulary)`	Deciphers words in an already-ciphered text with a given String-to-String Map for a vocabulary.
`int`	`getCacheLevel()`
`NaturalLanguageCipher`	`initialize(FakeLanguageGen language, long shift)`	Changes the language this can cipher, clearing its known translation (if any) and using the given FakeLanguageGen and shift as if given to `NaturalLanguageCipher(FakeLanguageGen, long)`.
`NaturalLanguageCipher`	`learnTranslation(Map<String,String> vocabulary, String sourceWord)`	Adds a translation pair to vocabulary so it can be used in decipher, giving a correct translation for sourceWord.
`NaturalLanguageCipher`	`learnTranslations(Map<String,String> vocabulary, Iterable<String> sourceWords)`	Adds translation pairs to vocabulary so it can be used in decipher, giving a correct translation for sourceWords.
`NaturalLanguageCipher`	`learnTranslations(Map<String,String> vocabulary, String... sourceWords)`	Adds translation pairs to vocabulary so it can be used in decipher, giving a correct translation for sourceWords.
`String`	`lookup(String source)`	Given a word in the source language (usually English), looks up an existing translation for that word, or if none exists, generates a new word based on the phonetic hash of the source word, any of its stemming information such as prefixes or suffixes, and this NaturalLanguageCipher's FakeLanguageGen.
`NaturalLanguageCipher`	`mismatchTranslation(Map<String,String> vocabulary, String correctWord, String mismatchWord)`	Adds a translation pair to vocabulary so it can be used in decipher, giving a typically-incorrect translation for correctWord where it provides mismatchWord instead when the ciphered version of correctWord appears.
`static long`	`phoneticHash64(char[] data, int start, int end)`	Gets a phonetic hash of a section of `data` between `start` inclusive and `end` exclusive; this 64-bit hash should be similar for similar words, instead of very different if they are different at all.
`void`	`setCacheLevel(int cacheLevel)`

Methods inherited from class java.lang.Object

clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- language
  
  public FakeLanguageGen language
  
  The FakeLanguageGen this will use to construct words; normally one of the static fields in FakeLanguageGen, a FakeLanguageGen produced by using the FakeLanguageGen.mixAll(Object...) method of two or more of them, or a random FakeLanguageGen produced by FakeLanguageGen.randomLanguage(long). Manually constructing FakeLanguageGen objects isn't easy, and if you decide to do that it's recommended you look at SquidLib's source to see how the existing calls to constructors work.
- table
  
  public HashMap<String,String> table
  
  The mapping of lower-case word keys to lower-case word values, where keys are in the source language and values are generated by language.
- reverse
  
  public HashMap<String,String> reverse
  
  The mapping of lower-case word keys to lower-case word values, where keys are in the source language and values are generated by language.
- cacheLevel
  
  public int cacheLevel
  The degree of vocabulary to cache to speed up future searches at the expense of memory usage.
  
  2 will cache source words to generated words in table, and generated to source in reverse.
  
  1 will cache source words to generated words in table, and won't write to reverse.
  
  0 won't write to table or reverse.
  
  Defaults to 2, writing to both table and reverse.
- shift
  
  public long shift
- markupMatcher
  
  protected regexodus.Matcher markupMatcher
Constructor Details
- NaturalLanguageCipher
  
  public NaturalLanguageCipher()
  
  Constructs a NaturalLanguageCipher that will generate simplified English-like text by default (this uses FakeLanguageGen.SIMPLISH).
- NaturalLanguageCipher
  
  public NaturalLanguageCipher(FakeLanguageGen language)
  
  Constructs a NaturalLanguageCipher that will use the given style of language generator to produce its text.
  
  Parameters:
  
  language - a FakeLanguageGen, typically one of the static constants in that class or a mix of them.
- NaturalLanguageCipher
  
  public NaturalLanguageCipher(FakeLanguageGen language, long shift)
  
  Constructs a NaturalLanguageCipher that will use the given style of language generator to produce its text, using the specified shift as a long to modify the generated words from the language's normal results.
  
  Parameters:
  
  language - a FakeLanguageGen, typically one of the static constants in that class or a mix of them.
  
  shift - any long; this will be used to alter the specific words generated unless it is 0
- NaturalLanguageCipher
  
  public NaturalLanguageCipher(NaturalLanguageCipher other)
  
  Copies another NaturalLanguageCipher and constructs this one with the information in the other. Copies the dictionary of known words/prefixes/suffixes/conjugations, as well as the FakeLanguageGen style and everything else.
  
  Parameters:
  
  other - a previously-constructed NaturalLanguageCipher.
Method Details
- initialize
  
  public NaturalLanguageCipher initialize(FakeLanguageGen language, long shift)
  
  Changes the language this can cipher, clearing its known translation (if any) and using the given FakeLanguageGen and shift as if given to NaturalLanguageCipher(FakeLanguageGen, long).
  
  Parameters:
  
  language - the FakeLanguageGen to change to
  
  shift - any long; this will be used to alter the specific words generated unless it is 0
  
  Returns:
  
  this for chaining
- phoneticHash64
  
  public static long phoneticHash64(char[] data, int start, int end)
  
  Gets a phonetic hash of a section of data between start inclusive and end exclusive; this 64-bit hash should be similar for similar words, instead of very different if they are different at all. The algorithm is conceptually related to a locality-sensitive hash, and is inspired by Eudex; like Eudex, the Hamming distance between the hashes of two similar words should be low, even if the values are very different on a number line. The input to this must contain lower-case ASCII letters, since that is all this knows how to read (characters not between 'a' and 'z' are ignored). In NaturalLanguageCipher, the hashes this produces are given as seeds to an intentionally-low-quality RandomnessSource that produces similar results for similar input states, which makes it likely to generate output words that are similar to each other when the input words are similar to each other.
  
  Parameters:
  
  data - a char array that should contain letters from 'a' to 'z' this can hash
  
  start - the starting position in data to read, inclusive
  
  end - the end position in data to stop reading at, exclusive
  
  Returns:
  
  a 64-bit long hash that should have a low Hamming distance to phonetic hashes of similar words
- lookup
  
  public String lookup(String source)
  
  Given a word in the source language (usually English), looks up an existing translation for that word, or if none exists, generates a new word based on the phonetic hash of the source word, any of its stemming information such as prefixes or suffixes, and this NaturalLanguageCipher's FakeLanguageGen.
  
  Parameters:
  
  source - a word in the source language
  
  Returns:
  
  a word in the fake language
- cipher
  
  public String cipher(String text)
  
  Given a String that should contain words in the source language, this translates each word to the fake language, using existing translations if previous calls to cipher() or lookup() had translated that word.
  
  Parameters:
  
  text - a String that contains words in the source language
  
  Returns:
  
  a String of the translated text.
- decipher
  
  public String decipher(CharSequence text, Map<String,String> vocabulary)
  
  Deciphers words in an already-ciphered text with a given String-to-String Map for a vocabulary. This Map could be the reverse field of this NaturalLanguageCipher, which would give a complete translation, or it could be a partially-complete or partially-correct vocabulary of words the player has learned. The vocabulary should typically have entries added using the quick and accurate learnTranslations(Map, String...) method, unless you want to add translations one word at a time (then use learnTranslation(Map, String)) or you want incorrect or biased translations added (then use mismatchTranslation(Map, String, String)). You don't need to use one of these methods if you just pass the whole of the reverse field as a vocabulary, which will translate every word. If making your own vocabulary without the learn methods, the keys need to be lower-case because while regex Patterns can be case-insensitive, the Maps used here are not.
  
  Parameters:
  
  text - a text in the fake language, as a CharSequence such as a String or StringBuilder
  
  vocabulary - a Map of Strings in the fake language to Strings in the source language
  
  Returns:
  
  a String of deciphered text that has any words as keys in vocabulary translated to the source language
- learnTranslation
  
  public NaturalLanguageCipher learnTranslation(Map<String,String> vocabulary, String sourceWord)
  
  Adds a translation pair to vocabulary so it can be used in decipher, giving a correct translation for sourceWord. Modifies vocabulary in-place and returns this NaturalLanguageCipher for chaining. Can be used to correct a mismatched translation added to vocabulary with mismatchTranslation().
  
  Parameters:
  
  vocabulary - a Map of String keys to String values that will be modified in-place
  
  sourceWord - a word in the source language, typically English; the meaning will be "learned" for decipher
  
  Returns:
  
  this, for chaining
- learnTranslations
  
  public NaturalLanguageCipher learnTranslations(Map<String,String> vocabulary, String... sourceWords)
  
  Adds translation pairs to vocabulary so it can be used in decipher, giving a correct translation for sourceWords. Modifies vocabulary in-place and returns this NaturalLanguageCipher for chaining. Can be used to correct mismatched translations added to vocabulary with mismatchTranslation().
  
  Parameters:
  
  vocabulary - a Map of String keys to String values that will be modified in-place
  
  sourceWords - an array or vararg of words in the source language, typically English; their meanings will be "learned" for decipher
  
  Returns:
  
  this, for chaining
- learnTranslations
  
  public NaturalLanguageCipher learnTranslations(Map<String,String> vocabulary, Iterable<String> sourceWords)
  
  Adds translation pairs to vocabulary so it can be used in decipher, giving a correct translation for sourceWords. Modifies vocabulary in-place and returns this NaturalLanguageCipher for chaining. Can be used to correct mismatched translations added to vocabulary with mismatchTranslation().
  
  Parameters:
  
  vocabulary - a Map of String keys to String values that will be modified in-place
  
  sourceWords - an Iterable of words in the source language, typically English; their meanings will be "learned" for decipher
  
  Returns:
  
  this, for chaining
- mismatchTranslation
  
  public NaturalLanguageCipher mismatchTranslation(Map<String,String> vocabulary, String correctWord, String mismatchWord)
  
  Adds a translation pair to vocabulary so it can be used in decipher, giving a typically-incorrect translation for correctWord where it provides mismatchWord instead when the ciphered version of correctWord appears. Modifies vocabulary in-place and returns this NaturalLanguageCipher for chaining. You can use learnTranslation() to correct a mismatched vocabulary word, or mismatchTranslation() again to change the mismatched word.
  
  Parameters:
  
  vocabulary - a Map of String keys to String values that will be modified in-place
  
  correctWord - a word in the source language, typically English; where the ciphered version of this appears and the text is deciphered, mismatchWord will be used instead
  
  mismatchWord - a String that will be used for deciphering in place of the translation of correctWord.
  
  Returns:
  
  this, for chaining
- getCacheLevel
  
  public int getCacheLevel()
- setCacheLevel
  
  public void setCacheLevel(int cacheLevel)
- cipherMarkup
  
  public String cipherMarkup(CharSequence text)
  
  Given a String, StringBuilder, or other CharSequence that should contain words in the source language (almost always English, since this only knows English prefixes and suffixes), this finds sections of the text that start and end with [?] and [?], translates each word between those start/end markers to the fake language, using existing translations if previous calls to cipher() or lookup() had translated that word, and removes the [?] markup afterwards. This is meant for cases where only some words should be translated, such as (for example) translating "What the [?]heck?[?]" to "What the grug?" or something like it if the language is FakeLanguageGen.GOBLIN, or "What the xu'oz?" if the language is FakeLanguageGen.DEMONIC.
  
  Parameters:
  
  text - a CharSequence, such as a String, that contains words in the source language and [?] markup
  
  Returns:
  
  a String of the translated text with markup-surrounded sections translated and markup removed

Class NaturalLanguageCipher

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Details

Constructor Details

Method Details