public class NaturalLanguageCipher
extends java.lang.Object
implements java.io.Serializable
FakeLanguageGen
, and can translate a source text to a similarly-punctuated, similarly-capitalized fake text;
it will try to use variants on the translation of the same root word when it encounters conjugations of that root
word or that root word with common English prefixes/suffixes. Performs basic stemming to separate a root word from
prefixed, suffixes, and conjugation changes, then uses a phonetic hash of each such separate section to determine the
RNG seed that FakeLanguageGen will use, so the translation is not random (similar-sounding root words with similar
length will tend to be similar in the results as well). Can cipher an English text and generate a text with
FakeLanguageGen, but also decipher such a generated text with a fully-complete, partially-complete, or
partially-incorrect vocabulary.
setCacheLevel()
, where it starts at 2 (writing to table and reverse), and can be lowered to 1 (writing to
table only) if you don't need reverse to decipher a language easily, or to 0 (writing to neither) if you expect that
memory will be at a premium and don't mind re-generating the same word each time it occurs in a source text. If
cacheLevel is 1 or less, then this will not check for overlap between previously-generated words (it won't have an
easy way to look up previously-generated ones) and so may be impossible to accurately decipher. As an example, one
test of level 1 generated "he" as the translation for both "a" and "at", so every time "a" had been ciphered and then
deciphered, the reproduced version said "at" instead. This won't happen by default, but the default instead relies on
words being entered as inputs to cipher() or lookup() in the same order. If words are entered in two different orders
to different runs of the program, they may have different generated results if cacheLevel is 2. One way to handle
this is to use cacheLevel 2 and cipher the whole game script, or just the unique words in it (maybe just a large word
list, such as 12dicts ), then serialize the NaturalLanguageCipher
for later usage.Modifier and Type | Field and Description |
---|---|
int |
cacheLevel
The degree of vocabulary to cache to speed up future searches at the expense of memory usage.
|
FakeLanguageGen |
language
The FakeLanguageGen this will use to construct words; normally one of the static fields in FakeLanguageGen, a
FakeLanguageGen produced by using the
FakeLanguageGen.mixAll(Object...) method of two or more of them, or
a random FakeLanguageGen produced by FakeLanguageGen.randomLanguage(long) . |
protected regexodus.Matcher |
markupMatcher |
java.util.HashMap<java.lang.String,java.lang.String> |
reverse
The mapping of lower-case word keys to lower-case word values, where keys are in the source language and values
are generated by language.
|
long |
shift |
java.util.HashMap<java.lang.String,java.lang.String> |
table
The mapping of lower-case word keys to lower-case word values, where keys are in the source language and values
are generated by language.
|
Constructor and Description |
---|
NaturalLanguageCipher()
Constructs a NaturalLanguageCipher that will generate simplified English-like text by default (this uses
FakeLanguageGen.SIMPLISH ). |
NaturalLanguageCipher(FakeLanguageGen language)
Constructs a NaturalLanguageCipher that will use the given style of language generator to produce its text.
|
NaturalLanguageCipher(FakeLanguageGen language,
long shift)
Constructs a NaturalLanguageCipher that will use the given style of language generator to produce its text, using
the specified
shift as a long to modify the generated words from the language's normal results. |
NaturalLanguageCipher(NaturalLanguageCipher other)
Copies another NaturalLanguageCipher and constructs this one with the information in the other.
|
Modifier and Type | Method and Description |
---|---|
java.lang.String |
cipher(java.lang.String text)
Given a String that should contain words in the source language, this translates each word to the fake language,
using existing translations if previous calls to cipher() or lookup() had translated that word.
|
java.lang.String |
cipherMarkup(java.lang.CharSequence text)
Given a String, StringBuilder, or other CharSequence that should contain words in the source language (almost
always English, since this only knows English prefixes and suffixes), this finds sections of the text that
start and end with
[?] and [?] , translates each word between those start/end markers to the fake
language, using existing translations if previous calls to cipher() or lookup() had translated that word, and
removes the [?] markup afterwards. |
java.lang.String |
decipher(java.lang.CharSequence text,
java.util.Map<java.lang.String,java.lang.String> vocabulary)
Deciphers words in an already-ciphered text with a given String-to-String Map for a vocabulary.
|
int |
getCacheLevel() |
NaturalLanguageCipher |
initialize(FakeLanguageGen language,
long shift)
Changes the language this can cipher, clearing its known translation (if any) and using the given FakeLanguageGen
and shift as if given to
NaturalLanguageCipher(FakeLanguageGen, long) . |
NaturalLanguageCipher |
learnTranslation(java.util.Map<java.lang.String,java.lang.String> vocabulary,
java.lang.String sourceWord)
Adds a translation pair to vocabulary so it can be used in decipher, giving a correct translation for sourceWord.
|
NaturalLanguageCipher |
learnTranslations(java.util.Map<java.lang.String,java.lang.String> vocabulary,
java.lang.Iterable<java.lang.String> sourceWords)
Adds translation pairs to vocabulary so it can be used in decipher, giving a correct translation for sourceWords.
|
NaturalLanguageCipher |
learnTranslations(java.util.Map<java.lang.String,java.lang.String> vocabulary,
java.lang.String... sourceWords)
Adds translation pairs to vocabulary so it can be used in decipher, giving a correct translation for sourceWords.
|
java.lang.String |
lookup(java.lang.String source)
Given a word in the source language (usually English), looks up an existing translation for that word, or if none
exists, generates a new word based on the phonetic hash of the source word, any of its stemming information such
as prefixes or suffixes, and this NaturalLanguageCipher's FakeLanguageGen.
|
NaturalLanguageCipher |
mismatchTranslation(java.util.Map<java.lang.String,java.lang.String> vocabulary,
java.lang.String correctWord,
java.lang.String mismatchWord)
Adds a translation pair to vocabulary so it can be used in decipher, giving a typically-incorrect translation for
correctWord where it provides mismatchWord instead when the ciphered version of correctWord appears.
|
static long |
phoneticHash64(char[] data,
int start,
int end)
Gets a phonetic hash of a section of
data between start inclusive and end exclusive; this
64-bit hash should be similar for similar words, instead of very different if they are different at all. |
void |
setCacheLevel(int cacheLevel) |
public FakeLanguageGen language
FakeLanguageGen.mixAll(Object...)
method of two or more of them, or
a random FakeLanguageGen produced by FakeLanguageGen.randomLanguage(long)
. Manually constructing
FakeLanguageGen objects isn't easy, and if you decide to do that it's recommended you look at SquidLib's source
to see how the existing calls to constructors work.public java.util.HashMap<java.lang.String,java.lang.String> table
public java.util.HashMap<java.lang.String,java.lang.String> reverse
public int cacheLevel
public long shift
protected regexodus.Matcher markupMatcher
public NaturalLanguageCipher()
FakeLanguageGen.SIMPLISH
).public NaturalLanguageCipher(FakeLanguageGen language)
language
- a FakeLanguageGen, typically one of the static constants in that class or a mix of them.public NaturalLanguageCipher(FakeLanguageGen language, long shift)
shift
as a long to modify the generated words from the language's normal results.language
- a FakeLanguageGen, typically one of the static constants in that class or a mix of them.shift
- any long; this will be used to alter the specific words generated unless it is 0public NaturalLanguageCipher(NaturalLanguageCipher other)
other
- a previously-constructed NaturalLanguageCipher.public NaturalLanguageCipher initialize(FakeLanguageGen language, long shift)
NaturalLanguageCipher(FakeLanguageGen, long)
.language
- the FakeLanguageGen to change toshift
- any long; this will be used to alter the specific words generated unless it is 0public static long phoneticHash64(char[] data, int start, int end)
data
between start
inclusive and end
exclusive; this
64-bit hash should be similar for similar words, instead of very different if they are different at all. The
algorithm is conceptually related to a locality-sensitive hash, and is inspired by
Eudex; like Eudex, the Hamming distance between the hashes of two
similar words should be low, even if the values are very different on a number line. The input to this must
contain lower-case ASCII letters, since that is all this knows how to read (characters not between 'a' and 'z'
are ignored). In NaturalLanguageCipher, the hashes this produces are given as seeds to an
intentionally-low-quality RandomnessSource that produces similar results for similar input states, which makes
it likely to generate output words that are similar to each other when the input words are similar to each other.data
- a char array that should contain letters from 'a' to 'z' this can hashstart
- the starting position in data to read, inclusiveend
- the end position in data to stop reading at, exclusivepublic java.lang.String lookup(java.lang.String source)
source
- a word in the source languagepublic java.lang.String cipher(java.lang.String text)
text
- a String that contains words in the source languagepublic java.lang.String decipher(java.lang.CharSequence text, java.util.Map<java.lang.String,java.lang.String> vocabulary)
learnTranslations(Map, String...)
method,
unless you want to add translations one word at a time (then use learnTranslation(Map, String)
) or you
want incorrect or biased translations added (then use mismatchTranslation(Map, String, String)
). You
don't need to use one of these methods if you just pass the whole of the reverse field as a vocabulary, which
will translate every word. If making your own vocabulary without the learn methods, the keys need to be
lower-case because while regex Patterns can be case-insensitive, the Maps used here are not.text
- a text in the fake language, as a CharSequence such as a String or StringBuildervocabulary
- a Map of Strings in the fake language to Strings in the source languagepublic NaturalLanguageCipher learnTranslation(java.util.Map<java.lang.String,java.lang.String> vocabulary, java.lang.String sourceWord)
vocabulary
- a Map of String keys to String values that will be modified in-placesourceWord
- a word in the source language, typically English; the meaning will be "learned" for decipherpublic NaturalLanguageCipher learnTranslations(java.util.Map<java.lang.String,java.lang.String> vocabulary, java.lang.String... sourceWords)
vocabulary
- a Map of String keys to String values that will be modified in-placesourceWords
- an array or vararg of words in the source language, typically English; their meanings will
be "learned" for decipherpublic NaturalLanguageCipher learnTranslations(java.util.Map<java.lang.String,java.lang.String> vocabulary, java.lang.Iterable<java.lang.String> sourceWords)
vocabulary
- a Map of String keys to String values that will be modified in-placesourceWords
- an Iterable of words in the source language, typically English; their meanings will be
"learned" for decipherpublic NaturalLanguageCipher mismatchTranslation(java.util.Map<java.lang.String,java.lang.String> vocabulary, java.lang.String correctWord, java.lang.String mismatchWord)
vocabulary
- a Map of String keys to String values that will be modified in-placecorrectWord
- a word in the source language, typically English; where the ciphered version of this
appears and the text is deciphered, mismatchWord will be used insteadmismatchWord
- a String that will be used for deciphering in place of the translation of correctWord.public int getCacheLevel()
public void setCacheLevel(int cacheLevel)
public java.lang.String cipherMarkup(java.lang.CharSequence text)
[?]
and [?]
, translates each word between those start/end markers to the fake
language, using existing translations if previous calls to cipher() or lookup() had translated that word, and
removes the [?]
markup afterwards. This is meant for cases where only some words should be translated,
such as (for example) translating "What the [?]heck?[?]" to "What the grug?" or something like it if the language
is FakeLanguageGen.GOBLIN
, or "What the xu'oz?" if the language is FakeLanguageGen.DEMONIC
.text
- a CharSequence, such as a String, that contains words in the source language and [?]
markupCopyright © Eben Howard 2012–2022. All rights reserved.