Package squidpony
Class FakeLanguageGen
java.lang.Object
squidpony.FakeLanguageGen
- All Implemented Interfaces:
Serializable
public class FakeLanguageGen extends Object implements Serializable
A text generator for producing sentences and/or words in nonsense languages that fit a theme. This does not use an
existing word list as a basis for its output, so it may or may not produce existing words occasionally, but you can
safely assume it won't generate a meaningful sentence except in the absolute unlikeliest of cases.
This supports a lot of language styles in predefined constants. There's a registry of these constants in
Created by Tommy Ettinger on 11/29/2015.
This supports a lot of language styles in predefined constants. There's a registry of these constants in
registered
and their names in registeredNames
, plus the languages that would make sense for
real-world cultures to use (and all use the Latin alphabet, so they can be swapped around) are in
romanizedHumanLanguages
. You can make a new language with a constructor, but it's pretty time-consuming; the
recommended ways are generating a random language with randomLanguage(long)
(when you don't care too much
about exactly how it should sound), or blending two or more languages with mixAll(Object...)
or
mix(double, FakeLanguageGen, double, Object...)
(when you have a sound in mind that isn't quite met by an
existing language).
Created by Tommy Ettinger on 11/29/2015.
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
FakeLanguageGen.Alteration
static class
FakeLanguageGen.Modifier
static class
FakeLanguageGen.SentenceForm
A simple way to bundle a FakeLanguageGen with the arguments that would be passed to it when callingsentence(IRNG, int, int, String[], String[], double, int)
or one of its overloads. -
Field Summary
Fields Modifier and Type Field Description static FakeLanguageGen
ALIEN_A
Fantasy/sci-fi language that could be spoken by some very-non-human culture that would typically be fitting for an alien species.static FakeLanguageGen
ALIEN_E
Fantasy/sci-fi language that could be spoken by some very-non-human culture that would typically be fitting for an alien species.static FakeLanguageGen
ALIEN_I
Fantasy/sci-fi language that could be spoken by some very-non-human culture that would typically be fitting for an alien species.static FakeLanguageGen
ALIEN_O
Fantasy/sci-fi language that could be spoken by some very-non-human culture that would typically be fitting for an alien species.static FakeLanguageGen
ALIEN_U
Fantasy/sci-fi language that could be spoken by some very-non-human culture that would typically be fitting for an alien species.static FakeLanguageGen
ANCIENT_EGYPTIAN
A (necessarily) very rough anglicization of Old Egyptian, a language that has no precisely known pronunciation rules and was written with hieroglyphics.static String
anyConsonant
A pattern String that will match any vowel FakeLanguageGen can produce out-of-the-box, including Latin, Greek, and Cyrillic; for use when a String will be interpreted as a regex (as inFakeLanguageGen.Alteration
).static String
anyConsonantCluster
A pattern String that will match any vowel FakeLanguageGen can produce out-of-the-box, including Latin, Greek, and Cyrillic; for use when a String will be interpreted as a regex (as inFakeLanguageGen.Alteration
).static String
anyVowel
A pattern String that will match any vowel FakeLanguageGen can produce out-of-the-box, including Latin, Greek, and Cyrillic; for use when a String will be interpreted as a regex (as inFakeLanguageGen.Alteration
).static String
anyVowelCluster
A pattern String that will match any vowel FakeLanguageGen can produce out-of-the-box, including Latin, Greek, and Cyrillic; for use when a String will be interpreted as a regex (as inFakeLanguageGen.Alteration
).static FakeLanguageGen
ARABIC_ROMANIZED
Imitation Arabic, using mostly the Latin alphabet but with some Greek letters for tough transliteration topics.static FakeLanguageGen
CELESTIAL
Fantasy language that is meant to sound like it could be spoken by divine or (magical) otherworldly beings.static FakeLanguageGen
CHEROKEE_ROMANIZED
A rough imitation of the Cherokee language, using an attempt at romanizing the syllabary the language is often written with, using only the parts of the language that are usually written down.static FakeLanguageGen
CHINESE_ROMANIZED
An approximation of Hanyu Pinyin, a Romanization technique used for Mandarin Chinese that has been in common use since the 1980s.boolean
clean
String[]
closingConsonants
String[]
closingSyllables
protected static regexodus.Pattern
consonantClusters
static FakeLanguageGen
CROW
A rough imitation of the Crow language of the American Midwest, using some tone marks.static FakeLanguageGen
DEEP_SPEECH
Fantasy/sci-fi language that would potentially be fitting for a trade language spoken by various very-different groups, such as creatures with tentacled faces who need to communicate with spider-elves and living crystals.static FakeLanguageGen
DEMONIC
Fantasy language that might be suitable for a language spoken by demons, aggressive warriors, or people who seek to emulate or worship similar groups.static FakeLanguageGen
DRAGON
Fantasy language that tries to sound like the speech of a powerful and pompous dragon, using long, complex words and a mix of hard consonants like "t" and "k", "liquid" consonants like "l" and "r", and sometimes vowel groups like "ie" and "aa".static FakeLanguageGen
ELF
Fantasy language that tries to imitate the various languages spoken by elves in J.R.R.static FakeLanguageGen
ENGLISH
Imitation English; may seem closer to Dutch in some generated text, and is not exactly the best imitation.static FakeLanguageGen
FANCY_FANTASY_NAME
A mix of four different languages with some accented characters added onto an ASCII base, that can be good for generating single words for creature or place names in fantasy settings that should have a "fancy" feeling from having unnecessary accents added primarily for visual reasons.static FakeLanguageGen
FANTASY_NAME
A mix of four different languages, using only ASCII characters, that is meant for generating single words for creature or place names in fantasy settings.static FakeLanguageGen
FRENCH
Imitation modern French, using the (many) accented vowels that are present in the language.static FakeLanguageGen
GOBLIN
Fantasy language that might be suitable for stealthy humanoids, such as goblins, or as a secret language used by humans who want to avoid notice.static FakeLanguageGen
GREEK_AUTHENTIC
Imitation ancient Greek, using the original Greek alphabet.static FakeLanguageGen
GREEK_ROMANIZED
Imitation ancient Greek, romanized to use the Latin alphabet.static FakeLanguageGen
HINDI_ROMANIZED
Imitation Hindi, romanized to use the Latin alphabet using accented glyphs similar to the IAST standard.static FakeLanguageGen
HLETKIP
A fictional language that could ostensibly be spoken by some group of humans, but that isn't closely based on any one real-world language.static FakeLanguageGen
IMP
A fantasy language meant for obnoxious screeching annoying enemies more-so than for intelligent friends or foes.static FakeLanguageGen
INFERNAL
Fantasy language that might be suitable for a language spoken by fiends, users of witchcraft, or people who seek to emulate or worship similar groups.static FakeLanguageGen
INSECT
Fantasy/sci-fi language that would typically be fitting for an insect-like species without a close equivalent to human lips.static FakeLanguageGen
INUKTITUT
Imitation text from an approximation of one of the Inuktitut languages spoken by various people of the Arctic and nearby areas.static FakeLanguageGen
JAPANESE_ROMANIZED
Imitation Japanese, romanized to use the Latin alphabet.static FakeLanguageGen
KOBOLD
Fantasy language based closely onDRAGON
, but with much shorter words normally and closing syllables that may sound "rushed" or "crude", though it has the same general frequency of most consonants and vowels.static FakeLanguageGen
KOREAN_ROMANIZED
Imitation text from an approximation of Korean, using the Revised Romanization method that is official in South Korea today and is easier to type.static FakeLanguageGen
LOVECRAFT
Ia! Ia! Cthulhu Rl'yeh ftaghn! Useful for generating cultist ramblings or unreadable occult texts.static FakeLanguageGen
MALAY
An approximation of the Malay language or any of its close relatives, such as Indonesian.static FakeLanguageGen
MAORI
Imitation text from an approximation of the Maori language, spoken in New Zealand both today and historically, and closely related to some other Polynesian languages.String[]
midConsonants
String[]
midVowels
ArrayList<FakeLanguageGen.Modifier>
modifiers
static FakeLanguageGen
MONGOLIAN
Imitation text from an approximation of one of the languages spoken in the 13th-century Mongol Empire.static FakeLanguageGen
NAHUATL
Imitation text from an approximation of the language spoken by the Aztec people and also over a million contemporary people in parts of Mexico.protected String
name
static FakeLanguageGen
NORSE
Somewhat close to Old Norse, which is itself very close to Icelandic, so this uses Icelandic spelling rules.static FakeLanguageGen
NORSE_SIMPLIFIED
Somewhat close to Old Norse, which is itself very close to Icelandic, but changed to avoid letters not on a US-ASCII keyboard.String[]
openingConsonants
String[]
openingVowels
static FakeLanguageGen[]
registered
An array that stores all the hand-made FakeLanguageGen constants; it does not store randomly-generated languages nor does it store modifications or mixes of languages.static String[]
registeredNames
protected static regexodus.Pattern
repeats
static FakeLanguageGen[]
romanizedHumanLanguages
FakeLanguageGen constants that are meant to sound like specific real-world languages, and that all use the Latin script (like English) with maybe some accents.static FakeLanguageGen
RUSSIAN_AUTHENTIC
Imitation modern Russian, using the authentic Cyrillic alphabet used in Russia and other countries.static FakeLanguageGen
RUSSIAN_ROMANIZED
Imitation modern Russian, romanized to use the Latin alphabet.regexodus.Pattern[]
sanityChecks
static FakeLanguageGen
SIMPLISH
English-like language that omits complex spelling and doesn't include any of the uncommon word endings of English like "ought" or "ation." A good choice when you want something that doesn't use any non-US-keyboard letters, looks somewhat similar to English, and tries to be pronounceable without too much effort.static FakeLanguageGen
SOMALI
Imitation Somali, using the Latin alphabet.static FakeLanguageGen
SPANISH
Imitation text from an approximation of Spanish (not using the variations spoken in Spain, but closer to Latin American forms of Spanish).static GWTRNG
srng
protected String
summary
static FakeLanguageGen
SWAHILI
Swahili is one of the more commonly-spoken languages in sub-Saharan Africa, and serves mainly as a shared language that is often learned after becoming fluent in one of many other (vaguely-similar) languages of the area.double
syllableEndFrequency
double[]
syllableFrequencies
protected double
totalSyllableFrequency
static FakeLanguageGen
VIETNAMESE
A very rough imitation of the Vietnamese language, without using the accurate characters Vietnamese really uses but that are rare in fonts.protected static regexodus.Pattern
vowelClusters
double
vowelEndFrequency
double
vowelSplitFrequency
String[]
vowelSplitters
double
vowelStartFrequency
-
Constructor Summary
Constructors Constructor Description FakeLanguageGen()
Zero-arg constructor for a FakeLanguageGen; produces a FakeLanguageGen equivalent to FakeLanguageGen.ENGLISH .FakeLanguageGen(String[] openingVowels, String[] midVowels, String[] openingConsonants, String[] midConsonants, String[] closingConsonants, String[] closingSyllables, String[] vowelSplitters, int[] syllableLengths, double[] syllableFrequencies, double vowelStartFrequency, double vowelEndFrequency, double vowelSplitFrequency, double syllableEndFrequency)
This is a very complicated constructor! Maybe look at the calls to this to initialize static members of this class, LOVECRAFT and GREEK_ROMANIZED.FakeLanguageGen(String[] openingVowels, String[] midVowels, String[] openingConsonants, String[] midConsonants, String[] closingConsonants, String[] closingSyllables, String[] vowelSplitters, int[] syllableLengths, double[] syllableFrequencies, double vowelStartFrequency, double vowelEndFrequency, double vowelSplitFrequency, double syllableEndFrequency, regexodus.Pattern[] sane, boolean clean)
This is a very complicated constructor! Maybe look at the calls to this to initialize static members of this class, LOVECRAFT and GREEK_ROMANIZED. -
Method Summary
Modifier and Type Method Description protected String[]
accentBoth(IRNG rng, String[] me, double vowelInfluence, double consonantInfluence)
protected String[]
accentConsonants(IRNG rng, String[] me, double influence)
protected String[]
accentVowels(IRNG rng, String[] me, double influence)
FakeLanguageGen
addAccents(double vowelInfluence, double consonantInfluence)
Produces a new FakeLanguageGen like this one but with extra vowels and/or consonants possible, adding from a wide selection of accented vowels (if vowelInfluence is above 0.0) and/or consonants (if consonantInfluence is above 0.0).FakeLanguageGen
addModifiers(Collection<FakeLanguageGen.Modifier> mods)
Adds the specified Modifier objects from a Collection to a copy of this FakeLanguageGen and returns it.FakeLanguageGen
addModifiers(FakeLanguageGen.Modifier... mods)
Adds the specified Modifier objects to a copy of this FakeLanguageGen and returns it.protected static boolean
checkAll(CharSequence testing, regexodus.Pattern[] checks)
static boolean
checkVulgarity(CharSequence testing)
Checks a CharSequence, such as a String, against an overzealous vulgarity filter, returning true if the text could contain vulgar elements or words that could seem vulgar or juvenile.FakeLanguageGen
copy()
static FakeLanguageGen
deserializeFromString(String data)
boolean
equals(Object o)
static FakeLanguageGen
get(String name)
If a FakeLanguageGen is known and is inregistered
, this allows you to look up that FakeLanguageGen by name (using a name fromregisteredNames
).static FakeLanguageGen
getAt(int index)
If a FakeLanguageGen is known and is inregistered
, this allows you to look up that FakeLanguageGen by index, from 0 toFakeLanguageGen.registered.length - 1
.String
getName()
Returns the name of this FakeLanguageGen, such as "English" or "Deep Speech", if one was registered for this.long
hash64()
int
hashCode()
protected String[]
merge1000(IRNG rng, String[] me, String[] other, double otherInfluence)
FakeLanguageGen
mix(double myWeight, FakeLanguageGen other1, double weight1, Object... pairs)
Produces a FakeLanguageGen by mixing this FakeLanguageGen with one or more other FakeLanguageGen objects.FakeLanguageGen
mix(FakeLanguageGen other, double otherInfluence)
Makes a new FakeLanguageGen that mixes this object withother
, mingling the consonants and vowels they use as well as any word suffixes or other traits, and favoring the qualities inother
byotherInfluence
, which will value both languages evenly if it is 0.5 .static FakeLanguageGen
mixAll(Object... pairs)
Produces a FakeLanguageGen from a group of FakeLanguageGen parameters and the weights for those parameters.static FakeLanguageGen.Modifier
modifier(String pattern, String replacement)
Convenience method that just callsModifier(String, String)
.static FakeLanguageGen.Modifier
modifier(String pattern, String replacement, double chance)
Convenience method that just callsModifier(String, String, double)
.static String
nameAt(int index)
If a FakeLanguageGen is known and is inregistered
, this allows you to look up that FakeLanguageGen's name by index, from 0 toFakeLanguageGen.registeredNames.length - 1
.static FakeLanguageGen
randomLanguage(long seed)
static FakeLanguageGen
randomLanguage(IRNG rng)
FakeLanguageGen
removeAccents()
Useful for cases with limited fonts, this produces a new FakeLanguageGen like this one but with all accented characters removed (including almost all non-ASCII Latin-alphabet characters, but only some Greek and Cyrillic characters).static CharSequence
removeAccents(CharSequence str)
Removes accented Latin-script characters from a string; if the "base" characters are non-English anyway then the result won't be an ASCII string, but otherwise it probably will be.FakeLanguageGen
removeModifiers()
Creates a copy of this FakeLanguageGen with no modifiers.String
sentence(int minWords, int maxWords)
Generate a sentence from this FakeLanguageGen, using and changing the current seed, with the length in words between minWords and maxWords, both inclusive.String
sentence(int minWords, int maxWords, String[] midPunctuation, String[] endPunctuation, double midPunctuationFrequency)
Generate a sentence from this FakeLanguageGen, using and changing the current seed.String
sentence(int minWords, int maxWords, String[] midPunctuation, String[] endPunctuation, double midPunctuationFrequency, int maxChars)
Generate a sentence from this FakeLanguageGen that fits in the given length limit.String
sentence(long seed, int minWords, int maxWords)
Generate a sentence from this FakeLanguageGen, using the given seed as a long, with the length in words between minWords and maxWords, both inclusive.String
sentence(long seed, int minWords, int maxWords, String[] midPunctuation, String[] endPunctuation, double midPunctuationFrequency)
Generate a sentence from this FakeLanguageGen, using the given seed as a long.String
sentence(long seed, int minWords, int maxWords, String[] midPunctuation, String[] endPunctuation, double midPunctuationFrequency, int maxChars)
Generate a sentence from this FakeLanguageGen that fits in the given length limit, using the given seed as a long.String
sentence(IRNG rng, int minWords, int maxWords)
Generate a sentence from this FakeLanguageGen, using the given RNG, with the length in words between minWords and maxWords, both inclusive.String
sentence(IRNG rng, int minWords, int maxWords, String[] midPunctuation, String[] endPunctuation, double midPunctuationFrequency)
Generate a sentence from this FakeLanguageGen using the specific RNG.String
sentence(IRNG rng, int minWords, int maxWords, String[] midPunctuation, String[] endPunctuation, double midPunctuationFrequency, int maxChars)
Generate a sentence from this FakeLanguageGen using the given RNG that fits in the given length limit.String
serializeToString()
String
toString()
String
word(boolean capitalize)
Generate a word from this FakeLanguageGen, using and changing the current seed.String
word(long seed, boolean capitalize)
Generate a word from this FakeLanguageGen using the specified long seed to use for a shared StatefulRNG.String
word(long seed, boolean capitalize, int approxSyllables)
Generate a word from this FakeLanguageGen with an approximate number of syllables using the specified long seed to use for a shared StatefulRNG.String
word(long seed, boolean capitalize, int approxSyllables, regexodus.Pattern[] additionalChecks)
Generate a word from this FakeLanguageGen with an approximate number of syllables using the specified long seed to use for a shared StatefulRNG.String
word(IRNG rng, boolean capitalize)
Generate a word from this FakeLanguageGen using the specified RNG.String
word(IRNG rng, boolean capitalize, int approxSyllables)
Generate a word from this FakeLanguageGen using the specified RNG with an approximate number of syllables.String
word(IRNG rng, boolean capitalize, int approxSyllables, regexodus.Pattern[] additionalChecks)
Generate a word from this FakeLanguageGen using the specified RNG with an approximate number of syllables.String
word(IStatefulRNG rng, boolean capitalize, int approxSyllables, long... reseeds)
Generate a word from this FakeLanguageGen using the specified StatefulRNG with an approximate number of syllables, potentially setting the state of rng mid-way through the word to another seed fromreseeds
more than once if the word is long enough.
-
Field Details
-
openingVowels
-
midVowels
-
openingConsonants
-
midConsonants
-
closingConsonants
-
vowelSplitters
-
closingSyllables
-
clean
-
syllableFrequencies
-
totalSyllableFrequency
-
vowelStartFrequency
-
vowelEndFrequency
-
vowelSplitFrequency
-
syllableEndFrequency
-
sanityChecks
-
modifiers
-
srng
-
summary
-
name
-
anyVowel
A pattern String that will match any vowel FakeLanguageGen can produce out-of-the-box, including Latin, Greek, and Cyrillic; for use when a String will be interpreted as a regex (as inFakeLanguageGen.Alteration
).- See Also:
- Constant Field Values
-
anyVowelCluster
A pattern String that will match any vowel FakeLanguageGen can produce out-of-the-box, including Latin, Greek, and Cyrillic; for use when a String will be interpreted as a regex (as inFakeLanguageGen.Alteration
).- See Also:
- Constant Field Values
-
anyConsonant
A pattern String that will match any vowel FakeLanguageGen can produce out-of-the-box, including Latin, Greek, and Cyrillic; for use when a String will be interpreted as a regex (as inFakeLanguageGen.Alteration
).- See Also:
- Constant Field Values
-
anyConsonantCluster
A pattern String that will match any vowel FakeLanguageGen can produce out-of-the-box, including Latin, Greek, and Cyrillic; for use when a String will be interpreted as a regex (as inFakeLanguageGen.Alteration
).- See Also:
- Constant Field Values
-
repeats
-
vowelClusters
-
consonantClusters
-
LOVECRAFT
Ia! Ia! Cthulhu Rl'yeh ftaghn! Useful for generating cultist ramblings or unreadable occult texts. You may want to consider mixing this with multiple other languages usingmixAll(Object...)
; using some very different languages in low amounts relative to the amount used for this, likeNAHUATL
,INUKTITUT
,SOMALI
,DEEP_SPEECH
, andINSECT
can alter the aesthetic of the generated text in ways that may help distinguish magic styles.
Zvrugg pialuk, ya'as irlemrugle'eith iposh hmo-es nyeighi, glikreirk shaivro'ei! -
ENGLISH
Imitation English; may seem closer to Dutch in some generated text, and is not exactly the best imitation. Should seem pretty fake to many readers; does not filter out dictionary words but does perform basic vulgarity filtering. If you want to avoid generating other words, you can subclass FakeLanguageGen and modify word() .
Mont tiste frot; mousation hauddes? Lily wrely stiebes; flarrousseal gapestist. -
GREEK_ROMANIZED
Imitation ancient Greek, romanized to use the Latin alphabet. Likely to seem pretty fake to many readers.
Psuilas alor; aipeomarta le liaspa... -
GREEK_AUTHENTIC
Imitation ancient Greek, using the original Greek alphabet. People may try to translate it and get gibberish. Make sure the font you use to render this supports the Greek alphabet! In the GDX display module, most fonts support all the Greek you need for this.
Ψυιλασ αλορ; αιπεομαρτα λε λιασπα... -
FRENCH
Imitation modern French, using the (many) accented vowels that are present in the language. Translating it will produce gibberish if it produces anything at all. In the GDX display module, most fonts support all the accented characters you need for this.
Bœurter; ubi plaqua se saigui ef brafeur? -
RUSSIAN_ROMANIZED
Imitation modern Russian, romanized to use the Latin alphabet. Likely to seem pretty fake to many readers.
Zhydotuf ruts pitsas, gogutiar shyskuchebab - gichapofeglor giunuz ieskaziuzhin. -
RUSSIAN_AUTHENTIC
Imitation modern Russian, using the authentic Cyrillic alphabet used in Russia and other countries. Make sure the font you use to render this supports the Cyrillic alphabet! In the GDX display module, the "smooth" fonts support all the Cyrillic alphabet you need for this.
Жыдотуф руц пйцас, гогутяр шыскучэбаб - гйчапофёглор гюнуз ъсказюжин. -
JAPANESE_ROMANIZED
Imitation Japanese, romanized to use the Latin alphabet. Likely to seem pretty fake to many readers.
Narurehyounan nikase keho... -
SWAHILI
Swahili is one of the more commonly-spoken languages in sub-Saharan Africa, and serves mainly as a shared language that is often learned after becoming fluent in one of many other (vaguely-similar) languages of the area. An example sentence in Swahili, that this might try to imitate aesthetically, is "Mtoto mdogo amekisoma," meaning "The small child reads it" (where it is a book). A notable language feature used here is the redoubling of words, which is used in Swahili to emphasize or alter the meaning of the doubled word; here, it always repeats exactly and can't make minor changes like a real language might. This generates things like "gata-gata", "hapi-hapi", and "mimamzu-mimamzu", always separating with a hyphen here.
As an aside, please try to avoid the ugly stereotypes that fantasy media often assigns to speakers of African-like languages when using this or any of the generators. Many fantasy tropes come from older literature written with major cultural biases, and real-world cultural elements can be much more interesting to players than yet another depiction of a "jungle savage" with stereotypical traits. Consider drawing from existing lists of real-world technological discoveries, like https://en.wikipedia.org/wiki/History_of_science_and_technology_in_Africa , for inspiration when world-building; though some groups may not have developed agriculture by early medieval times, their neighbors may be working iron and studying astronomy just a short distance away.
Kondueyu; ma mpiyamdabota mise-mise nizakwaja alamsa amja, homa nkajupomba. -
SOMALI
Imitation Somali, using the Latin alphabet. Due to uncommon word structure, unusual allowed combinations of letters, and no common word roots with most familiar languages, this may seem like an unidentifiable or "alien" language to most readers. However, it's based on the Latin writing system for the Somali language (probably closest to the northern dialect), which due to the previously mentioned properties, makes it especially good for mixing with other languages to make letter combinations that seem strange to appear. It is unlikely that this particular generated language style will be familiar to readers, so it probably won't have existing stereotypes associated with the text. One early comment this received was, "it looks like a bunch of letters semi-randomly thrown together", which is probably a typical response (the comment was made by someone fluent in German and English, and most Western European languages are about as far as you can get from Somali).
Libor cat naqoxekh dhuugad gisiqir? -
HINDI_ROMANIZED
Imitation Hindi, romanized to use the Latin alphabet using accented glyphs similar to the IAST standard. Most fonts do not support the glyphs that IAST-standard romanization of Hindi needs, so this uses alternate glyphs from at most Latin Extended-A. Relative to the IAST standard, the glyphs"ṛṝḷḹḍṭṅṇṣṃḥ"
become"ŗŕļĺđţńņşĕĭ"
, with the nth glyph in the first string being substituted with the nth glyph in the second string. You may want to get a variant on this language withremoveAccents()
if you can't display the less-commonly-supported glyphsāīūĕĭáíúóŗŕļţĺđńñņśş
. For some time SquidLib had a separate version of imitation Hindi that was accurate to the IAST standard, but this version is more usable because font support is much better for the glyphs it uses, so the IAST kind was removed (it added quite a bit of code for something that was mostly unusable).
Darvāga yar; ghađhinopŕauka āĕrdur, conśaigaijo śabhodhaĕđū jiviđaudu. -
ARABIC_ROMANIZED
Imitation Arabic, using mostly the Latin alphabet but with some Greek letters for tough transliteration topics. It's hard to think of a more different (widely-spoken) language to romanize than Arabic. Written Arabic does not ordinarily use vowels (the writing system is called an abjad, in contrast to an alphabet), and it has more than a few sounds that are very different from those in English. This version, because of limited support in fonts and the need for separate words to be distinguishable with regular expressions, uses somewhat-accurate digraphs or trigraphs instead of the many accented glyphs (not necessarily supported by most fonts) in most romanizations of Arabic, and this scheme uses no characters from outside ASCII.
Please try to be culturally-sensitive about how you use this generator. Classical Arabic (the variant that normally marks vowels explicitly and is used to write the Qur'an) has deep religious significance in Islam, and if you machine-generate text that (probably) isn't valid Arabic, but claim that it is real, or that it has meaning when it actually doesn't, that would be an improper usage of what this generator is meant to do. In a fantasy setting, you can easily confirm that the language is fictional and any overlap is coincidental; an example of imitation Arabic in use is the Dungeons and Dragons setting, Al-Qadim, which according to one account sounds similar to a word in real Arabic (that does not mean anything like what the designer was aiming for). In a historical setting, FakeLanguageGen is probably "too fake" to make a viable imitation for any language, and may just sound insulting if portrayed as realistic. You may want to mix ARABIC_ROMANIZED with a very different kind of language, like GREEK_ROMANIZED or RUSSIAN_AUTHENTIC, to emphasize that this is not a real-world language.
Hiijakki al-aafusiib rihit, ibn-ullukh aj shwisari! -
INUKTITUT
Imitation text from an approximation of one of the Inuktitut languages spoken by various people of the Arctic and nearby areas. This is likely to be hard to pronounce. Inuktitut is the name accepted in Canada for one language family of that area, but other parts of the Arctic circle speak languages with varying levels of difference from this style of generated text. The term "Inuit language" may be acceptable, but "Eskimo language" is probably not, and when that term is not considered outright offensive it refers to a different language group anyway (more properly called Yupik or Yup'ik, and primarily spoken in Siberia instead of Canada and Alaska).
Ugkangungait ninaaq ipkutuilluuq um aitqiinnaitunniak tillingaat. -
NORSE
Somewhat close to Old Norse, which is itself very close to Icelandic, so this uses Icelandic spelling rules. Not to be confused with the language(s) of Norway, where the Norwegian languages are called norsk, and are further distinguished into Bokmål and Nynorsk. This should not be likely to seem like any form of Norwegian, since it doesn't have the a-with-ring letter 'å' and has the letters eth ('Ðð') and thorn ('Þþ'). If you want to remove any letters not present on a US-ASCII keyboard, you can useFakeLanguageGen.Modifier.SIMPLIFY_NORSE
on this language or some mix of this with other languages; it also changes some of the usage of "j" where it means the English "y" sound, making "fjord" into "fyord", which is closer to familiar uses from East Asia like "Tokyo" and "Pyongyang". You can also now useNORSE_SIMPLIFIED
directly, which is probably easiest.
Leyrk tjör stomri kna snó æd ðrépdápá, prygso? -
NAHUATL
Imitation text from an approximation of the language spoken by the Aztec people and also over a million contemporary people in parts of Mexico. This is may be hard to pronounce, since it uses "tl" as a normal consonant (it can start or end words), but is mostly a fairly recognizable style of language.
Olcoletl latl palitz ach; xatatli tzotloca amtitl, xatloatzoatl tealitozaztitli otamtax? -
MONGOLIAN
Imitation text from an approximation of one of the languages spoken in the 13th-century Mongol Empire. Can be hard to pronounce. This is closest to Middle Mongolian, and is probably not the best way to approximate modern Mongolian, which was written for many years in the Cyrillic alphabet (same alphabet as Russian) and has changed a lot in other ways.
Ghamgarg zilijuub lirgh arghar zunghichuh naboogh. -
FANTASY_NAME
A mix of four different languages, using only ASCII characters, that is meant for generating single words for creature or place names in fantasy settings.
Adeni, Sainane, Caneros, Sune, Alade, Tidifi, Muni, Gito, Lixoi, Bovi... -
FANCY_FANTASY_NAME
A mix of four different languages with some accented characters added onto an ASCII base, that can be good for generating single words for creature or place names in fantasy settings that should have a "fancy" feeling from having unnecessary accents added primarily for visual reasons.
Askieno, Blarcīnũn, Mēmida, Zizhounkô, Blęrinaf, Zemĭ, Mónazôr, Renerstă, Uskus, Toufounôr... -
GOBLIN
Fantasy language that might be suitable for stealthy humanoids, such as goblins, or as a secret language used by humans who want to avoid notice. Uses no "hard" sounds like "t" and "k", but also tries to avoid the flowing aesthetic of fantasy languages associated with elves. Tends toward clusters of consonants like "bl", "gm", "dg", and "rd".
Gwabdip dwupdagorg moglab yurufrub. -
ELF
Fantasy language that tries to imitate the various languages spoken by elves in J.R.R. Tolkien's works, using accented vowels occasionally and aiming for long, flowing, vowel-heavy words. It's called ELF because there isn't a consistent usage across fantasy and mythological sources of either "elvish", "elfish", "elven", "elfin", or any one adjective for "relating to an elf." In the GDX display module, the "smooth" and "unicode" fonts, among others, support all the accented characters you need for this.
Il ilthiê arel enya; meâlelail theasor arôreisa. -
DEMONIC
Fantasy language that might be suitable for a language spoken by demons, aggressive warriors, or people who seek to emulate or worship similar groups. The tendency here is for DEMONIC to be the language used by creatures that are considered evil because of their violence, while INFERNAL would be the language used by creatures that are considered evil because of their manipulation and deceit (DEMONIC being "chaotic evil" and INFERNAL being "lawful evil"). This uses lots of sounds that don't show up in natural languages very often, mixing harsh or guttural sounds like "kh" and "ghr" with rare sounds like "vr", "zv", and "tl". It uses vowel-splitting in a way that is similar to LOVECRAFT, sometimes producing sounds like "tsa'urz" or "khu'olk".
Vrirvoks xatughat ogz; olds xu'oz xorgogh! -
INFERNAL
Fantasy language that might be suitable for a language spoken by fiends, users of witchcraft, or people who seek to emulate or worship similar groups. The tendency here is for DEMONIC to be the language used by creatures that are considered evil because of their violence, while INFERNAL is the language used by creatures that are considered evil because of their manipulation and deceit (DEMONIC being "chaotic evil" and INFERNAL being "lawful evil"). The name INFERNAL refers to Dante's Inferno and the various naming conventions used for residents of Hell in the more-modern Christian traditions (as well as some of the stylistic conventions of Old Testament figures described as false idols, such as Moloch and Mammon). In an effort to make this distinct from the general style of names used in ancient Hebrew (since this is specifically meant for the names of villains as opposed to normal humans), we add in vowel splits as used in LOVECRAFT and DEMONIC, then add quite a few accented vowels. These traits make the language especially well-suited for "deal with the Devil" written bargains, where a single accent placed incorrectly could change the meaning of a contract and provide a way for a fiend to gain leverage.
Zézîzûth eke'iez áhìphon; úhiah îbbëphéh haîtemheû esmez... -
SIMPLISH
English-like language that omits complex spelling and doesn't include any of the uncommon word endings of English like "ought" or "ation." A good choice when you want something that doesn't use any non-US-keyboard letters, looks somewhat similar to English, and tries to be pronounceable without too much effort. This doesn't have any doubled or silent letters, nor does it require special rules for pronouncing vowels like "road" vs. "rod", though someone could make up any rules they want.
Fledan pranam, simig bag chaimer, drefar, woshash is sasik. -
ALIEN_A
Fantasy/sci-fi language that could be spoken by some very-non-human culture that would typically be fitting for an alien species. This alien language emphasizes unusual consonant groups and prefers the vowels 'a' and 'i', sometimes with two different vowels in one syllable, like with 'ea', but never two of the same vowel, like 'ee'. Many consonant groups may border on unpronounceable unless a different sound is meant by some letters, such as 'c', 'h', 'q', 'x', 'w', and 'y'. In particular, 'x' and 'q' may need to sound like different breathy, guttural, or click noises for this to be pronounced by humans effectively.
Jlerno iypeyae; miojqaexli qraisojlea epefsaihj xlae... -
KOREAN_ROMANIZED
Imitation text from an approximation of Korean, using the Revised Romanization method that is official in South Korea today and is easier to type. The text this makes may be hard to pronounce. Korean is interesting as a language to imitate for a number of reasons; many of the sounds in it are rarely found elsewhere, it can cluster consonants rather tightly (most languages don't; English does to a similar degree but Japanese hardly has any groups of consonants), and there are many more vowel sounds without using tones (here, two or three letters are used for a vowel, where the first can be y or w and the rest can be a, e, i, o, or u in some combination). Some letter combinations possible here are impossible or very rare in correctly-Romanized actual Korean, such as the rare occurrence of a single 'l' before a vowel (it normally only appears in Romanized text before a consonant or at the end of a word).
Hyeop euryam, sonyon muk tyeok aengyankeon, koelgwaelmwak. -
ALIEN_E
Fantasy/sci-fi language that could be spoken by some very-non-human culture that would typically be fitting for an alien species. This alien language emphasizes hard sounds and prefers the vowels 'e' and 'a', sometimes with two of the same vowel, like 'ee', but never with two different vowels in one syllable, like with 'ea'. This language is meant to use click sounds, if pronunciation is given, where 'q' modifies a consonant to form a click, such as 'tq'. This is like how 'h' modifies letters in English to make 'th' different from 't' or 'h'. This may be ideal for a species with a beak (or one that lacks lips for some other reason), since it avoids using sounds that require lips (some clicks might be approximated by other species using their lips if this uses some alien-specific clicking organ).
Reds zasg izqekkek zagtsarg ukaard ac ots as! -
ALIEN_I
Fantasy/sci-fi language that could be spoken by some very-non-human culture that would typically be fitting for an alien species. This alien language emphasizes "liquid" sounds such as 'l', 'r', and mixes with those and other consonants, and prefers the vowels 'i' and 'o', never with two of the same vowel, like 'ee', nor with two different vowels in one syllable, like with 'ea'; it uses accent marks heavily and could be a tonal language. It sometimes splits vowels with a single apostrophe, and rarely has large consonant clusters.
Asherzhäl zlómór ìsiv ázá nralthóshos, zlôbùsh. -
ALIEN_O
Fantasy/sci-fi language that could be spoken by some very-non-human culture that would typically be fitting for an alien species. This alien language emphasizes large clusters of vowels, typically with 2 or 3 vowel sounds between consonants, though some vowel groups could be interpreted in multiple ways (such as English "maim" and "bail", which also have regional differences in pronunciation). As the name would suggest, it strongly prefers using the vowel "o", with it present in about half the groups, but doesn't have any preference toward or against the other vowels it uses, "a", "e", "i", and "u". The consonants completely avoid hard sounds like "t" and "k", medium-hard sounds like "g" and "b", and also sibilants like "s" and "z". This should be fairly hard to pronounce, but possible.
Foiuhoeorfeaorm novruol naionouffeu meuif; hmoieloreo naemriou. -
ALIEN_U
Fantasy/sci-fi language that could be spoken by some very-non-human culture that would typically be fitting for an alien species. This alien language is meant to have an abrupt change mid-word for many words, with the suffix of roughly half of words using the letter "e", which is absent from the rest of the language; these suffixes can also use consonant clusters, which are similarly absent elsewhere. The suffixes would make sense as a historical relic or as a linguistic holdout from a historical merger. As the name would suggest, it strongly prefers using the vowel "u", with it present in about half the groups, and can use the umlaut accent "ü" on some vowels. The consonants completely avoid hard sounds like "t" and "k", and don't cluster; they often have special marks. This should be relatively easy to pronounce for an alien language, though the words are rather long.
Üweħid vuŕeħid deẃul leŋul waloyeyür; äyovavü... -
DRAGON
Fantasy language that tries to sound like the speech of a powerful and pompous dragon, using long, complex words and a mix of hard consonants like "t" and "k", "liquid" consonants like "l" and "r", and sometimes vowel groups like "ie" and "aa". It frequently uses consonant clusters involving "r". It uses no accented characters.
Vokegodzaaz kigrofreth ariatarkioth etrokagik deantoznik hragriemitaaz gianehaadaz... -
KOBOLD
Fantasy language based closely onDRAGON
, but with much shorter words normally and closing syllables that may sound "rushed" or "crude", though it has the same general frequency of most consonants and vowels. This means it still uses lots of "t", "k", and "r", can group two vowels sometimes, and when there's a consonant in the middle of a word, it is often accompanied by an "r" on one or both sides. If used withNaturalLanguageCipher
, this will look very similar to DRAGON, because the syllable lengths aren't determined by this object but by the text being ciphered. Still, the ends of words are often different. It is called KOBOLD because, even though the original kobold myth was that of a goblin-like spirit that haunted cobalt mines, the modern RPG treatment of kobolds frequently describes them as worshippers of dragons or in some way created by dragons, but generally they're a sort of failure to live up to a dragon's high expectations. The feel of this language is meant to be something like a dragon's speech, but much less "fancy" and rather curt.
Thritriz, laazak gruz kokak thon lut... -
INSECT
Fantasy/sci-fi language that would typically be fitting for an insect-like species without a close equivalent to human lips. This language emphasizes hard sounds such as 't' and 'k', uses some sibilants such as 's', 'sh', and 'x', uses lots of 'r' sounds, includes trill sounds using 'rr' (as in Spanish), and uses primarily 'a' and 'i' for vowels, with low complexity on vowels. Differs fromALIEN_E
by not having harder-to-explain click sounds, and adjusting vowels/sibilants a fair bit.
Ritars tsarraxgits, krit trir istsak! -
MAORI
Imitation text from an approximation of the Maori language, spoken in New Zealand both today and historically, and closely related to some other Polynesian languages. This version uses the current standard orthographic standard of representing a long "a" with the letter "ā" (adding a macron diacritic).
Māuka whapi enāongupe worute, moa noepo? -
SPANISH
Imitation text from an approximation of Spanish (not using the variations spoken in Spain, but closer to Latin American forms of Spanish). This isn't as close as possible, but it abides by most of the orthographic rules that Spanish uses. It uses the acute accent on the vowels á, é, í, ó, and ú, as well as the consonant ñ.
Jamos daí oñuezqui, luarbezquisdas canga ombiurta irri hoño resda! -
DEEP_SPEECH
Fantasy/sci-fi language that would potentially be fitting for a trade language spoken by various very-different groups, such as creatures with tentacled faces who need to communicate with spider-elves and living crystals. This language tries to use relatively few sounds so vocally-restricted species can speak it or approximate it, but some of its sounds are uncommon. It uses "ng" as Vietnamese does, as a sound that can be approximated with "w" but more accurately is like the sound at the end of "gong". It uses a breathy sound in many vowels, represented by "h", and this is separate from (and can be combined with) lengthening the vowel by doubling it ("a", "ah", "aa", and "aah" are different). The "x" sound can be approximated by any of the "kh" or "q" sounds used in various human languages, or with its usage in English for "ks". This does separate some vowels with "'", which can be a glottal stop as in Hawaiian or various other languages, or approximated with a brief pause.
Zrolmolurz, voluu, nguu yuh'ongohng! -
NORSE_SIMPLIFIED
Somewhat close to Old Norse, which is itself very close to Icelandic, but changed to avoid letters not on a US-ASCII keyboard. Not to be confused with the language(s) of Norway, where the Norwegian languages are called norsk, and are further distinguished into Bokmål and Nynorsk. This just appliesFakeLanguageGen.Modifier.SIMPLIFY_NORSE
toNORSE
. This replaces eth ('Ðð') and thorn ('Þþ') with 'th' unless preceded by 's' (where 'sð' or 'sþ' becomes "st") or followed by 'r' (where 'ðr' or 'þr' becomes 'fr'). It replaces 'Æ' or 'æ' with 'Ae' or 'ae', and replaces 'Ö' or 'ö' with 'Ou' or "ou", which can change the length of a String relative to NORSE. It removes all other accent marks (since the two-dot umlaut accent has already been changed, this only affects acute accents). It also changes some of the usage of "j" where it means the English "y" sound, making "fjord" into "fyord", which is closer to familiar uses from East Asia like "Tokyo" and "Pyongyang".
Leyrk tyour stomri kna sno aed frepdapa, prygso? -
HLETKIP
A fictional language that could ostensibly be spoken by some group of humans, but that isn't closely based on any one real-world language. It is meant to have a mix of hard and flowing sounds, roughly like Hebrew or Turkish, but with a very different set of consonants and consonant blends. Importantly, consonant sounds are always paired here except for the final consonant of a word, which is always one consonant sound if it is used at all. The choices of consonant sounds are designed to be unusual, like "hl", "pkh", and "zhg" (which can all start a word).
Nyep khruv kwolbik psesh klulzhanbik psahzahwuth bluryup; hnish zhrim? -
ANCIENT_EGYPTIAN
A (necessarily) very rough anglicization of Old Egyptian, a language that has no precisely known pronunciation rules and was written with hieroglyphics. This is meant to serve as an analogue for any ancient language with few contemporary speakers.
Thenamses upekha efe emesh nabasu ahakhepsut! -
CROW
A rough imitation of the Crow language of the American Midwest, using some tone marks. Some of the orthography rules aren't clear across Internet information about the language, so this really is a "fake" language it will be generating, not the real thing at all. This considers 'x' to be the rough back-of-throat noise that isn't in English other than in loanwords, like the Scottish "loch," and in names like the German "Bach." Doubled (to use the linguistic term, geminated) consonants are pronounced for a longer time, and doubled vowels with the same accent mark or no accent mark are also lengthened. An un-accented vowel has a normal tone, an accented vowel has a high tone, and an accented vowel followed by an un-accented vowel has a falling tone. This last feature is the least common among languages here, and is a good way of distinguishing imitation Crow from other languages.
Pashu-umíkiki; chinébúlu ak kóokutú shu-eníí-a ipíimúu heekokáakoku? -
IMP
A fantasy language meant for obnoxious screeching annoying enemies more-so than for intelligent friends or foes. Uses accented vowels to mean "louder or higher-pitched" and up to three repeats of any vowel to lengthen it.
Siii-aghak fítríííg dú-úgh ru-úúk, grííírá! -
MALAY
An approximation of the Malay language or any of its close relatives, such as Indonesian. This differs from Malay as it is normally written by using "ch" for what Malay writes as "c" (it is pronounced like the start of "chow"), and "sh" for what Malay writes as "sy" (pronounced like the start of "shoe").
Kashanyah satebok bisal bekain akinuk an as, penah lukul... -
CELESTIAL
Fantasy language that is meant to sound like it could be spoken by divine or (magical) otherworldly beings. Sometimes uses the breve mark (as inăĕĭŏ
) over vowels and rarely splits consonants with'
. Uses very few harsh sounds, and may be easy to confuse withELF
(this tends to use much shorter words). This happens to sound a little like Hebrew, but since this doesn't have some consonants that are commonly used in Hebrew, and because this uses accented vowels that aren't in Hebrew, they should be different enough that this language can seem "not of this world."
Emŏl ebin hanzi'ab, isharar omrihrel nevyăd. -
CHINESE_ROMANIZED
An approximation of Hanyu Pinyin, a Romanization technique used for Mandarin Chinese that has been in common use since the 1980s. This makes some slight changes so the vulgarity filters this uses can understand how some letters sound; Pinyin's letter c becomes ts, and this replaces the u with umlaut, ü, in all cases with yu.
Tuàn tiāzhǎn dér, ǔngínbǔng xōr shàū kán nu tsīn. -
CHEROKEE_ROMANIZED
A rough imitation of the Cherokee language, using an attempt at romanizing the syllabary the language is often written with, using only the parts of the language that are usually written down. Some of the orthography rules aren't clear across Internet information about the language, so this really is a "fake" language it will be generating, not the real thing at all. The vowel 'ü' is used in place of the 'v' that the normal transliteration uses, to help with profanity-checking what this generates; it is pronounced like in the French word "un".
Dah utugü tsahnahsütoi gohü usütahdi asi tsau dah tashi. -
VIETNAMESE
A very rough imitation of the Vietnamese language, without using the accurate characters Vietnamese really uses but that are rare in fonts. Since so many letters in correct Vietnamese aren't available in most fonts, this can't represent most of the accented vowels in the language, but it tries, with 6 accents for each of a, e, i, o, and u, though none for y. It also uses 'ð' from Icelandic in place of the correct d with bar. This could also maybe be used as an approximation of (badly) Romanized Thai, since Thai normally uses its own script but also has many tones (which would be indicated by the accents here).
Bach trich, nŏ ngiukh nga cä tran ngonh... -
registered
An array that stores all the hand-made FakeLanguageGen constants; it does not store randomly-generated languages nor does it store modifications or mixes of languages. The order these are stored in is related to the numeric codes for languages in theserializeToString()
output, but neither is dependent on the other if this array is changed for some reason (which is not recommended, but not out of the question). If this is modified, then it is probably a bad idea to assign null to any elements in registered; special care is taken to avoid null elements in its original state, so some code may rely on the items being usable and non-null. -
registeredNames
-
romanizedHumanLanguages
FakeLanguageGen constants that are meant to sound like specific real-world languages, and that all use the Latin script (like English) with maybe some accents.
-
-
Constructor Details
-
FakeLanguageGen
public FakeLanguageGen()Zero-arg constructor for a FakeLanguageGen; produces a FakeLanguageGen equivalent to FakeLanguageGen.ENGLISH . -
FakeLanguageGen
public FakeLanguageGen(String[] openingVowels, String[] midVowels, String[] openingConsonants, String[] midConsonants, String[] closingConsonants, String[] closingSyllables, String[] vowelSplitters, int[] syllableLengths, double[] syllableFrequencies, double vowelStartFrequency, double vowelEndFrequency, double vowelSplitFrequency, double syllableEndFrequency)This is a very complicated constructor! Maybe look at the calls to this to initialize static members of this class, LOVECRAFT and GREEK_ROMANIZED.- Parameters:
openingVowels
- String array where each element is a vowel or group of vowels that may appear at the start of a word or in the middle; elements may be repeated to make them more commonmidVowels
- String array where each element is a vowel or group of vowels that may appear in the middle of the word; all openingVowels are automatically copied into this internally. Elements may be repeated to make them more commonopeningConsonants
- String array where each element is a consonant or consonant cluster that can appear at the start of a word; elements may be repeated to make them more commonmidConsonants
- String array where each element is a consonant or consonant cluster than can appear between vowels; all closingConsonants are automatically copied into this internally. Elements may be repeated to make them more commonclosingConsonants
- String array where each element is a consonant or consonant cluster than can appear at the end of a word; elements may be repeated to make them more commonclosingSyllables
- String array where each element is a syllable starting with a vowel and ending in whatever the word should end in; elements may be repeated to make them more commonvowelSplitters
- String array where each element is a mark that goes between vowels, so if "-" is in this, then "a-a" may be possible; elements may be repeated to make them more commonsyllableLengths
- int array where each element is a possible number of syllables a word can use; closely tied to syllableFrequenciessyllableFrequencies
- double array where each element corresponds to an element in syllableLengths and represents how often each syllable count should appear relative to other counts; there is no need to restrict the numbers to add up to any other numbervowelStartFrequency
- a double between 0.0 and 1.0 that determines how often words start with vowels; higher numbers yield more words starting with vowelsvowelEndFrequency
- a double between 0.0 and 1.0 that determines how often words end with vowels; higher numbers yield more words ending in vowelsvowelSplitFrequency
- a double between 0.0 and 1.0 that, if vowelSplitters is not empty, determines how often a vowel will be split into two vowels separated by one of those splitterssyllableEndFrequency
- a double between 0.0 and 1.0 that determines how often an element of closingSyllables is used instead of ending normally
-
FakeLanguageGen
public FakeLanguageGen(String[] openingVowels, String[] midVowels, String[] openingConsonants, String[] midConsonants, String[] closingConsonants, String[] closingSyllables, String[] vowelSplitters, int[] syllableLengths, double[] syllableFrequencies, double vowelStartFrequency, double vowelEndFrequency, double vowelSplitFrequency, double syllableEndFrequency, regexodus.Pattern[] sane, boolean clean)This is a very complicated constructor! Maybe look at the calls to this to initialize static members of this class, LOVECRAFT and GREEK_ROMANIZED.- Parameters:
openingVowels
- String array where each element is a vowel or group of vowels that may appear at the start of a word or in the middle; elements may be repeated to make them more commonmidVowels
- String array where each element is a vowel or group of vowels that may appear in the middle of the word; all openingVowels are automatically copied into this internally. Elements may be repeated to make them more commonopeningConsonants
- String array where each element is a consonant or consonant cluster that can appear at the start of a word; elements may be repeated to make them more commonmidConsonants
- String array where each element is a consonant or consonant cluster than can appear between vowels; all closingConsonants are automatically copied into this internally. Elements may be repeated to make them more commonclosingConsonants
- String array where each element is a consonant or consonant cluster than can appear at the end of a word; elements may be repeated to make them more commonclosingSyllables
- String array where each element is a syllable starting with a vowel and ending in whatever the word should end in; elements may be repeated to make them more commonvowelSplitters
- String array where each element is a mark that goes between vowels, so if "-" is in this, then "a-a" may be possible; elements may be repeated to make them more commonsyllableLengths
- int array where each element is a possible number of syllables a word can use; closely tied to syllableFrequenciessyllableFrequencies
- double array where each element corresponds to an element in syllableLengths and represents how often each syllable count should appear relative to other counts; there is no need to restrict the numbers to add up to any other numbervowelStartFrequency
- a double between 0.0 and 1.0 that determines how often words start with vowels; higher numbers yield more words starting with vowelsvowelEndFrequency
- a double between 0.0 and 1.0 that determines how often words end with vowels; higher numbers yield more words ending in vowelsvowelSplitFrequency
- a double between 0.0 and 1.0 that, if vowelSplitters is not empty, determines how often a vowel will be split into two vowels separated by one of those splitterssyllableEndFrequency
- a double between 0.0 and 1.0 that determines how often an element of closingSyllables is used instead of ending normallysane
- true to perform sanity checks for pronounce-able sounds to most English speakers, replacing many words that are impossible to say; slows down generation slightly, irrelevant for non-Latin alphabetsclean
- true to perform vulgarity/obscenity checks on the word, replacing it if it is too close to a common English vulgarity, obscenity, or slur/epithet; slows down generation slightly
-
-
Method Details
-
removeAccents
Removes accented Latin-script characters from a string; if the "base" characters are non-English anyway then the result won't be an ASCII string, but otherwise it probably will be.- Parameters:
str
- a string that may contain accented Latin-script characters- Returns:
- a string with all accented characters replaced with their (possibly ASCII) counterparts
-
get
If a FakeLanguageGen is known and is inregistered
, this allows you to look up that FakeLanguageGen by name (using a name fromregisteredNames
).- Parameters:
name
- a String name such as "English", "Korean Romanized", or "Russian Authentic"- Returns:
- a FakeLanguageGen corresponding to the given name, or null if none was found
-
getAt
If a FakeLanguageGen is known and is inregistered
, this allows you to look up that FakeLanguageGen by index, from 0 toFakeLanguageGen.registered.length - 1
.- Parameters:
index
- an int from 0 toFakeLanguageGen.registered.length - 1
- Returns:
- a FakeLanguageGen corresponding to the given index, or null if none was found
-
nameAt
If a FakeLanguageGen is known and is inregistered
, this allows you to look up that FakeLanguageGen's name by index, from 0 toFakeLanguageGen.registeredNames.length - 1
.- Parameters:
index
- an int from 0 toFakeLanguageGen.registeredNames.length - 1
- Returns:
- a FakeLanguageGen corresponding to the given index, or null if none was found
-
randomLanguage
-
randomLanguage
-
checkAll
-
checkVulgarity
Checks a CharSequence, such as a String, against an overzealous vulgarity filter, returning true if the text could contain vulgar elements or words that could seem vulgar or juvenile. The idea here is that false positives are OK as long as there are very few false negatives (missed vulgar words). Does not check punctuation or numbers that could look like letters.- Parameters:
testing
- the text, as a CharSequence such as a String, to check- Returns:
- true if the text could contain a vulgar or juvenile element; false if it probably doesn't
-
word
Generate a word from this FakeLanguageGen, using and changing the current seed.- Parameters:
capitalize
- true if the word should start with a capital letter, false otherwise- Returns:
- a word in the fake language as a String
-
word
Generate a word from this FakeLanguageGen using the specified long seed to use for a shared StatefulRNG. If seed is the same, a FakeLanguageGen should produce the same word every time with this method.- Parameters:
seed
- the seed, as a long, to use for the randomized string buildingcapitalize
- true if the word should start with a capital letter, false otherwise- Returns:
- a word in the fake language as a String
-
word
Generate a word from this FakeLanguageGen using the specified RNG.- Parameters:
rng
- the RNG to use for the randomized string buildingcapitalize
- true if the word should start with a capital letter, false otherwise- Returns:
- a word in the fake language as a String
-
word
Generate a word from this FakeLanguageGen with an approximate number of syllables using the specified long seed to use for a shared StatefulRNG. If seed and the other parameters are the same, a FakeLanguageGen should produce the same word every time with this method.- Parameters:
seed
- the seed, as a long, to use for the randomized string buildingcapitalize
- true if the word should start with a capital letter, false otherwiseapproxSyllables
- the approximate number of syllables to produce in the word; there may be more syllables- Returns:
- a word in the fake language as a String
-
word
Generate a word from this FakeLanguageGen using the specified RNG with an approximate number of syllables.- Parameters:
rng
- the RNG to use for the randomized string buildingcapitalize
- true if the word should start with a capital letter, false otherwiseapproxSyllables
- the approximate number of syllables to produce in the word; there may be more syllables- Returns:
- a word in the fake language as a String
-
word
public String word(long seed, boolean capitalize, int approxSyllables, regexodus.Pattern[] additionalChecks)Generate a word from this FakeLanguageGen with an approximate number of syllables using the specified long seed to use for a shared StatefulRNG. This takes an array ofPattern
objects (from RegExodus, not java.util.regex) that should match invalid outputs, such as words that shouldn't be generated in some context due to vulgarity or cultural matters. If seed and the other parameters are the same, a FakeLanguageGen should produce the same word every time with this method.- Parameters:
seed
- the seed, as a long, to use for the randomized string buildingcapitalize
- true if the word should start with a capital letter, false otherwiseapproxSyllables
- the approximate number of syllables to produce in the word; there may be more syllablesadditionalChecks
- an array of RegExodus Pattern objects that match invalid words (these may be additional vulgarity checks, for example)- Returns:
- a word in the fake language as a String
-
word
public String word(IRNG rng, boolean capitalize, int approxSyllables, regexodus.Pattern[] additionalChecks)Generate a word from this FakeLanguageGen using the specified RNG with an approximate number of syllables. This takes an array ofPattern
objects (from RegExodus, not java.util.regex) that should match invalid outputs, such as words that shouldn't be generated in some context due to vulgarity or cultural matters.- Parameters:
rng
- the RNG to use for the randomized string buildingcapitalize
- true if the word should start with a capital letter, false otherwiseapproxSyllables
- the approximate number of syllables to produce in the word; there may be more syllablesadditionalChecks
- an array of RegExodus Pattern objects that match invalid words (these may be additional vulgarity checks, for example)- Returns:
- a word in the fake language as a String
-
word
Generate a word from this FakeLanguageGen using the specified StatefulRNG with an approximate number of syllables, potentially setting the state of rng mid-way through the word to another seed fromreseeds
more than once if the word is long enough. This overload is less likely to be used very often.- Parameters:
rng
- the StatefulRNG to use for the randomized string buildingcapitalize
- true if the word should start with a capital letter, false otherwiseapproxSyllables
- the approximate number of syllables to produce in the word; there may be more syllablesreseeds
- an array or varargs of additional long seeds to seedrng
with mid-generation- Returns:
- a word in the fake language as a String
-
sentence
Generate a sentence from this FakeLanguageGen, using and changing the current seed, with the length in words between minWords and maxWords, both inclusive. This can use commas and semicolons between words, and can end a sentence with ".", "!", "?", or "...".- Parameters:
minWords
- an int for the minimum number of words in a sentence; should be at least 1maxWords
- an int for the maximum number of words in a sentence; should be at least equal to minWords- Returns:
- a sentence in the fake language as a String
-
sentence
Generate a sentence from this FakeLanguageGen, using the given seed as a long, with the length in words between minWords and maxWords, both inclusive. This can use commas and semicolons between words, and can end a sentence with ".", "!", "?", or "...".- Parameters:
seed
- the seed, as a long, for the randomized string buildingminWords
- an int for the minimum number of words in a sentence; should be at least 1maxWords
- an int for the maximum number of words in a sentence; should be at least equal to minWords- Returns:
- a sentence in the fake language as a String
-
sentence
Generate a sentence from this FakeLanguageGen, using the given RNG, with the length in words between minWords and maxWords, both inclusive. This can use commas and semicolons between words, and can end a sentence with ".", "!", "?", or "...".- Parameters:
rng
- the RNG to use for the randomized string buildingminWords
- an int for the minimum number of words in a sentence; should be at least 1maxWords
- an int for the maximum number of words in a sentence; should be at least equal to minWords- Returns:
- a sentence in the fake language as a String
-
sentence
public String sentence(int minWords, int maxWords, String[] midPunctuation, String[] endPunctuation, double midPunctuationFrequency)Generate a sentence from this FakeLanguageGen, using and changing the current seed. The sentence's length in words will be between minWords and maxWords, both inclusive. It will put one of the punctuation Strings frommidPunctuation
between two words (before the space) at a frequency ofmidPunctuationFrequency
(between 0 and 1), and will end the sentence with one String chosen fromendPunctuation
.- Parameters:
minWords
- an int for the minimum number of words in a sentence; should be at least 1maxWords
- an int for the maximum number of words in a sentence; should be at least equal to minWordsmidPunctuation
- a String array where each element is a comma, semicolon, or the like that goes before a space in the middle of a sentenceendPunctuation
- a String array where each element is a period, question mark, or the like that goes at the very end of a sentencemidPunctuationFrequency
- a double between 0.0 and 1.0 that determines how often Strings from midPunctuation should be inserted before spaces- Returns:
- a sentence in the fake language as a String
-
sentence
public String sentence(long seed, int minWords, int maxWords, String[] midPunctuation, String[] endPunctuation, double midPunctuationFrequency)Generate a sentence from this FakeLanguageGen, using the given seed as a long. The sentence's length in words will be between minWords and maxWords, both inclusive. It will put one of the punctuation Strings frommidPunctuation
between two words (before the space) at a frequency ofmidPunctuationFrequency
(between 0 and 1), and will end the sentence with one String chosen fromendPunctuation
.- Parameters:
seed
- the seed, as a long, for the randomized string buildingminWords
- an int for the minimum number of words in a sentence; should be at least 1maxWords
- an int for the maximum number of words in a sentence; should be at least equal to minWordsmidPunctuation
- a String array where each element is a comma, semicolon, or the like that goes before a space in the middle of a sentenceendPunctuation
- a String array where each element is a period, question mark, or the like that goes at the very end of a sentencemidPunctuationFrequency
- a double between 0.0 and 1.0 that determines how often Strings from midPunctuation should be inserted before spaces- Returns:
- a sentence in the fake language as a String
-
sentence
public String sentence(IRNG rng, int minWords, int maxWords, String[] midPunctuation, String[] endPunctuation, double midPunctuationFrequency)Generate a sentence from this FakeLanguageGen using the specific RNG. The sentence's length in words will be between minWords and maxWords, both inclusive. It will put one of the punctuation Strings frommidPunctuation
between two words (before the space) at a frequency ofmidPunctuationFrequency
(between 0 and 1), and will end the sentence with one String chosen fromendPunctuation
.- Parameters:
rng
- the RNG to use for the randomized string buildingminWords
- an int for the minimum number of words in a sentence; should be at least 1maxWords
- an int for the maximum number of words in a sentence; should be at least equal to minWordsmidPunctuation
- a String array where each element is a comma, semicolon, or the like that goes before a space in the middle of a sentenceendPunctuation
- a String array where each element is a period, question mark, or the like that goes at the very end of a sentencemidPunctuationFrequency
- a double between 0.0 and 1.0 that determines how often Strings from midPunctuation should be inserted before spaces- Returns:
- a sentence in the fake language as a String
-
sentence
public String sentence(int minWords, int maxWords, String[] midPunctuation, String[] endPunctuation, double midPunctuationFrequency, int maxChars)Generate a sentence from this FakeLanguageGen that fits in the given length limit. The sentence's length in words will be between minWords and maxWords, both inclusive, unless it would exceed maxChars, in which case it is truncated. It will put one of the punctuation Strings frommidPunctuation
between two words (before the space) at a frequency ofmidPunctuationFrequency
(between 0 and 1), and will end the sentence with one String chosen fromendPunctuation
.- Parameters:
minWords
- an int for the minimum number of words in a sentence; should be at least 1maxWords
- an int for the maximum number of words in a sentence; should be at least equal to minWordsmidPunctuation
- a String array where each element is a comma, semicolon, or the like that goes before a space in the middle of a sentenceendPunctuation
- a String array where each element is a period, question mark, or the like that goes at the very end of a sentencemidPunctuationFrequency
- a double between 0.0 and 1.0 that determines how often Strings from midPunctuation should be inserted before spacesmaxChars
- the longest string length this can produce; should be at least6 * minWords
- Returns:
- a sentence in the fake language as a String
-
sentence
public String sentence(long seed, int minWords, int maxWords, String[] midPunctuation, String[] endPunctuation, double midPunctuationFrequency, int maxChars)Generate a sentence from this FakeLanguageGen that fits in the given length limit, using the given seed as a long. The sentence's length in words will be between minWords and maxWords, both inclusive, unless it would exceed maxChars, in which case it is truncated. It will put one of the punctuation Strings frommidPunctuation
between two words (before the space) at a frequency ofmidPunctuationFrequency
(between 0 and 1), and will end the sentence with one String chosen fromendPunctuation
.- Parameters:
seed
- the seed, as a long, for the randomized string buildingminWords
- an int for the minimum number of words in a sentence; should be at least 1maxWords
- an int for the maximum number of words in a sentence; should be at least equal to minWordsmidPunctuation
- a String array where each element is a comma, semicolon, or the like that goes before a space in the middle of a sentenceendPunctuation
- a String array where each element is a period, question mark, or the like that goes at the very end of a sentencemidPunctuationFrequency
- a double between 0.0 and 1.0 that determines how often Strings from midPunctuation should be inserted before spacesmaxChars
- the longest string length this can produce; should be at least6 * minWords
- Returns:
- a sentence in the fake language as a String
-
sentence
public String sentence(IRNG rng, int minWords, int maxWords, String[] midPunctuation, String[] endPunctuation, double midPunctuationFrequency, int maxChars)Generate a sentence from this FakeLanguageGen using the given RNG that fits in the given length limit. The sentence's length in words will be between minWords and maxWords, both inclusive, unless it would exceed maxChars, in which case it is truncated. It will put one of the punctuation Strings frommidPunctuation
between two words (before the space) at a frequency ofmidPunctuationFrequency
(between 0 and 1), and will end the sentence with one String chosen fromendPunctuation
.- Parameters:
rng
- the RNG to use for the randomized string buildingminWords
- an int for the minimum number of words in a sentence; should be at least 1maxWords
- an int for the maximum number of words in a sentence; should be at least equal to minWordsmidPunctuation
- a String array where each element is a comma, semicolon, or the like that goes before a space in the middle of a sentenceendPunctuation
- a String array where each element is a period, question mark, or the like that goes at the very end of a sentencemidPunctuationFrequency
- a double between 0.0 and 1.0 that determines how often Strings from midPunctuation should be inserted before spacesmaxChars
- the longest string length this can produce; should be at least6 * minWords
- Returns:
- a sentence in the fake language as a String
-
merge1000
-
accentVowels
-
accentConsonants
-
accentBoth
protected String[] accentBoth(IRNG rng, String[] me, double vowelInfluence, double consonantInfluence) -
mix
Makes a new FakeLanguageGen that mixes this object withother
, mingling the consonants and vowels they use as well as any word suffixes or other traits, and favoring the qualities inother
byotherInfluence
, which will value both languages evenly if it is 0.5 .
You should generally prefermix(double, FakeLanguageGen, double, Object...)
ormixAll(Object...)
if you ever mix 3 or more languages. Chaining this mix() method can be very counter-intuitive because the weights are relative, while in the other mix() and mixAll() they are absolute.- Parameters:
other
- another FakeLanguageGen to mix along with this one into a new languageotherInfluence
- how much other should affect the pair, with 0.5 being equal and 1.0 being only other used- Returns:
- a new FakeLanguageGen with traits from both languages
-
mix
public FakeLanguageGen mix(double myWeight, FakeLanguageGen other1, double weight1, Object... pairs)Produces a FakeLanguageGen by mixing this FakeLanguageGen with one or more other FakeLanguageGen objects. Takes a weight for this, another FakeLanguageGen, a weight for that FakeLanguageGen, then a possibly-empty group of FakeLanguageGen parameters and the weights for those parameters. If other1 is null or if pairs has been given a value of null instead of the normal (possibly empty) array of Objects, then this simply returns a copy of this FakeLanguageGen. Otherwise, it will at least mix this language with other1 using the given weights for each. If pairs is not empty, it has special requirements for what types it allows and in what order, but does no type checking. Specifically, pairs requires the first Object to be a FakeLanguageGen, the next to be a number of some kind that will be the weight for the previous FakeLanguageGen(this method can handle non-Double weights, and converts them to Double if needed), and every two parameters after that to follow the same order and pattern (FakeLanguageGen, then number, then FakeLanguageGen, then number...). Weights are absolute, and don't depend on earlier weights, which is the case when chaining themix(FakeLanguageGen, double)
method. This makes reasoning about the ideal weights for multiple mixed languages easier; to mix 3 languages equally you can use 3 equal weights with this, whereas with mix chaining you would need to mix the first two with 0.5 and the third with 0.33 .
It's up to you whether you want to usemixAll(Object...)
or this method; they call the same code and produce the same result, including the summary for serialization support. You probably shouldn't usemix(FakeLanguageGen, double)
with two arguments in new code, since it's easy to make mistakes when mixing three or more languages (calling that twice or more).- Parameters:
myWeight
- the weight to assign this FakeLanguageGen in the mixother1
- another FakeLanguageGen to mix in; if null, this method will abort and returncopy()
weight1
- the weight to assign other1 in the mixpairs
- may be empty, not null; otherwise must alternate between FakeLanguageGen and number (weight) elements- Returns:
- a FakeLanguageGen produced by mixing this with any FakeLanguageGen arguments by the given weights
-
mixAll
Produces a FakeLanguageGen from a group of FakeLanguageGen parameters and the weights for those parameters. Requires the first Object in pairs to be a FakeLanguageGen, the next to be a number of some kind that will be the weight for the previous FakeLanguageGen(this method can handle non-Double weights, and converts them to Double if needed), and every two parameters after that to follow the same order and pattern (FakeLanguageGen, then number, then FakeLanguageGen, then number...). There should be at least 4 elements in pairs, half of them languages and half of them weights, for this to do any mixing, but it can produce a result with as little as one FakeLanguageGen (returning a copy of the first FakeLanguageGen). Weights are absolute, and don't depend on earlier weights, which is the case when chaining themix(FakeLanguageGen, double)
method. This makes reasoning about the ideal weights for multiple mixed languages easier; to mix 3 languages equally you can use 3 equal weights with this, whereas with mix chaining you would need to mix the first two with 0.5 and the third with 0.33 .
This is probably the most intuitive way to mix languages here, though there's alsomix(double, FakeLanguageGen, double, Object...)
, which is very similar but doesn't take its parameters in quite the same way (it isn't static, and treats the FakeLanguageGen object like the first item in pairs here). Used internally in the deserialization code.- Parameters:
pairs
- should have at least one item, and must alternate between FakeLanguageGen and number (weight) elements- Returns:
- a FakeLanguageGen produced by mixing any FakeLanguageGen arguments by the given weights
-
addAccents
Produces a new FakeLanguageGen like this one but with extra vowels and/or consonants possible, adding from a wide selection of accented vowels (if vowelInfluence is above 0.0) and/or consonants (if consonantInfluence is above 0.0). This may produce a gibberish-looking language with no rhyme or reason to the accents, and generally consonantInfluence should be very low if it is above 0 at all.- Parameters:
vowelInfluence
- between 0.0 and 1.0; if 0.0 will not affect vowels at allconsonantInfluence
- between 0.0 and 1.0; if 0.0 will not affect consonants at all- Returns:
- a new FakeLanguageGen with modifications to add accented vowels and/or consonants
-
removeAccents
Useful for cases with limited fonts, this produces a new FakeLanguageGen like this one but with all accented characters removed (including almost all non-ASCII Latin-alphabet characters, but only some Greek and Cyrillic characters). This will replace letters like "A with a ring" with just "A". Some of the letters chosen as replacements aren't exact matches.- Returns:
- a new FakeLanguageGen like this one but without accented letters
-
getName
Returns the name of this FakeLanguageGen, such as "English" or "Deep Speech", if one was registered for this. In the case of hybrid languages produced bymix(FakeLanguageGen, double)
or related methods, this should produce a String like "English/French" (or "English/French/Maori" if more are mixed together). If no name was registered, this will return "Nameless Language".- Returns:
- the human-readable name of this language, or "Nameless Language" if none is known
-
addModifiers
Adds the specified Modifier objects from a Collection to a copy of this FakeLanguageGen and returns it. You can obtain a Modifier with the static constants in the FakeLanguageGen.Modifier nested class, the FakeLanguageGen.modifier() method, or Modifier's constructor.- Parameters:
mods
- an array or vararg of Modifier objects- Returns:
- a copy of this with the Modifiers added
-
addModifiers
Adds the specified Modifier objects to a copy of this FakeLanguageGen and returns it. You can obtain a Modifier with the static constants in the FakeLanguageGen.Modifier nested class, the FakeLanguageGen.modifier() method, or Modifier's constructor.- Parameters:
mods
- an array or vararg of Modifier objects- Returns:
- a copy of this with the Modifiers added
-
removeModifiers
Creates a copy of this FakeLanguageGen with no modifiers.- Returns:
- a copy of this FakeLanguageGen with modifiers removed.
-
modifier
Convenience method that just callsModifier(String, String)
.- Parameters:
pattern
- a String that will be interpreted as a regex pattern usingPattern
replacement
- a String that will be interpreted as a replacement string for pattern; can include "$1" and the like if pattern has groups- Returns:
- a Modifier that can be applied to a FakeLanguagGen
-
modifier
Convenience method that just callsModifier(String, String, double)
.- Parameters:
pattern
- a String that will be interpreted as a regex pattern usingPattern
replacement
- a String that will be interpreted as a replacement string for pattern; can include "$1" and the like if pattern has groupschance
- the chance, as a double between 0 and 1, that the Modifier will take effect- Returns:
- a Modifier that can be applied to a FakeLanguagGen
-
equals
-
hashCode
-
hash64
-
toString
-
copy
-
serializeToString
-
deserializeFromString
-