Class MarkovChar
java.lang.Object
com.github.yellowstonegames.text.MarkovChar
A simple Markov chain text generator; call
analyze(CharSequence) once on a large sample text, then you can
call chain(long) many times to get odd-sounding "remixes" of the sample text, one char at a time. This is
meant to allow easy serialization of the necessary data to call chain(); if you can store the chars and
processed arrays in some serialized form, then you can reassign them to the same fields to avoid calling
analyze(). One way to do this conveniently is to use stringSerialize() after calling analyze() once and to
save the resulting String; then, rather than calling analyze() again on future runs, you would call
stringDeserialize(String) to create the MarkovChar without needing any repeated analysis.-
Field Summary
FieldsModifier and TypeFieldDescriptionchar[]All chars (case-sensitive and only counting chars that are letters in Unicode, plus"'") that this encountered during the latest call toanalyze(CharSequence).com.github.tommyettinger.ds.IntIntMapMap of all pairs of chars encountered to the position in the order they were encountered.int[][]Complicated data that mixes probabilities of chars using their indices incharsand the indices of char pairs inpairs, generated during the latest call toanalyze(CharSequence). -
Constructor Summary
ConstructorsConstructorDescriptionCreates an empty MarkovChar; you should callanalyze(CharSequence)before doing anything else with this new object. -
Method Summary
Modifier and TypeMethodDescriptionvoidanalyze(CharSequence corpus) This is the main necessary step before using a MarkovChar; you must call this method at some point before you can call any other methods.chain(long seed) Generate a word-like String based on the previously analyzed corpus text (usinganalyze(CharSequence)) that terminates when a non-letter character other than"'"is encountered, or once the length would be greater than 200 characters without stopping.chain(long seed, int maxLength) Generate a word-like String based on the previously analyzed corpus text (usinganalyze(CharSequence)) that terminates when a non-letter character other than"'"is encountered, or once the length would be greater thanmaxLengthcharacters without stopping.copy()static MarkovCharstringDeserialize(String data) Recreates an already-analyzed MarkovChar given a String produced bystringSerialize().Returns a representation of this MarkovChar as a String; usestringDeserialize(String)to get a MarkovChar back from this String.
-
Field Details
-
chars
public char[] charsAll chars (case-sensitive and only counting chars that are letters in Unicode, plus"'") that this encountered during the latest call toanalyze(CharSequence). Will be null ifanalyze(CharSequence)was never called. -
pairs
public com.github.tommyettinger.ds.IntIntMap pairsMap of all pairs of chars encountered to the position in the order they were encountered. Pairs are stored using their 16-bitcharsindices placed into the most-significant bits for the first char and the least-significant bits for the second char. The size of this IntIntOrderedMap is likely to be larger than the char arraychars, but should be equal toprocessed.length. Will be null ifanalyze(CharSequence)was never called. -
processed
public int[][] processedComplicated data that mixes probabilities of chars using their indices incharsand the indices of char pairs inpairs, generated during the latest call toanalyze(CharSequence). This is a jagged 2D array. Will be null ifanalyze(CharSequence)was never called.
-
-
Constructor Details
-
MarkovChar
public MarkovChar()Creates an empty MarkovChar; you should callanalyze(CharSequence)before doing anything else with this new object.
-
-
Method Details
-
analyze
This is the main necessary step before using a MarkovChar; you must call this method at some point before you can call any other methods. You can serialize this MarkovChar after calling to avoid needing to call this again on later runs, or even include serialized MarkovChar objects with a game to only need to call this during pre-processing. This method analyzes the pairings of chars in a (typically large) corpus text. It only uses two preceding chars to determine the subsequent char. When it finishes processing, it stores the results incharsandprocessed, which allows other methods to be called (they will throw aNullPointerExceptionif analyze() hasn't been called).- Parameters:
corpus- a typically-large sample text in the style that should be mimicked
-
chain
Generate a word-like String based on the previously analyzed corpus text (usinganalyze(CharSequence)) that terminates when a non-letter character other than"'"is encountered, or once the length would be greater than 200 characters without stopping.- Parameters:
seed- the seed for the random decisions this makes, as a long; any long can be used- Returns:
- a word generated from the analyzed corpus text's char placement
-
chain
Generate a word-like String based on the previously analyzed corpus text (usinganalyze(CharSequence)) that terminates when a non-letter character other than"'"is encountered, or once the length would be greater thanmaxLengthcharacters without stopping.- Parameters:
seed- the seed for the random decisions this makes, as a long; any long can be usedmaxLength- the maximum length for the generated String, in number of characters- Returns:
- a word generated from the analyzed corpus text's char placement
-
stringSerialize
Returns a representation of this MarkovChar as a String; usestringDeserialize(String)to get a MarkovChar back from this String. Thecharsandprocessedfields must have been given values by either direct assignment, callinganalyze(CharSequence), or building this MarkovTest with the aforementioned destringSerialize method. Separates items using commas and semicolons. Uses tabs to separate fields.- Returns:
- a String that can be used to store the analyzed chars and frequencies in this MarkovChar
-
stringDeserialize
Recreates an already-analyzed MarkovChar given a String produced bystringSerialize().- Parameters:
data- a String returned bystringSerialize()- Returns:
- a MarkovChar that is ready to generate text with
chain(long)
-
copy
-