Package squidpony
Class MarkovChar
java.lang.Object
squidpony.MarkovChar
- All Implemented Interfaces:
Serializable
public class MarkovChar extends Object implements Serializable
A simple Markov chain text generator; call
Created by Tommy Ettinger on 1/30/2018.
analyze(CharSequence)
once on a large sample text, then you can
call chain(long)
many times to get odd-sounding "remixes" of the sample text. This is meant to allow easy
serialization of the necessary data to call chain(); if you can store the chars
and processed
arrays in some serialized form, then you can reassign them to the same fields to avoid calling analyze(). One way to
do this conveniently is to use serializeToString()
after calling analyze() once and to save the resulting
String; then, rather than calling analyze() again on future runs, you would call
deserializeFromString(String)
to create the MarkovText without needing any repeated analysis.
Created by Tommy Ettinger on 1/30/2018.
- See Also:
- Serialized Form
-
Field Summary
Fields Modifier and Type Field Description char[]
chars
All chars (case-sensitive and only counting chars that are letters in Unicode) that this encountered during the latest call toanalyze(CharSequence)
.IntIntOrderedMap
pairs
Map of all pairs of chars encountered to the position in the order they were encountered.int[][]
processed
Complicated data that mixes probabilities of chars using their indices inchars
and the indices of char pairs inpairs
, generated during the latest call toanalyze(CharSequence)
. -
Constructor Summary
Constructors Constructor Description MarkovChar()
-
Method Summary
Modifier and Type Method Description void
analyze(CharSequence corpus)
This is the main necessary step before using a MarkovText; you must call this method at some point before you can call any other methods.String
chain(long seed)
Generate a roughly-sentence-sized piece of text based on the previously analyzed corpus text (usinganalyze(CharSequence)
) that terminates when stop punctuation is used (".", "!", "?", or "..."), or once the length would be greater than 200 characters without encountering stop punctuation(it terminates such a sentence with "." or "...").String
chain(long seed, int maxLength)
Generate a roughly-sentence-sized piece of text based on the previously analyzed corpus text (usinganalyze(CharSequence)
) that terminates when stop punctuation is used (".", "!", "?", or "...") or once the maxLength would be exceeded by any other words (it terminates such a sentence with "." or "...").MarkovChar
copy()
static MarkovChar
deserializeFromString(String data)
Recreates an already-analyzed MarkovText given a String produced byserializeToString()
.String
serializeToString()
Returns a representation of this MarkovText as a String; usedeserializeFromString(String)
to get a MarkovText back from this String.
-
Field Details
-
chars
All chars (case-sensitive and only counting chars that are letters in Unicode) that this encountered during the latest call toanalyze(CharSequence)
. Will be null ifanalyze(CharSequence)
was never called. -
pairs
Map of all pairs of chars encountered to the position in the order they were encountered. Pairs are stored using their 16-bitchars
indices placed into the most-significant bits for the first word and the least-significant bits for the second word. The size of this IntIntOrderedMap is likely to be larger than the char arraychars
, but should be equal toprocessed.length
. Will be null ifanalyze(CharSequence)
was never called. -
processed
Complicated data that mixes probabilities of chars using their indices inchars
and the indices of char pairs inpairs
, generated during the latest call toanalyze(CharSequence)
. This is a jagged 2D array. Will be null ifanalyze(CharSequence)
was never called.
-
-
Constructor Details
-
MarkovChar
public MarkovChar()
-
-
Method Details
-
analyze
This is the main necessary step before using a MarkovText; you must call this method at some point before you can call any other methods. You can serialize this MarkovText after calling to avoid needing to call this again on later runs, or even include serialized MarkovText objects with a game to only need to call this during pre-processing. This method analyzes the pairings of words in a (typically large) corpus text, including some punctuation as part of words and some kinds as their own "words." It only uses one preceding word to determine the subsequent word. When it finishes processing, it stores the results inchars
andprocessed
, which allows other methods to be called (they will throw aNullPointerException
if analyze() hasn't been called).- Parameters:
corpus
- a typically-large sample text in the style that should be mimicked
-
chain
Generate a roughly-sentence-sized piece of text based on the previously analyzed corpus text (usinganalyze(CharSequence)
) that terminates when stop punctuation is used (".", "!", "?", or "..."), or once the length would be greater than 200 characters without encountering stop punctuation(it terminates such a sentence with "." or "...").- Parameters:
seed
- the seed for the random decisions this makes, as a long; any long can be used- Returns:
- a String generated from the analyzed corpus text's word placement, usually a small sentence
-
chain
Generate a roughly-sentence-sized piece of text based on the previously analyzed corpus text (usinganalyze(CharSequence)
) that terminates when stop punctuation is used (".", "!", "?", or "...") or once the maxLength would be exceeded by any other words (it terminates such a sentence with "." or "...").- Parameters:
seed
- the seed for the random decisions this makes, as a long; any long can be usedmaxLength
- the maximum length for the generated String, in number of characters- Returns:
- a String generated from the analyzed corpus text's word placement, usually a small sentence
-
serializeToString
Returns a representation of this MarkovText as a String; usedeserializeFromString(String)
to get a MarkovText back from this String. Thechars
andprocessed
fields must have been given values by either direct assignment, callinganalyze(CharSequence)
, or building this MarkovTest with the aforementioned deserializeToString method. Uses spaces to separate words and a tab to separate the two fields.- Returns:
- a String that can be used to store the analyzed words and frequencies in this MarkovText
-
deserializeFromString
Recreates an already-analyzed MarkovText given a String produced byserializeToString()
.- Parameters:
data
- a String returned byserializeToString()
- Returns:
- a MarkovText that is ready to generate text with
chain(long)
-
copy
Copies the String arraychars
and the 2D jagged int arrayprocessed
into a new MarkovText. None of the arrays will be equivalent references, but the Strings (being immutable) will be the same objects in both MarkovText instances.- Returns:
- a copy of this MarkovText
-