Class MarkovChar

java.lang.Object
com.github.yellowstonegames.text.MarkovChar

@Beta public class MarkovChar extends Object
A simple Markov chain text generator; call analyze(CharSequence) once on a large sample text, then you can call chain(long) many times to get odd-sounding "remixes" of the sample text, one char at a time. This is meant to allow easy serialization of the necessary data to call chain(); if you can store the chars and processed arrays in some serialized form, then you can reassign them to the same fields to avoid calling analyze(). One way to do this conveniently is to use stringSerialize() after calling analyze() once and to save the resulting String; then, rather than calling analyze() again on future runs, you would call stringDeserialize(String) to create the MarkovChar without needing any repeated analysis.
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    char[]
    All chars (case-sensitive and only counting chars that are letters in Unicode, plus "'") that this encountered during the latest call to analyze(CharSequence).
    com.github.tommyettinger.ds.IntIntMap
    Map of all pairs of chars encountered to the position in the order they were encountered.
    int[][]
    Complicated data that mixes probabilities of chars using their indices in chars and the indices of char pairs in pairs, generated during the latest call to analyze(CharSequence).
  • Constructor Summary

    Constructors
    Constructor
    Description
    Creates an empty MarkovChar; you should call analyze(CharSequence) before doing anything else with this new object.
  • Method Summary

    Modifier and Type
    Method
    Description
    void
    This is the main necessary step before using a MarkovChar; you must call this method at some point before you can call any other methods.
    chain(long seed)
    Generate a word-like String based on the previously analyzed corpus text (using analyze(CharSequence)) that terminates when a non-letter character other than "'" is encountered, or once the length would be greater than 200 characters without stopping.
    chain(long seed, int maxLength)
    Generate a word-like String based on the previously analyzed corpus text (using analyze(CharSequence)) that terminates when a non-letter character other than "'" is encountered, or once the length would be greater than maxLength characters without stopping.
    Copies the char array chars, the IntIntMap pairs, and the 2D jagged int array processed into a new MarkovChar.
    static MarkovChar
    Recreates an already-analyzed MarkovChar given a String produced by stringSerialize().
    Returns a representation of this MarkovChar as a String; use stringDeserialize(String) to get a MarkovChar back from this String.

    Methods inherited from class Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

    • chars

      public char[] chars
      All chars (case-sensitive and only counting chars that are letters in Unicode, plus "'") that this encountered during the latest call to analyze(CharSequence). Will be null if analyze(CharSequence) was never called.
    • pairs

      public com.github.tommyettinger.ds.IntIntMap pairs
      Map of all pairs of chars encountered to the position in the order they were encountered. Pairs are stored using their 16-bit chars indices placed into the most-significant bits for the first char and the least-significant bits for the second char. The size of this IntIntOrderedMap is likely to be larger than the char array chars, but should be equal to processed.length. Will be null if analyze(CharSequence) was never called.
    • processed

      public int[][] processed
      Complicated data that mixes probabilities of chars using their indices in chars and the indices of char pairs in pairs, generated during the latest call to analyze(CharSequence). This is a jagged 2D array. Will be null if analyze(CharSequence) was never called.
  • Constructor Details

    • MarkovChar

      public MarkovChar()
      Creates an empty MarkovChar; you should call analyze(CharSequence) before doing anything else with this new object.
  • Method Details

    • analyze

      public void analyze(CharSequence corpus)
      This is the main necessary step before using a MarkovChar; you must call this method at some point before you can call any other methods. You can serialize this MarkovChar after calling to avoid needing to call this again on later runs, or even include serialized MarkovChar objects with a game to only need to call this during pre-processing. This method analyzes the pairings of chars in a (typically large) corpus text. It only uses two preceding chars to determine the subsequent char. When it finishes processing, it stores the results in chars and processed, which allows other methods to be called (they will throw a NullPointerException if analyze() hasn't been called).
      Parameters:
      corpus - a typically-large sample text in the style that should be mimicked
    • chain

      public String chain(long seed)
      Generate a word-like String based on the previously analyzed corpus text (using analyze(CharSequence)) that terminates when a non-letter character other than "'" is encountered, or once the length would be greater than 200 characters without stopping.
      Parameters:
      seed - the seed for the random decisions this makes, as a long; any long can be used
      Returns:
      a word generated from the analyzed corpus text's char placement
    • chain

      public String chain(long seed, int maxLength)
      Generate a word-like String based on the previously analyzed corpus text (using analyze(CharSequence)) that terminates when a non-letter character other than "'" is encountered, or once the length would be greater than maxLength characters without stopping.
      Parameters:
      seed - the seed for the random decisions this makes, as a long; any long can be used
      maxLength - the maximum length for the generated String, in number of characters
      Returns:
      a word generated from the analyzed corpus text's char placement
    • stringSerialize

      public String stringSerialize()
      Returns a representation of this MarkovChar as a String; use stringDeserialize(String) to get a MarkovChar back from this String. The chars and processed fields must have been given values by either direct assignment, calling analyze(CharSequence), or building this MarkovTest with the aforementioned destringSerialize method. Separates items using commas and semicolons. Uses tabs to separate fields.
      Returns:
      a String that can be used to store the analyzed chars and frequencies in this MarkovChar
    • stringDeserialize

      public static MarkovChar stringDeserialize(String data)
      Recreates an already-analyzed MarkovChar given a String produced by stringSerialize().
      Parameters:
      data - a String returned by stringSerialize()
      Returns:
      a MarkovChar that is ready to generate text with chain(long)
    • copy

      public MarkovChar copy()
      Copies the char array chars, the IntIntMap pairs, and the 2D jagged int array processed into a new MarkovChar. None of the arrays or objects will be equivalent references (this is a deep copy).
      Returns:
      a copy of this MarkovChar