Class StringTools

java.lang.Object
com.github.yellowstonegames.core.StringTools

public final class StringTools extends Object
Various utility functions for handling readable natural-language text. This has tools to wrap long CharSequences to fit in a maximum width, and generally tidy up generated text. This last step includes padding left and right (including a "strict" option that truncates Strings that are longer than the padded size), Capitalizing Each Word, Capitalizing the first word in a sentence, replacing "a improper usage of a" with "an improved replacement using an," etc. This also has a lot of predefined categories of chars that are considered widely enough supported by fonts, like COMMON_PUNCTUATION and LATIN_LETTERS_UPPER.
  • Field Details

    • whitespacePattern

      public static final regexodus.Pattern whitespacePattern
    • nonSpacePattern

      public static final regexodus.Pattern nonSpacePattern
    • PERMISSIBLE_CHARS

      public static final String PERMISSIBLE_CHARS
      A constant containing only chars that are reasonably likely to be supported by broad fonts and thus display-able. This assumes the font supports Latin, Greek, and Cyrillic alphabets, with good support for extended Latin (at least for European languages) but not required to be complete enough to support the very large Vietnamese set of extensions to Latin, nor to support any International Phonetic Alphabet (IPA) chars. It also assumes box drawing characters are supported and a handful of common dingbats, such as male and female signs. It does not include the tab, newline, or carriage return characters, since these don't usually make sense on a grid of chars.
      See Also:
    • BOX_DRAWING_SINGLE

      public static final String BOX_DRAWING_SINGLE
      See Also:
    • BOX_DRAWING_DOUBLE

      public static final String BOX_DRAWING_DOUBLE
      See Also:
    • BOX_DRAWING

      public static final String BOX_DRAWING
      See Also:
    • VISUAL_SYMBOLS

      public static final String VISUAL_SYMBOLS
      See Also:
    • DIGITS

      public static final String DIGITS
      See Also:
    • MARKS

      public static final String MARKS
      See Also:
    • GROUPING_SIGNS_OPEN

      public static final String GROUPING_SIGNS_OPEN
      Can be used to match an index with one in GROUPING_SIGNS_CLOSE to find the closing char (this way only).
      See Also:
    • GROUPING_SIGNS_CLOSE

      public static final String GROUPING_SIGNS_CLOSE
      An index in GROUPING_SIGNS_OPEN can be used here to find the closing char for that opening one.
      See Also:
    • COMMON_PUNCTUATION

      public static final String COMMON_PUNCTUATION
      See Also:
    • MODERN_PUNCTUATION

      public static final String MODERN_PUNCTUATION
      See Also:
    • UNCOMMON_PUNCTUATION

      public static final String UNCOMMON_PUNCTUATION
      See Also:
    • TECHNICAL_PUNCTUATION

      public static final String TECHNICAL_PUNCTUATION
      See Also:
    • PUNCTUATION

      public static final String PUNCTUATION
      See Also:
    • CURRENCY

      public static final String CURRENCY
      See Also:
    • SPACING

      public static final String SPACING
      See Also:
    • ENGLISH_LETTERS_UPPER

      public static final String ENGLISH_LETTERS_UPPER
      See Also:
    • ENGLISH_LETTERS_LOWER

      public static final String ENGLISH_LETTERS_LOWER
      See Also:
    • ENGLISH_LETTERS

      public static final String ENGLISH_LETTERS
      See Also:
    • LATIN_EXTENDED_LETTERS_UPPER

      public static final String LATIN_EXTENDED_LETTERS_UPPER
      See Also:
    • LATIN_EXTENDED_LETTERS_LOWER

      public static final String LATIN_EXTENDED_LETTERS_LOWER
      See Also:
    • LATIN_EXTENDED_LETTERS

      public static final String LATIN_EXTENDED_LETTERS
      See Also:
    • LATIN_LETTERS_UPPER

      public static final String LATIN_LETTERS_UPPER
      See Also:
    • LATIN_LETTERS_LOWER

      public static final String LATIN_LETTERS_LOWER
      See Also:
    • LATIN_LETTERS

      public static final String LATIN_LETTERS
      See Also:
    • GREEK_LETTERS_UPPER

      public static final String GREEK_LETTERS_UPPER
      Includes the letter Sigma, 'Σ', twice because it has two lower-case forms in GREEK_LETTERS_LOWER. This lets you use one index for both lower and upper case, like with Latin and Cyrillic.
      See Also:
    • GREEK_LETTERS_LOWER

      public static final String GREEK_LETTERS_LOWER
      Includes both lower-case forms for Sigma, 'ς' and 'σ', but this matches the two upper-case Sigma in GREEK_LETTERS_UPPER. This lets you use one index for both lower and upper case, like with Latin and Cyrillic.
      See Also:
    • GREEK_LETTERS

      public static final String GREEK_LETTERS
      See Also:
    • CYRILLIC_LETTERS_UPPER

      public static final String CYRILLIC_LETTERS_UPPER
      See Also:
    • CYRILLIC_LETTERS_LOWER

      public static final String CYRILLIC_LETTERS_LOWER
      See Also:
    • CYRILLIC_LETTERS

      public static final String CYRILLIC_LETTERS
      See Also:
    • LETTERS_UPPER

      public static final String LETTERS_UPPER
      See Also:
    • LETTERS_LOWER

      public static final String LETTERS_LOWER
      See Also:
    • LETTERS

      public static final String LETTERS
      See Also:
    • LETTERS_AND_NUMBERS

      public static final String LETTERS_AND_NUMBERS
      See Also:
    • ALL_UNICODE_LETTER_SET

      public static final com.github.tommyettinger.ds.CharBitSetFixedSize ALL_UNICODE_LETTER_SET
      An OffsetBitSet containing every letter char in the Unicode BMP as an index. You can check if a char c is in this set with ALL_UNICODE_LETTER_SET.contains(c) .
    • ALL_UNICODE_UPPERCASE_LETTER_SET

      public static final com.github.tommyettinger.ds.CharBitSetFixedSize ALL_UNICODE_UPPERCASE_LETTER_SET
      An OffsetBitSet containing every upper-case letter char in the Unicode BMP as an index. You can check if a char c is in this set with ALL_UNICODE_UPPERCASE_LETTER_SET.contains(c) .
    • ALL_UNICODE_LOWERCASE_LETTER_SET

      public static final com.github.tommyettinger.ds.CharBitSetFixedSize ALL_UNICODE_LOWERCASE_LETTER_SET
      An OffsetBitSet containing every lower-case letter char in the Unicode BMP as an index. You can check if a char c is in this set with ALL_UNICODE_LOWERCASE_LETTER_SET.contains(c) .
  • Method Details

    • join

      @Deprecated public static String join(CharSequence delimiter, CharSequence... elements)
      Deprecated.
      Use TextTools.join(CharSequence, Object[]) instead.
      Parameters:
      delimiter -
      elements -
      Returns:
    • joinArrays

      public static String joinArrays(CharSequence delimiter, char[]... elements)
    • join

      public static String join(CharSequence delimiter, long... elements)
    • join

      public static String join(CharSequence delimiter, double... elements)
    • join

      public static String join(CharSequence delimiter, int... elements)
    • join

      public static String join(CharSequence delimiter, float... elements)
    • join

      public static String join(CharSequence delimiter, short... elements)
    • join

      public static String join(CharSequence delimiter, char... elements)
    • join

      public static String join(CharSequence delimiter, byte... elements)
    • join

      public static String join(CharSequence delimiter, boolean... elements)
    • joinReadably

      public static String joinReadably(CharSequence delimiter, long... elements)
      Like Base.join(CharSequence, long[]), but this appends an 'L' to each number, so they can be read in by Java. Replaced by Base.joinReadable(CharSequence, long[]) in most circumstances.
      Parameters:
      delimiter -
      elements -
      Returns:
    • appendJoinedReadably

      public static StringBuilder appendJoinedReadably(StringBuilder sb, CharSequence delimiter, long... elements)
      Like Base.appendJoined(CharSequence, CharSequence, long[]) , but this appends an 'L' to each number so they can be read in by Java. Replaced by Base.appendJoinedReadable(CharSequence, CharSequence, long[]). * @param sb
      Parameters:
      delimiter -
      elements -
      Returns:
    • appendJoined

      public static StringBuilder appendJoined(StringBuilder sb, CharSequence delimiter, CharSequence... elements)
    • appendJoinedArrays

      public static StringBuilder appendJoinedArrays(StringBuilder sb, CharSequence delimiter, char[]... elements)
    • appendJoined

      public static StringBuilder appendJoined(StringBuilder sb, CharSequence delimiter, long... elements)
    • appendJoined

      public static StringBuilder appendJoined(StringBuilder sb, CharSequence delimiter, double... elements)
    • appendJoined

      public static StringBuilder appendJoined(StringBuilder sb, CharSequence delimiter, int... elements)
    • appendJoined

      public static StringBuilder appendJoined(StringBuilder sb, CharSequence delimiter, float... elements)
    • appendJoined

      public static StringBuilder appendJoined(StringBuilder sb, CharSequence delimiter, short... elements)
    • appendJoined

      public static StringBuilder appendJoined(StringBuilder sb, CharSequence delimiter, char... elements)
    • appendJoined

      public static StringBuilder appendJoined(StringBuilder sb, CharSequence delimiter, byte... elements)
    • appendJoined

      public static StringBuilder appendJoined(StringBuilder sb, CharSequence delimiter, boolean... elements)
    • joinDense

      @Deprecated public static String joinDense(boolean... elements)
      Deprecated.
      Joins the boolean array elements without delimiters into a String, using "1" for true and "0" for false. This is "dense" because it doesn't have any delimiters between elements. Using TextTools.joinDense(boolean...) is recommended instead.
      Parameters:
      elements - an array or vararg of booleans
      Returns:
      a String using 1 for true elements and 0 for false, or the empty string if elements is null or empty
    • joinDense

      @Deprecated public static String joinDense(char t, char f, boolean... elements)
      Deprecated.
      Joins the boolean array elements without delimiters into a String, using the char t for true and the char f for false. This is "dense" because it doesn't have any delimiters between elements. Using TextTools.joinDense(char, char, boolean...) is recommended instead.
      Parameters:
      t - the char to write for true values
      f - the char to write for false values
      elements - an array or vararg of booleans
      Returns:
      a String using 1 for true elements and 0 for false, or the empty string if elements is null or empty
    • appendJoinedDense

      @Deprecated public static StringBuilder appendJoinedDense(StringBuilder sb, boolean... elements)
      Deprecated.
      Joins the boolean array elements without delimiters into a StringBuilder, using "1" for true and "0" for false. This is "dense" because it doesn't have any delimiters between elements. Using TextTools.appendJoinedDense(CharSequence, boolean...) is recommended instead.
      Parameters:
      sb - a StringBuilder that will be modified in-place
      elements - an array or vararg of booleans
      Returns:
      sb after modifications (if elements was non-null)
    • appendJoinedDense

      @Deprecated public static StringBuilder appendJoinedDense(StringBuilder sb, char t, char f, boolean... elements)
      Deprecated.
      Joins the boolean array elements without delimiters into a StringBuilder, using the char t for true and the char f for false. This is "dense" because it doesn't have any delimiters between elements. Using TextTools.appendJoinedDense(CharSequence, char, char, boolean...) is recommended instead.
      Parameters:
      sb - a StringBuilder that will be modified in-place
      t - the char to write for true values
      f - the char to write for false values
      elements - an array or vararg of booleans
      Returns:
      sb after modifications (if elements was non-null)
    • join

      @Deprecated public static String join(CharSequence delimiter, Object[] elements)
      Deprecated.
      Joins the items in elements by calling their toString method on them (or just using the String "null" for null items), and separating each item with delimiter. Unlike other join methods in this class, this does not take a vararg of Object items, since that would cause confusion with the overloads that take one object; it takes a non-vararg Object array instead. Using TextTools.join(CharSequence, Object[]) is recommended instead.
      Parameters:
      delimiter - the String or other CharSequence to separate items in elements with; if null, uses ""
      elements - the Object items to stringify and join into one String; if the array is null or empty, this returns an empty String, and if items are null, they are shown as "null"
      Returns:
      the String representations of the items in elements, separated by delimiter and put in one String
    • join

      @Deprecated public static String join(CharSequence delimiter, Iterable<?> elements)
      Deprecated.
      Joins the items in elements by calling their toString method on them (or just using the String "null" for null items), and separating each item with delimiter. This can take any Iterable of any type for its elements parameter. Using TextTools.join(CharSequence, Iterable) is recommended instead.
      Parameters:
      delimiter - the String or other CharSequence to separate items in elements with; if null, uses ""
      elements - the Object items to stringify and join into one String; if Iterable is null or empty, this returns an empty String, and if items are null, they are shown as "null"
      Returns:
      the String representations of the items in elements, separated by delimiter and put in one String
    • appendJoined

      @Deprecated public static StringBuilder appendJoined(StringBuilder sb, CharSequence delimiter, Object[] elements)
      Deprecated.
      Joins the items in elements by calling their toString method on them (or just using the String "null" for null items), and separating each item with delimiter. Unlike other join methods in this class, this does not take a vararg of Object items, since that would cause confusion with the overloads that take one object; it takes a non-vararg Object array instead. Using TextTools.appendJoined(CharSequence, CharSequence, Object[]) is recommended instead.
      Parameters:
      sb - a StringBuilder that will be modified in-place
      delimiter - the String or other CharSequence to separate items in elements with; if null, uses ""
      elements - the Object items to stringify and join into one String; if the array is null or empty, this returns an empty String, and if items are null, they are shown as "null"
      Returns:
      sb after modifications (if elements was non-null)
    • appendJoined

      @Deprecated public static StringBuilder appendJoined(StringBuilder sb, CharSequence delimiter, Iterable<?> elements)
      Deprecated.
      Joins the items in elements by calling their toString method on them (or just using the String "null" for null items), and separating each item with delimiter. This can take any Iterable of any type for its elements parameter. Using TextTools.appendJoined(CharSequence, CharSequence, Iterable) is recommended instead.
      Parameters:
      sb - a StringBuilder that will be modified in-place
      delimiter - the String or other CharSequence to separate items in elements with; if null, uses ""
      elements - the Object items to stringify and join into one String; if Iterable is null or empty, this returns an empty String, and if items are null, they are shown as "null"
      Returns:
      sb after modifications (if elements was non-null)
    • contains

      @Deprecated public static boolean contains(CharSequence text, CharSequence search)
      Deprecated.
      Searches text for the exact contents of the char array search; returns true if text contains search. Use TextTools.contains(CharSequence, CharSequence) instead.
      Parameters:
      text - a CharSequence, such as a String or StringBuilder, that might contain search
      search - a char array to try to find in text
      Returns:
      true if search was found
    • containsPart

      @Deprecated public static int containsPart(CharSequence text, CharSequence search)
      Deprecated.
      Tries to find as much of the char array search in the CharSequence text, always starting from the beginning of search (if the beginning isn't found, then it finds nothing), and returns the length of the found part of search (0 if not found). Use TextTools.containsPart(CharSequence, CharSequence) instead.
      Parameters:
      text - a CharSequence to search in
      search - a char array to look for
      Returns:
      the length of the searched-for char array that was found
    • contains

      @Deprecated public static boolean contains(CharSequence text, char[] search)
      Deprecated.
      Searches text for the exact contents of the char array search; returns true if text contains search. Use TextTools.contains(CharSequence, char[]) instead.
      Parameters:
      text - a CharSequence, such as a String or StringBuilder, that might contain search
      search - a char array to try to find in text
      Returns:
      true if search was found
    • containsPart

      @Deprecated public static int containsPart(CharSequence text, char[] search)
      Deprecated.
      Tries to find as much of the char array search in the CharSequence text, always starting from the beginning of search (if the beginning isn't found, then it finds nothing), and returns the length of the found part of search (0 if not found). Use TextTools.containsPart(CharSequence, char[]) instead.
      Parameters:
      text - a CharSequence to search in
      search - a char array to look for
      Returns:
      the length of the searched-for char array that was found
    • containsPart

      public static int containsPart(CharSequence text, char[] search, CharSequence prefix, CharSequence suffix)
      Tries to find as much of the sequence prefix search suffix as it can in text, where prefix and suffix are CharSequences for some reason and search is a char array. Returns the length of the sequence it was able to match, up to prefix.length() + search.length + suffix.length(), or 0 if no part of the looked-for sequence could be found.
      This is almost certainly too specific to be useful outside a handful of cases, but it isn't marked as deprecated because it was removed from TextTools. If you for whatever reason need this, it is here.
      Parameters:
      text - a CharSequence to search in
      search - a char array to look for, surrounded by prefix and suffix
      prefix - a mandatory prefix before search, separated for some weird optimization reason
      suffix - a mandatory suffix after search, separated for some weird optimization reason
      Returns:
      the length of the searched-for prefix+search+suffix that was found
    • replace

      @Deprecated public static String replace(CharSequence text, CharSequence before, CharSequence after)
      Deprecated.
      Use TextTools.replace(CharSequence, CharSequence, CharSequence) instead.
      Parameters:
      text -
      before -
      after -
      Returns:
    • count

      public static int count(String source, String search)
      Scans repeatedly in source for the String search, not scanning the same char twice except as part of a larger String, and returns the number of instances of search that were found, or 0 if source is null or if search is null or empty.
      Parameters:
      source - a String to look through
      search - a String to look for
      Returns:
      the number of times search was found in source
    • count

      public static int count(String source, int search)
      Scans repeatedly in source for the codepoint search (which is usually a char literal), not scanning the same section twice, and returns the number of instances of search that were found, or 0 if source is null.
      Parameters:
      source - a String to look through
      search - a codepoint or char to look for
      Returns:
      the number of times search was found in source
    • count

      public static int count(String source, String search, int startIndex, int endIndex)
      Scans repeatedly in source (only using the area from startIndex, inclusive, to endIndex, exclusive) for the String search, not scanning the same char twice except as part of a larger String, and returns the number of instances of search that were found, or 0 if source or search is null or if the searched area is empty. If endIndex is negative, this will search from startIndex until the end of the source.
      Parameters:
      source - a String to look through
      search - a String to look for
      startIndex - the first index to search through, inclusive
      endIndex - the last index to search through, exclusive; if negative this will search the rest of source
      Returns:
      the number of times search was found in source
    • count

      public static int count(String source, int search, int startIndex, int endIndex)
      Scans repeatedly in source (only using the area from startIndex, inclusive, to endIndex, exclusive) for the codepoint search (which is usually a char literal), not scanning the same section twice, and returns the number of instances of search that were found, or 0 if source is null or if the searched area is empty. If endIndex is negative, this will search from startIndex until the end of the source.
      Parameters:
      source - a String to look through
      search - a codepoint or char to look for
      startIndex - the first index to search through, inclusive
      endIndex - the last index to search through, exclusive; if negative this will search the rest of source
      Returns:
      the number of times search was found in source
    • safeSubstring

      public static String safeSubstring(String source, int beginIndex, int endIndex)
      Like String.substring(int, int) but returns "" instead of throwing any sort of Exception. This delegates to TextTools.safeSubstring(String, int, int).
      Parameters:
      source - the String to get a substring from
      beginIndex - the first index, inclusive; will be treated as 0 if negative
      endIndex - the index after the last character (exclusive); if negative this will be source.length()
      Returns:
      the substring of source between beginIndex and endIndex, or "" if any parameters are null/invalid
    • split

      public static String[] split(String source, String delimiter)
      Like String.split(String) but doesn't use any regex for splitting (the delimiter is a literal String).
      Parameters:
      source - the String to get split-up substrings from
      delimiter - the literal String to split on (not a regex); will not be included in the returned String array
      Returns:
      a String array consisting of at least one String (the entirety of Source if nothing was split)
    • padRight

      public static String padRight(String text, int minimumLength)
      If text is shorter than the given minimumLength, returns a String with text padded on the right with spaces until it reaches that length; otherwise it simply returns text.
      Parameters:
      text - the text to pad if necessary
      minimumLength - the minimum length of String to return
      Returns:
      text, potentially padded with spaces to reach the given minimum length
    • padRight

      public static String padRight(String text, char padChar, int minimumLength)
      If text is shorter than the given minimumLength, returns a String with text padded on the right with padChar until it reaches that length; otherwise it simply returns text.
      Parameters:
      text - the text to pad if necessary
      padChar - the char to use to pad text, if necessary
      minimumLength - the minimum length of String to return
      Returns:
      text, potentially padded with padChar to reach the given minimum length
    • padRightStrict

      public static String padRightStrict(String text, int totalLength)
      Constructs a String with exactly the given totalLength by taking text (or a substring of it) and padding it on its right side with spaces until totalLength is reached. If text is longer than totalLength, this only uses the portion of text needed to fill totalLength, and no more.
      Parameters:
      text - the String to pad if necessary, or truncate if too long
      totalLength - the exact length of String to return
      Returns:
      a String with exactly totalLength for its length, made from text and possibly extra spaces
    • padRightStrict

      public static String padRightStrict(String text, char padChar, int totalLength)
      Constructs a String with exactly the given totalLength by taking text (or a substring of it) and padding it on its right side with padChar until totalLength is reached. If text is longer than totalLength, this only uses the portion of text needed to fill totalLength, and no more.
      Parameters:
      text - the String to pad if necessary, or truncate if too long
      padChar - the char to use to fill any remaining length
      totalLength - the exact length of String to return
      Returns:
      a String with exactly totalLength for its length, made from text and possibly padChar
    • padLeft

      public static String padLeft(String text, int minimumLength)
      If text is shorter than the given minimumLength, returns a String with text padded on the left with spaces until it reaches that length; otherwise it simply returns text.
      Parameters:
      text - the text to pad if necessary
      minimumLength - the minimum length of String to return
      Returns:
      text, potentially padded with spaces to reach the given minimum length
    • padLeft

      public static String padLeft(String text, char padChar, int minimumLength)
      If text is shorter than the given minimumLength, returns a String with text padded on the left with padChar until it reaches that length; otherwise it simply returns text.
      Parameters:
      text - the text to pad if necessary
      padChar - the char to use to pad text, if necessary
      minimumLength - the minimum length of String to return
      Returns:
      text, potentially padded with padChar to reach the given minimum length
    • padLeftStrict

      public static String padLeftStrict(String text, int totalLength)
      Constructs a String with exactly the given totalLength by taking text (or a substring of it) and padding it on its left side with spaces until totalLength is reached. If text is longer than totalLength, this only uses the portion of text needed to fill totalLength, and no more.
      Parameters:
      text - the String to pad if necessary, or truncate if too long
      totalLength - the exact length of String to return
      Returns:
      a String with exactly totalLength for its length, made from text and possibly extra spaces
    • padLeftStrict

      public static String padLeftStrict(String text, char padChar, int totalLength)
      Constructs a String with exactly the given totalLength by taking text (or a substring of it) and padding it on its left side with padChar until totalLength is reached. If text is longer than totalLength, this only uses the portion of text needed to fill totalLength, and no more.
      Parameters:
      text - the String to pad if necessary, or truncate if too long
      padChar - the char to use to fill any remaining length
      totalLength - the exact length of String to return
      Returns:
      a String with exactly totalLength for its length, made from text and possibly padChar
    • wrap

      public static List<String> wrap(CharSequence longText, int width)
      Word-wraps the given String (or other CharSequence, such as a StringBuilder) so it is split into zero or more Strings as lines of text, with the given width as the maximum width for a line. This correctly splits most (all?) text in European languages on spaces (treating all whitespace characters matched by the regex '\\s' as breaking), and also uses the English-language rule (probably used in other languages as well) of splitting on hyphens and other dash characters (Unicode category Pd) in the middle of a word. This means for a phrase like "UN Secretary General Ban-Ki Moon", if the width was 12, then the Strings in the List returned would be
      "UN Secretary"
      "General Ban-"
      "Ki Moon"
      
      Spaces are not preserved if they were used to split something into two lines, but dashes are.
      Parameters:
      longText - a probably-large piece of text that needs to be split into multiple lines with a max width
      width - the max width to use for any line, removing trailing whitespace at the end of a line
      Returns:
      a List of Strings for the lines after word-wrapping
    • wrap

      public static List<String> wrap(List<String> receiving, CharSequence longText, int width)
      Word-wraps the given String (or other CharSequence, such as a StringBuilder) so it is split into zero or more Strings as lines of text, with the given width as the maximum width for a line; appends the word-wrapped lines to the given List of Strings and does not create a new List. This correctly splits most (all?) text in European languages on spaces (treating all whitespace characters matched by the regex '\\s' as breaking), and also uses the English-language rule (probably used in other languages as well) of splitting on hyphens and other dash characters (Unicode category Pd) in the middle of a word. This means for a phrase like "UN Secretary General Ban-Ki Moon", if the width was 12, then the Strings in the List returned would be
      "UN Secretary"
      "General Ban-"
      "Ki Moon"
      
      Spaces are not preserved if they were used to split something into two lines, but dashes are.
      Parameters:
      receiving - the List of String to append the word-wrapped lines to
      longText - a probably-large piece of text that needs to be split into multiple lines with a max width
      width - the max width to use for any line, removing trailing whitespace at the end of a line
      Returns:
      the given receiving parameter, after appending the lines from word-wrapping
    • indexOf

      public static int indexOf(CharSequence text, regexodus.Pattern regex, int beginIndex)
    • indexOf

      public static int indexOf(CharSequence text, String regex, int beginIndex)
    • indexOf

      public static int indexOf(CharSequence text, regexodus.Pattern regex)
    • indexOf

      public static int indexOf(CharSequence text, String regex)
    • capitalize

      public static String capitalize(CharSequence original)
      Capitalizes Each Word In The Parameter original, Returning A New String.
      Parameters:
      original - a CharSequence, such as a StringBuilder or String, which could have CrAzY capitalization
      Returns:
      A String With Each Word Capitalized At The Start And The Rest In Lower Case
    • sentenceCase

      public static String sentenceCase(CharSequence original)
      Attempts to scan for sentences in original, capitalizes the first letter of each sentence, and otherwise leaves the CharSequence untouched as it returns it as a String. Sentences are detected with a crude heuristic of "does it have periods, exclamation marks, or question marks at the end, or does it reach the end of input? If yes, it's a sentence."
      Parameters:
      original - a CharSequence that is expected to contain sentence-like data that needs capitalization; existing upper-case letters will stay upper-case.
      Returns:
      a String where the first letter of each sentence (detected as best this can) is capitalized.
    • correctABeforeVowel

      public static String correctABeforeVowel(CharSequence text)
      A simple method that looks for any occurrences of the word 'a' followed by some non-zero amount of whitespace and then any vowel starting the following word (such as 'a item'), then replaces each such improper 'a' with 'an' (such as 'an item'). The regex used here isn't bulletproof, but it should be fairly robust, handling when you have multiple whitespace chars, different whitespace chars (like carriage return and newline), accented vowels in the following word (but not in the initial 'a', which is expected to use English spelling rules), and the case of the initial 'a' or 'A'. This also changes improper uses of "an" back to "a", such as by changing "an dog" to "a dog", or "an malevolent force" to "a malevolent force".
      Gotta love Regexodus; this is a two-liner that uses features specific to that regular expression library. This only matches text in the Latin script because a/an is a feature of English, and doesn't have a direct equivalent I know of in the Greek or Cyrillic scripts. There could easily be one! I just couldn't verify it.
      Parameters:
      text - the (probably generated English) multi-word text to search for 'a'/'an' in and possibly replace
      Returns:
      a new String with every improper 'a' and 'an' replaced
    • decompressCategory

      public static com.github.tommyettinger.ds.CharBitSetFixedSize decompressCategory(regexodus.Category category)
      Takes the compressed bitset inside a RegExodus Category and decompresses it to a jdkgdxds OffsetBitSet. This may improve lookup time for frequently-checked Categories, since OffsetBitSet.contains(int) is quite fast (it runs in O(1) time), while Category.contains(char) is... not as fast (it runs in O(n) time, where n is the RLE-compressed size of the entire bitset). An OffsetBitSet can also be modified if needed, whereas a Category cannot.
      Parameters:
      category - a RegExodus Category, such as Category.Lu for upper-case letters
      Returns:
      a new OffsetBitSet storing the same contents as the given Category, but optimized for faster access