--------------------------------------------------------------------------[BLO]-
/AUTOTEXT (special CLI command added by Bloody)
--------------------------------------------------------------------------------

Auto-convert a .bup|.idx|.ifo|.sub|.vob subtitle to .srt|.ssa|.ttxt, only asking
for user interaction if there is a new character to be added to the matrix.

To enable /AUTOTEXT mode, SubRip must be started using "/AUTOTEXT" as the very
first commandline arg, otherwise the "classic" commandline syntax will be used.

MSYS users need to use //AUTOTEXT instead of /AUTOTEXT to circumvent the
automatic root path expansion through MSYS (where using /AUTOTEXT would cause
SubRip to abort without a proper error message explaining the problem).

Usage:

  SubRip /AUTOTEXT [options] <infile> <outfile> <matrix-file>

  (For a brief usage example, see "Examples:" below.)

  General Options:

    --force
    --no-force

      Overwrite an existing <outfile>. By default, the program would abort with
      an error msg if <outfile> already exists.

      Default: disabled

    --line-breaks <type>

      Specifies if <outfile> shall be created using DOS/Windows line breaks
      (0x0D + 0x0A) or Unix line breaks (0x0A) at the end of each line.
      By default, Unix line breaks will be used, making output files a bit
      smaller, including the overall filesize of a video which includes text
      subtitles as embedded stream(s) (.mkv files, for example).

      Virtually all software (including Windows software) that deals with text-
      type subtitles can handle Unix line breaks. If you're unlucky and some
      program you're using has a problem with Unix line breaks, use the "dos"
      (or "windows") <type> instead. It is also possible to quickly convert any
      subtitle text file from/to DOS|Unix format later on, by using dos2unix (or
      unix2dos). See also: "Notes:" (below).

      <type>: the following types are defined:

        "dos"     - use DOS  line breaks (0x0D + 0x0A)
        "unix"    - use Unix line breaks (0x0A)
        "windows" - synonym for "dos" (same type)

      Default: "unix"

    --r2l
    --no-r2l

      Enable right-to-left processing for languages like e.g. Arabic or Hebrew.

      Default: disabled

    --show-loadmatrix
    --no-show-loadmatrix

      In /AUTOTEXT mode, the progress window for loading a matrix is hidden by
      default. This option may be used to get it back. Otherwise just wait..

      Default: disabled

    --sortmatrix <mode>

      Specifies if and how to sort the matrix file before saving (if changed),
      where <mode> can be one of the following values:

        "latin"   - for max. subtitle conversion speed (latin-based languages)
        "random"  - for speed tests (font-independent)
        "none"    - don't sort the matrix

      Default: "none"

    --subtitle-index <index>

      Index of the subtitle stream to process, where 0 is the first stream.

      Min.:    0
      Max.:    31
      Default: 0

    --subtitle-language <language-code>

      Specifies the language of the subtitle. This might be used by the program
      for any purpose, although it currently only affects the Post-OCR I/l
      Correction filter.

      It is always recommended to specify the subtitle language even if you know
      for sure that the current SubRip version does not contain any specific
      code/filters for that language. By doing so, your scripts will benefit
      from future updates which might indeed add specific filters for that
      language.

      To provide a generalized interface for dealing with subtitle languages,
      /AUTOTEXT is using a global ISO 639-3 language code. Alternatively, codes
      from ISO 639-2/B|/T or 639-1 (including a few obsolete codes used by older
      DVDs) can also be used. For example:

        French: "fra", "fre" or "fr"
        German: "deu", "ger" or "de"
        Hebrew: "heb", "he"  or "iw" (where "iw" is one of those obsolete codes)

      A full list of international codes can be looked up here:

        http://www-01.sil.org/iso639-3/codes.asp

      Plain text versions are available here:

        http://www-01.sil.org/iso639-3/download.asp

      Now you can specify any language code you want (for example, "swe" for
      Swedish), and if there are no special filters for Swedish, the program
      will automatically use the same settings as for the default "und" value
      (Undetermined).

      Default: "und" (Undetermined), currently exactly equal to "eng" (English)

      See also: --ocrfix-il (below).

      See also: "Examples:" (below).

    --use-idx-file-offsets
    --no-use-idx-file-offsets

      Use file offsets (for images in .sub files) from the corresponding .idx
      file, if available.

      Although the factory default for this (global) option is True, i've only
      had bad results with this, so when using /AUTOTEXT mode, this setting is
      disabled by default but can be re-enabled using this option if needed.

      Ignored if <infile> is .bup|.ifo|.vob.

      Default: disabled

    --use-subtitle-map
    --no-use-subtitle-map

      Create and/or use a subtitle map (.srm) file. These files help speeding up
      subtitle scanning but are superfluous if you only want to convert a single
      subtitle from the same input file(s).

      Default: enabled

    --utf8-bom
    --no-utf8-bom

      Enable or disable adding a UTF-8 Byte Order Mark (BOM) to the given
      <outfile>. A UTF-8 BOM is used to tell other software that the file is
      encoded as UTF-8 instead of some 8-bit ASCII character set (Code Page).

      Most modern software has no problem with this, although there might be a
      few programs which do not recognize it, therefor this option can be
      disabled, if needed.

      Ignored if <outfile> is .ttxt (not required for XML).

      Default: enabled

    --wake-me-up
    --no-wake-me-up

      Enable or disable beeping/flashing after some idle time in the "Add new
      character" window.

      Default: enabled

    --warn-overlap
    --no-warn-overlap

      Show subtitle time overlap warning dialog(s). Disabling this option can be
      useful if your matrix file is likely complete and you wish to convert a
      large number of subtitles in absence without interruption, fixing possible
      overlap issues later with an external utility.

      Since v1.54.0: also includes warnings about negative timings.

      Default: enabled

    --
      Can be used as explicit end-of-options marker.

  Advanced OCR Options:

    These options can be used to tweak the OCR (text/character recognition)
    behaviour.

    --ocr-line-min-height <value>

      Minimum line exploration height.

      Min.:     0
      Max.:     45
      Default:  8

    --ocr-line-min-inter-height <value>

      Minimum inter line height.

      Min.:     1
      Max.:     50
      Default:  2

    --ocr-line-max-height <value>

      Maximum line height.

      Min.:     0
      Max.:     45
      Default:  45

    --ocr-char-min-width <value>

      Minimum character width.

      Min.:     1
      Max.:     45
      Default:  2

    --ocr-char-min-height <value>

      Minimum character height.

      Min.:     1
      Max.:     45
      Default:  2

    --ocr-check-detached-char
    --no-ocr-check-detached-char

      Enable/disable checking for slightly detached characters.

      Default: enabled

    --ocr-space-min-width <value>

      Minimum pixels between 2 characters to be considered as a SPC character.

      Min.:     -10
      Max.:     45
      Default:  7

    --ocr-char-diff-sensibility <value>

      Character difference sensibility.

      Min.:     0
      Max.:     1000
      Default:  980

    --ocr-char-max-width-diff <value>

      Maximum character width difference to be considered as the same character.

      Min.:     0
      Max.:     45
      Default:  2

    --ocr-char-max-height-diff <value>

      Maximum character height difference to be considered as the same
      character.

      Min.:     0
      Max.:     45
      Default:  2

    --ocr-char-max-top-diff <value>

      Maximum character top difference (line top relative) to be considered as
      the same character.

      Min.:     0
      Max.:     45
      Default:  6

  Post-OCR Correction Options:

    --ocrfix
    --no-ocrfix

      Post-OCR Correction: globally enable or disable Post-OCR text Correction.

      After parsing subtitles, there is usually a number of issues to be fixed,
      which can be accomplished using the Post-OCR Correction feature.

      If the --ocrfix option is used, further --ocrfix-* options can be added to
      specify if and how the individual correction filters shall be applied.

      If the --no-ocrfix option is used, all Post-OCR Correction will be
      completely disabled and any other --ocrfix-* options will be ignored.

      Default: enabled

    --ocrfix-capital-letters
    --no-ocrfix-capital-letters

      Post-OCR Correction: apply case corrections for single characters at the
      beginning of a word that follows a previous punctuation like ".!?:-".

      This usually won't change much since such mistakes are usually not the
      result of erroneous OCR parsing but rather mistakes by the author who
      wrote the subtitle in the first place.

      Note that although the program recognizes such cases even across subtitle
      entries, it occasionally happens that a case correction turns out to be
      wrong. However, such mistakes are quite rare, therefor this filter can be
      recommended.

      If the author of the subtitle is a superhuman who never makes mistakes and
      if your matrix is correct about upper/lowercase chars (other than the I/l
      problem) then this filter would be superfluous.

      Note that if this option is enabled, it will also automatically enable the
      --ocrfix-punctuation option in order to avoid mistakes caused by un-fixed
      punctuation issues.

      Default: enabled

    --ocrfix-capital-start-lower
    --no-ocrfix-capital-start-lower

      Post-OCR Correction: this option is applied only together with option
      --ocrfix-capital-letters. If enabled, all words will be converted entirely
      to lower-case before applying the --ocrfix-capital-letters corrections.

      This will convert everything to lower-case, including names, upper-case
      abbreviations, noise descriptions in SDH subtitles etc., which causes
      major changes to the entire text and is usually a bad idea. The result
      will only contain upper-case letters for words which start a new sentence.

      This option can be useful if the author of the subtitle keeps SHOUTING all
      the time (everything in all-uppercase) which is a strange habit of some
      people. Never seen in serious DVD productions, though.

      Ignored if --ocrfix-capital-letters is disabled.

      Default: disabled

    --ocrfix-format-whole-words-only
    --no-ocrfix-format-whole-words-only

      Post-OCR Correction: this filter fixes text formatting styles like bold,
      italic etc., so that only whole words are affected by a certain style.

      For example:
        "<i>Jupit</i>e<i>r</i>" would be converted to "<i>Jupiter</i>".

      In this example, the problem might be caused if you accidently forgot to
      tag that lower-case 'e' as italic. The filter includes several kinds of
      fixes of this nature and is therefor recommended.

      Default: enabled

    --ocrfix-il
    --no-ocrfix-il

      Post-OCR Correction: apply a number of individual corrections for cases
      where the letters 'I' and 'l' look so similar (or equal) that they are
      both mistakenly recognized as the same character. This is usually the case
      with most of the commonly used fonts, therefor this filter is enabled by
      default.

      Some of those corrections are meant to be used only for a certain language
      since they might have unwanted effects on text in other languages.
      Language-specific fixes are implemented for the following languages:

        ces|cze|cs (Czech)
        deu|ger|de (German)
        eng|en     (English)
        fra|fre|fr (French)
        ita|it     (Italian)
        pol|pl     (Polish)
        spa|es     (Spanish)

      Any language specified using the --subtitle-language option which is not
      listed above will cause the I/l filter to use the "und" (Undetermined)
      setting which is currently exactly identical with "eng" (English).

      Default: enabled

      See also: --subtitle-language (above).

      See also: "Examples:" (below).

    --ocrfix-orthography <dict-file>

      Post-OCR Correction: apply a number of individual corrections based on a
      user-defined list of from/to strings in the given <dict-file>. If this
      option is used, the dict-based filter will be applied at the very end
      after any other internal filter(s).

      Creating such files can be useful in order to define additional fixes for
      certain special cases which are not covered (or wrongly converted) by the
      internal filter(s).

      All dict files use a ".dic" file extension and contain an even number of
      text lines where each first (odd-numbered) line defines a search string
      and each second (even-numbered) line defines the corresponding replacement
      string, case-sensitive and not limited to whole words (any match will do).
      For Example:
        lnte
        inte
      ...would fix bad words like "lnternal", "lntegrity", "lntelligent", ...

      There are already a few files in the "Dict/User" folder of the SubRip
      standard distribution, but beware that some of these files might be
      partially outdated/obsolete ("English_Screwed" sounds suspicious to me..),
      so i'd suggest to check the result and if there are mistakes then take
      only what you believe is correct and put it into your own .dic files.

      Note that files in "Dict/Internal" are for internal use by SubRip and
      should not be modified. If you need something from there, create copies in
      "Dict/User" and go from there.

      Default: disabled

    --ocrfix-punctuation
    --no-ocrfix-punctuation

      Post-OCR Correction: enable or disable corrections for combinations of
      spaces, punctuation and other special characters.

      This filter also applies all fixes defined in "Dict/Internal/punct.dic".

      About the Percent (%) character issue: just enter what you see (Degree|
      Slash|Degree) and the punctuation filter will do the rest (the same
      applies to quotes: if you see a double-quote and the program only
      highlights one half of it, enter what you see, i.e., a single-quote and
      let the "Double Quotes" filter do the rest).

      This filter is fixing lots of things quite well, therefor recommended.

      Default: enabled

    --ocrfix-quotes
    --no-ocrfix-quotes

      Post-OCR Correction: fix problems with double/multiple quote characters.

      A common OCR parsing problem occurs when a " appears but the OCR
      recognizes only one half of it as a character, usually producing either
      two '' or "" in the translation. The filter for issues like this is quite
      simple and reliable, therefor recommended.

      (Tech: converts all ` and  to ', then all '' to " and then all "" to ").

      Default: enabled

    --ocrfix-spaces-between-numbers
    --no-ocrfix-spaces-between-numbers

      Post-OCR Correction: removes spaces between numeric expressions, like
      "1 0 0", "2 . 5", "1 / 3" and so on. Happens quite often when parsing
      certain fonts and is therefor recommended.

      Default: enabled

    --ocrfix-strip-cross-line-styles
    --no-ocrfix-strip-cross-line-styles

      Post-OCR Correction: strips redundant text style tags with a scope across
      text lines, like:

        from:
          <b><i>Text line 1...</i></b>
          <b><i>Text line 2...</i></b>
        to:
          <b><i>Text line 1...
          Text line 2...</i></b>

      This will create smaller output files which are also a bit easier to edit.

      Since there is no official .srt file format specification which explicitly
      forbids this, and because common media players have no problem with cross-
      line style scopes, this (newer) option is enabled by default.

      Default: enabled

  Output Format Options:

    These options are used with specific subtitle output formats.

    Some of these options specify default font attributes, but note that good
    media players can be configured to ignore/override whatever a subtitle tries
    to suggest/dictate and instead use what the user prefers. However, it is
    recommended to use something sane here so that people with simple media
    players can enjoy the look at least to a certain degree.

    Output Format Options are only applied if the current output format matches,
    otherwise they will be ignored. The current output format is automatically
    determined by the file extension of the given <outfile>.

    Currently the following output formats are supported by the /AUTOTEXT cmd:

      3GPP (.ttxt) ("3GPP Timed Text")

        Used with some videos for 3G-compatible mobile phones, and in some cases
        also for regular MPEG-4 (.mp4) videos.

      SubRip (.srt)

        By far the most commonly used subtitle format. Not overly feature-rich,
        but certainly good enough for most "standard" subtitles, with a simple
        and compact file format (small file size, easy to edit).

      SSA (.ssa) ("Sub Station Alpha")

        Used quite often in the past, although most people meanwhile use the
        SubRip (.srt) format instead. The SSA format might offer a few more
        styles/features, but in the end, a "standard" subtitle does not really
        need much more than regular/italic text, maybe text coordinates, maybe
        text colors, all of which is also covered by the SubRip (.srt) format.

    --out-3gpp-bg-color <RRGGBBAA>

      3GPP (.ttxt): default background color, in 8 hexadecimal digits for the
      red, green, blue and alpha channels, where FFFFFFFF is white, FF0000FF is
      red, 00FF0080 is half-transparent green, and so on.

      Default: 00000000 (100% transparent)

    --out-3gpp-font-color <RRGGBBAA>

      3GPP (.ttxt): color for the default font, in 8 hexadecimal digits for the
      red, green, blue and alpha channels, where FFFFFFFF is white, FF0000FF is
      red, 00FF0080 is half-transparent green, and so on.

      Default: FFFFFFFF (white, 100% opaque)

    --out-3gpp-font-name <name>

      3GPP (.ttxt): name of the default font (case-sensitive).

      Default: "Arial"

    --out-3gpp-font-size <size>

      3GPP (.ttxt): size of the default font.

      Default: 14

    --out-subrip-add-coordinates
    --no-out-subrip-add-coordinates

      SubRip (.srt): add X/Y coordinates for text items, based on where the DVD
      subtitle pictures are meant to appear. This might be used to preserve
      certain cases where some hard text can be seen in the video at the same
      place where a subtitle would appear, so the original subtitle author(s)
      decided to move that specific subtitle item to a different place.

      Note that this feature is a SubRip extension which was not included in
      early SubRip versions, therefor it might be possible that some old media
      players cannot handle such subtitles. However, it is also possible that
      such players would simply ignore those coordinates and still display the
      subtitle as usual (at the default position).

      Default: disabled

    --out-subrip-add-font-colors
    --no-out-subrip-add-font-colors

      SubRip (.srt): add font-color tags for subtitles with varying font colors.

      This color-fiddling is not very common, but i've seen a few in the past.
      Use this option if you wish to preserve such tags.

      Default: disabled

    --out-subrip-index-offset <offset>

      SubRip (.srt): add a given offset to all subtitle index numbers.

      This can be used to create a subtitle which shall later be concatenated
      with another .srt file to create a joined subtitle file.

      Min.:    0
      Max.:    100000
      Default: 0

    --out-ssa-default-style <parameter-list>

      SSA (.ssa): comma-separated list of parameters for the default text style.

      The <parameter-list> consists of the following parameters:

      Fontname,Fontsize,PrimaryColour,SecondaryColour,TertiaryColour,BackColour,
      Bold,Italic,BorderStyle,Outline,Shadow,Alignment,MarginL,MarginR,MarginV,
      AlphaLevel,Encoding

      The easiest way to create such a <parameter-list> is to run SubRip in GUI
      mode, parse one subtitle, then click on the Output Format button, select
      SSA (Sub Station Alpha), configure all parameters as desired, save the
      subtitle, search for a text line which begins with "Style: Style1," and
      then simply copy the rest of that text line (behind the "...Style1,".

      When you paste these parameters into your SubRip script file, remember to
      quote the font name if it contains any spaces.

      Default: "Arial",28,8454143,8454143,8454143,0,0,0,1,2,2,2,30,30,25,0,0

    --out-ssa-original-script <name>

      SSA (.ssa): name of the original author/source of the subtitle script.

      Default: "<unknown>"

    --out-ssa-title <title>

      SSA (.ssa): title of the subtitle script.

      Default: "<untitled>"

  Debug Options:

    These are used to print debug infos to the console error output (stderr).

    Note: Windows users need to redirect stderr output to a file, pipe or other
    character device because Windows GUI programs do not have any stdio streams
    available, even if launched from within CMD.exe (no problem under Wine).
    For example:
      SubRip.exe /AUTOTEXT [...] 2>D:\Work\Fun\AutoText-DebugLog.txt

    --debug-ocrfix-from <entry-number>
    --debug-ocrfix-to   <entry-number>

      These options can be used to limit Post-OCR debug output to a given range
      of subtitle entries.

      Default: no limit (all entries)

    --debug-ocrfix-il-all
    --debug-ocrfix-il-il
    --debug-ocrfix-il-li
    --debug-ocrfix-il-common-all
    --debug-ocrfix-il-common-il
    --debug-ocrfix-il-common-li
    --debug-ocrfix-il-language-all
    --debug-ocrfix-il-language-il
    --debug-ocrfix-il-language-li
    --debug-ocrfix-il-skipped

      Print debug infos for Post-OCR I/l correction filter(s).

      Use one or more of the above options to debug certain I/l filter types:

        --debug-ocrfix-il-all           all I>l and l>I filters
        --debug-ocrfix-il-il            all I>l filters
        --debug-ocrfix-il-li            all l>I filters
        --debug-ocrfix-il-common-all    language-independent I>l and l>I filters
        --debug-ocrfix-il-common-il     language-independent I>l filters
        --debug-ocrfix-il-common-li     language-independent l>I filters
        --debug-ocrfix-il-language-all  language-specific I>l and l>I filters
        --debug-ocrfix-il-language-il   language-specific I>l filters
        --debug-ocrfix-il-language-li   language-specific l>I filters
        --debug-ocrfix-il-skipped       not swapped or kept by any filter

      The "-skipped" variant prints debug infos for I/l chars which did not
      cause any swap or keep action by any filter, including filters not
      selected for debugging.

      The above debug options are ignored if the I/l filter is disabled.

  <infile>

    Input file (must be a .bup, .idx, .ifo, .sub, or .vob file).

    It is known that some .ifo files are very problematic due to the fact that
    some DVD manufacturers go to extraordinary lengths trying to keep everybody
    from processing their DVDs. So if you run into problems, then my advice is
    to use some external utility to extract DVD subtitles to VobSub format first
    (.sub/.idx files) and then use those files instead of the .ifo/.vob files.

    Although there are also a few cases where certain .idx/.sub files can be
    problematic, using .idx/.sub works at least much better than .ifo/vob, also
    regarding the time stamps for subtitle items which are sometimes incorrect
    if converted from .ifo/.vob files.

  <outfile>

    Output file (must be a .srt, .ssa or .ttxt file).

  <matrix-file>

    Character matrix file to load before proceeding (must be a .sum file).
    If the given matrix file does not exist, the program will create a new one.

    After finishing the conversion, the matrix file will be automatically saved
    if there were any changes.

Notes:

  Main Window:

    During /AUTOTEXT conversion, the program will minimize the main window so it
    won't get in the way and also to accelerate the overall processing speed.

  Output charset:

    /AUTOTEXT always saves .srt and .ssa output files in UTF-8 format (Unicode).
    If you need a different encoding (maybe you want Windows Latin1/CP1252 for
    some reason), iconv is your friend. For example:

      #
      # convert .srt file from UTF-8 to Windows-CodePage 1252 (Latin1):
      #
      iconv -f UTF8 -t CP1252 "movie-UTF8.eng.srt" >"movie-CP1252.eng.srt"

    For *nix, iconv is a standard package in pretty much any distribution.
    Windows users can download a WIN32 implementation here:
      https://code.google.com/p/win-iconv/downloads/list

  Output line breaks:

    If you find out (after subtitle conversion) that a certain Windows software
    you're using has a problem with Unix line breaks and you don't wish to
    re-convert all subtitles again (using the --line-breaks option), dos2unix
    (unix2dos) can be used to quickly convert all line breaks in any text file.
    For example:

      #
      # convert a .srt file from Unix to DOS/Windows
      #
      unix2dos -n "movie.eng.srt" "movie-dos.eng.srt"
      #
      # or just:
      #
      unix2dos "movie.eng.srt"

      #
      # convert a .ssa file from DOS/Windows to Unix:
      #
      dos2unix "movie.ita.ssa"

    Most *nix users should have a package available for dos2unix. If not, the
    program can be found here (including binaries for Windows):
      http://waterlan.home.xs4all.nl/dos2unix.html

    See also: --line-breaks (above)

Examples:

  #
  # Convert a subtitle from .sub to .srt, using standard Post-OCR Correction
  # settings with additional I/l correction for Danish ("dan").
  # Since there are no special filters for Danish yet, the program will
  # automatically use the "und" (Undetermined) filter which is currently exactly
  # identical with the "eng" (English) filter.
  #
  SubRip /AUTOTEXT --subtitle-language dan Fun.dan.sub Fun.dan.srt D:\ChMatrix\Fun.sum

  #
  # Convert the second subtitle stream in (a) .vob file(s) to .ssa, using
  # Post-OCR with additional I/l correction for the English language:
  #
  SubRip /AUTOTEXT --subtitle-index 1 --subtitle-language eng VTS_03_1.VOB Fun.eng.ssa D:\ChMatrix\Fun.sum

--------------------------------------------------------------------------------
[end] /AUTOTEXT
--------------------------------------------------------------------------------

--------------------------------------------------------------------------[BLO]-
/FINDMATRIX (special CLI command added by Bloody)
--------------------------------------------------------------------------------

This command is used to search for an existing matrix file which already
contains most (or all) glyphs for a given <infile>.

To enable /FINDMATRIX mode, SubRip must be started using "/FINDMATRIX" as the
very first commandline arg, otherwise the "classic" commandline syntax will be
used.

MSYS users need to use //FINDMATRIX instead of /FINDMATRIX to circumvent the
automatic root path expansion through MSYS (where using /FINDMATRIX would cause
SubRip to abort without a proper error message explaining the problem).

Usage:

  SubRip /FINDMATRIX [options] <infile> <matrix-dir>

  (For a brief usage example, see "Examples:" below.)

  The /FINDMATRIX command does not produce any output files, but only conducts a
  search and then prints the result(s) to stdout (console standard output), with
  filename(s) for positive matches printed with full (absolute) Windows path(s).
  Progress messages and other infos are printed to stderr.

  Introduction: the issue with huge Matrix files

  The purpose of this command is to allow using separate matrix files per font
  (or per DVD/series) in order to avoid growing huge ber matrix files with
  hundreds of fonts which would eventually lead to erroneous output, like
  confusion between J<>j, o<>O|0, V<>v etc. (and even worse).
  Attempting to 'fix' such a giant matrix can be quite difficult since it does
  not necessarily contain 'wrong' glyphs, but simply too many of them.

  To vamp up such a matrix, one could tighten some of the Advanced OCR options
  (see --ocr-char-*diff* options), e.g. by raising --ocr-char-diff-sensibility
  to 1000, but that in turn would require adding otherwise equal chars multiple
  times just because of subtle differences, which is not only more work but also
  increases the size of the matrix even more, requiring even tighter settings,
  and so on.

  Large matrix files also have the disadvantage that they cannot be split into
  multiple small matrix files because there is no way to tell which glyphs
  belong together (same font). Once a mess, always a mess.

  With the /FINDMATRIX command, matrix files can be kept separate (per-source),
  searched for, sorted out (by finding & removing 'doubles'), more easily fixed
  (or simply deleted), and even be shared between friends to build a collection.

  Another advantage of using small matrix files instead of huge ones is that
  subtitle conversion goes much faster, which can significantly improve batch
  conversion speed.

  Note that using separate matrix files also has it's disadvantages, regarding
  administration effort, i.e., having to run a /FINDMATRIX command before
  converting subtitle(s). At the end, one might still choose between using one
  large matrix or several small files (or even a combination of both).

  General Options:

    --popup-results
    --no-popup-results

      Use this option to enable/disable a final popup message window showing the
      result(s), in addition to the usual text output. Useful for Windows users
      who don't want to redirect output into a file.

      Default: disabled

    --r2l
    --no-r2l

      Enable right-to-left processing for languages like e.g. Arabic or Hebrew.

      If a matrix file was created using R2L processing, this option might be
      used here, too, especially for cases where two characters are placed so
      close together in the subtitle image that the OCR recognizes them as one
      single glyph. Although i'm not exactly sure if there really is any
      possible difference, i've added this option here as well.

      Ignored if <infile> is a .sum (matrix) file.

      Default: disabled

    --subtitle-index <index>

      Index of the subtitle stream to process, where 0 is the first stream.

      Ignored if <infile> is a .sum (matrix) file.

      Min.:    0
      Max.:    31
      Default: 0

    --use-idx-file-offsets
    --no-use-idx-file-offsets

      Use file offsets (for images in .sub files) from the corresponding .idx
      file, if available.

      Although the factory default for this (global) option is True, i've only
      had bad results with this, so when using /FINDMATRIX mode, this setting is
      disabled by default but can be re-enabled using this option if needed.

      Ignored if <infile> is .bup|.ifo|.vob or a .sum (matrix) file.

      Default: disabled

    --
      Can be used as explicit end-of-options marker.

  Advanced OCR Options:

    These options can be used to tweak the OCR (text/character recognition)
    behaviour.

    Note that some of these options have no influence if <infile> is a .sum
    (matrix) file.

    --ocr-line-min-height <value>

      Minimum line exploration height.

      Min.:     0
      Max.:     45
      Default:  8

    --ocr-line-min-inter-height <value>

      Minimum inter line height.

      Min.:     1
      Max.:     50
      Default:  2

    --ocr-line-max-height <value>

      Maximum line height.

      Min.:     0
      Max.:     45
      Default:  45

    --ocr-char-min-width <value>

      Minimum character width.

      Min.:     1
      Max.:     45
      Default:  2

    --ocr-char-min-height <value>

      Minimum character height.

      Min.:     1
      Max.:     45
      Default:  2

    --ocr-check-detached-char
    --no-ocr-check-detached-char

      Enable/disable checking for slightly detached characters.

      Default: enabled

    --ocr-space-min-width <value>

      Minimum pixels between 2 characters to be considered as a SPC character.

      Min.:     -10
      Max.:     45
      Default:  7

    --ocr-char-diff-sensibility <value>

      Character difference sensibility.

      Min.:     0
      Max.:     1000
      Default:  980

    --ocr-char-max-width-diff <value>

      Maximum character width difference to be considered as the same character.

      Min.:     0
      Max.:     45
      Default:  2

    --ocr-char-max-height-diff <value>

      Maximum character height difference to be considered as the same
      character.

      Min.:     0
      Max.:     45
      Default:  2

    --ocr-char-max-top-diff <value>

      Maximum character top difference (line top relative) to be considered as
      the same character.

      Min.:     0
      Max.:     45
      Default:  6

  Search Options:

    --all
    --no-all

      Normally, the /FINDMATRIX scan will stop after finding a 'good' matrix for
      a given <infile>. Use this option to continue scanning all files in the
      given <matrix-dir> for a complete result.

      Note that this might print multiple filenames to stdout (each file in a
      separate line).

      Default: disabled (stop at first match)

    --first-matrix <matrix-file>

      This option can be used to speed up batch processing if multiple subtitles
      are to be converted which are using the same font. If so, the result from
      the last /FINDMATRIX search can be specified here as the <matrix-file>
      parameter in order to check that matrix file before any others.

      If the given <matrix-file> is located within the given <matrix-dir>, it
      will be automatically excluded from checking again.

      Since it is not a problem if the given <matrix-file> could not be loaded
      (e.g. if it doesn't exist), /FINDMATRIX will gently ignore this option and
      proceed with the other matrix files in <matrix-dir>, making it easier to
      write shell scripts utilizing this option.

      Note: Delphi apparently removes explicitly empty commandline args, so
      you need to pass at least something as initial 'dummy' parameter to
      satisfy this stupid language. For example:

        #!/usr/bin/env bash
        #
        # this doesn't work with Delphi...
        #
        last_good_matrix=""
        #
        # ...so we use:
        #
        last_good_matrix="anything-but-empty"
        #
        while [...]; do
          matrix=$(SubRip /FINDMATRIX --first-matrix "$last_good_matrix" [...])
          [...]
          last_good_matrix=$matrix
        done

    --full-search
    --no-full-search

      This option can be used to search for all <infile> glyphs instead of a
      limited range as normally used in typical searches. Enabling this option
      allows to get the total number of <infile> glyphs found in other matrix
      files which helps to decide about possible 'double' matrix candidates.

      Note that --full-search takes more time, and if <infile> is a subtitle, it
      will have to be fully scanned before the search begins.

      Windows users need to redirect stderr to a file to see all output because
      the --popup-results option would only show full matches in this case (all
      glyphs found).

      Default: disabled

    --search-set <num>

      Specifies the minimum number of unique glyphs in the given <infile> to be
      scanned before starting the matrix search.

      Ignored if --full-search is enabled.

      Min.:    15
      Max.:    99999
      Default: 60

    --match-set <num>

      Specifies the minimum number of unique glyphs in the given <infile> (from
      the given --search-set) that must match one of the glyphs in a character
      matrix to consider the matrix to be a 'good' one.

      The given <num> value must be <= the --search-set value.

      The default value might still be a bit conservative (i.e., quite high),
      but beware that very small numbers might cause 'weak' hits, i.e., show up
      'good' matrix files where in fact only a few simple characters match
      (things like l . , : - ' etc.) while the important (more unique)
      characters might be quite different and should therefor be put into a
      separate matrix. I imagine that a value of 35 or even 30 might still be
      okay, though. Just remember: better too high than too low.

      Ignored if --full-search is enabled.

      Min.:    10
      Max.:    99999
      Default: 40

    --tiny-match-percent <percent>

      Used if <infile> is so small that it contains less than the --search-set
      number of unique glyphs, e.g. from a short 'bonus' clip or a tiny 'Forced'
      subtitle. If this happens, /FINDMATRIX will reduce the --search-set to the
      total number of glyphs in <infile> and set --match-set to a percentage of
      that number (rounded up).

      For example, if <percent> is 75 and --search-set is 60 but <infile> only
      contains 23 unique glyphs, --search-set will be reduced to 23 and
      --match-set will be set to 18 (rounded up from 23 * 0.75 = 17.25).

      Ignored if --full-search is enabled.

      Min.:     1 (careful: 1 would add the glyphs to virtually any matrix)
      Max.:   100
      Default: 75

    --search-order <value>

      Specifies the (secondary) sort order for the matrix file search.

      The primary order is to always search in all matrix files within the same
      dir first before diving deeper into sub directories. If there are multiple
      subdirs in the same folder, those subdirs are always scanned in
      alphabetical order (ascending, not case-sensitive).

      Now, the --search-order option specifies a secondary sort order which only
      applies to matrix files within the same (sub-)dir. It can be set to one of
      the following values:

        "aname"   - sort files by name (ascending, not case-sensitive)
        "dname"   - sort files by name (descending, not case-sensitive)
        "asize"   - sort files by size (ascending)
        "dsize"   - sort files by size (descending)
        "none"    - no sort order for matrix files

      The default "asize" mode will search through small matrix files first
      before taking on the larger ones (within the same (sub-)directory).
      This allows to gradually complete the smaller ones by occasionally adding
      a few missing glyphs from the next subtitle(s), also speeding up batch
      conversion in cases where a large multi-font matrix is present in a mix
      with further small ones within the same (sub-)dir.

      Using a clear sort order allows full control over which kinds of matrix
      files to be searched first. For example, more reliable ones should be
      scanned before some imported ones from external sources, and you may also
      prefer to have extremely large matrix files always scanned last.
      An example directory structure might look like this:

        ChMatrix/
           10-Good/
              MovieX.sum
              MovieY.sum
           20-BuddyX/
              MovieA.sum
              MovieB.sum
           50-SomeWebsite/
              MovieC.sum
           ZZ-VeryLarge/
              LotsOf-1.sum
           MyMovieA.sum
           MyMovieB.sum

      In the above example, MyMovieA.sum and MyMovieB.sum will always be checked
      first because they are placed in the top-level dir. With --search-order
      "dname", MyMovieB.sum would be the first of the two. Then, 10-Good will be
      scanned. Eventually, the very last file checked will be LotsOf-1.sum.

      Default: "asize"

  <infile>

    Must be either a .bup|.idx|.ifo|.sub|.vob file or a .sum (matrix) file.

    Specifying a .sum file here is useful to find out if a new matrix file
    (usually from an external source) contains glyphs for a font/DVD which are
    already part of another matrix file within the given <matrix-dir>.

  <matrix-dir>

    The given <matrix-dir> will be searched recursively, so it is possible to
    add external matrix files in sub directories in order to keep them separate
    from your own files and/or from each other.
    See also: --search-order (above)

    If <infile> is located somewhere within the given <matrix-dir>, it will be
    automatically excluded from a comparison against itself, so it is not
    necessary to move a certain matrix file from/to the matrix dir in order to
    search for 'doubles'.

Result:

  The /FINDMATRIX command will return with an exit code of 1 in case of an
  error, otherwise 0 (even if no 'good' matrix was found). To test if a positive
  match was found, check if there was any output to stdout (status info and/or
  error messages go to stderr).

  *nix users: note that text output lines (incl. the 'good' matches) end with a
  DOS/Windows line break, so after assigning output to a shell variable, the
  last character (CR) needs to be removed. For example:

    #
    # bash shell:
    #
    line=${line%?}

    #
    # other (POSIX) shells:
    #
    line=$(echo $line | awk '{print substr($0, 0, length($0)-1)}')

  Windows users will either have to use the --popup-results option or redirect
  stdout to a file, pipe or other character device because Windows GUI programs
  don't have any stdio streams available, even if launched from within CMD.exe
  (no problem under Wine). If you also wish to log all progress & other infos,
  you'll need to redirect the stderr output stream, too. For example:

    #
    # just log the result(s)
    #
    SubRip /FINDMATRIX Fun.eng.sub D:\ChMatrix >Results-simple.txt

    #
    # log everything
    #
    SubRip /FINDMATRIX Fun.eng.sub D:\ChMatrix >Results-full.txt 2>&1

Notes:

  Main Window:

    During /FINDMATRIX search, the program will minimize the main window so it
    won't get in the way and also to accelerate the overall processing speed.

Examples:

  #
  # Find a matrix file for a given subtitle
  #
  SubRip /FINDMATRIX --popup-results D:\Work\Fun.sub D:\ChMatrix

  #
  # Find out if a certain matrix file is already covered by another one
  # (simple example)
  #
  SubRip /FINDMATRIX --popup-results D:\Work\Fun.sum D:\ChMatrix
  #
  # Advanced version: same purpose as above, but using a full search in all
  # matrix files, writing more detailed statistics to an output text file.
  # This helps deciding about 'double' candidates for potential removal, or
  # alternatively, joining using the /JOINMATRIX cmd (below).
  #
  SubRip /FINDMATRIX --all --full-search --search-order aname D:\Work\Fun.sum D:\ChMatrix >D:\Work\Fun-MatrixStats.txt 2>&1

Example Scripts:

  Example Script 1:

  A simple bash shell script for searching 'doubles' amongst matrix files (using
  Wine e.g. on Linux) might look like this:

    #!/usr/bin/env bash
    #
    # *nix path to a directory to store the result info file(s)
    #
    results_dir=./Results
    mkdir -p "$results_dir/filtered"
    if [ $? -ne 0 ]; then
      echo "ERROR: failed to create results dir: $results_dir" >&2
      exit 1
    fi
    #
    # Windows paths to SubRip.exe and the base matrix dir where your permanent
    # collection is stored (in this example, an own matrix dir within the user's
    # home directory).
    #
    subrip_cmd="D:\\SubRip\\SubRip.exe"
    matrix_dir=$(winepath -w ~/.config/SubRip/ChMatrix)
    #
    # Windows path to the matrix test dir which contains matrix files to check
    # against existing ones in $matrix_dir. These two dirs ($matrix_dir and
    # $matrix_testdir) can also be identical if you simply wish to check the
    # base $matrix_dir for 'doubles' amongst each other, like:
    #
    # matrix_testdir=$matrix_dir
    #
    matrix_testdir=$(winepath -w ./NewMatrixFiles)
    #
    # setup wine to be silent
    #
    export WINEDEBUG="err-all,warn-all,fixme-all"
    #
    # Full/deep recursive scan for 'doubles' (we sort by aname just for readability)
    #
    # We add a counter to result text filenames to create separate output files in
    # case where $matrix_testdir contains multiple matrix files with the same name
    # within different subdirs.
    #
    echo "Counting matrix files..."
    src_files=$(find $(winepath -u "$matrix_testdir") -name "*.sum" -type f | wc -l)
    dest_files=$(find $(winepath -u "$matrix_dir") -name "*.sum" -type f | wc -l)
    echo "Checking $src_files file(s) against $dest_files file(s) in the base matrix dir..."
    if [ $src_files  -le 0 ] \
    || [ $dest_files -le 0 ]; then
      echo "Nothing to check, aborting..."
      exit 0
    fi
    matrix_testdir_u=$(winepath -u "$matrix_testdir")
    count=1
    find "$matrix_testdir_u" -name "*.sum" -type f | while read file; do
      echo "File [$count/$src_files]: ${file:((1+${#matrix_testdir_u}))}"
      bname=$(basename "$file")
      wine $subrip_cmd /FINDMATRIX \
                       --all \
                       --full-search \
                       --search-order aname \
                       $(winepath -w "$file") \
                       "$matrix_dir" \
                       >"$results_dir/$bname.$count.txt" 2>&1
      #
      # create filtered output for some more relevant details
      #
      header=0
      cat "$results_dir/$bname.$count.txt" | while read -r line; do
        if echo "$line" | grep '+++ ' >/dev/null \
        || echo "$line" | grep '+-+ ' >/dev/null ; then
          if [ $header -eq 0 ]; then
            echo -e "Filtered results for matrix file: $file\015" >>"$results_dir/filtered/$bname.$count.txt"
            header=1
          fi
          echo "$old_line"  >>"$results_dir/filtered/$bname.$count.txt"
          echo "$line"      >>"$results_dir/filtered/$bname.$count.txt"
        fi
        old_line=$line
      done
      ((count++))
    done

  Example Script 2:

  A simple bash shell script for batch-processing with /FINDMATRIX and /AUTOTEXT
  (using Wine e.g. on Linux). It converts all *.???.sub files in the current dir
  to .srt, where ??? is the language code, as in "Title00.01.eng.sub".
                                                             ^^^
  Using the above filename pattern (??? for the language code) allows the script
  to automatically set the --subtitle-language option to the right language for
  Post-OCR Correction.

  Note: to keep your matrix files as small (and distinct) as possible, run this
  script at least once per subtitle language. Some DVD subtitles are produced by
  multiple authors (per-language) so there could be multiple fonts involved.
  Therefor i'd recommend to put all subtitles for a certain language in one
  directory, another language in a separate directory etc., then run the script
  once per each of those directories.

    #!/usr/bin/env bash
    #
    # New matrix filename (if no 'good' one was found)
    #
    # The ".1" means that it's our first matrix for this DVD. After switching to the
    # next language, use ".2" if the ".1" matrix was indeed created, for the case
    # when the subtitles for the next language turn out to be using a different
    # font. This will keep your matrix files as small as possible.
    #
    # To turn this into a commandline arg for the script itself, you could use:
    #
    #new_matrix_file="$1"
    #
    new_matrix_file="Fun.1.sum"
    #
    # Matrix sort mode (in case the matrix is changed during the conversion),
    # which can be one of: "latin", "random" or "none".
    #
    # This might also be turned into a commandline arg, like:
    #
    #sort_mode="$2"
    #
    sort_mode="latin"
    #
    # The following needs to be configured once:
    #
    # Windows paths to SubRip.exe and matrix dir (in this example, an own matrix
    # dir within the user's home directory).
    #
    subrip_cmd="D:\\SubRip\\SubRip.exe"
    matrix_dir=$(winepath -w ~/.config/SubRip/ChMatrix)
    #
    # (end of configuration)
    #
    # Setup wine to be silent
    #
    export WINEDEBUG="err-all,warn-all,fixme-all"
    #
    # Make sure that the "new" matrix file really doesn't exist:
    #
    if [ -e $(winepath -u "$matrix_dir\\$new_matrix_file") ]; then
      echo "ERROR: new matrix file already exists: $matrix_dir\\$new_matrix_file" >&2
      exit 1
    fi
    #
    # Use the largest subtitle as test sample for /FINDMATRIX (most reliable)
    #
    sample_subtitle=$(ls -1 -S *.???.sub | head -n 1)
    #
    # find a 'good' matrix file or use a new one
    #
    matrix_file=$(wine $subrip_cmd /FINDMATRIX "$sample_subtitle" "$matrix_dir")
    if [ $? -ne 0 ]; then
      echo "ERROR: SubRip failed due to an error during /FINDMATRIX" >&2
      exit 1
    fi
    if [ -n "$matrix_file" ]; then
      # found a 'good' one: remove CR (from DOS/Windows line break)
      matrix_file=${matrix_file%?}
    else
      # no output to stdout == no 'good' matrix found, so use a new one
      matrix_file="$matrix_dir\\$new_matrix_file"
    fi
    #
    # Convert all *.???.sub files in the current dir to .srt, where ??? is the
    # subtitle language code, as in "Title00.01.eng.sub".
    #                                           ^^^
    # We sort by filesize (big files first), so when we arrive at the small subs, an
    # appropriate matrix will already exist by then. This avoids creating new matrix
    # files which may be so tiny that they could end up being skipped in future
    # /FINDMATRIX runs because they won't qualify for typical --match-set values.
    #
    ls -1 -S *.???.sub | while read file; do
      winfile=$(winepath -w "$file")
      wine $subrip_cmd /AUTOTEXT \
                       --no-use-subtitle-map \
                       --subtitle-language "${file:(-7):3}" \
                       --sortmatrix "$sort_mode" \
                       "$winfile" \
                       "${winfile%.sub}.srt" \
                       "$matrix_file" \
                       2>&1 | tee -a AutoText.log
    done

--------------------------------------------------------------------------------
[end] /FINDMATRIX
--------------------------------------------------------------------------------

--------------------------------------------------------------------------[BLO]-
/JOINMATRIX (special CLI command added by Bloody)
--------------------------------------------------------------------------------

This cmd helps to clean out some (rather small) matrix files if most of their
glyphs are already part of another matrix (as shown by the /FINDMATRIX cmd).

To enable /JOINMATRIX mode, SubRip must be started using "/JOINMATRIX" as the
very first commandline arg, otherwise the "classic" commandline syntax will be
used.

MSYS users need to use //JOINMATRIX instead of /JOINMATRIX to circumvent the
automatic root path expansion through MSYS (where using /JOINMATRIX would cause
SubRip to abort without a proper error message explaining the problem).

Usage:

  SubRip /JOINMATRIX [options] <matrix1> <matrix2> <output-matrix>

  (For a brief usage example, see "Examples:" below.)

  Introduction: the emergence of 'double' matrix files

    This command is a somewhat more 'geeky' maintenance utility; not exactly
    required reading for 'normal' users who just want to get the job done, i.e.,
    convert DVD subtitles to text using /FINDMATRIX and /AUTOTEXT only, but if
    you're interested in a bit more technical stuff, read on..

    Assume you just ran the following commands:

      cd D:\ChMatrix
      SubRip /FINDMATRIX --all --full-search --search-order aname matrix1.sum . >matrix1-results.txt 2>&1

    The matrix1-results.txt file might show something like this:

      matrix [1/1]: D:\ChMatrix\matrix2.sum
      +-+ (part)    72/83/83, 159 total

    What we can see here is that our very small matrix1.sum with only 83 glyphs
    is mostly covered by matrix2.sum (which contains 159 glyphs in total).

    72 of the 83 glyphs have been identified in matrix2.sum, so it's pretty safe
    to assume that those two files contain glyphs for the very same font only.

    One could now simply delete matrix1.sum (the smaller one) and be done with
    it, or one could be more pedantic and use the /JOINMATRIX cmd to preserve
    those 11 glyphs from matrix1.sum by adding them to matrix2.sum, making it
    a bit more likely to be identified as a 'good' one in the next /FINDMATRIX
    search, and of course, to get rid of matrix1.sum.

    How come that 'double' matrix files occur in the first place?

    There are several possible reasons for this:

      - matrix1.sum is so small that it is reasonable to assume that it only
        contains regular (non-italic) glyphs. If another subtitle is converted
        which is using the very same font but begins with italic-style text,
        those italic glyphs (the Search Set) would not be identified within the
        smaller matrix1.sum, therefor a new (separate) matrix2.sum is created.

      - the subtitle that created matrix2.sum contained multiple glyphs early on
        which belong to a language not yet covered by matrix1.sum, and/or multi-
        char glyphs (the usual stuff like "tt", "rv" etc., glued so tightly
        together that the OCR engine perceives them as one single glyph), also
        not yet part of matrix1.sum, or it contained identical glyphs but with
        certain subtle differences, so they were considered to be 'different'.
        The latter sometimes happens with crappy, poorly-rendered fonts (to be
        blunt, i haven't seen any really great-looking DVD subtitles yet)...

      - matrix2.sum was imported from an external source.

    In any case, 'double' matrix files are indeed sometimes created, depending
    on how many very small files (<100-150 glyphs) are compared with each other.
    More 'mature' matrix files (>250-300+ glyphs) tend to attract glyphs for
    known fonts more reliably, therefor more likely avoiding the creation of new
    matrix files for the very same font.

    It would be tricky to entirely avoid the creation of 'doubles' because one
    would have to either lower the MatchSet value used with the /FINDMATRIX cmd
    which, if too low, would add glyphs to a matrix they don't really belong to,
    or alternatively, raise both SearchSet/MatchSet values, but that in turn
    would not only significantly slow down the /FINDMATRIX search, but also
    raise the question about how long it is still justified to join two matrix
    files instead of simply keeping them.
    For example, assume the following output:

      matrix [1/1]: D:\ChMatrix\matrix2.sum
      +-+ (part)    224/391/391, 542 total

    In this case we have two bigger matrix files where it can't be said with
    certainty that neither of those two files contain more than one single font
    already, which is exactly what we try to avoid since we use multiple matrix
    files to avoid problems with huge ones (with thousands of glyphs).

    Sometimes it happens that a matrix file contains more than one font, like
    when a subtitle contains certain entries which were added later by another
    author, even if sometimes just rendered slightly different or using a
    slightly different font size, or a number of 'bold' glyphs which just so
    happen to look just like the entire font of another subtitle to be converted
    in a future session.

    Joining those two matrix files would create an even bigger <output-matrix>
    which would only serve as a "magnet" for even more multi-font subtitles, and
    in the end, such a matrix could grow so large that it would not only slow
    down future subtitle conversions but also start to cause problems of a
    different nature. This topic is covered in the /FINDMATRIX documentation,
    under "the issue with huge Matrix files" (above).

    Also, the number of identical glyphs in the above case is far too low for
    my taste, i.e., those two matrix files are not really so equal.

    Therefor i wouldn't advise to join bigger matrix files together because at
    some point they would require tighter "advanced OCR settings" and so on.
    Better just keep those files as-is, and if one of them later turns out to be
    compromised (erroneous glyphs or simply grown too big), then it could simply
    be deleted while the other matrix is still preserved, saving at least some
    glyph typing in future subtitle conversions.

    Eventually, small 'double' matrix files are not a real problem, compared to
    huge ones (with thousands of glyphs), so better don't over-use this command.
    Remember: once joined, a matrix file can never be split again because there
    is no way to tell which glyphs belong together (same font).

    So in a nutshell, the /JOINMATRIX cmd is useful to get rid of some of the
    smallest 'double' matrix files rather than joining bigger ones.

  General Options:

    --delete-input-files
    --no-delete-input-files

      Delete <matrix1> and <matrix2> after successful joining.

      Default: disabled

    --force
    --no-force

      Overwrite an existing <output-matrix>. By default, the program would abort
      with an error msg if <output-matrix> already exists.

      Default: disabled

    --sortmatrix <mode>

      Specifies if and how to sort the <output-matrix> file before saving, where
      <mode> can be one of the following values:

        "latin"   - for max. subtitle conversion speed (latin-based languages)
        "random"  - for speed tests (font-independent)
        "none"    - don't sort the matrix

      Default: "none"

    --
      Can be used as explicit end-of-options marker.

  Advanced OCR Options:

    These options can be used to tweak the OCR (text/character recognition)
    behaviour.

    Note that some of these options have no real influence on .sum (matrix)
    files, but some are certainly required for a reasonable glyph comparison,
    such as --ocr-char-*diff* options. For small(-ish) matrix files (less than
    thousands of glyphs), the defaults are usually fine, though.

    --ocr-line-min-height <value>

      Minimum line exploration height.

      Min.:     0
      Max.:     45
      Default:  8

    --ocr-line-min-inter-height <value>

      Minimum inter line height.

      Min.:     1
      Max.:     50
      Default:  2

    --ocr-line-max-height <value>

      Maximum line height.

      Min.:     0
      Max.:     45
      Default:  45

    --ocr-char-min-width <value>

      Minimum character width.

      Min.:     1
      Max.:     45
      Default:  2

    --ocr-char-min-height <value>

      Minimum character height.

      Min.:     1
      Max.:     45
      Default:  2

    --ocr-check-detached-char
    --no-ocr-check-detached-char

      Enable/disable checking for slightly detached characters.

      Default: enabled

    --ocr-space-min-width <value>

      Minimum pixels between 2 characters to be considered as a SPC character.

      Min.:     -10
      Max.:     45
      Default:  7

    --ocr-char-diff-sensibility <value>

      Character difference sensibility.

      Min.:     0
      Max.:     1000
      Default:  980

    --ocr-char-max-width-diff <value>

      Maximum character width difference to be considered as the same character.

      Min.:     0
      Max.:     45
      Default:  2

    --ocr-char-max-height-diff <value>

      Maximum character height difference to be considered as the same
      character.

      Min.:     0
      Max.:     45
      Default:  2

    --ocr-char-max-top-diff <value>

      Maximum character top difference (line top relative) to be considered as
      the same character.

      Min.:     0
      Max.:     45
      Default:  6

  <matrix1>
  <matrix2>

    Must both be .sum (matrix) files.

    The order in which those two files are specified does not matter, i.e., the
    /JOINMATRIX cmd will always take the bigger matrix first (the one with more
    glyphs) and then add the missing glyphs from the other one to the joined
    <output-matrix> file.

    Any 'double' glyphs identified in both <matrix1> and <matrix2> will be
    skipped, i.e., only added once to the <output-matrix> file.

  <output-matrix>

    Must also be a .sum (matrix) file. If the output file already exists, the
    default action is to abort with an error message.

    See also: --force (above).

Notes:

  Main Window:

    During a /JOINMATRIX operation, the program will minimize the main window so
    it won't get in the way and also to accelerate the overall processing speed.

Examples:

  #
  # Join matrix1.sum and matrix2.sum into a new matrix3.sum file
  #
  SubRip /JOINMATRIX matrix1.sum matrix2.sum matrix3.sum

--------------------------------------------------------------------------------
[end] /JOINMATRIX
--------------------------------------------------------------------------------

--------------------------------------------------------------------------[BLO]-
/MATRIXINFO (special CLI command added by Bloody)
--------------------------------------------------------------------------------

This cmd prints information about a matrix file.

To enable /MATRIXINFO mode, SubRip must be started using "/MATRIXINFO" as the
very first commandline arg, otherwise the "classic" commandline syntax will be
used.

MSYS users need to use //MATRIXINFO instead of /MATRIXINFO to circumvent the
automatic root path expansion through MSYS (where using /MATRIXINFO would cause
SubRip to abort without a proper error message explaining the problem).

Usage:

  SubRip /MATRIXINFO [options] <matrix-file>

  (For a brief usage example, see "Examples:" below.)

  General Options:

    --type <value>

      Specifies the type of information to be printed, where <value> can be one
      of the following:

        "bold"      - number of bold-style glyphs in the matrix
        "comment"   - the user comment (if any)
        "italic"    - number of italic-style glyphs in the matrix
        "regular"   - number of regular (non-style) glyphs in the matrix
        "total"     - overall number of glyphs in the matrix
        "underline" - number of underline-style glyphs in the matrix

      Default: "total"

    --
      Can be used as explicit end-of-options marker.

  <matrix-file>

    An existing .sum (matrix) file to retrieve information about.

Notes:

  Results are printed to stdout, while everything else (incl. headlines and
  other output) goes to stderr.

  Windows users need to redirect stdout output to a file, pipe or other
  character device because Windows GUI programs do not have any stdio streams
  available, even if launched from within CMD.exe (no problem under Wine).
  For example:

    SubRip.exe /MATRIXINFO MyMatrix.sum >D:\Work\Fun\MyMatrix-Info.txt

    Or (to log all output):

    SubRip.exe /MATRIXINFO MyMatrix.sum >D:\Work\Fun\MyMatrix-Info.txt 2>&1

  *nix users note that all text output lines end with a DOS line break, so after
  assigning output to a shell variable, the last character (CR) must be removed
  before numeric output can be processed. For example:

    #
    # bash shell:
    #
    line=${line%?}

    #
    # other (POSIX) shells:
    #
    line=$(echo $line | awk '{print substr($0, 0, length($0)-1)}')

Examples:

  #
  # Find out how many glyphs are in a matrix file
  #
  SubRip /MATRIXINFO --type total MyMatrix.sum
  #
  # ...or just (since "total" is the default type):
  #
  SubRip /MATRIXINFO MyMatrix.sum

--------------------------------------------------------------------------------
[end] /MATRIXINFO
--------------------------------------------------------------------------------

--------------------------------------------------------------------------[BLO]-
/SORTMATRIX (special CLI command added by Bloody)
--------------------------------------------------------------------------------

This command is used to [batch-]sort existing/imported matrix file(s).

To enable /SORTMATRIX mode, SubRip must be started using "/SORTMATRIX" as the
very first commandline arg, otherwise the "classic" commandline syntax will be
used.

MSYS users need to use //SORTMATRIX instead of /SORTMATRIX to circumvent the
automatic root path expansion through MSYS (where using /SORTMATRIX would cause
SubRip to abort without a proper error message explaining the problem).

Usage:

  SubRip /SORTMATRIX [options] <matrix-file>

  (For a brief usage example, see "Examples:" below.)

  The /SORTMATRIX command loads, sorts and then saves a given <matrix-file>.

  General Options:

    --mode <mode>

      Specifies how to sort the matrix, where <mode> can be one of the following
      values:

        "latin"   - for max. subtitle conversion speed (latin-based languages)
        "random"  - for speed tests (font-independent)

      Default: "latin"

    --
      Can be used as explicit end-of-options marker.

  <matrix-file>

    A character matrix (.sum) file.

Examples:

  #
  # Sort a matrix for max. subtitle conversion speed (latin-based languages)
  #
  SubRip /SORTMATRIX --mode latin D:\ChMatrix\Fun.sum
  #
  # or just: (since "latin" is the default for --mode)
  #
  SubRip /SORTMATRIX D:\ChMatrix\Fun.sum

--------------------------------------------------------------------------------
[end] /SORTMATRIX
--------------------------------------------------------------------------------

--------------------------------------------------------------------------[BLO]-
--version option (added by Bloody)
--------------------------------------------------------------------------------

Usage:

  SubRip --version

Output:

  The first output line contains "SubRip <Major>.<Minor>.<Build>".

  Windows users will get a popup window by default. If you wish to parse the
  version from a script, you'll need to redirect stdout to a file, pipe or other
  character device because Windows GUI programs do not have any stdio streams
  available, even if launched from within CMD.exe (no problem under Wine).
  For example:

    SubRip --version >SubRip-Version.txt

Notes:

  SubRip 1.50beta7 printed "SubRip 1.50b7" (the --version option was introduced
  with the 1.50b7 release).

  Older SubRip versions will try to execute this command in "classic" mode and
  simply fail & quit without printing anything.

  In a nutshell:

    if (VersionOutput = "" or VersionOutput = "1.50b7") then OldVersion = true

--------------------------------------------------------------------------------
[end] --version option
--------------------------------------------------------------------------------

---------------------------------------------------------------------------v1.1-
PhASE #1 (optional)
obtain info about available subtitle streams from idx/IFO
--------------------------------------------------------------------------------

SYNTAX:

 SubRip /SSSINFO in-file out-file

 /SSSINFO     Save SubPicture Stream INFOrmations,
              i.e. content of the SubRip's lng stream combobox
 in-file      input IFO/idx file (including path)
 out-file     output txt-filename (including path)


--------------------------------------------------------------------------------
PhASE #2
build subtitle script, (palette file) & bmps
--------------------------------------------------------------------------------

SYNTAX:

 SubRip ini-file in-file out-file stream out-format dropflag

 ini-file     INI file (including path)
 in-file      input IFO/idx file (including path)
 out-file     output filename prefix (including path)
 stream       subtitle stream to process (0 ... 31)
 out-format   output subtitle format (0 ... 5 in order like in SubRip, that is:
              I-Author, Philips, Sonic DVD, Scenarist, DVDMaestro, Impression)
 dropflag     1 = DROP / 0 = NON DROP (ignored for PAL source)

 Hints (takes PhASE #1 too):
 - all CLI parameters are required and in this order,
 - remember to use double quotes if some path/name contain space character,
 - if some error occur, SubRip exits,
 - if selected subtitle stream does not exist (according to IFO/idx file),
   SubRip exits,
 - all output files are overwritten w/o prompt (if they already exist).

--------------------------------------------------------------------------------

INI FILE:

 Here is an example of INI file for Scenarist output (default values). The
 section is named according to out-format's number (BMPOUT 0 ... BMPOUT 5).
 Items are named according to SubRip's Bitmap Adjustment window. The first part
 of name equals to tab name, the second part is named in coherence with
 the controls.

 [BMPOUT 3]
 Cropping_Enabled=1  //Allow BMP Cropping (1 = yes, 0 = no)
 Cropping_MinX=720  //Minimum Picture Width
 Cropping_MinY=574  //Minimum Picture Height
 Cropping_DivX=8  //Width must be a multiple of
 Cropping_DivY=2  //Height must be a multiple of
 Cropping_XPosMode=1  //Horizontal Alignment (0 = Left, 1 = Center, 2 = Right)
 Cropping_YPosMode=1  //Vertical Alignment (0 = Top, 1 = Center, 2 = Bottom)
 Cropping_PosShift.X=0  //+
 Cropping_PosShift.Y=0  //+
 Cropping_DivL=1  //Left must be a multiple of
 Cropping_DivT=1  //Top must be a multiple of
 Colors_Custom=0  //Custom Colors and Contrast (1 = yes, 0 = no)
 Colors_CustColor_0=0  //Custom Color #1 (BGR order)
 Colors_CustColor_1=16777215  //Custom Color #2 (BGR order)
 Colors_CustColor_2=16711680  //Custom Color #3 (BGR order)
 Colors_CustColor_3=8421504  //Custom Color #4 (BGR order)
 Colors_CustContrast_0=0  //Custom Contrast #1
 Colors_CustContrast_1=15  //Custom Contrast #2
 Colors_CustContrast_2=15  //Custom Contrast #3
 Colors_CustContrast_3=15  //Custom Contrast #4
 Colors_BitPerPixel=4 //bits per pixel, BMPOUT 1, 3, 4 only (4 = 4bit, 8 = 8bit)
 Colors_Compress=0 //Compress Bitmaps (1 = yes, 0 = no)
 Positioning_Enabled=0  //Allow BMP position change (1 = yes, 0 = no)
 Positioning_XPosMode=0  //Left Position (0 = Keep, 1 = Center, 2 = Right)
 Positioning_SetTo.X=0  //Left Position - Set to
 Positioning_PosShift.X=0  //Left Position - Then Add/Remove
 Positioning_YPosMode=0  //Top Position (0 = Keep, 1 = Center, 2 = Bottom)
 Positioning_SetTo.Y=0  //Top Position - Set to
 Positioning_PosShift.Y=0  //Top Position - Then Add/Remove

 In INI file, the variables must not be in order as provided here. And it's not
 required to set all values (or even whole sections). I.e. if you don't state
 some variable (section), default value is used. In other words, you can use
 INI file only to override some default value(s). To see default values for all
 graphical outputs, delete SubRip.ini in SubRip folder and execute SubRip.
 In Bitmap Adjustment window, use "Save current Profile as" to store default
 profile (profile = your INI file). Now you can adjust your INI using e.g.
 Notepad or set your values in Bitmap Adjustment window and then store it
 again as a new profile.

 Hints:
 - Cropping_MinY is set to 574 by default. You must not change this value for
   NTSC DVDs, it's done automatically and the same vice versa, but only for
   478 and 480, resp. 574 and 576.

--------------------------------------------------------------------------------

EXAMPLES:

 SubRip /SSSINFO C:\VIDEO_TS\VTS_01_0.IFO C:\WINDOWS\TEMP\ssi.txt
 - SubRip opens VTS_01_0.IFO and saves content of the lng stream combobox (like
 you see it in What to do? window) into ssi.txt in selected folder.

 SubRip E:\dvds\maestro.ini C:\VIDEO_TS\VTS_01_0.IFO "D:\DVD Maest\movie" 0 4 1
 - SubRip read settings for selected DVDMaestro output from maestro.ini
 ([BMPOUT 4] section), open VTS_01_0.IFO and corresponding VOBs (w/o menu
 VTS_01_0.VOB), process subtitle stream #0, save output files movie#####.bmp,
 movie.son and movie.spf into D:\DVD Maest folder, in case of NTSC source the
 DropFrame flag is set to DROP.

 SubRip "C:\Program Files\my.ini" C:\VobSubs\divx-some.idx D:\SST\some 12 3 0
 - SubRip read settings for selected Scenarist output from my.ini ([BMPOUT 3]
 section), open divx-some.idx and corresponding divx-some.sub, process subtitle
 stream #12, save output files some.#.bmp, some.sst into D:\SST folder, in case
 of NTSC source the DropFrame flag is set to NON DROP.

--------------------------------------------------------------------------------

HiSTORY:

1.1
+ Cropping_DivL, Cropping_DivT
+ Colors_BitPerPixel for Scenarist, DVDMaestro
+ Colors_Compress

1.0
* initial version

--------------------------------------------------------------------------------
