[Top] | [Contents] | [Index] | [ ? ] |
1. Character Sets | Input charsets and output formats | |
2. Invoking Ocrad | Command Line Interface | |
3. Reporting Bugs | ||
Concept Index | Index of Concepts |
Copyright © 2003, 2004, 2005 Antonio Diaz Diaz.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The character set internally used by ocrad is ISO 10646, also known as UCS (Universal Character Set), which can represent over two thousand million characters (2^31).
As it is unpractical to try to recognize one among so many different characters, you can tell ocrad what character sets to recognize. You do this with the `--charset' option.
If the input page contains characters from only one character set, say
`ISO-8859-15', you can use the default `byte' output
format. But in a page with `ISO-8859-9' and
`ISO-8859-15' characters, you can't tell if a code of 0xFD
represents a 'latin small letter i dotless' or a 'latin small letter y
with acute'. You should use `--format=utf8' instead.
Of course, you may request UTF-8 output in any case.
NOTE: Don't believe everything Usamericans tell you. A billion is a million millions (million^2), a trillion is a million million millions (million^3), and so on. Please, respect the meaning of prefixes to make communication among all people possible. Thanks.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The format for running ocrad is:
ocrad [options] [files] |
Ocrad supports the following options:
Print an informative help message describing the options and exit.
Print the version number of ocrad on the standard output and exit.
Append generated text to the output file instead of overwriting it.
Process only the specified text block, beginning from 1. Is only useful when used in conjunction with layout analysis (see below).
Enable recognition of the characters belonging to the given character set.
You can repeat this option multiple times with different names for
processing a page with characters from different character sets.
If no charset is specified, `iso-8859-15' (latin9) is assumed.
Try `--charset=help' for a list of valid charset names.
Force overwrite of output file.
Select the output format. The valid names are `byte' and `utf8'.
If no output format is specified, `byte' (8 bit) is assumed.
Invert image levels (white on black).
Enable page layout analysis. The meaning of mode is:
`0' no analysis at all, `1' column separation, `2' full analysis.
Place the output into file instead of into the standard output.
Scale the input image by value before layout analysis and recognition. If value is negative, the input image is scaled down by -value.
Perform given transformation (rotation or mirroring) on the input image
before scaling, layout analysis and recognition.
Try `--transform=help' for a list of valid transformation names.
Set binarization threshold for pgm files or for `--scale' option (only for scaled down images). value should be a rational number between 0 an 1, and may be given as a percentage (50%), a fraction (1/2), or a decimal value (0.5). Image values greater than threshold are converted to white. The default value is 0.5.
Verbose mode.
Write (export) OCR Results File to file. `-x -' writes to stdout, overriding text output except if output has been also redirected with the -o option.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
If you find a bug in GNU Ocrad, please send electronic mail to bug-ocrad@gnu.org. Include the version number, which you can find by running `ocrad --version'.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Jump to: | B G I O P U V |
---|
Jump to: | B G I O P U V |
---|
[Top] | [Contents] | [Index] | [ ? ] |
[Top] | [Contents] | [Index] | [ ? ] |
[Top] | [Contents] | [Index] | [ ? ] |
This document was generated on December, 27 2005 using texi2html 1.76.
The buttons in the navigation panels have the following meaning:
Button | Name | Go to | From 1.2.3 go to |
---|---|---|---|
[ < ] | Back | previous section in reading order | 1.2.2 |
[ > ] | Forward | next section in reading order | 1.2.4 |
[ << ] | FastBack | beginning of this chapter or previous chapter | 1 |
[ Up ] | Up | up section | 1.2 |
[ >> ] | FastForward | next chapter | 2 |
[Top] | Top | cover (top) of document | |
[Contents] | Contents | table of contents | |
[Index] | Index | index | |
[ ? ] | About | about (help) |
where the Example assumes that the current position is at Subsubsection One-Two-Three of a document of the following structure:
This document was generated on December, 27 2005 using texi2html 1.76.