[Top] [Contents] [Index] [ ? ]

GNU Ocrad


Copyright © 2003, 2004, 2005 Antonio Diaz Diaz.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

1. Character Sets

The character set internally used by ocrad is ISO 10646, also known as UCS (Universal Character Set), which can represent over two thousand million characters (2^31).

As it is unpractical to try to recognize one among so many different characters, you can tell ocrad what character sets to recognize. You do this with the `--charset' option.

If the input page contains characters from only one character set, say `ISO-8859-15', you can use the default `byte' output format. But in a page with `ISO-8859-9' and `ISO-8859-15' characters, you can't tell if a code of 0xFD represents a 'latin small letter i dotless' or a 'latin small letter y with acute'. You should use `--format=utf8' instead.
Of course, you may request UTF-8 output in any case.


NOTE: Don't believe everything Usamericans tell you. A billion is a million millions (million^2), a trillion is a million million millions (million^3), and so on. Please, respect the meaning of prefixes to make communication among all people possible. Thanks.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

2. Invoking Ocrad

The format for running ocrad is:

 
ocrad [options] [files]

Ocrad supports the following options:

`--help'
`-h'

Print an informative help message describing the options and exit.

`--version'
`-V'

Print the version number of ocrad on the standard output and exit.

`--append'
`-a'

Append generated text to the output file instead of overwriting it.

`--block=number'
`-b number'

Process only the specified text block, beginning from 1. Is only useful when used in conjunction with layout analysis (see below).

`--charset=name'
`-c name'

Enable recognition of the characters belonging to the given character set. You can repeat this option multiple times with different names for processing a page with characters from different character sets.
If no charset is specified, `iso-8859-15' (latin9) is assumed.
Try `--charset=help' for a list of valid charset names.

`--force'
`-f'

Force overwrite of output file.

`--format=name'
`-F name'

Select the output format. The valid names are `byte' and `utf8'.
If no output format is specified, `byte' (8 bit) is assumed.

`--invert'
`-i'

Invert image levels (white on black).

`--layout=mode'
`-l mode'

Enable page layout analysis. The meaning of mode is:
`0' no analysis at all, `1' column separation, `2' full analysis.

`-o file'

Place the output into file instead of into the standard output.

`--scale=value'
`-s value'

Scale the input image by value before layout analysis and recognition. If value is negative, the input image is scaled down by -value.

`--transform=name'
`-t name'

Perform given transformation (rotation or mirroring) on the input image before scaling, layout analysis and recognition.
Try `--transform=help' for a list of valid transformation names.

`--threshold=value'
`-T value'

Set binarization threshold for pgm files or for `--scale' option (only for scaled down images). value should be a rational number between 0 an 1, and may be given as a percentage (50%), a fraction (1/2), or a decimal value (0.5). Image values greater than threshold are converted to white. The default value is 0.5.

`--verbose'
`-v'

Verbose mode.

`-x file'

Write (export) OCR Results File to file. `-x -' writes to stdout, overriding text output except if output has been also redirected with the -o option.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

3. Reporting Bugs

If you find a bug in GNU Ocrad, please send electronic mail to bug-ocrad@gnu.org. Include the version number, which you can find by running `ocrad --version'.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

Concept Index

Jump to:   B   G   I   O   P   U   V  
Index Entry Section

B
bugs3. Reporting Bugs

G
getting help3. Reporting Bugs

I
input charsets1. Character Sets
invoking2. Invoking Ocrad

O
options2. Invoking Ocrad
output format1. Character Sets

P
problems3. Reporting Bugs

U
usage2. Invoking Ocrad

V
version2. Invoking Ocrad

Jump to:   B   G   I   O   P   U   V  

[Top] [Contents] [Index] [ ? ]

Table of Contents


[Top] [Contents] [Index] [ ? ]

Short Table of Contents


[Top] [Contents] [Index] [ ? ]

About This Document

This document was generated on December, 27 2005 using texi2html 1.76.

The buttons in the navigation panels have the following meaning:

Button Name Go to From 1.2.3 go to
[ < ] Back previous section in reading order 1.2.2
[ > ] Forward next section in reading order 1.2.4
[ << ] FastBack beginning of this chapter or previous chapter 1
[ Up ] Up up section 1.2
[ >> ] FastForward next chapter 2
[Top] Top cover (top) of document  
[Contents] Contents table of contents  
[Index] Index index  
[ ? ] About about (help)  

where the Example assumes that the current position is at Subsubsection One-Two-Three of a document of the following structure:


This document was generated on December, 27 2005 using texi2html 1.76.