Chapter 6 - HTML Reference

This chapter defines all of the HTML elements and attributes that are recognized and supported by HTMLDOC.

General Usage

There are two types of HTML files - structured documents using headings (H1, H2, etc.) which HTMLDOC calls "books", and unstructured documents that do not use headings which HTMLDOC calls "web pages".

A very common mistake is to try converting a web page using:

which will likely produce a PDF file with no pages. To convert web page files you must use the --webpage option at the command-line or choose Web Page in the input tab of the GUI.

HTMLDOC does not support HTML 4.0 elements, attributes, stylesheets, or scripting.

Elements

The following HTML elements are recognized by HTMLDOC:
ElementVersionSupported?Notes
!DOCTYPE3.0YesDTD is ignored
A1.0YesSee Below
ACRONYM2.0YesNo font change
ADDRESS2.0Yes 
AREA2.0No 
B1.0Yes 
BASE2.0No 
BASEFONT1.0No 
BIG2.0Yes 
BLINK2.0No 
BLOCKQUOTE2.0Yes 
BODY1.0Yes 
BR2.0Yes 
CAPTION2.0YesSee Below
CENTER2.0Yes 
CITE2.0YesItalic/Oblique
CODE2.0YesCourier
DD2.0Yes 
DEL2.0YesStrikethrough
DFN2.0YesHelvetica
DIR2.0Yes 
DIV3.2Yes 
DL2.0Yes 
DT2.0YesItalic/Oblique
EM2.0YesItalic/Oblique
EMBED2.0YesHTML Only
FONT2.0YesSee Below
FORM2.0No 
FRAME3.2No 
FRAMESET3.2No 
H11.0YesBoldface, See Below
H21.0YesBoldface, See Below
H31.0YesBoldface, See Below
H41.0YesBoldface, See Below
H51.0YesBoldface, See Below
H61.0YesBoldface, See Below
HEAD1.0Yes 
HR1.0YesSee Below
HTML1.0Yes 
I1.0Yes 
IMG1.0YesSee Below
INPUT2.0No 
INS2.0YesUnderline
ISINDEX2.0No 
KBD2.0YesCourier Bold
LI2.0Yes 
LINK2.0No 
MAP2.0No 
MENU2.0Yes 
META2.0YesSee Below
MULTICOLN3.0No 
NOBR1.0No 
NOFRAMES3.2No 
OL2.0Yes 
OPTION2.0No 
P1.0Yes 
PRE1.0Yes 
S2.0YesStrikethrough
SAMP2.0YesCourier
SCRIPT2.0No 
SELECT2.0No 
SMALL2.0Yes 
SPACERN3.0Yes 
STRIKE2.0Yes 
STRONG2.0YesBoldface Italic/Oblique
SUB2.0YesReduced Fontsize
SUP2.0YesReduced Fontsize
TABLE2.0YesSee Below
TD2.0Yes 
TEXTAREA2.0No 
TH2.0YesBoldface Center
TITLE2.0Yes 
TR2.0Yes 
TT2.0YesCourier
U1.0Yes 
UL2.0Yes 
VAR2.0YesHelvetica Oblique
WBR1.0No 

Comments

HTMLDOC supports four special HTML comments to initiate page breaks:
<!-- HALF PAGE -->
Break to the next half page.
<!-- PAGE BREAK -->
Break to the next page.
<!-- NEW PAGE -->
Break to the next page.
<!-- NEW SHEET -->
Break to the next sheet.
<!-- NEED length -->
Break if there is less than length units left on the current page. The length value defaults to points but can be suffixed by in, mm, or cm to convert from the corresponding units.

FONT Attributes

Limited typeface specification is currently supported to ensure portability across platforms and for older PostScript printers:
Requested FontActual Font
ArialHelvetica
CourierCourier
HelveticaHelvetica
MonospaceCourier
Sans-SerifHelvetica
SerifTimes
SymbolSymbol
TimesTimes
All other unrecognized typefaces are silently ignored.

Headings

Currently HTMLDOC supports a maximum of 10000 headings and 100 chapters. These limits can be increased by changing the constants in the config.h file included with the source code.

All chapters start with a top-level heading (H1) markup. Any headings within a chapter must be of a lower level (H2 to H6). Each chapter starts a new page or the next odd-numbered page if duplexing is selected.

The headings you use within a chapter must start at level 2 (H2). If you skip levels the heading will be shown under the last level that was known. For example, if you use the following hierarchy of headings:

the table-of-contents that is generated will show:

Numbered Headings

When the numbered headings option is enabled, HTMLDOC recognizes the following additional attributes for all heading elements:
VALUE="#"
Specifies the starting value for this heading level (default is "1" for all new levels).
TYPE="1"
Specifies that decimal numbers should be generated for this heading level.
TYPE="a"
Specifies that lowercase letters should be generated for this heading level.
TYPE="A"
Specifies that uppercase letters should be generated for this heading level.
TYPE="i"
Specifies that lowercase roman numerals should be generated for this heading level.
TYPE="I"
Specifies that uppercase roman numerals should be generated for this heading level.

Images

HTMLDOC supports loading of BMP, GIF, JPEG, and PNG image files. EPS and other types of image files are not supported at this time.

Links

Currently HTMLDOC supports a maximum of 20000 links within a document. This limit can be increased by changing the constant in the config.h file included with the source code.

External URL and internal (#target and filename.html) links are fully supported for HTML and PDF output.

When generating PDF files, local PDF file links will be converted to external file links for the PDF viewer instead of URL links. That is, you can directly link to another local PDF file from your HTML document with:

META Attributes

HTMLDOC supports the following META attributes for the title page and document information:
<META NAME="AUTHOR" CONTENT="..."
Specifies the document author.
<META NAME="COPYRIGHT" CONTENT="..."
Specifies the document copyright.
<META NAME="DOCNUMBER" CONTENT="..."
Specifies the document number.
<META NAME="GENERATOR" CONTENT="..."
Specifies the application that generated the HTML file.
<META NAME="KEYWORDS" CONTENT="..."
Specifies document search keywords.

Page Breaks

HTMLDOC supports four new page comments to specify page breaks. In addition, the older BREAK attribute is still supported by the HR element: Support for the BREAK attribute is deprecated and will be removed in a future release of HTMLDOC.

Tables

Currently HTMLDOC supports a maximum of 200 columns within a single table. This limit can be increased by changing the MAX_COLUMNS constant in the config.h file included with the source code. HTMLDOC supports HTML 3.0 tables with the following exceptions:

HTMLDOC does not support HTML 4.0 table elements or attributes, such as TBODY, THEAD, TFOOT, or RULES.