Major Document Formats
| Plain Text | XHTML | DocBook | OpenDocument Text | Word | ||
|---|---|---|---|---|---|---|
| Extension | txt | html / htm | xml | odt | doc | |
| Text or binary | text | text (based on XML) | text (based on XML) | text (based on XML) | binary | binary |
| Open or proprietary | open | open | open | open | proprietary | open |
| Data on document structure | - | + | + (with styles) | + | + (with styles) | - |
| Data on document layout | - | Transitional: + | - (assumed by XSL-FO) | + | + | + |
| Strict: - (assumed by CSS) | ||||||
| Metadata | - | + | + | + | + | + |
| Main usage | when there is no need for layout | online publishing | print publishing | print publishing | print publishing | print publishing |
| Open software for authoring | text editors | text editors / HTML editors | text editors / XML editors | OpenOffice.org Writer | OpenOffice.org Writer | OpenOffice.org Writer / PDF converters |
| Proprietary software for authoring | text editors | text editors / HTML editors | text editors / XML editors | StarOffice Writer | Word | Acrobat / PDF converters |
| Backward and future compatibility | very high | high | rather high | rather high | dubious | rather high |
| Compatibility between differenct environments | absolute | full | full | full | problematic | full |
| Degree of layout preservation | (no layout) | flexible depending on the environment | reasonable | reasonable | reasonable | complete |
| Distribution | wide, esp. among computer geeks | wide | popular for technical documentation among computer geeks | still limited but widening | still very wide, esp. among naive Windows users | relatively wide |
How to Choose a Document Format
- General criteria
- Text format > binary format
- Open format > proprietary format
- Algorithm
- Is layout important?
- No --> plain text (preferably in UTF-8 without BOM): priority level 1 (Geeks prefer plain text)
- Yes -->
- How is it necessary to publish?
- Online -->
- XHTML Strict (with CSS): priority level 2
- XHTML Transitional (without CSS): priority level 3
- Print -->
- As a pivot -->
- Non-WYSIWYG authoring environment --> DocBook: priority level 2
- WYSIWYG authoring environment --> OpenDocument Text: priority level 2
- For complete layout preservation --> PDF: priority level 3
- For the comsumption of those who are still stuck in software called Word --> Word: priority level 10 (Word is not a document exchange format; We can put an end to Word attachments)
- For those who want to make sure that their documents will be illegible in the future --> Word: priority level 1 ;-)
- As a pivot -->
- Online -->
- How is it necessary to publish?
- Is layout important?
Some Stupid Idiosyncrasies of DocBook and OpenDocument Text
- DocBook
- Elements
firstnameandsurnameshould be renamed asgivennameandfamilynamefor linguistic and cultral neutrality.
- Elements
- OpenDocument Text
- The first character of each word comprising a style name is generally capitalized (e.g.,
Bibliography Heading, but in the following style names only the first character of the first word is capitalized (e.g.,Complementary close, which should beComplementary Close. It is a shame that the format has become an ISO standard with such an inconsistent naming convention.- Paragraph styles:
Complementary close,First line indent,Footer left,Footer right,Frame contents,Hanging indent,Header left,Header right,Object index 1,Object index heading,Table index 1,Table index heading,Text body,Text body indent - Character styles:
Endnote anchor,Line numbering,Main index entry
- Paragraph styles:
- The first character of each word comprising a style name is generally capitalized (e.g.,