Web dsl.org

Typesetting and Word Processing

If you're coming to Linux with a Microsoft Windows or Apple MacOS background, or from some other non-Unix computing environment, you are likely used to one approach to "word processing." In these environments, most writing is done in word processors -- large programs that offer a vast array of formatting options and that store their output in proprietary file formats. Most people use word processors no matter where the intended output will go (even if it's just your diary).

Word processors, from complete suites like StarOffice to commercial favorites like WordPerfect, are available for Linux -- and have been for years. However, the standard personal-computing paradigm known as "word processing" has never really taken off on Linux -- or, for that matter, on Unix-like operating systems in general. With Linux, most writing is done in a text editor, and files are kept in plain text.

When you keep a file in plain text, you can use command-line tools to format the pages and paragraphs; add page numbers and headers; check the spelling, style, and usage; count the lines, words, and characters it contains; convert it to HTML and other formats; and even print the text in a font of your choosing -- all of which are described in the recipes in this book. The text can be formatted, analyzed, cut, chopped, sliced, diced, and otherwise processed by the vast array of Linux command-line tools that work on text -- over 750 in an average installation.

This approach may seem primitive at first -- especially to those weaned in a computing environment that dictates that all writing must be set in a typeface from the moment of creation -- but the word-processing approach can be excessive compared to what Linux provides. You can, if you like, view or print plain text in a font, with a single command -- which is what ninety percent of people want to do with a word processor ninety percent of the time, anyway; to do this, see Converting Plain Text for Output.

It's my opinion that word processing is not a forward-thinking direction for the handling of text, especially on Linux systems and especially now that text is not always destined for printed output: text can end up on a Web page, in an "eBook,"(25) in an email message, or possibly in print. The best common source for these formats is plain text. Word processing programs, and the special file formats they require, are anathema to the generalized, tools-based and plain-text philosophy of Unix and Linux (see Unix and the Tools Philosophy). "Word processing" itself may be an obsolete idea of the 1980s personal computing environment, and it may no longer be a necessity in the age of the Web and email -- mediums in which plain text content is more native than proprietary word processor formats.

If you do need to design a special layout for hardcopy, you can typeset the text. One could write a book on the subject of Linux typesetting; unfortunately, no such book has yet been written, but this chapter contains recipes for producing typeset text. They were selected as being the easiest to prepare or most effective for their purpose.

NOTE: For more information on this subject, I recommend Christopher B. Browne's excellent overview, "Word Processors for Linux".

Choosing the Right Typesetting System for the Job

Choosing the proper typesetting system to use when you are about to begin a project can be daunting: each has its own drawbacks and abilities, and to the less experienced it may not be immediately clear which is most appropriate for a particular document or project.

The following table can help you determine which system is best for a particular task. There isn't one way of doing such things, of course -- these are only my recommendations. The first column lists the kind of output you intend, the second gives examples of the kind of documents, and the third suggests the typesetting system(s) to use. These systems are described in the remaining sections of this chapter.
 INTENDED OUTPUT EXAMPLES TYPESETTING SYSTEM Printed, typeset output and electronic HTML or text file Internet FAQ, white paper, dissertation enscript; Texinfo; SGMLtools Printed, typeset output and text file man page, command reference card groff Printed, typeset output Letter or other correspondence, report, book manuscript LaTeX or LyX Printed, typeset output Brochure or newsletter with multiple columns and images LyX Printed, typeset output Envelope, mailing label, other specialized document TeX Printed text output in a font Grocery list, saved email message, to-do list enscript Printed, typeset output Poster, sign enscript; HTML; LyX; TeX Large printed text output Long banners for parties or other occasions banner
NOTE: If you really don't need a document to be typeset, then don't bother! Just keep it a plain text file, and use a text editor to edit it (see Text Editing). Do this for writing notes, email messages, Web pages, Usenet articles, and so forth. If you ever do need to typeset it later, you will still be able to do so. And you can, if you like, view or print plain text in nice fonts (see Outputting Text in a Font).

Converting Plain Text for Output

Debian: enscript'
WWW: http://www.iki.fi/~mtr/genscript/


The simplest way to typeset plain text is to convert it to PostScript. This is often done to prepare text for printing; the original source text file remains as unformatted text, but the text of the printed output is formatted in basic ways, such as being set in a font.

The main tool for converting text to PostScript is called enscript; it converts the text file that is specified as an argument into PostScript, making any number of formatting changes in between. It's great for quickly making nice output from a plain text file -- you can use it to do things such as output text in a font of your choosing, or paginate text with graphical headers at the top of each page.

By default, enscript paginates its input, outputs it in a 10-point Courier font, and puts a simple header at the top of each page containing the file name, date and time, and page number in bold. Use the -B' option to omit this header.

If you have a PostScript printer connected to your system, enscript can be set up to spool its output right to the printer. You can verify if your system is set up this way by looking at the enscript configuration file, /etc/enscript.cfg'. The line

DefaultOutputMethod: printer


specifies that output is spooled directly to the printer; changing it to stdout' instead of printer' sends the output to the standard output instead.

Even if your default printer does not natively understand PostScript, it may be able to take enscript output, anyway. Most Linux installations these days have print filters set up so that PostScript spooled for printing is automatically converted to a format the printer understands (if your system doesn't have this setup for some reason, convert the PostScript to a format recognized by your printer with the gs tool, and then print that -- see Converting PostScript).

• To convert the text file saved-mail' to PostScript, with default formatting, and spool the output right to the printer, type:
$enscript saved-mail [RET]  To write the output to a file instead of spooling it, give the name of the file you want to output as an argument to the -p' option. This is useful when you don't have a PostScript printer and you need to convert the output first, or for when you just want to make a PostScript image file from some text, or for previewing the output before you print it. In the latter case, you can view it on the display screen with a PostScript viewer application such as ghostview (see Previewing a PostScript File). • To write the text file saved-mail' to a PostScript file, saved-mail.ps', and then preview it in X, type: $ enscript -p report.ps saved-mail [RET]
$ghostview saved-mail.ps [RET]  The following recipes show how to use enscript to output text with different effects and properties. NOTE: Once you make a PostScript file from text input, you can use any of the tools to format this new PostScript file, including rearranging and resizing its pages (see PostScript). Outputting Text in a Font To output text in a particular PostScript font, use enscript and give the name of the font you want to use as a quoted argument to the -f' option. Specify both the font family and size in points: give the capitalized name of the font family (with hyphens to indicate spaces between words) followed by the the size in points. For example, Courier14' outputs text in the Courier font at 14 points, and Times-Roman12.2' outputs text in the Times Roman font at 12.2 points. Some of the available font names are listed in the file /usr/share/enscript/afm/font.map'; the enscript man page describes how to use additional fonts that might be installed on your system. • To print the contents of the text file saved-mail' on a PostScript printer, with text set in the Helvetica font at 12 points, type: $ enscript -B -f "Helvetica12" saved-mail [RET]

• To make a PostScript file called saved-mail.ps' containing the contents of the text file saved-mail', with text set in the Helvetica font at 12 points, type:
$enscript -B -f "Helvetica12" -p saved-mail.ps saved-mail [RET]  The -B' option was used in the preceding examples to omit the output of a header on each page. When headers are used, they're normally output in 10-point Courier Bold; to specify a different font for the text in the header, give its name as an argument to the -F' option. • To print the contents of the text file saved-mail' to a PostScript printer, with text set in 10-point Times Roman and header text set in 18-point Times Bold, type: $ enscript -f "Times-Roman10" -F "Times-Bold18" saved-mail [RET]

• To make a PostScript file called saved-mail.ps' containing the contents of the text file saved-mail', with text and headers both set in 16-point Palatino Roman, type:
$enscript -f "Palatino-Roman16" -F "Palatino-Roman16" -p saved-mail.ps saved-mail [RET]  Outputting Text as a Poster or Sign You can output any text you type directly to the printer (or to a PostScript file) by omitting the name of the input file; enscript will read the text on the standard input until you type C-d on a new line. This is especially useful for making a quick-and-dirty sign or poster -- to do this, specify a large font for the text, such as Helvetica Bold at 72 points, and omit the display of default headers. • To print a sign in 72-point Helvetica Bold type to a PostScript printer, type: $ enscript -B -f "Helvetica-Bold72" [RET]
[RET]
CAUTION [RET]
[RET]
WET PAINT! [RET]
C-d



72-point type is very large; use the --word-wrap' option with longer lines of text to wrap lines at word boundaries if necessary. You might need this option because at these larger font sizes, you run the risk of making lines that are longer than could fit on the page. You can also use the -r' option to print the text in landscape orientation, as described in Outputting Text in Landscape Orientation.

• To print a sign in 63-point Helvetica Bold across the long side of the page, type:


$enscript -B -r --word-wrap -f "Helvetica-Bold63" [RET] [RET] [RET] CAUTION -- WET PAINT! [RET] C-d  NOTE: To make a snazzier or more detailed message or sign, you would create a file in a text editor and justify the words on each line in the file as you want them to print, with blank lines where necessary. If you're getting that complicated with it, it would also be wise to use the -p' option once to output to a file first, and preview the file before printing it (see Previewing a PostScript File). Outputting Text with Language Highlighting The enscript tool currently recognizes the formatting of more than forty languages and formats, from the Perl and C programming languages to HTML, email, and Usenet news articles; enscript can highlight portions of the text based on its syntax. In Unix-speak, this is called pretty-printing. The following table lists the names of some of the language filters that are available at the time of this writing and describes the languages or formats they're used for.  FILTER LANGUAGE OR FORMAT ada Ada95 programming language. asm Assembler listings. awk AWK programming language. bash Bourne-Again shell programming language. c C programming language. changelog ChangeLog files. cpp C++ programming language. csh C-Shell script language. delphi Delphi programming language. diff Normal "difference reports" made from diff. diffu Unified "difference reports" made from diff. elisp Emacs Lisp programming language. fortran Fortran77 programming language. haskell Haskell programming language. html HyperText Markup Language (HTML). idl IDL (CORBA Interface Definition Language). java Java programming language. javascript JavaScript programming language. ksh Korn shell programming language. m4 M4 macro processor programming language. mail Electronic mail and Usenet news articles. makefile Rule files for make. nroff Manual pages formatted with nroff. objc Objective-C programming language. pascal Pascal programming language. perl Perl programming language. postscript PostScript programming language. python Python programming language. scheme Scheme programming language. sh Bourne shell programming language. skill Cadence Design Systems Lisp-like language. sql Sybase 11 SQL. states Definition files for states. synopsys Synopsys dc shell scripting language. tcl Tcl programming language. tcsh TC-Shell script language. vba Visual Basic (for Applications). verilog Verilog hardware description language. vhdl VHSIC Hardware Description Language (VHDL). vrml Virtual Reality Modeling Language (VRML97). zsh Z-shell programming language. To pretty-print a file, give the name of the filter to use as an argument to the -E' option, without any whitespace between the option and argument. • To pretty-print the HTML file index.html', type: $ enscript -Ehtml index.html [RET]

• To pretty-print an email message saved to the file important-mail', and output it with no headers to a file named important-mail.ps', type:
$enscript -B -Email -p important-mail.ps important-mail [RET]  Use the special --help-pretty-print' option to list the languages supported by the copy of enscript you have. • To peruse a list of currently supported languages, type: $ enscript --help-pretty-print | less [RET]


To output text with fancy graphic headers, where the header text is set in blocks of various shades of gray, use enscript with the -G' option.

• To print the contents of the text file saved-mail' with fancy headers on a PostScript printer, type:
$enscript -G saved-mail [RET]  • To make a PostScript file called saved-mail.ps' containing the contents of the text file saved-mail', with fancy headers, type: $ enscript -G -p saved-mail.ps saved-mail [RET]


Without the -G' option, enscript outputs text with a plain header in bold text, printing the file name and the time it was last modified. The -B' option, as described earlier, omits all headers.

You can customize the header text by quoting the text you want to use as an argument to the -b' option. Use the special symbol $%' to specify the current page number in the header text. • To print the contents of the text file saved-mail' with a custom header label containing the current page number, type: $ enscript -b "Page $% of the saved email archive" saved-mail [RET]  NOTE: You can create your own custom fancy headers, too -- this is described in the CUSTOMIZATION' section of the enscript man page. Outputting Text in Landscape Orientation To output text in landscape orientation, where text is rotated 90 degrees counter-clockwise, use the -r' option. • To print the contents of the text file saved-mail' to a PostScript printer, with text set in 28-point Times Roman and oriented in landscape orientation, type: $ enscript -f "Times-Roman28" -r saved-mail [RET]


The -r' option is useful for making horizontal banners by passing output of the figlet tool to enscript (see Horizontal Text Fonts).

• To output the text This is a long banner' in a figlet font and write it to the default printer with text set at 18-point Courier and in landscape orientation, type:
$figlet "A long banner" | enscript -B -r -f "Courier18" [RET]  Outputting Multiple Copies of Text To output multiple copies of text when sending to the printer with enscript, give the number as an argument to the -#' option. This option doesn't work when sending to a file, but note that lpr takes the same option (see Printing Multiple Copies of a Job). • To print three copies of the text file saved-mail' to a PostScript printer with the default enscript headers, type: $ enscript -#3 saved-mail [RET]


Selecting the Pages of Text to Output

To specify which pages of a text are output with enscript, give the range of page number(s) as an argument to the -a' option.

• To print pages two through ten of file saved-mail' with the default enscript headers, type:
$enscript -a2-10 saved-mail [RET]  To print just the odd or even pages, use the special odd' and even' arguments. This is good for printing double-sided pages: first print the odd-numbered pages, and then feed the output pages back into the printer and print the even-numbered pages. • To print the odd-numbered pages of the file saved-mail' with the default headers, type: $ enscript -a odd saved-mail [RET]

• To print the even-numbered pages of the file saved-mail' with the default headers, type:
$ In this example, grep didn't return any matches, so it's safe to assume that gentle.tex' is a TeX file and not a LaTeX file. NOTE: For more on grep and searching for regular expressions, see Regular Expressions -- Matching Text Patterns. Processing TeX Files Use tex to process TeX files. It takes as an argument the name of the TeX source file to process, and it writes an output file in DVI ("DeVice Independent") format, with the same base file name as the source file, but with a .dvi' extension. • To process the file gentle.tex', type: $ tex gentle.tex [RET]


Once you have produced a DVI output file with this method, you can do the following with it:

Processing LaTeX Files

The latex tool works just like tex, but is used to process LaTeX files.

• To process the LaTeX file lshort.tex', type:
$latex lshort.tex [RET]  This command writes a DVI output file called lshort.dvi'. You may need to run latex on a file several times consecutively. LaTeX documents sometimes have indices and cross references, which, because of the way that LaTeX works, take two (and in rare cases three or more) runs through latex to be fully processed. Should you need to run latex through a file more than once in order to generate the proper references, you'll see a message in the latex processing output after you process it the first time instructing you to process it again. • To ensure that all of the cross references in lshort.tex' have been generated properly, run the input file through latex once more: $ latex lshort.tex [RET]


The lshort.dvi' file will be rewritten with an updated version containing the proper page numbers in the cross reference and index entries. You can then view, print, or convert this DVI file as described in the previous recipe for processing TeX files.

Writing Documents with TeX and LaTeX

WWW: ftp://ctan.tug.org/tex-archive/documentation/gentle.tex
WWW: ftp://ctan.tug.org/tex-archive/documentation/lshort/


To create a document with TeX or LaTeX, you generally use your favorite text editor to write an input file containing the text in TeX or LaTeX formatting. Then, you process this TeX or LaTeX input file to create an output file in the DVI format, which you can preview, convert, or print.

It's an old tradition among programmers introducing a programming language to give a simple program that just outputs the text Hello, world' to the screen; such a program is usually just detailed enough to give those unfamiliar with the language a feel for its basic syntax.

We can do the same with document processing languages like TeX and LaTeX. Here's the "Hello, world" for a TeX document:

Hello, world
\end


If you processed this input file with tex, it would output a DVI file that displayed the text Hello, world' in the default TeX font, on a default page size, and with default margins.

Here's the same "Hello, world" for LaTeX:

\documentclass{article}
\begin{document}
Hello, world
\end{document}


Even though the TeX example is much simpler, LaTeX is generally easier to use fresh "out of the box" for writing certain kinds of structured documents -- such as correspondence and articles -- because it comes with predefined document classes which control the markup for the structural elements the document contains(27). Plain TeX, on the other hand, is better suited for more experimental layouts or specialized documents.

The TeX and LaTeX markup languages are worth a book each, and providing an introduction to their use is well out of the scope of this text. To learn how to write input for them, I suggest two excellent tutorials, Michael Doob's A Gentle Introduction to TeX, and Tobias Oetiker's The Not So Short Introduction to LaTeX---each available on the WWW at the URLs listed above. These files are each in the respective format they describe; in order to read them, you must process these files first, as described in the two previous recipes.

Good LaTeX documentation in HTML format can be found installed on many Linux systems in the /usr/share/texmf/doc/latex/latex2e-html/' directory; use the lynx browser to view it (see Browsing Files).

Some other typesetting systems, such as LyX, SGMLtools, and Texinfo (all described elsewhere in this chapter), write TeX or LaTeX output, too -- so you can use those systems to produce said output without actually learning the TeX and LaTeX input formats. (This book was written in Emacs in Texinfo format, and the typeset output was later generated by TeX.)

NOTE: The Oetiker text consists of several separate LaTeX files in the lshort' directory; download and save all of these files.

TeX and LaTeX Document Templates

WWW: http://dsl.org/comp/templates/


A collection of sample templates for typesetting certain kinds of documents in TeX and LaTeX can be found at the URL listed above. These templates include those for creating letters and correspondence, articles and term papers, envelopes and mailing labels,(28) and fax cover sheets. If you're interested in making typeset output with TeX and LaTeX, these templates are well worth exploring.

To write a document with a template, insert the contents of the template file into a new file that has a .tex' or .ltx' extension, and edit that. (Use your favorite text editor to do this.)

To make sure that you don't accidentally overwrite the actual template files, you can write-protect them (see Write-Protecting a File):

$chmod a-w template-file-names [RET]  In the templates themselves, the bracketed, uppercase text explains what kind of text belongs there; fill in these lines with your own text, and delete the lines you don't need. Then, process your new file with either latex or tex as appropriate, and you've got a typeset document! The following table lists the file names of the TeX templates, and describes their use. Use tex to process files you make with these templates (see Processing TeX Files).  TEMPLATE FILE DESCRIPTION fax.tex A cover sheet for sending fax messages. envelope.tex A No. 10 mailing envelope. label.tex A single mailing label for printing on standard 15-up sheets. The following table lists the file names of the LaTeX templates, and describes their use.(29) Use latex to process files you make with these templates (see Processing LaTeX Files).  TEMPLATE FILE DESCRIPTION letter.ltx A letter or other correspondence. article.ltx An article or a research or term paper. manuscript.ltx A book manuscript. There are more complex template packages available on the net that you might want to look at: Writing Documents with SGMLtools Debian: sgml-tools' WWW: http://www.sgmltools.org/  With the SGMLtools package, you can write documents and generate output in many different kinds of formats -- including HTML, plain text, PDF, and PostScript -- all from the same plain text input file. SGML ("Standard Generalized Markup Language") is not an actual format, but a specification for writing markup languages; the markup language "formats" themselves are called DTDs ("Document Type Definition"). When you write a document in an SGML DTD, you write input as a plain text file with markup tags. The various SGML packages on Linux are currently in a state of transition. The original SGML-Tools package (known as LinuxDoc-SGML in another life; now SGMLtools v1) is considered obsolete and is no longer being developed; however, the newer SGMLtools v2 (a.k.a. "SGMLtools Next Generation" and "SGMLtools '98") is still alpha software, as is SGMLtools-lite, a new subset of SGMLtools. In the interim, if you want to dive in and get started making documents with the early SGMLtools and the LinuxDoc DTD, it's not hard to do. While the newer DocBook DTD has become very popular, it may be best suited for technical books and other very large projects -- for smaller documents written by individual authors, such as a multi-part essay, FAQ, or white paper, the LinuxDoc DTD still works fine. And since the Linux HOWTOs are still written in LinuxDoc, the Debian project has decided to maintain the SGMLtools 1.0 package independently. The SGML-Tools User's Guide comes installed with the sgml-tools' package, and is available in several formats in the /usr/doc/sgml-tools' directory. These files are compressed; if you want to print or convert them, you have to uncompress them first (see Compressed Files). To peruse the compressed text version of the SGML-Tools guide, type: $ zless /usr/doc/sgml-tools/guide.txt.gz [RET]

• To print a copy of the PostScript version of the SGML-Tools guide to the default printer, type:


Generating Output from SGML

The following table lists the SGML converter tools that come with SGMLtools, and describes the kind of output they generate. All take the name of the SGML file to work on as an argument, and they write a new file with the same base file name and the file name extension of their output format.
 TOOL DESCRIPTION sgml2html Generates HTML files. sgml2info Generates a GNU Info file. sgml2lyx Generates a LyX input file. sgml2latex Generates a LaTeX input file (useful for printing; first process as in Processing LaTeX Files, and then print the resultant DVI or PostScript output file). sgml2rtf Generates a file in Microsoft's "Rich Text Format." sgml2txt Generates plain text format. sgml2xml Generates XML format.

• To make a plain text file from myfile.sgml', type:
$sgml2txt myfile.sgml [RET]  This command writes a plain text file called myfile.txt'. To make a PostScript or PDF file from an SGML file, first generate a LaTeX input file, run it through LaTeX to make a DVI output file, and then process that to make the final output. • To make a PostScript file from myfile.sgml', type: $ sgml2latex myfile.sgml [RET]
$latex myfile.latex [RET]$ dvips -t letter -o myfile.ps myfile.dvi [RET]
$ In this example, sgml2latex writes a LaTeX input file from the SGML source file, and then the latex tool processes the LaTeX file to make DVI output, which is processed with dvips to get the final output: a PostScript file called myfile.ps' with a paper size of US letter. To make a PDF file from the PostScript file, you need to take one more step and use ps2pdf, part of the gs or Ghostscript package; this converts the PostScript to PDF. • To make a PDF file from the PostScript file myfile.ps', type: $ ps2pdf myfile.ps myfile.pdf [RET]


Other Word Processors and Typesetting Systems

The following table describes other popular word processors and typesetting tools available for Linux. Those systems not in general use have been silently omitted.
 SYSTEM DESCRIPTION AbiWord A graphical, WYSIWYG-style word processor for Linux systems. It can read Microsoft Word files. WWW: http://www.abisource.com/ groff GROFF is the latest in a line of phototypesetting systems that have been available on Unix-based systems for years; the original in this line was roff ("runoff," meaning that it was for files to be run off to the printer). groff is used in the typesetting of man pages, but it's possible to use it to create other kinds of documents, and it has a following of staunch adherents. To output the tutorial file included with the groff distribution to a DVI file called intro.dvi', type: \$ zcat /usr/doc/groff/me-intro.me.gz | groff -me -T dvi > intro.dvi [RET]  Debian: groff' Maxwell A graphical word processor for use in X. WWW: http://www.eeyore-mule.demon.co.uk/ PostScript The PostScript language is generally considered to be a format generated by software, but some people write straight PostScript! Converting Plain Text for Output, has recipes on creating PostScript output from text, including outputting text in a font. People have written PostScript template files for creating all kinds of documents -- from desktop calendars to mandalas for meditation. The Debian cdlabelgen' and cd-circleprint' packages contain tools for writing labels for compact discs. Also of interest are Jamie Zawinski's templates for printing label inserts for video and audio tapes; edit the files in a text editor and then view or print them as you would any PostScript file. WWW: http://www.jwz.org/audio-tape.ps WWW: http://www.jwz.org/video-tape.ps StarWriter A traditional word processor for Linux systems, part of the StarOffice application suite. It can also read Microsoft Word files. WWW: http://www.sun.com/staroffice/ Texinfo Texinfo is the GNU Project's documentation system and is an excellent system for writing FAQs or technical manuals. It allows for the inclusion of in-line EPS images and can produce both TeX-based, HTML, and Info output -- use it if this matches your needs. Debian: tetex-base' WWW: http://www.texinfo.org/