Introduction

This tutorial is aimed at both people who want to learn PostScript and those who simply want to be able to modify the behaviour of PostScript programs (eg, change paper sizes, or scale and shift the output of pre-generated PostScript)


Overview

The tutorial is generally organised into sections which should be attempted in the order presented if you are interested in learning the language.

The tutorial covers a large chunk of what you need to know, however it deals almost exclusively with PostScript Level 1, not the newer features of Level 2.

If you are interested only in finding recipies to address specific problems, feel free to look at the Common Tasks sections without looking at the rest of the tutorial - all of the examples are commented and are available as seperate files within the examples directory.


PostScript History

PostScript was created in 1984 at a time when most printers were only able to deal with strings of binary escape sequences to tell them how to make marks on the page.

Printers of the time generally didn't have the computing power to do any page layout themselves - they simply followed the bitstream that they were given and translated that to the page directly.

PostScript was a big change. Apple and Adobe teamed up to make the first LaserWriter range which included PostScript as the render engine.

The LaserWriter needed enough memory to store all the paths and strokes made on the page and had to be able to render them quickly enough to keep up with the laser engine. To do this, the LaserWriter rendered the page in discreet bands, of perhaps an inch at a time, then fed the raw data into the laser engine while the processor went to work on the rest of the page.

Naturally this complexity meant that the printer cost a great deal more than the competition at the time, however because it was so flexible, PostScript quickly became the de facto standard for the publishing industry.

Since the LaserWriter, PostScript printers have been increasing in memory and resolution. It is now common for a printer to have enough render space to hold two or more complete pages in memory. The processing power has also increased, almost to the point of rivalling desktop PCs in terms of raw computing speed for some of the top of the range PostScript printers.


Products

PostScript has now grown into several other products, including rtable PoDocument Format and the Display PostScript System.

Freely available versions of the interpreter also exist, most notably GhostScript, which allow further development within the language without the need for a PostScript printer.


Interpreter Environment I

PostScript is an interpreted language. Tokens are parsed from an input stream and are interpreted immediately. There is no form of compiled PostScript, except perhaps a byte-coded form for DPS and PDF, however these are still interpreted and are not native machine code)

PostScript is also stack-based. The language grew out of research work done at about the same time as other stack-based languages, such as Forth.

There are some notable differences with Forth, however: Forth only allows integers to be pushed and manipulated on the stack, whilst PostScript allows any object, making for a much richer programming environment.


File Format

PostScript files are generally plain-text encoded rather than binary. This allows for greater portability, since there is no need to be concerned about escaping dangerous characters when moving or copying PostScript data around a network.

Lines of text are usually kept to below 80 characters, but there is no hard limit on this since PostScript places no significance on file layout - linefeeds and carriage returns are treated in exactly the same way as tabs and spaces.

Commenting PostScript code is supported and encouraged - any text following after an unprotected percent mark ('%') is ignored.

Traditionally, PostScript files have begun with the magic '%!' string to identify them. Strings immediately following the '%!' mark are used to denote which form of PostScript follows - without one, PostScript level 1 is assumed.

Adobe have also defined a set of comments that are placed within PostScript files known as the Document Structuring Convention. These are used to denote where procedures are defined, page environment is set up and where each individual page begins and ends.

DSC comments are primarily aimed at allowing print spool engines to choose the best printer to handle the PostScript file. Other specialist functions can also be managed, such as printing pages in reverse or discrete orders, and printing in 2-up or duplex modes. These comments are discussed in more depth later on in this tutorial.


PostScript Objects

PostScript programs tend to work on collections of objects via the operand stack.

Objects are regarded as either simple or complex, depending on whether the data they contain requires any extra memory - integers and reals do not require any extra storage and are simply stored within the object.

Other objects, such as strings, arrays, procedures and dictionaries are store their data in main memory and these objects are used to refer to them - the data for these objects is manipulated indirectly through pointers.

Usually the difference isn't noticable, however there are instances where these properties can come in handy, which we'll get to later on.

Numbers

Numbers may be real or integer within PostScript. Real numbers use the usual exponent form (eg, 123.4e5).

There is also the radix number form, which allows specifying numbers of different base, usually used for binary (2#10010), octal (8#7732) and hexidecimal (16#FFFE).

Strings

Strings are specified by three methods:

For literal text, the following C-like escape sequences are supported:

Linefeeds within strings may be escaped with backslash at the end of the line, otherwise they are treated as \n If a backslash precedes a character that is not part of the list above, it is ignored.

For hexadecimal data, the one catch to be wary of is having an odd number of digits in the string. For instance, <901FA3E> is properly formed, however <901FAE> would be interpreted as a string with the hexadecimal characters 90, 1F and 0A.

Names

Any token that cannot be interpreted as a number will be treated as a name object. Name objects may contain any character apart from spaces and other delimiters (eg brackets, < and >, %). This means that punctuation may also be used in names (eg @, ., #, and $ are all valid name characters).

Take care when choosing names that look like numbers in either the real or radix form - 23A1 is a valid name, however 23E1 is interpreted as a real, and 23#1 as a radix number.

A name may be prepended with a literal / character to indicate that the following is a literal name. Names that begin with // are termed immediately evaluated names.

The difference between the forms of names is discussed later in the section PostScript Name Resolution.

Arrays

Arrays are delimited by the [ and ] characters. Each token after the first [ is parsed and executed one at a time until the last ] is encountered, at which point the complete array is constructed and placed on the operand stack.

An example of an array could be

 [ /blah (foo) 7331 ]

Procedures

Procedures are delimited by the { and } characters.

In essence, procedures are merely executable arrays, however they are constructed quite differently. Whereas the simple array is evaluated element for element until the final ] is encountered, procedure content is essentially treated as literal data until the final } is found.

Whilst constructing a procedure, the interpreter pushes elements onto the operand stack, but does not execute them. Once the } is encountered, the array is constructed then marked as executable and pushed back onto the operand stack.

An example procedure would be

 { (blah) print }

Dictionaries

Dictionaries are one of the most fundamental constructs of the PostScript language. Dictionaries are similar to Perl's hashes in that each element in a dictionary has a name and a value, and individual items can be retrieved from a dictionary by their name.

Miscellaneous Objects

There are several other PostScipt object types which fill out the language, although numbers, strings, names, arrays and dictionaries are probably the most important. Some of these are:

boolean

These are the values true and false

mark

The mark object is used primarily for array construction

save

This object type is used as a token for memory states

file

Any filestream that the interpreter has access to can be accessed through these


Stacks

As mentioned earlier, PostScript is a stack-based language. Most of the operations undertaken by PostScript programs are concerned with the operand and dictionary stacks.

The PostScript interpreter manages it's own state on the execution stack, however this will not be covered in this tutorial as it is generally not useful for a PostScript program to modify.

For graphical work, the graphics stack is also used, and this is covered in the Graphics Stack section later in the tutorial.

Operand Stack

The operand stack is the main stack used by PostScript programs to perform operations. Elements are pushed onto the operand stack in the correct order, operators are invoked to manipulate them, then the results are pushed back onto the operand stack.

Dictionary Stack

The dictionary stack is quite different from the operand stack and is mainly used for looking up the values of defined names.

Dictionaries can be created and manipulated on the operand stack as any other object is, however the begin and end operators are used to push and pop individual dictionaries onto and off the dictionary stack.

PostScript Name Resolution

When the interpreter finds a name object to execute, it first looks for the relevant value in the uppermost dictionary on the dictionary stack. If it finds the appropriate entry, the value is pushed onto the operand stack and executed.

If the uppermost dictionary does not contain a value to match the required name, the interpreter will consult the dictionary further down the stack until the name is resolved or the dictionary stack is exhausted, in which case the interpreter returns with an error.

The interpreter may look up names in the dictionary stack at several points when parsing a PostScript program, depending on the type of name being resolved:

literal names

Literal names are not resolved by the interpreter.

executable names

Any name object which is not literal is looked up by the interpreter unless the name is found during the construction of a procedure, in which case the name lookup is deferred.

immediately evaluated names (Level 2 only)

These names are looked up and resolved from the dictionary stack as soon as they are encountered by the interpreter, regardless of whether a procedure is being constructed.


Operators I

PostScript defines some 400 operators to manipulate objects on the stack, graphics state and other interpreter environment settings.

Most operators are given arguments and place their output on the operand stack, and the following notation has been used to illustrate their behaviour:

operator

operand_1 operand_2 ... operand_n => result_1 result_2 ... result_n

Here, stack contents are laid out horizontally. Conceptually, the bottom of the stack is to the right, the top to the left.

In cases where the operator takes no arguments or produces no output, -- is used as the operand or result list.

Where the stack bottom is encountered, the symbol |-- is used. The following shorthand is used to denote which type is used as arguments or output:

any

operand may be of any type

Some operators are polymorphic in that they will do different things depending on the arguments they are presented with. For instance, the length operator returns the lengths of either strings, dictionaries or arrays. The copy operator is used to copy n stack items when given an integer, whereas if given two strings, it will copy the content of the first string into the first portion of the second.

Operators that are polymorphic may be presented more than once in the following sections, depending on the type of the arguments being presented.

Please note that this is by no means a complete list of all of the operators available - these are the ones that are the most generally useful. For a complete list, see the PostScript Language Reference Manual, or the appropriate PDFs from the Adobe website.


Output Operators

Output operators are a programmers best friend in PostScript - with these you can dump stack contents, individual values or even normal program output.

If you are using GhostScript, all output will appear on standard out.

print

string => --

Print takes the string and prints it to the current output filestream (usually stdout). Print supports the literal string escape sequences detailed earlier, so the command

 (Hello\nthere) print

would produce

 Hello
 there

as output.

=

any => --

Prints a string representation of the object. For objects that can't be converted directly using cvs it will print '--nostringval--'.

==

any => --

This is more useful command for dumping the content of objects as it attempts to produce simple PostScript code which will reproduce the object. Like =, it isn't able to produce text to recreate things like dictionaries or packed arrays, in which case it will print the type of the object as '-dict-'.

stack

-- => --

Dumps the current content of the operand stack using the = operator. Elements are listed vertically - topmost first - until the stack is exhausted. This operator does not affect the stack in any way.

pstack

-- => --

This is similar to the original stack operator, however it uses the == operator to print the results.

=only

-- => --

This is the same as the = operator, however it does not produce the trailing \n character.

==only

-- => --

This is the same as the == operator, however it does not produce the trailing \n character.


Stack Operators

pop

any => --

Removes and discards the top stack element.

exch

any_1 any_2 => any_2 any_1

Exchanges the top two elements on the stack.

dup

any_1 => any_1 any_1

Duplicates the top element on the stack.

copy

any_1 ... any_n n => any_1 ... any_n any_1 ... any_n

Duplicates the top n elements on the stack.

index

any_n ... any_0 n => any_n ... any_0 any_n

Duplicate the nth element on the stack.

roll

any_(n-1) ... any_0 n j => any_((j-1)%n) ... any_0 any_(n-1) ... any_(j%n)

Roll takes n elements and shifts them j times on the stack. For example, for a stack with

 (foo) (bar) 123 (seven) 11

here is what happens with the following rolls:

 4 1 roll => (foo) 11 (bar) 123 (seven)
 4 -1 roll => (foo) 123 (seven) 11 (bar)
 4 6 roll => (foo) (seven) 11 (bar) 123
count

|-- any_1 ... any_n => |-- any_1 ... any_n n

Counts the number of items on the stack.

clear

|-- any_1 ... any_n => |--

Removes all the elements from the stack.

mark

-- => mark

This operator pushes a mark object onto the stack.

counttomark

mark any_1 ... any_n => mark any_1 ... any_n n

Counts the number of items on the stack down to the first mark object.

cleartomark

mark any_1 ... any_n => --

Removes the elements on the stack down to (and including) the first mark object.


Math Operators

add

number_1 number_2 => sum

Adds two numbers.

sub

number_1 number_2 => difference

Subtracts number_2 from number_1

div

number_1 number_2 => quotient

Divides number_1 by number_2. The result is of the real type.

idiv

number_1 number_2 => quotient

Performs and integer division of number_1 by number_2. The result is an int.

mod

number_1 number_2 => remainder

Performs the modulus operation on number_1 and number_2. The result is an int.

mul

number_1 number_2 => product

Multiplies number_1 and number_2. The result is int or real depending on the types of number_1 and number_2.

neg

number => -number

Negates the number given (ie. multiplies by -1)


String Operators

string

n => string

Creates an empty (consists entirely of NULL characters) string of length n

length

string => length

Returns the length (in characters) of the given string.

get

string index => int

Retrieves the index character from string string as an integer. Note that the beginning of the string is index 0.

Also, note that string is not left on the stack after this operation. If string is important to you, make a copy using dup.

put

string index int => --

Puts the supplied int at position index inside string string.

Like get, the original string is not left on the stack.


Array Operators

[

-- => mark

The [ operator is essentially the same as the mark operator, however it is preferred in cases where you are building an array.

]

mark any_1 ... any_n => array

This operator is used to construct arrays. Counts all objects down to the first mark object, creates an array of the appropriate size, then puts the objects into the array.

Note that this operator is essentially the same as the following code:

 counttomark array astore exch pop
array

n => array

Creates an empty (all elements are null objects) array of the given size and places it on the stack.

length

array => n

Returns the length of the array array.

get

array index => any

Get the element at index index from array array. Like strings, array elements are numbered from 0.

put

array index any => --

Puts the given object into the array at the appropriate position.

astore

any_0 ... any_n array => array

For an array of length n, takes the first n elements and places them in the array, in the same order as they appear on the stack.

aload

array => any_0 ... any_n array

Unloads the contents of the given array onto the stack in the same order as they appear in the array. Note that the array remains the topmost element on the stack, and may need to be removed.


Programming I

We've covered the main operators of PostScript in a fairly bland way, and haven't put them together in any code as yet. Let's do so ...


Development Environment

PostScript interpreters are generally hard to talk to as they are typically buried inside a print engine and don't normally interact with the outside world directly - usually they will be given a PostScript job by the print queue manager and simply execute it.

Some printers do allow you to connect to the interpreter directly. Usually this is via the serial port (on older laser printers, such as the Mannesman Tally) or they allow you to connect to a particular network port and get to the interpreter that way (such as most of the HP range of printers).

Whilst you are talking to the interpreter directly, you probably won't be able to submit jobs for printing however, so don't do this to your office printer.

For development work, PostScript interpreters such as GhostScript are used instead. These offer a number of benefits for the PostScript programmer in that they generally allow an unlimited amount of memory for the interpreter and put their output to file or to the display, saving a considerable amount in paper costs.

Usually there are a few enhancements or shortcuts that are taken, or other little differences between the environment as presented to a PostScript program that executes in a print engine interpreter and that found in something like GhostScript, so it is a good idea not to rely on these functions in your programming.

Such functions include the ability to run other PostScript programs directly from a filesystem, being able to read shell environment variables, and other special modes (like being able to tinker with some system parameters).


Tools

As with most languages, in order to develop for them you will need an editor (I generally use vim as it has PostScript syntax highlighting), and something to run your code on (I use GhostScript for this).


Program Layout

I tend to use a style similar to the One True Brace style for formatting C code. Program segments will look something like this:

 /Example {           % string number  ==>  --

   SomeVar eq {       % string

     Process          % --

   } {                % string

     pop              % --

   } ifelse

 } def


GhostScript Command Mode

GhostScript is generally run on individual files, rather than presenting a prompt for direct PostScript input, however this mode is very useful for testing theories about the language or other experimenting. The GhostScript prompt will look like the following:

 mangala[~] 6>: gs -sDEVICE=nullpage
 GNU Ghostscript 5.10 (1998-12-17)
 Copyright (C) 1997 Aladdin Enterprises, Menlo Park, CA.  All rights reserved.
 This software comes with NO WARRANTY: see the file COPYING for details.
 GS>

There are a few differences between the command line mode and the file mode:

For example, the above code becomes a lot less readable on the command line:

 GS>/Example { SomeVar eq { Process } { pop } ifelse } def


Operators II

The operators given here are probably more useful than those mentioned earlier and cover things like tests, control structures and dictionary manipulation.


Defining Procedures

Procedures are created as noted earlier, however in order to be more useful, they would be better associated with names in the topmost dictionary on the dictionary stack. That way, the procedure can be called by name rather than rewriting it every time it's needed. This is what the def operator is for.

def

key any => --

Associates any with key and creates an entry in the topmost dictionary on the dictionary stack.

For example, here is a GhostScript session showing its use:

 GS>/foo { (blah\n) print } def
 GS>foo
 blah
 GS>

Note that anything may be associated with names in the current dictionary - not just procedures. This allows us to create variables and other data and refer to them by name:

 GS>/i 0 def
 GS>/j { i == /i i 1 add def } def
 GS>j j j
 0
 1
 2
 GS>


Boolean and Relational Operators

true

-- => true

Pushes a boolean true object onto the stack.

false

-- => false

Pushes a boolean false object onto the stack.

eq

any_1 any_2 => { true | false }

Determines whether any_1 and any_2 are equal and returns true or false.

Note that complex objects (ie, not numbers) are generally considered to be unequal unless both objects reference the same data. The one exception is two strings that match each other, even if they reside in different areas of memory. This is shown below:

 1 0 eq => false
 1 1 eq => true

 (blah) (blah) eq => true
 (blah) (foo) eq => false

 /blah /blah eq => true

 [ 1 ] [ 1 ] eq => false
 [ 1 ] dup eq => true

 1 dict 1 dict eq => false
 1 dict dup eq => true
ne

any_1 any_2 => { true | false }

Determines whether any_1 and any_2 are unequal and returns true or false.

Follows the same equality measure mentioned above for eq.

ge

number_1 number_2 => { true | false }

string_1 string_2 => { true | false }

Tests whether number_1 is greater than or equal to number_2 or that string_1 is lexically greater than or equal to string_2. Returns a boolean.

gt

number_1 number_2 => { true | false }

string_1 string_2 => { true | false }

Tests whether number_1 is greater than number_2 or that string_1 is lexically greater than string_2. Returns a boolean.

le

number_1 number_2 => { true | false }

string_1 string_2 => { true | false }

Tests whether number_1 is less than or equal to number_2 or that string_1 is lexically less than or equal to string_2. Returns a boolean.

lt

number_1 number_2 => { true | false }

string_1 string_2 => { true | false }

Tests whether number_1 is less than number_2 or that string_1 is lexically less than string_2. Returns a boolean.

and

bool_1 bool_2 => bool_3

int_1 int_2 => int_3

Performs logical and when presented with boolean arguments, returning boolean. Performs bitwise and when presented with integer arguments, returning an integer.

not

bool_1 bool_2 => bool_3

int_1 int_2 => int_3

Performs logical not when presented with a boolean argument, returning boolean. Performs bitwise not when presented with an integer argument, returning an integer.

or

bool_1 bool_2 => bool_3

int_1 int_2 => int_3

Performs logical or when presented with boolean arguments, returning boolean. Performs bitwise or when presented with integer arguments, returning an integer.

xor

bool_1 bool_2 => bool_3

int_1 int_2 => int_3

Performs logical exclusive or when presented with boolean arguments, returning boolean. Performs bitwise exclusive or when presented with integer arguments, returning an integer.

bitshift

int_1 n => int_2

Shifts int_1 n bits to the left (negative values for n shift to the right)


Control Operators

if

bool proc => --

Executes procedure proc if boolean bool is true.

For example:

 GS>10 10.0 eq { (they're equal!\n) print } if
 they're equal!
 GS>
ifelse

bool proc_1 proc_2 => --

Executes procedure proc_1 if boolean bool is true, otherwise will execute procedure proc_2

 GS>(blah) (blah) eq { (equal\n) print } { (unequal\n) print } ifelse
 equal
 GS>30 (llama) eq { (equal\n) print } { (unequal\n) print } ifelse
 unequal
 GS>
for

start delta finish proc => --

The PostScript for is very similar to most other implementations, however PostScript implements things a little differently. For each time procc is executed, the current value of the index is pushed onto the stack beforehand. This allows proc complete access to the stack while it executes. For example:

 GS>0 1 5 { == } for
 0
 1
 2
 3
 4
 5
 GS>
forall

array proc => --

string proc => --

dict proc => --

This operand is used to traverse arrays, strings and dictionaries and apply proc to each item within the object.

In the case of arrays, proc is executed on an individual object. When strings are traversed, proc is executed for an individual character (represented as an integer). For dictionaries, proc is supplied with each paired key and value in turn. For example:

forall on arrays:

 GS>[ 1 2 3 4 ] { == } forall
 1
 2
 3
 4

forall on strings:

 GS>(1234\n) { == } forall
 49
 50
 51
 52
 10

forall on dictionaries:

 GS>4 dict dup begin /one 1 def /two 2 def /three 3 def end
 G1{ exch ==only ( ) print == } forall
 /three 3
 /one 1
 /two 2
 GS>
repeat

count proc => --

Executes proc count times. For example:

 GS>4 { (blah\n) print } repeat
 blah
 blah
 blah
 blah
 GS>
loop

proc => --

exit

-- => --

The loop and exit operands are paired to create loops with arbitrary exit conditions.

For example:

 GS>/i 0 def
 GS>{ i 4 eq { exit } if i == /i i 1 add def } loop
 0
 1
 2
 3
 GS>
stopped

proc => bool

stop

-- ==> --


Dictionary Operators

length

-- => --

load

-- => --

get

-- => --

put

-- => --

known

-- => --

where

-- => --

currentdict

-- => --


End of part 1

Unfortunately this was as far as I wrote when I presented this talk to the Linux Users of Victoria Programmers SIG on 27th November 2001.

I will be writing the rest and presenting a followup tutorial early in 2002.

Here's what is yet to come:


Further Reading

Apart from the references given below, I would recommend having a look through the Adobe site for more information on PostScript language programming.

Have a good look around for the PostScript operator guide which should be available from somewhere on the site (this is essentially a PDF version of the operator summary given in the Language Reference Manual) - it is a very very useful thing to have around whilst programming.

The PostScript Language Tutorial and Cookbook has a number of well commented examples showing how the language works. There are electronic copies of these programs available on the Adobe site, and give a good grounding in the language.

There are also a large number of tutorials on the web and in newsgroups. The newsgroup alt.lang.postscript generally has an up-to-date FAQ for resources, hints and tips.


References

Adobe Systems Incorportated, PostScript Language Reference Manual, 2nd Edition, Addison-Wesley, 1990 ISBN 0-201-18127-4

Adobe Systems Incorportated, PostScript Language Reference Manual, Addison-Wesley, 1985 ISBN 0-201-10174-2

Adobe Systems Incorportated, PostScript Language Tutorial and Cookbook, Addison-Wesley, 1985 ISBN 0-201-10179-3