This tutorial is aimed at both people who want to learn PostScript and those who simply want to be able to modify the behaviour of PostScript programs (eg, change paper sizes, or scale and shift the output of pre-generated PostScript)
The tutorial is generally organised into sections which should be attempted in the order presented if you are interested in learning the language.
The tutorial covers a large chunk of what you need to know, however it deals almost exclusively with PostScript Level 1, not the newer features of Level 2.
If you are interested only in finding recipies to address specific
problems, feel free to look at the Common Tasks sections without looking at the rest of the tutorial - all of the examples
are commented and are available as seperate files within the examples
directory.
PostScript was created in 1984 at a time when most printers were only able to deal with strings of binary escape sequences to tell them how to make marks on the page.
Printers of the time generally didn't have the computing power to do any page layout themselves - they simply followed the bitstream that they were given and translated that to the page directly.
PostScript was a big change. Apple and Adobe teamed up to make the first LaserWriter range which included PostScript as the render engine.
The LaserWriter needed enough memory to store all the paths and strokes made on the page and had to be able to render them quickly enough to keep up with the laser engine. To do this, the LaserWriter rendered the page in discreet bands, of perhaps an inch at a time, then fed the raw data into the laser engine while the processor went to work on the rest of the page.
Naturally this complexity meant that the printer cost a great deal more than the competition at the time, however because it was so flexible, PostScript quickly became the de facto standard for the publishing industry.
Since the LaserWriter, PostScript printers have been increasing in memory and resolution. It is now common for a printer to have enough render space to hold two or more complete pages in memory. The processing power has also increased, almost to the point of rivalling desktop PCs in terms of raw computing speed for some of the top of the range PostScript printers.
PostScript has now grown into several other products, including rtable PoDocument Format and the Display PostScript System.
Freely available versions of the interpreter also exist, most notably GhostScript, which allow further development within the language without the need for a PostScript printer.
PostScript is an interpreted language. Tokens are parsed from an input stream and are interpreted immediately. There is no form of compiled PostScript, except perhaps a byte-coded form for DPS and PDF, however these are still interpreted and are not native machine code)
PostScript is also stack-based. The language grew out of research work done at about the same time as other stack-based languages, such as Forth.
There are some notable differences with Forth, however: Forth only allows integers to be pushed and manipulated on the stack, whilst PostScript allows any object, making for a much richer programming environment.
PostScript files are generally plain-text encoded rather than binary. This allows for greater portability, since there is no need to be concerned about escaping dangerous characters when moving or copying PostScript data around a network.
Lines of text are usually kept to below 80 characters, but there is no hard limit on this since PostScript places no significance on file layout - linefeeds and carriage returns are treated in exactly the same way as tabs and spaces.
Commenting PostScript code is supported and encouraged - any text following after an unprotected percent mark ('%') is ignored.
Traditionally, PostScript files have begun with the magic '%!' string to identify them. Strings immediately following the '%!' mark are used to denote which form of PostScript follows - without one, PostScript level 1 is assumed.
Adobe have also defined a set of comments that are placed within PostScript files known as the Document Structuring Convention. These are used to denote where procedures are defined, page environment is set up and where each individual page begins and ends.
DSC comments are primarily aimed at allowing print spool engines to choose the best printer to handle the PostScript file. Other specialist functions can also be managed, such as printing pages in reverse or discrete orders, and printing in 2-up or duplex modes. These comments are discussed in more depth later on in this tutorial.
PostScript programs tend to work on collections of objects via the operand stack.
Objects are regarded as either simple or complex, depending on whether the data they contain requires any extra memory - integers and reals do not require any extra storage and are simply stored within the object.
Other objects, such as strings, arrays, procedures and dictionaries are store their data in main memory and these objects are used to refer to them - the data for these objects is manipulated indirectly through pointers.
Usually the difference isn't noticable, however there are instances where these properties can come in handy, which we'll get to later on.
Numbers
Numbers may be real or integer within PostScript. Real numbers use the usual exponent form (eg, 123.4e5).
There is also the radix number form, which allows specifying numbers of different base, usually used for binary (2#10010), octal (8#7732) and hexidecimal (16#FFFE).
Strings
Strings are specified by three methods:
As literal text enclosed between ( and ) (eg, (hello))
As hexadecimal data enclosed between < and > (eg, <DEADBEEF>)
As ASCII-85 encoded data enclosed between <~ and ~> (Level 2 only)
For literal text, the following C-like escape sequences are supported:
\n - linefeed
\r - carriage return
\t - horizontal tab
\b - backspace
\f - formfeed
\\ - backslash
\( - left parenthesis
\) - right parenthesis
\ddd - character with octal code ddd
Linefeeds within strings may be escaped with backslash at the end of the line, otherwise they are treated as \n If a backslash precedes a character that is not part of the list above, it is ignored.
For hexadecimal data, the one catch to be wary of is having an odd number of digits in the string. For instance, <901FA3E> is properly formed, however <901FAE> would be interpreted as a string with the hexadecimal characters 90, 1F and 0A.
Names
Any token that cannot be interpreted as a number will be treated as a name object. Name objects may contain any character apart from spaces and other delimiters (eg brackets, < and >, %). This means that punctuation may also be used in names (eg @, ., #, and $ are all valid name characters).
Take care when choosing names that look like numbers in either the real or radix form - 23A1 is a valid name, however 23E1 is interpreted as a real, and 23#1 as a radix number.
A name may be prepended with a literal / character to indicate that the following is a literal name. Names that begin with // are termed immediately evaluated names.
The difference between the forms of names is discussed later in the section PostScript Name Resolution.
Arrays
Arrays are delimited by the [ and ] characters. Each token after the first [ is parsed and executed one at a time until the last ] is encountered, at which point the complete array is constructed and placed on the operand stack.
An example of an array could be
[ /blah (foo) 7331 ]
Procedures
Procedures are delimited by the { and } characters.
In essence, procedures are merely executable arrays, however they are constructed quite differently. Whereas the simple array is evaluated element for element until the final ] is encountered, procedure content is essentially treated as literal data until the final } is found.
Whilst constructing a procedure, the interpreter pushes elements onto the operand stack, but does not execute them. Once the } is encountered, the array is constructed then marked as executable and pushed back onto the operand stack.
An example procedure would be
{ (blah) print }
Dictionaries
Dictionaries are one of the most fundamental constructs of the PostScript language. Dictionaries are similar to Perl's hashes in that each element in a dictionary has a name and a value, and individual items can be retrieved from a dictionary by their name.
Miscellaneous Objects
There are several other PostScipt object types which fill out the language, although numbers, strings, names, arrays and dictionaries are probably the most important. Some of these are:
These are the values true and false
The mark object is used primarily for array construction
This object type is used as a token for memory states
Any filestream that the interpreter has access to can be accessed through these
As mentioned earlier, PostScript is a stack-based language. Most of the operations undertaken by PostScript programs are concerned with the operand and dictionary stacks.
The PostScript interpreter manages it's own state on the execution stack, however this will not be covered in this tutorial as it is generally not useful for a PostScript program to modify.
For graphical work, the graphics stack is also used, and this is covered in the Graphics Stack section later in the tutorial.
Operand Stack
The operand stack is the main stack used by PostScript programs to perform operations. Elements are pushed onto the operand stack in the correct order, operators are invoked to manipulate them, then the results are pushed back onto the operand stack.
Dictionary Stack
The dictionary stack is quite different from the operand stack and is mainly used for looking up the values of defined names.
Dictionaries can be created and manipulated on the operand stack as any other object is, however the begin and end operators are used to push and pop individual dictionaries onto and off the dictionary stack.
PostScript Name Resolution
When the interpreter finds a name object to execute, it first looks for the relevant value in the uppermost dictionary on the dictionary stack. If it finds the appropriate entry, the value is pushed onto the operand stack and executed.
If the uppermost dictionary does not contain a value to match the required name, the interpreter will consult the dictionary further down the stack until the name is resolved or the dictionary stack is exhausted, in which case the interpreter returns with an error.
The interpreter may look up names in the dictionary stack at several points when parsing a PostScript program, depending on the type of name being resolved:
Literal names are not resolved by the interpreter.
Any name object which is not literal is looked up by the interpreter unless the name is found during the construction of a procedure, in which case the name lookup is deferred.
These names are looked up and resolved from the dictionary stack as soon as they are encountered by the interpreter, regardless of whether a procedure is being constructed.
PostScript defines some 400 operators to manipulate objects on the stack, graphics state and other interpreter environment settings.
Most operators are given arguments and place their output on the operand stack, and the following notation has been used to illustrate their behaviour:
operand_1 operand_2 ... operand_n => result_1 result_2 ... result_n
Here, stack contents are laid out horizontally. Conceptually, the bottom of the stack is to the right, the top to the left.
In cases where the operator takes no arguments or produces no output, -- is used as the operand or result list.
Where the stack bottom is encountered, the symbol |-- is used. The following shorthand is used to denote which type is used as arguments or output:
operand may be of any type
Some operators are polymorphic in that they will do different things depending on the arguments they are presented with. For instance, the length operator returns the lengths of either strings, dictionaries or arrays. The copy operator is used to copy n stack items when given an integer, whereas if given two strings, it will copy the content of the first string into the first portion of the second.
Operators that are polymorphic may be presented more than once in the following sections, depending on the type of the arguments being presented.
Please note that this is by no means a complete list of all of the operators available - these are the ones that are the most generally useful. For a complete list, see the PostScript Language Reference Manual, or the appropriate PDFs from the Adobe website.
Output operators are a programmers best friend in PostScript - with these you can dump stack contents, individual values or even normal program output.
If you are using GhostScript, all output will appear on standard out.
string => --
Print takes the string and prints it to the current output filestream (usually stdout). Print supports the literal string escape sequences detailed earlier, so the command
(Hello\nthere) print
would produce
Hello there
as output.
any => --
Prints a string representation of the object. For objects that can't be converted directly using cvs it will print '--nostringval--'.
any => --
This is more useful command for dumping the content of objects as it attempts to produce simple PostScript code which will reproduce the object. Like =, it isn't able to produce text to recreate things like dictionaries or packed arrays, in which case it will print the type of the object as '-dict-'.
-- => --
Dumps the current content of the operand stack using the = operator. Elements are listed vertically - topmost first - until the stack is exhausted. This operator does not affect the stack in any way.
-- => --
This is similar to the original stack operator, however it uses the == operator to print the results.
-- => --
This is the same as the = operator, however it does not produce the trailing \n character.
-- => --
This is the same as the == operator, however it does not produce the trailing \n character.
any => --
Removes and discards the top stack element.
any_1 any_2 => any_2 any_1
Exchanges the top two elements on the stack.
any_1 => any_1 any_1
Duplicates the top element on the stack.
any_1 ... any_n n => any_1 ... any_n any_1 ... any_n
Duplicates the top n elements on the stack.
any_n ... any_0 n => any_n ... any_0 any_n
Duplicate the nth element on the stack.
any_(n-1) ... any_0 n j => any_((j-1)%n) ... any_0 any_(n-1) ... any_(j%n)
Roll takes n elements and shifts them j times on the stack. For example, for a stack with
(foo) (bar) 123 (seven) 11
here is what happens with the following rolls:
4 1 roll => (foo) 11 (bar) 123 (seven) 4 -1 roll => (foo) 123 (seven) 11 (bar) 4 6 roll => (foo) (seven) 11 (bar) 123
|-- any_1 ... any_n => |-- any_1 ... any_n n
Counts the number of items on the stack.
|-- any_1 ... any_n => |--
Removes all the elements from the stack.
-- => mark
This operator pushes a mark object onto the stack.
mark any_1 ... any_n => mark any_1 ... any_n n
Counts the number of items on the stack down to the first mark object.
mark any_1 ... any_n => --
Removes the elements on the stack down to (and including) the first mark object.
number_1 number_2 => sum
Adds two numbers.
number_1 number_2 => difference
Subtracts number_2 from number_1
number_1 number_2 => quotient
Divides number_1 by number_2. The result is of the real type.
number_1 number_2 => quotient
Performs and integer division of number_1 by number_2. The result is an int.
number_1 number_2 => remainder
Performs the modulus operation on number_1 and number_2. The result is an int.
number_1 number_2 => product
Multiplies number_1 and number_2. The result is int or real depending on the types of number_1 and number_2.
number => -number
Negates the number given (ie. multiplies by -1)
n => string
Creates an empty (consists entirely of NULL characters) string of length n
string => length
Returns the length (in characters) of the given string.
string index => int
Retrieves the index character from string string as an integer. Note that the beginning of the string is index 0.
Also, note that string is not left on the stack after this operation. If string is important to you, make a copy using dup.
string index int => --
Puts the supplied int at position index inside string string.
Like get, the original string is not left on the stack.
-- => mark
The [ operator is essentially the same as the mark operator, however it is preferred in cases where you are building an array.
mark any_1 ... any_n => array
This operator is used to construct arrays. Counts all objects down to the first mark object, creates an array of the appropriate size, then puts the objects into the array.
Note that this operator is essentially the same as the following code:
counttomark array astore exch pop
n => array
Creates an empty (all elements are null objects) array of the given size and places it on the stack.
array => n
Returns the length of the array array.
array index => any
Get the element at index index from array array. Like strings, array elements are numbered from 0.
array index any => --
Puts the given object into the array at the appropriate position.
any_0 ... any_n array => array
For an array of length n, takes the first n elements and places them in the array, in the same order as they appear on the stack.
array => any_0 ... any_n array
Unloads the contents of the given array onto the stack in the same order as they appear in the array. Note that the array remains the topmost element on the stack, and may need to be removed.
We've covered the main operators of PostScript in a fairly bland way, and haven't put them together in any code as yet. Let's do so ...
PostScript interpreters are generally hard to talk to as they are typically buried inside a print engine and don't normally interact with the outside world directly - usually they will be given a PostScript job by the print queue manager and simply execute it.
Some printers do allow you to connect to the interpreter directly. Usually this is via the serial port (on older laser printers, such as the Mannesman Tally) or they allow you to connect to a particular network port and get to the interpreter that way (such as most of the HP range of printers).
Whilst you are talking to the interpreter directly, you probably won't be able to submit jobs for printing however, so don't do this to your office printer.
For development work, PostScript interpreters such as GhostScript are used instead. These offer a number of benefits for the PostScript programmer in that they generally allow an unlimited amount of memory for the interpreter and put their output to file or to the display, saving a considerable amount in paper costs.
Usually there are a few enhancements or shortcuts that are taken, or other little differences between the environment as presented to a PostScript program that executes in a print engine interpreter and that found in something like GhostScript, so it is a good idea not to rely on these functions in your programming.
Such functions include the ability to run other PostScript programs directly from a filesystem, being able to read shell environment variables, and other special modes (like being able to tinker with some system parameters).
As with most languages, in order to develop for them you will need an editor (I generally use vim as it has PostScript syntax highlighting), and something to run your code on (I use GhostScript for this).
I tend to use a style similar to the One True Brace style for formatting C code. Program segments will look something like this:
/Example { % string number ==> --
SomeVar eq { % string
Process % --
} { % string
pop % --
} ifelse
} def
GhostScript is generally run on individual files, rather than presenting a prompt for direct PostScript input, however this mode is very useful for testing theories about the language or other experimenting. The GhostScript prompt will look like the following:
mangala[~] 6>: gs -sDEVICE=nullpage GNU Ghostscript 5.10 (1998-12-17) Copyright (C) 1997 Aladdin Enterprises, Menlo Park, CA. All rights reserved. This software comes with NO WARRANTY: see the file COPYING for details. GS>
There are a few differences between the command line mode and the file mode:
In command mode, the prompt will change to show how many objects are on the operand stack:
GS>1 2 3 4
Procedures must be complete within one line. This is probably the most annoying problem with command mode, however if you have cut-and-paste it's not too bad.
There is no line editor. The functions that provide a prompt, read a line and get the interpreter to execute the code are all written in PostScript, so cursor control and editing capabilities are not available beyond those provided by your terminal emulator.
For example, the above code becomes a lot less readable on the command line:
GS>/Example { SomeVar eq { Process } { pop } ifelse } def
The operators given here are probably more useful than those mentioned earlier and cover things like tests, control structures and dictionary manipulation.
Procedures are created as noted earlier, however in order to be more useful, they would be better associated with names in the topmost dictionary on the dictionary stack. That way, the procedure can be called by name rather than rewriting it every time it's needed. This is what the def operator is for.
key any => --
Associates any with key and creates an entry in the topmost dictionary on the dictionary stack.
For example, here is a GhostScript session showing its use:
GS>/foo { (blah\n) print } def
GS>foo
blah
GS>
Note that anything may be associated with names in the current dictionary - not just procedures. This allows us to create variables and other data and refer to them by name:
GS>/i 0 def
GS>/j { i == /i i 1 add def } def
GS>j j j
0
1
2
GS>
-- => true
Pushes a boolean true object onto the stack.
-- => false
Pushes a boolean false object onto the stack.
any_1 any_2 => { true | false }
Determines whether any_1 and any_2 are equal and returns true or false.
Note that complex objects (ie, not numbers) are generally considered to be unequal unless both objects reference the same data. The one exception is two strings that match each other, even if they reside in different areas of memory. This is shown below:
1 0 eq => false 1 1 eq => true
(blah) (blah) eq => true (blah) (foo) eq => false
/blah /blah eq => true
[ 1 ] [ 1 ] eq => false [ 1 ] dup eq => true
1 dict 1 dict eq => false 1 dict dup eq => true
any_1 any_2 => { true | false }
Determines whether any_1 and any_2 are unequal and returns true or false.
Follows the same equality measure mentioned above for eq.
number_1 number_2 => { true | false }
string_1 string_2 => { true | false }
Tests whether number_1 is greater than or equal to number_2 or that string_1 is lexically greater than or equal to string_2. Returns a boolean.
number_1 number_2 => { true | false }
string_1 string_2 => { true | false }
Tests whether number_1 is greater than number_2 or that string_1 is lexically greater than string_2. Returns a boolean.
number_1 number_2 => { true | false }
string_1 string_2 => { true | false }
Tests whether number_1 is less than or equal to number_2 or that string_1 is lexically less than or equal to string_2. Returns a boolean.
number_1 number_2 => { true | false }
string_1 string_2 => { true | false }
Tests whether number_1 is less than number_2 or that string_1 is lexically less than string_2. Returns a boolean.
bool_1 bool_2 => bool_3
int_1 int_2 => int_3
Performs logical and when presented with boolean arguments, returning boolean. Performs bitwise and when presented with integer arguments, returning an integer.
bool_1 bool_2 => bool_3
int_1 int_2 => int_3
Performs logical not when presented with a boolean argument, returning boolean. Performs bitwise not when presented with an integer argument, returning an integer.
bool_1 bool_2 => bool_3
int_1 int_2 => int_3
Performs logical or when presented with boolean arguments, returning boolean. Performs bitwise or when presented with integer arguments, returning an integer.
bool_1 bool_2 => bool_3
int_1 int_2 => int_3
Performs logical exclusive or when presented with boolean arguments, returning boolean. Performs bitwise exclusive or when presented with integer arguments, returning an integer.
int_1 n => int_2
Shifts int_1 n bits to the left (negative values for n shift to the right)
bool proc => --
Executes procedure proc if boolean bool is true.
For example:
GS>10 10.0 eq { (they're equal!\n) print } if
they're equal!
GS>
bool proc_1 proc_2 => --
Executes procedure proc_1 if boolean bool is true, otherwise will execute procedure proc_2
GS>(blah) (blah) eq { (equal\n) print } { (unequal\n) print } ifelse
equal
GS>30 (llama) eq { (equal\n) print } { (unequal\n) print } ifelse
unequal
GS>
start delta finish proc => --
The PostScript for is very similar to most other implementations, however PostScript implements things a little differently. For each time procc is executed, the current value of the index is pushed onto the stack beforehand. This allows proc complete access to the stack while it executes. For example:
GS>0 1 5 { == } for
0
1
2
3
4
5
GS>
array proc => --
string proc => --
dict proc => --
This operand is used to traverse arrays, strings and dictionaries and apply proc to each item within the object.
In the case of arrays, proc is executed on an individual object. When strings are traversed, proc is executed for an individual character (represented as an integer). For dictionaries, proc is supplied with each paired key and value in turn. For example:
forall on arrays:
GS>[ 1 2 3 4 ] { == } forall
1
2
3
4
forall on strings:
GS>(1234\n) { == } forall
49
50
51
52
10
forall on dictionaries:
GS>4 dict dup begin /one 1 def /two 2 def /three 3 def end
G1{ exch ==only ( ) print == } forall
/three 3
/one 1
/two 2
GS>
count proc => --
Executes proc count times. For example:
GS>4 { (blah\n) print } repeat
blah
blah
blah
blah
GS>
proc => --
-- => --
The loop and exit operands are paired to create loops with arbitrary exit conditions.
For example:
GS>/i 0 def
GS>{ i 4 eq { exit } if i == /i i 1 add def } loop
0
1
2
3
GS>
proc => bool
-- ==> --
-- => --
-- => --
-- => --
-- => --
-- => --
-- => --
-- => --
Unfortunately this was as far as I wrote when I presented this talk to the Linux Users of Victoria Programmers SIG on 27th November 2001.
I will be writing the rest and presenting a followup tutorial early in 2002.
Here's what is yet to come:
Creating Dictionaries
Dictionaries as Libraries
Dictionaries for Local Storage
Predefined Dictionaries
userdict
systemdict
statusdict
Encapsulating Existing Procedures
Debugging
Adobe Document Structure Convention
Graphics Environment I
Graphics State
Graphics State Operators
Page Operators
Graphics Stack
Miscellaneous
Common Tasks I
Printing EPS Files
Converting PS to EPS
Changing paper sizes
Transforming Layout
Graphics Environment II
Path Construction
Painting
Simple Font Use
Colour Manipulation
Bitmap Images
Interpreter Environment II
Operators II
Type Operators
File Operators
Memory Management
Common Tasks II
Page Watermark
Forms
Simple Graphing
Text Block Layout
Graphics Environment III
CTM operations
Colour Spaces
Pattern Spaces
Creating New Fonts
Common Tasks III
N-Up
Programming II
Virtual Memory
Complex Object References
File Operators
Error Handlers
Common Tasks IV
Printing ASCII
Apart from the references given below, I would recommend having a look through the Adobe site for more information on PostScript language programming.
Have a good look around for the PostScript operator guide which should be available from somewhere on the site (this is essentially a PDF version of the operator summary given in the Language Reference Manual) - it is a very very useful thing to have around whilst programming.
The PostScript Language Tutorial and Cookbook has a number of well commented examples showing how the language works. There are electronic copies of these programs available on the Adobe site, and give a good grounding in the language.
There are also a large number of tutorials on the web and in newsgroups. The newsgroup alt.lang.postscript generally has an up-to-date FAQ for resources, hints and tips.
Adobe Systems Incorportated, PostScript Language Reference Manual, 2nd Edition, Addison-Wesley, 1990 ISBN 0-201-18127-4
Adobe Systems Incorportated, PostScript Language Reference Manual, Addison-Wesley, 1985 ISBN 0-201-10174-2
Adobe Systems Incorportated, PostScript Language Tutorial and Cookbook, Addison-Wesley, 1985 ISBN 0-201-10179-3