>

Background

When submitting papers to journals or online archives, one of the common submission methods is to write your paper in LaTeX and submit the figures as EPS files.

LaTeX was originally designed to produce an "Device-Independent" .dvi file that could then be converted to postscript file for printing using another program, such as dvips. In the old days, the standard way to exchange a paper was to send it as postscript (or gzipped postscript), as this contained the actual layout with embedded figures.

Recently, postscript has been eclipsed by the PDF standard, which has many more features built in, such as text compression and image compression. In order to convert a LaTeX .dvi file to PDF, many useful tools are available. In particular, I find that dvipdfm is the most effective, as it automatically includes the fonts as vector outlines instead of bitmaps, a problem that plagued the dvips/ps2pdf conversion option. It also supports many latex extensions that take advantage of PDFs advanced features, such as hyperlinking and embedding movies.

Natively, PDF supports many different different options for compressing images in order to reduce file size. Two of the common ones are:

To show what the difference between these two looks like, compare the following two images:

Lossless (FlateEncode/Zlib) Compression (470 bytes):

Lossy (DCTEncode/jpeg) compresssion with low quality factor (919 bytes):

The DCT compressed image is fuzzier, and shows "ringing" noise around the edges of the text. Note also that for images such as these that have only a few colors, DCTEncoding actually produces a larger file than FlateEncoding.

For more details about these and other image compression setting included in the PDF standard, I would reccommend reading the "Using the Image Settings" section of the Adobe PDF Creation Settings manual available from Adobe's Acrobat SDK Documentation developer's resource page. The Adobe Distiller Parameters manual is also quite useful.

One of the problems with pdf conversion is that most pdf converters ("distillers") are configured by default to always use DCTEncoding for color and grayscale images. For a scientific paper, this produces very poor results.

Note that even such reputed publishers as Science and Nature suffer from these problems: if you zoom in on a figure from an online PDF from one of these publishers, the figures clearly show the signs of DCT compression with a low quality factor.

Raterizing Figures

This all becomes even more relevant when submitting figures to an online archive server, such as the arXiv preprint server. The arXiv server limits the size of submissions to 1MB, which can make including high quality figures difficult. In particular, postscript figures of plots with a lot of points or lines can easily vastly exceed this limit.

The best solution is to take your vector postscript figures and "rasterize" them at a fixed resolution by converting them to either PNG image files or JPEG image files with a high quality setting.

A good way to do this is using ghostscript:

$ gs -r300 -dEPSCrop -dTextAlphaBits=4 -sDEVICE=png16m -sOutputFile=fig.png -dBATCH -dNOPAUSE fig.eps

You can set the image resolution in pixels per inch using the -r flag. Make sure to include the -dEPSCrop option to crop the output to the size of the bounding box. The -dTextAlphaBits=4 option will anti-alias fonts in the EPS file so they have smooth looking edges. In general, printers are capable of at least 300 dpi, although I find you can go down to 150 dpi before it becomes really noticeable to the eye. Changing the resolution is by far the biggest way to impact the file size.

Once the figures are rasterized, the raster image can be encapsulated into an EPS file using programs like ImageMagick or imgtops. The imgtops webpage has an excellent discussion of the subtleties of this step. Imagemagick is included in cygwin, making it easy to use on a windows computer.

An important consideration is what postscript compatibility level you can use. As discussed in the imgtops page linked above, newer postscript versions support much better internal image formats. Level 1 uses only ascii-coded RGB values, and is very wasteful, producing very large files. Level 2 includes support for JPEG encoded images, which produces much smaller files. Level 3 includes support for Zlib compression, making it well suited for making EPS files from png files.

In general, level 3 will produce the smallest files. Level 2 provides the best compatibility, and works well with jpeg images.

If you decide to use level 2 postscript, I recommend converting first to a jpg file. The "convert" program included Imagemagick uses a quality factor in "percent" that ranges from 0 to 100:

$ convert -quality 80 fig.png fig.jpg

I find a quality factor of 80 on high resolution images gives good compresssion without too much loss in quality. You can then to convert the image to eps using "convert" with the eps2 settings:

$ convert fig.jpg eps2:fig.eps

If you can use level 3 postscript, you can convert directly from png to eps:

$ convert fig.png eps3:fig.eps

Using level 3 postscript from a png image file for scientific figures will often produce a very small eps file. Ghostscript is compatible with these level 3 eps files, so this is often a good way to go.

By adjusting a combination of the jpeg quality factor and the image resolution at the rasterization step, you can tweak the images to get an EPS file exactly the size you need while maintaining the highest possible quality and level of detail.

More information about this rasterization process is available from the arXiv Bitmapping Figures page.

In general, I have found that is easy to produce relatively high quality raterized images this way that are small enough to squeeze inside the arXiv 1MB submisssion size limit.

However, this is not the end of the story...

PDF Conversion

This is where the real problem is. As I mentioned above, the default settings for most PDF distillers is to always compress color images using DCTEncoding, and the default quality factor is usually quite low.

This means that even though we have gone to all the trouble of tweaking our image resolution and quality factors to get the best quality images possible for our 1MB file limit, the images will be recompresssed by the PDF conversion software when somebody downloads a pdf of our paper. Furthermore, this recompression at the PDF conversion stage will involve a low quality factor, and the figures that will be in the PDF file will be of remarkably poor quality.

Fortunately, there is a way around this. The Adobe specification also defines a special set of Postscript commands that can be used inside of a postscript file to control the settings that the postscript to pdf conversion software uses. By manually editing your EPS files to include these special postscript commands, you can tell the PDF distiller exactly which types of image compression to use. This will work for any distiller that is compatible with the Adobe specifications, which fortunately includes the PDF conversion abilities of ghostscript.

In order to get high quality figures in the converted PDF, you can either tell the PDF distiller to use FlateEncode or to use DCTEncode with a high quality factor. Here are the postscript snippets that allow you to do this:

To use these, simply open up your .eps file in a text editor such as emacs and insert the text after the end of the "%" commented area at the beginning of the file. This should automatically work with dvipdfm conversion as well as the pdf conversion software used on the arXiv server.