USGS Digital Raster Graphics
What follows is a non-technical explanation of data corruption problems in USGS Digital Raster Graphics (DRG) products. In July-September, 1996, programmers at the USGS and the Minnesota Land Management Information Center (LMIC) discovered several problems with corrupted Tagged Image File Format (TIFF) tags. The phrase "the GeoTIFF bug" is now associated with these problems, although this is not quite an accurate description. In response to these data problems, the USGS implemented a moratorium on sales of DRG CD-ROM's.
Appropriate steps have been taken to correct the GeoTIFF problems. Bugs in USGS software and procedures have been fixed, and new quality control procedures have been implemented. The sales moratorium has been lifted. Customers who have purchased DRG CD-ROM's and are unable to use the data as a result of the GeoTIFF problems may request replacement data. The USGS will replace the CD-ROM with DRG data on a Compact Disc-Recordable (CD-R).
DRG TIFF images are in most ways very typical TIFF raster data sets. The characteristics of these data include:
The internal coordinate system of such images is extremely simple. The upper left pixel is designated, by convention, as (0,0). The X axis increases to the right, and the Y axis increases down. To display or print such a data set, software must transform the image's internal coordinate space to the coordinate space of a display device or printer. In modern software systems, these transformations are completely hidden from the end user. Note that this transformation is a characteristic of software, not of the data.
A cartographic raster data set must be capable of another type of coordinate transformation. It must be possible to relate the coordinate space of the image to real-world ground coordinates. This type of transformation is also performed by software, and requires additional information that must be supplied by the data or by the user.
The standard DRG product contains the information needed to transform internal coordinates to the plane coordinates of the Universal Transverse Mercator (UTM) map projection. Performing this transformation to ground coordinates is called georeferencing in the DRG documentation. The numerical coefficients needed to perform the transformation are called georeference information. Georeference information is in fact held in three separate places in the DRG product.
The TIFF standard was designed for desktop publishing and other non-cartographic applications. The standard does not address data georeferencing. In spite of this, the USGS chose TIFF as the DRG file format. We believe that the popularity of the format more than compensates for its lack of cartographic awareness. Large numbers of commercial software packages can already read the format, which makes DRGs instantly usable.
Geographic Information Systems (GIS) that read TIFF therefore require information that is usually not contained in the TIFF file. Without this georeference information, the TIFF image cannot be properly associated with the earth's surface, and cannot be logically registered to other cartographic data sets. (The image can still be displayed over other data sets and visually referenced to them, and can also be printed.)
Frequently, georeference information must be supplied to a GIS by the user. This means that the user must know such things as the projection and ground resolution of the raster data. How this information is transmitted to the GIS depends mostly on the software design. Common choices include keyboard entry and auxiliary data files.
The requirement to hold cartographic data in TIFF files is by no means unique to the USGS. Others have recognized the potential value of carrying georeference information in TIFF files. This would allow software to hide the TIFF-to-ground transformation from the end user, much like the TIFF-to-screen transformation is currently hidden.
The TIFF standard was designed with the flexibility needed to accommodate these kinds of new requirements. The TIFF header contains a large number of data "slots," called TIFF tags, to hold information about the image. Some of these tags were left undefined for future use. In the period 1990-1995, a number of people from industry and government worked to define a standard tag set for carrying georeference information in TIFF tags.
Like the TIFF standard itself, GeoTIFF is conceptually simple, but the exact specification is complex and technical. Those interested in the technical details of GeoTIFF, or the history of the GeoTIFF working group, should read the "GeoTIFF Format Specification."
By chance, the draft GeoTIFF standard was nearing completion at the same time the USGS was designing the DRG. The GeoTIFF working group and the USGS agreed that it would benefit both efforts for the DRG product to implement the draft GeoTIFF standard.
Georeference information is contained in three separate places in the standard USGS DRG product:
USGS DRGs are produced by Land Information Technology, Ltd. through an Innovative Partnership agreement. Land Info's operation, and the Innovative Partnership agreement between Land Info and the USGS, were both in place before the GeoTIFF standard was finalized. The USGS decided to add the GeoTIFF information to the Land Info data as part of the government-run data validation and packaging process. Software to add this information to the TIFF image files was written by USGS programmers.
Some time after this software had been tested and put into production, several bugs were introduced into the code through human error by USGS technical staff members. The bugs were not detected immediately because of other human errors in the design and use of the production and quality control systems. These bugs resulted in two different errors in GeoTIFF values.
Following the discovery of these problems, the USGS and several other organizations began inspecting the data more rigorously. This work exposed two additional errors in TIFF tags.
At this writing, four separate problems have been found in the way various TIFF tags are populated in DRG data. The four problems are explained in detail below.
All four problems are corruption of TIFF tag (header) data, not corruption of image data. None of these errors prevent a DRG TIFF image from being displayed, manipulated, printed or even georeferenced. Under some circumstances, they will cause some application software to become "confused," or produce incorrect or unexpected results. In most cases there are workarounds, but these depend on the capabilities of the particular software used to manipulate the data, and usually will require the user to have some understanding of the software's georeferencing process.
The following two problems have separate causes in the USGS software system, but have effects that will appear similar to the end user. For most purposes, they can be thought of as one problem.
Error 1 -- Omitted GeoTIFF tags. This problem simply makes DRGs like the overwhelming majority of other TIFF data: no georeference information. This is a violation of the DRG product specification, but not of the TIFF standard (TIFF does not require GeoTIFF implementation).
Error 2 -- Incorrectly valued GeoTIFF tags. This is a violation of both the DRG product specification and the GeoTIFF standard. The most common manifestation of this error is an X-direction cell size of 0. Software that depends on a correct GeoTIFF implementation may produce incorrect coordinate transformations in response to these data.
Errors 1 and 2 occur in the data with unpredictable frequency; the software bugs that caused the data corruption were activated only under certain circumstances.
Errors 1 and 2 will impact the end user only when using application software that depends on GeoTIFF tags. For example, ARC/INFO v7.0 and earlier does not implement GeoTIFF, so any problems using DRGs in ARC are not related to these data errors. Software that implements GeoTIFF is quite rare at this time, and the immediate impact of the problem should therefore be small. However, GeoTIFF implementations are already starting to appear in commercial graphics software, and data that contains these errors might cause serious problems for users within a few years.
Projection tag is set to WGS84_UTM_zone_XXN rather than PCS_NAD27_UTM_zone_XXN (where XX is the UTM zone number of the data set). This error was caused by simple oversight, not a software bug. The impacts are similar to those of error 2 above. However, this error will probably not cause software crashes, and probably will cause georeferencing results that appear to be correct or nearly correct.
Resolution tag value is set to 72 instead of 250. This error was also caused by oversight, not a software bug. It is not technically a violation of any of the standards involved, but it does lead to confusing results in some circumstances. The tag in question is used by application software to set the print resolution of the data. A value of 250 in this tag is desirable because it will allow most software to print the DRG at correct map scale by default. There is a simple work-around for this error: set the print resolution to 250 in the application printing the data.
As stated earlier, appropriate steps have been taken to correct the problems described in this paper.
Customers who have purchased DRG CD-ROM's and are unable to use the data as a result of the GeoTIFF problems may request replacement data. The USGS will replace the CD-ROM with DRG data on a Compact Disc-Recordable (CD-R)
Send specific questions and requests for assistance to email@example.com. Please include as much information as you can about your data, especially (1) the quadrangle name and scale, and (2) where and when you obtained the data.
The USGS thanks the Minnesota Land Management Information Center for extensive help in discovering and analyzing the data errors discussed in this report.