Using the R statistical data analysis language on GRASS 5.0 GIS data base files
Research report

View/ Open
Date
1999Metadata
Show full item recordCollections
- Geografi i Bergen [35]
Abstract
With the release of the open-source GIS GRASS 5.0 in early 1999, opportunities are presented for integration
with the open-source R statistical data analysis programming environment (Ihaka and Gentleman, 1996, code
obtained from [1]). In the examples presented, R is run interactively within the GRASS 5.0 environment,
transfering data by writing and reading temporary text files; the operating system here is Linux. The note
describes the implementation in R of functions needed to move data between GRASS and R, providing the user
with a basic interface between the two environments.
Development of the leading Open Source GIS — GRASS— has been moved to Baylor University in Texas,
where work on a new release incorporating floating-point raster cell values and NULL values different from zero
is now in beta testing (Byars and Clamons, 1998, Linux binary obtained from [3]). In parallel with this, the R
statistical and data analysis language, also Open Source, is maturing very rapidly, and can now execute most S and
S-PLUS code in an unmodified form. In the past, when S was available on academic license, integration between
GRASS and S existed in a loose-coupled form for integer raster cell values sampled at points given in a site layer.
The issues involved in linking two complex and fast-changing programming environments are encapsulated in a
comprehensive way in the R functions included in the code accompanying this note. While the progress reported
in this paper is based on Open Source Unix-like operating systems, it is worth noting that both GRASS and R
have been compiled for MS Windows systems. Programming techniques for R are covered in Venables and
Ripley (1997), and in materials available at the R archive [2].
In work to date, the interface used is that of the statistical analysis system, run from within the GIS environment.
Given major design differences in memory management — GRASS uses the underlying file system, while R
maps all active objects into a static area of memory allocated when the program is started, managed by a garbage
collector— and other problems, it has been necessary to decide on a representation suiting the data analysis and
visualization tasks being performed. This means here that the statistical programming environment is run from
within GRASS, permitting GRASS command line instructions, including those requiring interaction, to be issued
from within R using the system() function.
Running under Unix-family operating systems, GRASS only customizes the user’s program execution
environment, adding specific definitions needed for GRASS programs to be able to find the files and metadata
required for their work. GRASS does not then represent a major memory overhead, and R can be launched with
plenty of space for its computations. The examples reported below did not need more than 12Mb heap memory
for analysis of a data set with 57600 raster cells, and with the judicious deletion of data objects from the heap,
less would have surficed.
Publisher
University of Bergen. Department of GeographySeries
Geografi i Bergen229