Using the R statistical data analysis language on GRASS 5.0 GIS data base files
MetadataVis full innførsel
- Geografi i Bergen 
With the release of the open-source GIS GRASS 5.0 in early 1999, opportunities are presented for integration with the open-source R statistical data analysis programming environment (Ihaka and Gentleman, 1996, code obtained from ). In the examples presented, R is run interactively within the GRASS 5.0 environment, transfering data by writing and reading temporary text files; the operating system here is Linux. The note describes the implementation in R of functions needed to move data between GRASS and R, providing the user with a basic interface between the two environments. Development of the leading Open Source GIS — GRASS— has been moved to Baylor University in Texas, where work on a new release incorporating floating-point raster cell values and NULL values different from zero is now in beta testing (Byars and Clamons, 1998, Linux binary obtained from ). In parallel with this, the R statistical and data analysis language, also Open Source, is maturing very rapidly, and can now execute most S and S-PLUS code in an unmodified form. In the past, when S was available on academic license, integration between GRASS and S existed in a loose-coupled form for integer raster cell values sampled at points given in a site layer. The issues involved in linking two complex and fast-changing programming environments are encapsulated in a comprehensive way in the R functions included in the code accompanying this note. While the progress reported in this paper is based on Open Source Unix-like operating systems, it is worth noting that both GRASS and R have been compiled for MS Windows systems. Programming techniques for R are covered in Venables and Ripley (1997), and in materials available at the R archive . In work to date, the interface used is that of the statistical analysis system, run from within the GIS environment. Given major design differences in memory management — GRASS uses the underlying file system, while R maps all active objects into a static area of memory allocated when the program is started, managed by a garbage collector— and other problems, it has been necessary to decide on a representation suiting the data analysis and visualization tasks being performed. This means here that the statistical programming environment is run from within GRASS, permitting GRASS command line instructions, including those requiring interaction, to be issued from within R using the system() function. Running under Unix-family operating systems, GRASS only customizes the user’s program execution environment, adding specific definitions needed for GRASS programs to be able to find the files and metadata required for their work. GRASS does not then represent a major memory overhead, and R can be launched with plenty of space for its computations. The examples reported below did not need more than 12Mb heap memory for analysis of a data set with 57600 raster cells, and with the judicious deletion of data objects from the heap, less would have surficed.
UtgiverUniversity of Bergen. Department of Geography
SerieGeografi i Bergen