TOC
R libraries for geoscience
In a chat with another data-oriented geologist, they mentioned a popular R package for geochemistry data (rgr) that I had not heard of. After looking in to rgr a decided to troll through the list of current CRAN packages and find all the geoscience related packages. Again, this is just at list of the libraries that are up-to-date on CRAN, others may have not been kept up to date, but that doesn’t mean we can’t rally support for them. Additionally, I would be interested to learn about libraries hosted other repositories (e.g., GitHub). If I’ve missed a R library that you use or know of, let me know! I would love to make the a more complete list!
For non-R programmers, the Comprehensive R Archive Network (CRAN) is the main repository for R packages. Packages listed on CRAN (about 17206 of them) are tested daily on multiple systems to check for dependency issues as other packages in the R environment are updated. These have been more-or-less “peer-reviewed” by R experts, unit tested, and have standardized documentation. It is a fantastic resource that benefits the entire R community.
While going through the list of packages on CRAN I was blown away by the diversity of geoscience-related packages that are available. In this list I also wanted to highlight which packages have datasets built into the package. In my pursuit to develop stronger data analysis/science skills, having a variety of datasets on hand to play with is pretty handy.
General geoscience and mapping
- geoscale - geological time scale plotting
- astrochron - a package for conducting, and learning about: (1) paleoclimate time series analysis, (2) astronomical time scale construction, and (3) the statistical integration of astrochronologies with other geochronologic/chronostratigraphic data.
- GEOmap - Topographic and geologic mapping - datasets included
- geomapdata - Set of data for use in package GEOmap - datasets included
- Globe - Plot 2D and 3D Views of the Earth - datasets included
- terrainr - Landscape Visualizations in R and ‘Unity’
- terrainmeshr - Triangulate and simplify 3D terrain meshes
- RSAGA - SAGA geoprocessing and terrain analysis - datasets included
- rayshader - Create Maps and Visualize Data in 2D and 3D
- earthtones - Derive a color palette from a particular location on Earth
Geochemistry
- rgr - Applied geochemistry exploratory data analysis
- isoplotR - Statistical toolbox for radiometric geochronology - datasets included
- GeoChemical Data ToolKIT - The GeoChemical Data ToolKIT, or in short GCDkit, is a system for handling and recalculation of whole-rock analyses from igneous rocks. It is written in R, a language and environment for statistical computing and graphics.
- OrgMassSpecR - Organic/biological mass spectrometry data analysis
- CHNOSZ - Thermodynamic calculations and diagrams for geochemistry
- ggtern - An extension to ‘ggplot2’, for the creation of ternary diagrams - datasets included
- phreeqc - R Interface to Geochemical Modeling Software
Geostatistics
- geotoolsR - Tools to improve the use of geostatistic
- georob - Robust geostatistical analysis of spatial data - datasets included
- gear - Geostatistical analysis in R - datasets included
- gstat - Spatial and spatio-temporal geostatistical modeling, prediction, and simulation - datasets included
- geostats - An introduction to statistics for geoscientists - datasets included
Hydrology
- GWSDAT - Groundwater spatiotemporal data analysis tool - includes RShiny data exploration tool!
- streamDepletr - Estimate streamflow depletion due to groundwater pumping - datasets included
- GlacierSMBM - Glacier surface mass balance model - datasets included
Oil and Gas
- Rmbal - Estimate original hydrocarbon in place and reservoir performance
- Rpvt - Estimate the PVT properties of reservoir fluids
- zFactor - GitHub Repo - Computational tools for chemical, petrochemical and petroleum engineers
Paleontology
- tidypaleo - Provides a set of functions with a common framework for age-depth model management, stratigraphic visualization, and common statistical transformations. See Dunnington et al. (2021) doi:10.18637/jss.v101.i07.
- velociraptr - Functions for downloading, cleaning, and analyzing fossil data from the Paleobiology Database
- paleobioDB - Download and process data from the paleobiology batabase
- fossil - Palaeoecological and palaeogeographical analysis tools - datasets included
- FossilSim - Simulating taxonomy and fossil data on phylogenetic trees under mechanistic models of speciation, preservation and sampling
- strap - Stratigraphic tree analysis for paleontology - datasets included
- folio - Datasets for teaching archaeology and paleontology - datasets included
- chronosphere - A package to facilitate spatially explicit analyses of (paleo)environmental/ecological research - datasets included
- vegan - Ordination methods, diversity analysis and other functions for community and vegetation ecologists.
Provenance Analysis
- fingerPro - Sediment source fingerprinting
- provenance - Statistical toolbox for sedimentary provenance analysis - datasets included
- isoplotR - Statistical toolbox for radiometric geochronology
Seismicity
- ETAS - Modeling earthquake data using ‘ETAS’ model datasets included
- bayesainETAS - Bayesian estimation of the ETAS model for earthquake occurrences
- GRTo - Tools for the analysis of Gutenberg-Richter distributions of earthquake magnitudes
- TauP.R - Earthquake traveltime calculations for 1D earth models - datasets included
Stratigraphy
- stratigrapheR - Integrated stratigraphy tools for plotting stratigraphic data
- SDAR - A tool for plotting and facilitating the analysis of stratigraphic and sedimentological data - includes datasets
- coreCT - Programmatic analysis of sediment cores using computed tomography imaging - includes datasets
- G2sd - Grain-size statistics and description of sediment - includes datasets
- EMMAgeo - End-member modeling of grain-size data
- DecomposeR - Empirical model decomposition for cyclostratigraphy
- stRat stat - GitHub Repo - My R-based digitizer for turning hand-drawn graphic logs into a digital/numeric format
Structural Geology
- RockFab - Rock fabric and strain analysis tools
Non-CRAN resources
- Steven Holland, of the University of Georgia, has a “Data Analysis in the Geosciences” course that incorporates R scripts for sandstone provenance, ternary plots, and rose diagrams.
- GRAN - Geological Survey R Archive Network - GRAN is a repository of curated R packages developed by USGS employees for USGS and external use
For those learning R
For those who do not know R but may have found a package that interests them, I suggest you check out the “swirl” package that allows you to learn R in the R console! swirl was built by a student at Johns Hopkins University and houses a lot of different coursers ranging from introduction to R, data cleaning, visualization in R, and all the way to inferential statistics and regression models. I’ve used swirl in Coursera’s/John Hopkins Biostatistics’ “Data Science in R” course series – it’s fantastic.
If you’re looking to pick up some basic R then blast right into more complex topics like geostatistics, there is the “geostats” package by Pieter Vermeesch. This package is the companion code and datasets for a course run at University College London. The CRAN repository has a link to a short .pdf textbook for the course, which includes a short introduction to R. The .pdf textbook also has exercises for each chapter that use datasets built into the package.
Lastly, if you’re looking to increase your data analysis skills and R at the same time, Steve Holland’s course Data Analysis in the Geosciences has lots of online material.
For those looking for datasets to play with
Geoscience-related datasets
As I said up top, having a variety of datasets to play around with is fantastic. I wanted to highlight the packages that include datasets within them. These datasets are great to practice skills visualization and modeling skills, as many of them have been cleaned, but most of all, they are more interesting than mtcars…
provenance – provenance includes a large dataset composed of detrital zircon ages, compositional bulk petrography data, heavy mineral composition, and major and trace element data.
folio – Datasets for teaching quantitative approaches and modeling in archaeology and paleontology. Geological datasets include ice core data, δ18O data foram and sea level data. The archaeological datasets, however, are very interesting radiocarbon dating examples and trace element data from ancient ceramics.
geostats – geostats includes 23 different datasets that are all used in examples/exercise throughout the .pdf textbook. Some highlights are:
- AFM - 630 calc-alkali basalts from the Cascade Mountains and 474 tholeiitic basalts from Iceland
- declustered - 28267 earthquakes between 1769 and 2016, with aftershocks and precursor events removed
- DZ – Detrital zircon U-Pb data of 13 sand samples from China.
- forams – Planktic foraminifera counts in surface sediments in the Atlantic Ocean fractures – A 512 × 512 pixel image of a fracture network.
strap – A package for building, analyzing, and visualizing phylogenic trees of taxa. It includes family trees of lungfish (dipnoans), trilobites (Asaphidae), and a structured version of regional stages for the Ordovician in Britain.
ggtern - A package that extends ggplot2 to create ternary plots. This package includes:
- Feldspar - feldspar composition data
- Fragments - sand composition data related to morphometric information of drainages in the Coweeta Basin, North Carolina
- SkyeLava - AFM compositions of 23 aphyric Skye lavas
Non-geoscience related
For non-geoscience-related datasets there three main packages that contain classic datasets. An advantage of using these is that are you can easily find multiple solutions to a single dataset and see how people approached and dissected these problems. The main packages composed of datasets are:
- datasets – A package of datasets including cars, ChickWeight, and AirPassengers.
- MASS – A package functions and datasets to support the textbook “Modern Applied Statistics with S” (Venables and Ripley, 2002).
- mlbench – A packages that includes benchmark datasets for machine learning. It includes classic datasets like BostonHousing, Ionosphere, and BreastCancer.
Last updated: 03/05/2022