Data Retriever Logo Data Retriever

The Data Retriever is a package manager for data. It downloads, cleans, and stores publicly available data, so that analysts spend less time cleaning and managing data, and more time analyzing it.

“Thanks to the Data Retriever I went from idea to results in 30 minutes, and to a submitted manuscript in two months.” – Jean Philippe Gibert

Quick Start

The Data Retriever is written in Python and run using a command line interface or an associated R package. It installs publicly available data into a variety of databases (MySQL, PostgreSQL, SQLite, MS Access) and file formats (csv, json, xml).

Installation

If you have Python installed use pip from the terminal (additional install instructions):

pip install retriever

To install the associated R package:

devtools::install_github("ropensci/rdataretriever")

Command line interface

List available datasets:

retriever ls

Install the Portal dataset into csv files:

retriever install csv portal

Install the iris dataset into an SQLite database named iris.sqlite:

retriever install sqlite iris -f iris.sqlite

Available install formats are: mysql, postgres, sqlite, access, csv, json, and xml.

R interface

List available datasets:

retriever::datasets()

Install the iris dataset into SQLite:

retriever::install('iris', 'sqlite')

Download and load data on forest fires directly into R:

iris_data <- retriever::fetch('iris')

See the documentation for more commands, details, and datasets.

Install

Python

The easiest way to install the Data Retriever is using Python. If you have Python installed run:

pip install retriever

If you don’t have Python installed we recommend installing Anaconda using the default options in the installer.

Installers

If you don’t want to install Python you can download installers for Windows, OS X and Ubuntu/Debian Linux from the Releases page.

Windows

  • Download the .exe file for the most recent release and run it

OS X

  • Download and unzip the .zip file for the most recent release
  • Move the .app file into Applications
  • Run the following commands from the Terminal to add the Data Retriever to your path:
echo "/Applications/retriever.app/Contents/MacOS" > retrieverapp
sudo mkdir -p /etc/paths.d
sudo mv retrieverapp /etc/paths.d

Ubuntu/Debian

  • Download and run the .deb file for the most recent release

Installing from source

Either use pip to install directly from GitHub:

pip install git+https://git@github.com/weecology/retriever.git

or:

  1. Clone the repository
  2. cd into the repository directory and run pip install . (you may need to include sudo at the beginning of the command depending on your system).

More extensive documentation for those that are interested in developing can be found here.

R package

The R package wraps the command line interface, so the core Data Retriever needs to be installed first by following the instructions above. You can then install the R package using the devtools package (if you don’t have devtools installed run install.packages('devtools') first):

devtools::install_github('ropensci/rdataretriever')

The R package will be available on CRAN (via install.packages())shortly.

Documentation

Full documentation is available on Read the Docs and includes details on:

Contribute

The Data Retriever is an open source project and we welcome contributions of all shapes and sizes. Resources for those interested in getting involved include: