Quickstart¶
intake-sql
provides quick and easy access to tabular data stored in
sql data sources.
Installation¶
To use this plugin for intake, install with the following command:
conda install -c intake intake-sql
In addition, you will also need other packages, depending on the database you wish to talk to. For example, if your database if Hive, you will also need to install pyhive.
The plugins¶
intake-sql
provides three data plugins to access your data, plus a catalogue plugin. These
will be briefly described here, but see also the API documentation for specifics on parameter
usage. All make use of sqlalchemy for the connection and data transfer, and the specifics
of what should appear in a connection string, the list of supported backends and optional
dependencies can all be found in its documentation.
The plugins:
SQLSource
: this is the one-shot plugin, which requires the least configuration. It passes on your parameters topd.read_sql
, and so you can specify a table or full-syntax query. Since there is no partitioning, the query should be on a small dataset, or an aggregating query, whose output table size fits comfortably in memory.SQLSourceAutoPartition
: this is restricted to reading from tables, as opposed to general queries but, given a column to use for indexing, it can automatically determine the appropriate partitioning in various ways. The index column should be something with high cardinality and indexed in the database systems.