Getting Started¶
Installing¶
Install WeaveQ using pip (you may need to sudo):
$ pip install weaveq
Note
WeaveQ only officially supports Linux for now.
Running¶
Pivot from a CSV file to a JSON file to find bikes and cars of the same
colour, writing the output to stdout:
weaveq -q '#from "csv:bikes.csv" #as b #pivot-to "js:cars.json" #as c #where b.color = c.color'
Run the same query, but write the output to another file:
weaveq -o /path/to/out/file.jsonlines -q '#from "csv:bikes.csv" #as b #pivot-to "js:cars.json" #as c #where b.color = c.color'
Supply a configuration file and use Elasticsearch results as part of the query to join Honda bikes to cars of the same colour:
weaveq -c config.json -q '#from "el:bikes" #as b #filter |make:honda| #join-to "jsl:cars.jsonlines" #as c #where b.color = c.color'
For more details, see Running Queries
The Basics¶
WeaveQ reads data from a set of data sources and uses information you provide about the relationships between these data sources to perform ‘pivot’ or ‘join’ operations.
A pivot operation selects records from one data source based on there being related records in a second data source, and then discards the records from the second data source. A join operation merges records from one data source into related records from a second data source. For a more detailed explanation, see Running Queries
As the name suggests, a data source is a WeaveQ component that retrieves data from an external source for use in join and pivot operations. WeaveQ currently supports 4 data sources for use from within command line queries: JSON lines, JSON, CSV and Elasticsearch.
Note
You can write custom data source components for WeaveQ using the WeaveQ API. For more details, see Querying from Code
WeaveQ always outputs line-separated JSON, either to stdout or to a file
you specify using the -o command line option.
Configuring¶
WeaveQ uses a configuration file to control data source settings. Currently, only the Elasticsearch and CSV data sources have settings that can be configured.
As a result, you only need to supply WeaveQ a configuration file if either:
- You want to use Elasticsearch as a data source.
- You don’t want WeaveQ to use the first row of CSV files to determine field names (in the absence of a configuration file, WeaveQ defaults to doing this).
You must specify the configuration as a JSON file and pass it to WeaveQ using
the -c option. For example:
$ weaveq -c /path/to/config.json -q ...
An example configuration file is shown below:
{
"data_sources" :
{
"elasticsearch" :
{
"hosts" : ["10.1.1.2:9200","10.1.1.3:9200"],
"timeout" : 10,
"use_ssl" : false,
"verify_certs" : false,
"ca_certs" : "/path/to/ca/certs",
"client_cert" : "/path/to/client/cert",
"client_key" : "/path/to/client/key"
}
"csv" :
{
"first_row_contains_field_names" : true
}
}
}
| Configuration Item | Description | Required? |
|---|---|---|
| elasticsearch/hosts | An array of the host names/addresses and ports of Elasticsearch nodes. | Only if using the Elasticsearch data source within a query |
| elasticsearch/timeout | Global Elasticsearch timeout. Default = 10 | No |
| elasticsearch/use_ssl | Whether or not to use SSL in Elasticsearch communication. Default = false | No |
| elasticsearch/verify_certs | Whether or not to verify SSL certificates. Default = false | No |
| elasticsearch/ca_certs | Path to CA (certificate authority) certificate files. Default = none | No |
| elasticsearch/client_cert | Path to a PEM-formatted SSL client certificate file. Default = none | No |
| elasticsearch/client_key | Path to a PEM-formatted SSL client key. Default = none | No |
| csv/first_row_names | Whether or not the first row of CSV files should be used to define field names. If not, fields will be named column_n, where n is the index (starting at 0) of the CSV column from which the field was read. Default = true | No |