Railgun is a simple and fast data processing tool. Railgun uses:
- go-reader for opening and reading from URIs,
- go-simple-serializer (GSS) for reading/writing objects to standard formats, and
- go-dfl for filtering and transforming data.
Railgun uses the Dynamic Filter Language through go-dfl. See the *_test
files in the dfl source folder on GitHub for comprehensive examples of the syntax.
go-reader can read from stdin
, http/https
, the local filesystem, AWS S3, and HDFS.
go-simple-serializer (GSS) supports bson
, csv
, tsv
, hcl
, hcl2
, json
, jsonl
, properties
, toml
, yaml
. hcl
and hcl2
implementation is fragile and very much in alpha
.
For an interactive demo, see the railgun notebook on ObservableHQ. It is very heavy, so only use WiFi.
CLI
You can use the command line tool to process data.
Usage: railgun -input_format INPUT_FORMAT -o OUTPUT_FORMAT [-input_uri INPUT_URI] [-input_compression [bzip2|gzip|snappy]] [-h HEADER] [-c COMMENT] [-object_path PATH] [-dfl_exp DFL_EXPRESSION] [-dfl_file DFL_FILE] [-output_path OUTPUT_PATH] [-max MAX_COUNT]
Options:
-aws_access_key_id string
Defaults to value of environment variable AWS_ACCESS_KEY_ID
-aws_default_region string
Defaults to value of environment variable AWS_DEFAULT_REGION.
-aws_secret_access_key string
Defaults to value of environment variable AWS_SECRET_ACCESS_KEY.
-aws_session_token string
Defaults to value of environment variable AWS_SESSION_TOKEN.
-c string
The input comment character, e.g., #. Commented lines are not sent to output.
-dfl_exp string
Process using dfl expression
-dfl_file string
Process using dfl file.
-h string
The input header if the stdin input has no header.
-hdfs_name_node string
Defaults to value of environment variable HDFS_DEFAULT_NAME_NODE.
-help
Print help.
-input_compression string
The input compression: none, bzip2, gzip, snappy (default "none")
-input_format string
The input format: bson, csv, tsv, hcl, hcl2, json, jsonl, properties, toml, yaml
-input_reader_buffer_size int
The input reader buffer size (default 4096)
-input_uri string
The input uri (default "stdin")
-max int
The maximum number of objects to output (default -1)
-output_format string
The output format: bson, csv, tsv, hcl, hcl2, json, jsonl, properties, toml, yaml
-output_uri string
The output uri (default "stdout")
-version
Prints version to stdout.
Railgun is currently in alpha. See releases at https://github.com/spatialcurrent/railgun/releases.
Search for Cuisine
~/go/src/github.com/spatialcurrent/go-osm/bin/osm_linux_amd64 -input_uri 'http://download.geofabrik.de/north-america/us/district-of-columbia-latest.osm.bz2' -ways_to_nodes -output_format geojsonl -filter_keys_keep amenity -output_uri stdout | railgun -input_format jsonl -output_format json -dfl_file ~/go/src/github.com/spatialcurrent/railgun/examples/mexican.dfl -output_uri mexican.json
Tsunami Feed
const pipeline = ["filter(@features, '(@properties?.tsunami != null) and (@properties.tsunami == 1)')", "sort(@, '@properties?.mag', true)", "map(@, '@properties?.place ?: \"\"')", "limit(@, 10)"];
(await fetch("https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/2.5_month.geojson")).json().then(earthquakes => {
result = railgun.process(earthquakes, {"dfl": pipeline, "output_format": "yaml"});
console.log(result);
})
Encrypt as Yaml / Decrypt as JSON
# Encrypt secrets.yml and output to secrets.yml.enc
read -s -p 'Password: ' password && echo && railgun_linux_amd64 -input_uri secrets.yml -output_uri secrets.yml.enc -output_passphrase $password
...
# Decrypt secrets.yml.enc and output to stdout
read -s -p 'Password: ' password && echo && railgun_linux_amd64 -input_uri secrets.yml.enc -input_passphrase $password -output_format json
CLI
The build_cli.sh
script is used to build executables for Linux and Windows.
JavaScript
You can compile GSS to pure JavaScript with the scripts/build_javascript.sh
script.
Changing Destination
The default destination for build artifacts is railgun/bin
, but you can change the destination with a CLI argument. For building on a Chromebook consider saving the artifacts in /usr/local/go/bin
, e.g., bash scripts/build_cli.sh /usr/local/go/bin
mkdir -p /usr/local/terraform
aws-vault exec default -- terraform init # to download aws provider
cp -R .terraform/plugins/linux_amd64/terraform-provider-aws_v1.43.2_x4 /usr/local/terraform
aws-vault exec default -- terraform init -plugin-dir=/usr/local/terraform
aws-vault exec default -- terraform plan
Spatial Current, Inc. is currently accepting pull requests for this repository. We'd love to have your contributions! Please see Contributing.md for how to get started.
This work is distributed under the MIT License. See LICENSE file.