Generate and load Pandas data frames based on JSON Table Schema descriptors.
Version
v0.2
contains breaking changes:
- removed
Storage(prefix=)
argument (was a stub) - renamed
Storage(tables=)
toStorage(dataframes=)
- renamed
Storage.tables
toStorage.buckets
- changed
Storage.read
to read into memory - added
Storage.iter
to yield row by row
$ pip install datapackage
$ pip install jsontableschema-pandas
You can easily load resources from a data package as Pandas data frames by simply using datapackage.push_datapackage
function:
>>> import datapackage
>>> data_url = 'http://data.okfn.org/data/core/country-list/datapackage.json'
>>> storage = datapackage.push_datapackage(data_url, 'pandas')
>>> storage.buckets
['data___data']
>>> type(storage['data___data'])
<class 'pandas.core.frame.DataFrame'>
>>> storage['data___data'].head()
Name Code
0 Afghanistan AF
1 Åland Islands AX
2 Albania AL
3 Algeria DZ
4 American Samoa AS
Also it is possible to pull your existing data frame into a data package:
>>> datapackage.pull_datapackage('/tmp/datapackage.json', 'country_list', 'pandas', tables={
... 'data': storage['data___data'],
... })
Storage
Package implements Tabular Storage interface.
We can get storage this way:
>>> from jsontableschema_pandas import Storage
>>> storage = Storage()
Storage works as a container for Pandas data frames. You can define new data frame inside storage using storage.create
method:
>>> storage.create('data', {
... 'primaryKey': 'id',
... 'fields': [
... {'name': 'id', 'type': 'integer'},
... {'name': 'comment', 'type': 'string'},
... ]
... })
>>> storage.buckets
['data']
>>> storage['data'].shape
(0, 0)
Use storage.write
to populate data frame with data:
>>> storage.write('data', [(1, 'a'), (2, 'b')])
>>> storage['data']
id comment
1 a
2 b
Also you can use tabulator to populate data frame from external data file:
>>> import tabulator
>>> with tabulator.Stream('data/comments.csv', headers=1) as stream:
... storage.write('data', stream)
>>> storage['data']
id comment
1 a
2 b
1 good
As you see, subsequent writes simply appends new data on top of existing ones.
https://github.com/frictionlessdata/jsontableschema-py#snapshot
Please read the contribution guideline:
Thanks!