|
| 1 | +# Xml2db |
| 2 | + |
| 3 | +`xml2db` is a Python package which allows loading XML data into a relational database. It is designed to handle complex |
| 4 | +schemas which cannot be easily denormalized to a flat table, without any custom code. |
| 5 | + |
| 6 | +It builds a data model (i.e. a set of database tables linked with foreign keys relationships) based on a XSD schema and |
| 7 | +allows parsing and loading XML files into the database, and get them back to XML, if needed. |
| 8 | + |
| 9 | +It is as simple as: |
| 10 | + |
| 11 | +```python |
| 12 | +from xml2db import DataModel |
| 13 | + |
| 14 | +# Create a data model of tables with relations based on the XSD file |
| 15 | +data_model = DataModel( |
| 16 | + xsd_file="path/to/file.xsd", |
| 17 | + connection_string="postgresql+psycopg2://testuser:testuser@localhost:5432/testdb", |
| 18 | +) |
| 19 | +# Parse an XML file based on this XSD |
| 20 | +document = data_model.parse_xml( |
| 21 | + xml_file="path/to/file.xml" |
| 22 | +) |
| 23 | +# Insert the document content into the database |
| 24 | +document.insert_into_target_tables() |
| 25 | +``` |
| 26 | + |
| 27 | +The data model will adhere closely to the XSD schema, but `xml2db` will perform simplifications aimed at limiting the |
| 28 | +complexity of the resulting data model and the storage footprint. |
| 29 | + |
| 30 | +The raw data loaded into the database can then be processed using [DBT](https://www.getdbt.com/), SQL views or |
| 31 | +other tools aimed at extracting, correcting and formatting the data into more user-friendly tables. |
| 32 | + |
| 33 | +`xml2db` is developed and used at the [French energy regulation authority (CRE)](https://www.cre.fr/) to process XML |
| 34 | +data. |
| 35 | + |
| 36 | +This package uses `sqlalchemy` to interact with the database, so it should work with different database backends. It has |
| 37 | +been tested against PostgreSQL and MS SQL Server. It currently does not work with SQLite. You may have to install |
| 38 | +additional packages to connect to your database (e.g. `pyodbc` which is the default connector for MS SQL Server, or |
| 39 | +`psycopg2` for PostgreSQL). |
| 40 | + |
| 41 | +**Please read the [package documentation website](https://cre-dev.github.io/xml2db) for all the details!** |
| 42 | + |
| 43 | +## Installation |
| 44 | + |
| 45 | +The package can be installed, preferably in a virtual environment, using `pip`: |
| 46 | + |
| 47 | +``` bash |
| 48 | +pip install xml2db |
| 49 | +``` |
| 50 | + |
| 51 | +## Testing |
| 52 | + |
| 53 | +Running the tests requires installing additional development dependencies, after cloning the repo, with: |
| 54 | + |
| 55 | +```bash |
| 56 | +pip install -e .[tests,docs] |
| 57 | +``` |
| 58 | + |
| 59 | +Run all tests with the following command: |
| 60 | + |
| 61 | +```bash |
| 62 | +python -m pytest |
| 63 | +``` |
| 64 | + |
| 65 | +Integration tests require write access to a PostgreSQL or MS SQL Server database; the connection string is provided as an |
| 66 | +environment variable `DB_STRING`. If you want to run only conversion tests that do not require a database you can run: |
| 67 | + |
| 68 | +```bash |
| 69 | +pytest -m "not dbtest" |
| 70 | +````` |
| 71 | + |
| 72 | +## Contributing |
| 73 | + |
| 74 | +Contributions are more than welcome, as well as bug reports, starting with the project's |
| 75 | +[issue page](https://github.com/cre-dev/xml2db/issues). |
0 commit comments