Skip to content

Commit 14ebafd

Browse files
committed
Initial commit
0 parents  commit 14ebafd

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

56 files changed

+8257
-0
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
name: MSSQL integration tests
2+
3+
on:
4+
push:
5+
branches: [ "main" ]
6+
pull_request:
7+
branches: [ "main" ]
8+
workflow_dispatch:
9+
10+
jobs:
11+
integration-tests:
12+
runs-on: ubuntu-latest
13+
services:
14+
mssql:
15+
image: mcr.microsoft.com/mssql/server:2019-latest
16+
env:
17+
SA_PASSWORD: MyTestPassword1
18+
ACCEPT_EULA: 'Y'
19+
ports:
20+
- 1433:1433
21+
22+
steps:
23+
- name: Check out repository code
24+
uses: actions/checkout@v4
25+
26+
- name: Set up Python 3.11
27+
uses: actions/setup-python@v3
28+
with:
29+
python-version: 3.11
30+
31+
- name: Install dependencies
32+
run: |
33+
sleep 20
34+
curl https://packages.microsoft.com/keys/microsoft.asc | sudo tee /etc/apt/trusted.gpg.d/microsoft.asc
35+
curl https://packages.microsoft.com/config/ubuntu/$(lsb_release -rs)/prod.list | sudo tee /etc/apt/sources.list.d/mssql-release.list
36+
sudo apt-get update
37+
sudo ACCEPT_EULA=Y apt-get install -y msodbcsql17
38+
python -m pip install --upgrade pip
39+
python -m pip install flake8 .[tests] pyodbc
40+
41+
- name: Test with pytest
42+
run: |
43+
pytest tests
44+
env:
45+
DB_STRING: mssql+pyodbc://sa:MyTestPassword1@localhost:1433/master?driver=ODBC+Driver+17+for+SQL+Server
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
name: PostgreSQL integration tests
2+
3+
on:
4+
push:
5+
branches: [ "main" ]
6+
pull_request:
7+
branches: [ "main" ]
8+
workflow_dispatch:
9+
10+
jobs:
11+
integration-tests:
12+
runs-on: ubuntu-latest
13+
container: python:3.11-bookworm
14+
services:
15+
postgres:
16+
image: postgres
17+
env:
18+
POSTGRES_PASSWORD: postgres
19+
TZ: 'Europe/Paris'
20+
PGTZ: 'Europe/Paris'
21+
# Set health checks to wait until postgres has started
22+
options: >-
23+
--health-cmd pg_isready
24+
--health-interval 10s
25+
--health-timeout 5s
26+
--health-retries 5
27+
28+
steps:
29+
- name: Check out repository code
30+
uses: actions/checkout@v4
31+
32+
- name: Set up Python 3.11
33+
uses: actions/setup-python@v3
34+
with:
35+
python-version: 3.11
36+
37+
- name: Install dependencies
38+
run: |
39+
python -m pip install --upgrade pip
40+
python -m pip install flake8 .[tests] psycopg2
41+
42+
- name: Test with pytest
43+
run: |
44+
pytest tests
45+
env:
46+
DB_STRING: postgresql+psycopg2://postgres:postgres@postgres:5432/postgres

.github/workflows/python-package.yml

+39
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
# This workflow will install Python dependencies, run tests and lint with a variety of Python versions
2+
# For more information see: https://docs.github.com/en/actions/automating-builds-and-tests/building-and-testing-python
3+
4+
name: Python package
5+
6+
on:
7+
push:
8+
branches: [ "main" ]
9+
pull_request:
10+
branches: [ "main" ]
11+
12+
jobs:
13+
build:
14+
15+
runs-on: ubuntu-latest
16+
strategy:
17+
fail-fast: false
18+
matrix:
19+
python-version: ["3.9", "3.10", "3.11"]
20+
21+
steps:
22+
- uses: actions/checkout@v3
23+
- name: Set up Python ${{ matrix.python-version }}
24+
uses: actions/setup-python@v3
25+
with:
26+
python-version: ${{ matrix.python-version }}
27+
- name: Install dependencies
28+
run: |
29+
python -m pip install --upgrade pip
30+
python -m pip install flake8 .[tests]
31+
- name: Lint with flake8
32+
run: |
33+
# stop the build if there are Python syntax errors or undefined names
34+
flake8 src --count --select=E9,F63,F7,F82 --show-source --statistics
35+
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
36+
flake8 src --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
37+
- name: Test with pytest
38+
run: |
39+
pytest tests -m "not dbtest"

.gitignore

+10
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
__pycache__
2+
.pytest_cache
3+
build
4+
wheels
5+
dist
6+
*.egg-info
7+
venv
8+
.spyproject
9+
.idea
10+
site

LICENSE

+19
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
Copyright (c) 2024 Commission de régulation de l'énergie
2+
3+
Permission is hereby granted, free of charge, to any person obtaining a copy
4+
of this software and associated documentation files (the "Software"), to deal
5+
in the Software without restriction, including without limitation the rights
6+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
7+
copies of the Software, and to permit persons to whom the Software is
8+
furnished to do so, subject to the following conditions:
9+
10+
The above copyright notice and this permission notice shall be included in all
11+
copies or substantial portions of the Software.
12+
13+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
14+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
15+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
16+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
17+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
18+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
19+
SOFTWARE.

README.md

+75
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
# Xml2db
2+
3+
`xml2db` is a Python package which allows loading XML data into a relational database. It is designed to handle complex
4+
schemas which cannot be easily denormalized to a flat table, without any custom code.
5+
6+
It builds a data model (i.e. a set of database tables linked with foreign keys relationships) based on a XSD schema and
7+
allows parsing and loading XML files into the database, and get them back to XML, if needed.
8+
9+
It is as simple as:
10+
11+
```python
12+
from xml2db import DataModel
13+
14+
# Create a data model of tables with relations based on the XSD file
15+
data_model = DataModel(
16+
xsd_file="path/to/file.xsd",
17+
connection_string="postgresql+psycopg2://testuser:testuser@localhost:5432/testdb",
18+
)
19+
# Parse an XML file based on this XSD
20+
document = data_model.parse_xml(
21+
xml_file="path/to/file.xml"
22+
)
23+
# Insert the document content into the database
24+
document.insert_into_target_tables()
25+
```
26+
27+
The data model will adhere closely to the XSD schema, but `xml2db` will perform simplifications aimed at limiting the
28+
complexity of the resulting data model and the storage footprint.
29+
30+
The raw data loaded into the database can then be processed using [DBT](https://www.getdbt.com/), SQL views or
31+
other tools aimed at extracting, correcting and formatting the data into more user-friendly tables.
32+
33+
`xml2db` is developed and used at the [French energy regulation authority (CRE)](https://www.cre.fr/) to process XML
34+
data.
35+
36+
This package uses `sqlalchemy` to interact with the database, so it should work with different database backends. It has
37+
been tested against PostgreSQL and MS SQL Server. It currently does not work with SQLite. You may have to install
38+
additional packages to connect to your database (e.g. `pyodbc` which is the default connector for MS SQL Server, or
39+
`psycopg2` for PostgreSQL).
40+
41+
**Please read the [package documentation website](https://cre-dev.github.io/xml2db) for all the details!**
42+
43+
## Installation
44+
45+
The package can be installed, preferably in a virtual environment, using `pip`:
46+
47+
``` bash
48+
pip install xml2db
49+
```
50+
51+
## Testing
52+
53+
Running the tests requires installing additional development dependencies, after cloning the repo, with:
54+
55+
```bash
56+
pip install -e .[tests,docs]
57+
```
58+
59+
Run all tests with the following command:
60+
61+
```bash
62+
python -m pytest
63+
```
64+
65+
Integration tests require write access to a PostgreSQL or MS SQL Server database; the connection string is provided as an
66+
environment variable `DB_STRING`. If you want to run only conversion tests that do not require a database you can run:
67+
68+
```bash
69+
pytest -m "not dbtest"
70+
`````
71+
72+
## Contributing
73+
74+
Contributions are more than welcome, as well as bug reports, starting with the project's
75+
[issue page](https://github.com/cre-dev/xml2db/issues).

docs/api/data_model.md

+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# DataModel
2+
3+
::: xml2db.model.DataModel

docs/api/document.md

+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# Document
2+
3+
::: xml2db.document.Document

docs/api/xml_converter.md

+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# XMLConverter
2+
3+
::: xml2db.xml_converter.XMLConverter

0 commit comments

Comments
 (0)