@@ -16,9 +16,62 @@ The column types can also be configured to override the default type mapping, us
16
16
diagram (see the [ Getting started] ( getting_started.md ) page for directions on how to visualize data models) and
17
17
then adapt the configuration if need be.
18
18
19
- Configuration options are described below.
19
+ Configuration options are described below. Some options can be set at the model level, others at the table level and
20
+ others at the field level. The general structure of the configuration dict is the following:
21
+
22
+ ``` py title="Model config general structure" linenums="1"
23
+ {
24
+ " document_tree_hook" : None ,
25
+ " document_tree_node_hook" : None ,
26
+ " row_numbers" : False ,
27
+ " as_columnstore" : False ,
28
+ " metadata_columns" : None ,
29
+ " tables" : {
30
+ " table1" : {
31
+ " reuse" : True ,
32
+ " choice_transform" : False ,
33
+ " as_columnstore" : False ,
34
+ " fields" : {
35
+ " my_column" : {
36
+ " type" : None # default type
37
+ }
38
+ },
39
+ " extra_args" : [],
40
+ }
41
+ }
42
+ }
43
+ ```
44
+
45
+ ## Model configuration
20
46
21
- ## Field level config
47
+ The following options can be passed as a top-level keys of the model configuration ` dict ` :
48
+
49
+ * ` document_tree_hook ` (` Callable ` ): sets a hook function which can modify the data extracted from the XML. It gives direct
50
+ access to the underlying tree data structure just before it is extracted to be loaded to the database. This can be used,
51
+ for instance, to prune or modify some parts of the document tree before loading it into the database. The document tree
52
+ should of course stay compatible with the data model.
53
+ * ` document_tree_node_hook ` (` Callable ` ): sets a hook function which can modify the data extracted from the XML. It is
54
+ similar with ` document_tree_hook ` , but it is call as soon as a node is completed, not waiting for the entire parsing to
55
+ finish. It is especially useful if you intend to filter out some nodes and reduce memory footprint while parsing.
56
+ * ` row_numbers ` (` bool ` ): adds ` xml2db_row_number ` columns either to ` n-n ` relationships tables, or directly to data tables when
57
+ deduplication of rows is opted out. This allows recording the original order of elements in the source XML, which is not
58
+ always respected otherwise. It was implemented primarily for round-trip tests, but could serve other purposes. The
59
+ default value is ` False ` (disabled).
60
+ * ` as_columnstore ` (` bool ` ): for MS SQL Server, create clustered columnstore indexes on all tables. This can be also set up at
61
+ the table level for each table. However, for ` n-n ` relationships tables, this option is the only way to configure the
62
+ clustered columnstore indexes. The default value is ` False ` (disabled).
63
+ * ` metadata_columns ` (` list ` ): a list of extra columns that you want to add to the root table of your model. This is
64
+ useful for instance to add the name of the file which has been parsed, or a timestamp, etc. Columns should be specified
65
+ as dicts, the only required keys are ` name ` and ` type ` (a SQLAlchemy type object); other keys will be passed directly
66
+ as keyword arguments to ` sqlalchemy.Column ` . Actual values need to be passed to
67
+ [ ` Document.insert_into_target_tables ` ] ( api/document.md#xml2db.document.Document.insert_into_target_tables ) for each
68
+ parsed documents, as a ` dict ` , using the ` metadata ` argument.
69
+ * ` record_hash_column_name ` : the column name to use to store records hash data (defaults to ` xml2db_record_hash ` ).
70
+ * ` record_hash_constructor ` : a function used to build a hash, with a signature similar to ` hashlib ` constructor
71
+ functions (defaults to ` hashlib.sha1 ` ).
72
+ * ` record_hash_size ` : the byte size of the record hash (defaults to 20, which is the size of a ` sha-1 ` hash).
73
+
74
+ ## Fields configuration
22
75
23
76
These configuration options are defined for a specific field of a specific table. A "field" refers to a column in the
24
77
table, or a child table.
@@ -140,7 +193,7 @@ timeInterval_end[1, 1]: string
140
193
}
141
194
```
142
195
143
- ## Table level config
196
+ ## Tables configuration
144
197
145
198
### Simplify "choice groups"
146
199
@@ -226,20 +279,22 @@ With MS SQL Server database backend, `xml2db` can create
226
279
on tables. However, for ` n-n ` relationships tables, this option needs to be set globally (see below). The default value
227
280
is ` False ` (disabled).
228
281
229
- Configuration: ` "as_columnstore": ` ` False ` (default) or ` True `
282
+ ### Extra arguments
230
283
231
- ## Global options
284
+ Extra arguments can be passed to ` sqlalchemy.Table ` constructors, for instance if you want to customize indexes. These
285
+ can be passed in an iterable (e.g. ` tuple ` or ` list ` ) which will be simply unpacked into the ` sqlalchemy.Table `
286
+ constructor when building the table.
232
287
233
- These options can be passed as a top-level keys of the model configuration ` dict ` :
288
+ Configuration: ` "extra_args": [] ` (default)
234
289
235
- * ` document_tree_hook ` ( ` Callable ` ): sets a hook function which can modify the data extracted from the XML. It gives direct
236
- access to the underlying tree data structure just before it is extracted to be loaded to the database. This can be used,
237
- for instance, to prune or modify some parts of the document tree before loading it into the database. The document tree
238
- should of course stay compatible with the data model.
239
- * ` row_numbers ` ( ` bool ` ): adds ` xml2db_row_number ` columns either to ` n-n ` relationships tables, or directly to data tables when
240
- deduplication of rows is opted out. This allows recording the original order of elements in the source XML, which is not
241
- always respected otherwise. It was implemented primarily for round-trip tests, but could serve other purposes. The
242
- default value is ` False ` (disabled).
243
- * ` as_columnstore ` ( ` bool ` ): for MS SQL Server, create clustered columnstore indexes on all tables. This can be also set up at
244
- the table level for each table. However, for ` n-n ` relationships tables, this option is the only way to configure the
245
- clustered columnstore indexes. The default value is ` False ` (disabled).
290
+ !!! example
291
+ Adding an index on a specific column:
292
+ ``` python
293
+ model_config = {
294
+ " tables": {
295
+ "my_table": {
296
+ "extra_args": sqlalchemy.Index("my_index", "my_column1", "my_column2"),
297
+ }
298
+ }
299
+ }
300
+ ```
0 commit comments