Skip to content

Commit c859475

Browse files
merged of changes from remote
2 parents 7de014c + 615c56b commit c859475

15 files changed

+711
-114
lines changed

CHANGELOG.md

+14-5
Original file line numberDiff line numberDiff line change
@@ -3,23 +3,32 @@
33
## Change History
44
All notable changes to the Databricks Labs Data Generator will be documented in this file.
55

6+
### Version 0.3.4
7+
8+
#### Changed
9+
* Modified option to allow for range when specifying `numFeatures` with `structType='array'` to allow generation
10+
of varying number of columns
11+
* When generating multi-column or array valued columns, compute random seed with different name for each column
12+
* Additional build ordering enhancements to reduce circumstances where explicit base column must be specified
13+
14+
#### Added
15+
* Scripting of data generation code from schema (Experimental)
16+
* Scripting of data generation code from dataframe (Experimental)
17+
* Added top level `random` attribute to data generator specification constructor
18+
19+
620
### Version 0.3.3post2
721

822
#### Changed
923
* Fixed use of logger in _version.py and in spark_singleton.py
1024
* Fixed template issues
1125
* Document reformatting and updates, related code comment changes
12-
* Modified option to allow for range when specifying `numFeatures` with `structType='array'` to allow generation
13-
of varying number of columns
14-
* When generating multi-column or array valued columns, compute random seed with different name for each column
1526

1627
### Fixed
1728
* Apply pandas optimizations when generating multiple columns using same `withColumn` or `withColumnSpec`
1829

1930
### Added
2031
* Added use of prospector to build process to validate common code issues
21-
* Added top level `random` attribute to data generator specification constructor
22-
2332

2433

2534
### Version 0.3.2

README.md

+2-1
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,7 @@ used in other computations
5151
* use of SQL expressions in synthetic data generation
5252
* plugin mechanism to allow use of 3rd party libraries such as Faker
5353
* Use within a Databricks Delta Live Tables pipeline as a synthetic data generation source
54+
* Generate synthetic data generation code from existing schema or data (experimental)
5455

5556
Details of these features can be found in the online documentation -
5657
[online documentation](https://databrickslabs.github.io/dbldatagen/public_docs/index.html).
@@ -62,7 +63,7 @@ details of use and many examples.
6263

6364
Release notes and details of the latest changes for this specific release
6465
can be found in the GitHub repository
65-
[here](https://github.com/databrickslabs/dbldatagen/blob/release/v0.3.3post2/CHANGELOG.md)
66+
[here](https://github.com/databrickslabs/dbldatagen/blob/release/v0.3.4/CHANGELOG.md)
6667

6768
# Installation
6869

dbldatagen/__init__.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@
2727
from .datagen_constants import DEFAULT_RANDOM_SEED, RANDOM_SEED_RANDOM, RANDOM_SEED_FIXED, \
2828
RANDOM_SEED_HASH_FIELD_NAME, MIN_PYTHON_VERSION, MIN_SPARK_VERSION
2929
from .utils import ensure, topologicalSort, mkBoundsList, coalesce_values, \
30-
deprecated, parse_time_interval, DataGenError, split_list_matching_condition
30+
deprecated, parse_time_interval, DataGenError, split_list_matching_condition, strip_margins
3131
from ._version import __version__
3232
from .column_generation_spec import ColumnGenerationSpec
3333
from .column_spec_options import ColumnSpecOptions

dbldatagen/_version.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ def get_version(version):
3333
return version_info
3434

3535

36-
__version__ = "0.3.3post2" # DO NOT EDIT THIS DIRECTLY! It is managed by bumpversion
36+
__version__ = "0.3.4" # DO NOT EDIT THIS DIRECTLY! It is managed by bumpversion
3737
__version_info__ = get_version(__version__)
3838

3939

0 commit comments

Comments
 (0)