Releases: StanfordBioinformatics/encode_utils
Releases · StanfordBioinformatics/encode_utils
2.11.0
What's Changed
- EU-7 handle DCC urls with or without https schemes by @paul-sud in #25
- EU-23 Fix bugs affecting prop removal by @jenjou in #38
- EU-24-gc-upload by @jenjou in #39
- EU-25-update-remove-duplicate-associations-call by @yunhailuo in #40
- EU-27-eu-register-file-upload by @paul-sud in #43
- Pass 'dry_run' cli argument value on Connection init when 'dcc_mode' … by @emi80 in #45
- EU-28-add-more-attachment-props by @jenjou in #46
New Contributors
Full Changelog: 2.10.0...2.11.0
2.10.0
Updates
- Default
requests
timeout has been increased to 60s autoSql
attachments are now supported
Novelties
- Added an option to disable file logging
- Added an option to truncate long request payloads
Bugfixes
- Fixed incorrect singularization of
publication_data
to match portal
2.9.0
Updates
- Posted aliases are now deduplicated
profiles.Profile
has been removed and the rest of theprofiles
module has been refactored to allow forconnection.Connection
to pull schemas from URLs other than encodeproject.org
Novelties
eu_register.py
now exposesconnection.Connection.remove_and_patch()
to allow for removing and patching properties in one request using the--rm-patch
flag- File upload is now optional when posting file objects with
connection.Connetion.post
when specifyingupload_file=False
. connection.Connection.post
can now optionally return the original response code of the POST in addition to the usual JSON by specifyingreturn_original_status_code=True
.
Bugfixes
- Fixed a bug in
connection.Connection.extend_array_values
that would cause spurious failures when extending with empty arrays - Fixed a typo in the documentation
2.8.0
Updates
- Updated
connection.Connection.patch()
such that when the extend_array_values parameter is set toTrue
, arrays with string or dictionary elements are extended and duplicates are removed.
Novelties
- Added a new method
connection.Connection.remove_and_patch()
that allows for removing and patching properties in one request. Please see the documentation for more details.
Bugfixes
- Fixed a bug in
eu_register.check_valid_json()
that would allow attempted submission of JSON arrays of mixed types.
2.7.0
Updates:
- Updated
connection.Connection.before_post_file()
such that it will calculate file_size now (in addition to md5sum), and that whenever the md5sum needs to be set, the file_size will also be set.
Novelties:
- Added connection.Connection.get_biosample_type() to aid in searching for a BiosampleType with a given classification and term_id or term_name.
- Added function utils.orient_jpg() which fixes misoriented images, rotating them as necessary. connection.Connection.set_attachment() now calls this if the input file is a JPEG or TIFF.
- Added
aws_storage.py
that includes two classes: S3Upload that simplifies the process of uploading files to a bucket in a specific location with the specified acl, and S3Object which represents an object in a S3 bucket and is internally used for calculating the md5sum and file size when submitting S3 objects to the ENCODE Portal. - Added utils.url_join() which is now used for property joining URL paths, rather than incorrectly with os.path.join since that doesn't construct the right paths on Windows systems.
- Added new function in
profiles.py
called remove_duplicate_associations(). This is called when patching a record so that EU can detect and remove duplicates in array values (when extending arrays).
Bug fixes
- Added patch to
profiles.Profile._set_profile_id()
to include exceptional cases, such as antibody_lot records using @id values like '/antibodies/ENCAB719MQZ' instead of the expected '/antibody_lots/ENCAB719MQZ'.
2.6.0
Updates:
- Updated documentation in
transfer_to_gcp.py
module. - Updated
utils.calculate_md5sum()
such that it works with an S3 URI in addition to a local file path. - Updated
utils.calculate_md5sum()
so that it is more memory efficient with large files by breaking it up into chunks.
Novelties:
- Added new script
eu_patch_property.py
, which is useful for patching a given property with the same value across multiple records, potentially records of different profiles. For example, I once used this to patch the award property of many different object types, as I had originally used the wrong award when submitting the records. - New module
replicate.py
that makes working with replicates a whole lot easier. For example, you can search for a replicate by providing the biosample accession and the library accession. You can also let the module suggest whatbiosample_replicate_number
andtechnical_replicate_number
for you to use when submitting a new replicate object. Please see the well-documented source code for more details on how this works. - Added method
connection.Connection.get_experiments_with_biosample()
. - Added class method
profiles.Profile.profiles_with_property()
, which returns a list of all profiles containing the given property name.
2.5.0
Bug fixes in October:
- Updated
typecast()
function ineu_register.py
so that it can handle JSON Schema numbers. Moreover, when an attribute is declared to be a number, this function will convert from string to either int or float based on the given string representation. Thanks for khine for reporting. - Fixed bug reported by weiwei where in profile.py the
Profile.required_properties()
method always expected the given profile to contain a top-level 'required' key, which isn't always the case. For example, the biosample profile has it, whereas the file profile has it in the anyOf subschema. For this latter case, the method now returns the empty array as there isn't at present any attempt to figure out what is conditionally required. This would have affected attempts to remove a property from such profiles since the behavior is to first prevent the user from removing required properties by popping those out of the payload. - Fixed bug reported by Jennifer Jou where the
profile.Profile._set_profile_id()
method didn't properly singularize the profile ID in all cases. Fixed this by using the inflection module's singularize function. - Fixed bug where
eu_register.py
'stypecast()
function didn't check for booleans to typecaset to. That meant that the registration script didn't always handle boolean fields properly. Thanks again to jjou for reporting.
New in Master:
10/23/2018
- Added script
eu_get_accessions.py
.
Given an input list of record aliases, retrieves the DCC accession for each. - Added script
eu_create_gcp_url_list.py
.
Updates in Master:
- Documentation in transer_to_gcp.py has been updated.
- Renamed
utils.clean_alias_name()
toutils.clean_aliases
. This function now takes a list of aliases and either removes or replaces non-permitted characters, such as "/" and "#". This function is called in the pre-submit hookbefore_submit_alias
.
2.4.0
New:
Added script eu_generate_upload_creds.py
.
Added support for creating a URL list file in transfer_to_gcpy.py
and at connection.Connection.gcp_transfer_urllist
. This file can be used in GCP to copy files from ENCODE AWS buckets to GCP.
Updates:
Added option in transfer_to_gcp.py
when transferring files to GCP to allow overwrites.
Fixed bug in connection.Connection.gcp_transfer
method so that it correctly finds the s3 object path.
2.3.1
Known Bugs in this release
- The s3 to gcp transfer mechanism won't work since the s3 bucket name hard-coded to the test value of pulsar-encode-assets. This is fixed in master.
Updates in this release
- Added missing reference to dependency jsonschema in the setup.py file. No need to upgrade to this version if you already have release 2.3.0 and jsonschema in your Python packages list. Thanks @yunhailuo for adding the dependency fix.
S3 to GCP file transfer support
New:
- Added support to copy ENCODE files in AWS S3 to GCP; see the RTD documentation.
- Added script
eu_search_results_json.py
, which accepts an ENCODE Portal search URL and saves the results to a JSON file.
Updates:
eu_register.py
will not accept JSON in addition to tsv, thanks to the contribution from @yunhailuo.
Bug fixes:
- See notes in 2.0.0-pre pre-release.