Skip to content

Commit

Permalink
Change package name to bulkboto3
Browse files Browse the repository at this point in the history
  • Loading branch information
iamirmasoud committed Apr 7, 2022
1 parent 5bc6b9f commit d7f4f4a
Show file tree
Hide file tree
Showing 12 changed files with 100 additions and 38 deletions.
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,7 @@
# Change Log:
**v1.1.0:**
- Change package name to `bulkboto3`

**v1.0.3:**
- Add use case of transferring arbitrary files to S3

Expand Down
41 changes: 23 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,31 +2,36 @@
<!-- PROJECT LOGO -->
<br />
<div align="center">
<a href="https://github.com/iamirmasoud/bulkboto">
<img src="https://raw.githubusercontent.com/iamirmasoud/bulkboto/main/imgs/logo.jpg" alt="Logo" width="100" height="100">
<a href="https://github.com/iamirmasoud/bulkboto3">
<img src="https://raw.githubusercontent.com/iamirmasoud/bulkboto3/main/imgs/logo.png" alt="Logo" width="100" height="100">
</a>

<h3 align="center">Bulk Boto (bulkboto)</h3>
<h3 align="center">Bulk Boto3 (bulkboto3)</h3>

<p align="center">
Python package for fast and parallel transferring a bulk of files to S3 based on boto3!
<br />
<!--
<a href="https://github.com/iamirmasoud/bulkboto"><strong>Explore the docs »</strong></a>
<a href="https://github.com/iamirmasoud/bulkboto3"><strong>Explore the docs »</strong></a>
<br />
-->
<a href="https://github.com/iamirmasoud/bulkboto/blob/main/examples.py">View Examples</a>
<a href="https://github.com/iamirmasoud/bulkboto3/blob/main/examples.py">View Examples</a>
·
<a href="https://github.com/iamirmasoud/bulkboto/issues">Report Bug/Request Feature</a>
</p>
<a href="https://github.com/iamirmasoud/bulkboto3/issues">Report Bug/Request Feature</a>

![Python](https://img.shields.io/pypi/pyversions/bulkboto3.svg?style=flat&https://pypi.python.org/pypi/bulkboto3/)
![Version](http://img.shields.io/pypi/v/bulkboto3.svg?style=flat&https://pypi.python.org/pypi/bulkboto3/)
![License](http://img.shields.io/pypi/l/bulkboto3.svg?style=flat&https://github.com/boto/bulkboto3/blob/develop/LICENSE)

</p>
</div>

<!-- TABLE OF CONTENTS -->
<details>
<summary>Table of Contents</summary>
<ol>
<li>
<a href="#about-bulk-boto">About Bulk Boto</a>
<a href="#about-bulkboto3">About bulkboto3</a>
</li>
<li>
<a href="#getting-started">Getting Started</a>
Expand All @@ -43,7 +48,7 @@
</ol>
</details>

## About Bulk Boto
## About bulkboto3
[Boto3](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/quickstart.html) is the official Python SDK
for accessing and managing all AWS resources such as Amazon Simple Storage Service (S3).
Generally, it's pretty ok to transfer a small number of files using Boto3. However, transferring a large number of
Expand All @@ -52,10 +57,10 @@ it can take up to hours to transfer hundreds of thousands, or millions, of files
Moreover, because Amazon S3 does not have folders/directories, managing the hierarchy of directories and files
manually can be a bit tedious especially if there are many files located in different folders.

The Bulk Boto package solves these issues. It speeds up transferring of many small files to Amazon AWS S3 by
The `bulkboto3` package solves these issues. It speeds up transferring of many small files to Amazon AWS S3 by
executing multiple download/upload operations in parallel by leveraging the Python multiprocessing module.
Depending on the number of cores of your machine, Bulk Boto can make S3 transfers even 100X faster than sequential
mode using traditional Boto3! Furthermore, Bulk Boto can keep the original folder structure of files and
Depending on the number of cores of your machine, Bulk Boto3 can make S3 transfers even 100X faster than sequential
mode using traditional Boto3! Furthermore, Bulk Boto3 can keep the original folder structure of files and
directories when transferring them. There are also some other features as follows.

### Main Functionalities
Expand All @@ -71,23 +76,23 @@ directories when transferring them. There are also some other features as follow
* [pip](https://pip.pypa.io/en/stable/)

### Installation
Use the package manager [pip](https://pip.pypa.io/en/stable/) to install `bulkboto`.
Use the package manager [pip](https://pip.pypa.io/en/stable/) to install `bulkboto3`.

```bash
pip install bulkboto
pip install bulkboto3
```

## Usage
You can find the following scripts in [examples.py](https://github.com/iamirmasoud/bulkboto/blob/main/examples.py).

#### Import and instantiate a `BulkBoto` object with your credentials
#### Import and instantiate a `BulkBoto3` object with your credentials
```python
from bulkboto import BulkBoto
from bulkboto3 import BulkBoto3
TARGET_BUCKET = "test-bucket"
NUM_TRANSFER_THREADS = 50
TRANSFER_VERBOSITY = True

bulkboto_agent = BulkBoto(
bulkboto_agent = BulkBoto3(
resource_type="s3",
endpoint_url="<Your storage endpoint>",
aws_access_key_id="<Your access key>",
Expand Down Expand Up @@ -244,7 +249,7 @@ Uploaded 88800 small files (totally about 7GB) with 100 threads in 505 seconds t

## Contributing
Any contributions you make are **greatly appreciated**. If you have a suggestion that would make this better, please fork the repo and create a pull request.
You can also simply open an issue with the tag "enhancement". To contribute to `bulkboto`, follow these steps:
You can also simply open an issue with the tag "enhancement". To contribute to `bulkboto3`, follow these steps:

1. Fork this repository
2. Create a feature branch (`git checkout -b feature/AmazingFeature`)
Expand Down
2 changes: 0 additions & 2 deletions bulkboto/__init__.py

This file was deleted.

5 changes: 5 additions & 0 deletions bulkboto3/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
from .bulkboto3 import BulkBoto3
from .transfer_path import StorageTransferPath

__author__ = "Amir Masoud Sefidian"
__version__ = "1.1.0"
22 changes: 16 additions & 6 deletions bulkboto/bulkboto.py → bulkboto3/bulkboto3.py
100755 → 100644
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ def single_download(input_tuple):
bucket.download_file(download_path.storage_path, download_path.local_path)


class BulkBoto:
class BulkBoto3:
def __init__(
self,
endpoint_url: str,
Expand All @@ -53,7 +53,8 @@ def __init__(
aws_access_key_id=aws_access_key_id,
aws_secret_access_key=aws_secret_access_key,
config=Config(
signature_version="s3v4", max_pool_connections=max_pool_connections
signature_version="s3v4",
max_pool_connections=max_pool_connections,
),
)
except Exception as e:
Expand Down Expand Up @@ -160,14 +161,19 @@ def check_object_exists(
logger.exception("Something else has gone wrong.")
raise

def list_objects(self, bucket_name: str, storage_dir: str = "") -> List[str]:
def list_objects(
self, bucket_name: str, storage_dir: str = ""
) -> List[str]:
"""
Get the list of all objects in a specific directory on the object storage.
:param bucket_name: Name of the bucket.
:param storage_dir: Base directory on the object storage to get list of objects.
"""
bucket = self._get_bucket(bucket_name)
return [_object.key for _object in bucket.objects.filter(Prefix=storage_dir)]
return [
_object.key
for _object in bucket.objects.filter(Prefix=storage_dir)
]

def upload_dir_to_storage(
self,
Expand Down Expand Up @@ -256,7 +262,9 @@ def download_dir_from_storage(
logger.info(
f"Start downloading from '{storage_dir}' on storage to local '{local_dir}' with {n_threads} threads."
)
objects = self.list_objects(bucket_name=bucket_name, storage_dir=storage_dir)
objects = self.list_objects(
bucket_name=bucket_name, storage_dir=storage_dir
)

# create the directories structure in local
unique_dirs = {os.path.dirname(path) for path in objects}
Expand Down Expand Up @@ -292,7 +300,9 @@ def download_dir_from_storage(
)
)
else:
self.download(bucket_name=bucket_name, download_paths=download_paths)
self.download(
bucket_name=bucket_name, download_paths=download_paths
)

logger.info(
f"Successfully downloaded {len(download_paths)} files from bucket: '{bucket_name}' "
Expand Down
File renamed without changes.
File renamed without changes.
14 changes: 9 additions & 5 deletions examples.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import logging

from bulkboto import BulkBoto, StorageTransferPath
from bulkboto3 import BulkBoto3, StorageTransferPath

logging.basicConfig(
level="INFO",
Expand All @@ -13,8 +13,8 @@
NUM_TRANSFER_THREADS = 50
TRANSFER_VERBOSITY = True

# instantiate a BulkBoto object
bulkboto_agent = BulkBoto(
# instantiate a BulkBoto3 object
bulkboto_agent = BulkBoto3(
resource_type="s3",
endpoint_url="<Your storage endpoint>",
aws_access_key_id="<Your access key>",
Expand Down Expand Up @@ -66,7 +66,9 @@
local_path="f5",
),
]
bulkboto_agent.download(bucket_name=TARGET_BUCKET, download_paths=download_paths)
bulkboto_agent.download(
bucket_name=TARGET_BUCKET, download_paths=download_paths
)

# check if a file exists in a bucket
print(
Expand All @@ -83,7 +85,9 @@

# get list of objects in a bucket (with prefix)
print(
bulkboto_agent.list_objects(bucket_name=TARGET_BUCKET, storage_dir="my_storage_dir")
bulkboto_agent.list_objects(
bucket_name=TARGET_BUCKET, storage_dir="my_storage_dir"
)
)
print(
bulkboto_agent.list_objects(
Expand Down
Binary file removed imgs/logo.jpg
Binary file not shown.
Binary file added imgs/logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
14 changes: 14 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
[tool.pytest.ini_options]
markers = [
"slow: marks tests as slow",
]

[tool.isort]
profile = "black"
line_length = 79
honor_noqa = true
src_paths = ["boto3", "tests"]

[tool.black]
line-length = 79
skip_string_normalization = true
37 changes: 30 additions & 7 deletions setup.py
Original file line number Diff line number Diff line change
@@ -1,28 +1,42 @@
import io
import os
import pathlib
import re

from setuptools import find_namespace_packages, setup

# The directory containing this file
HERE = pathlib.Path(__file__).parent

# Package meta-data.
NAME = "bulkboto"
VERSION = "1.0.3"
NAME = "bulkboto3"
DESCRIPTION = "Python package for fast and parallel transferring a bulk of files to S3 based on boto3"
URL = "https://github.com/iamirmasoud/bulkboto"
URL = "https://github.com/iamirmasoud/bulkboto3"
AUTHOR = "Amir Masoud Sefidian"
AUTHOR_EMAIL = "amir.masoud.sefidian@gmail.com"
REQUIRES_PYTHON = ">=3.3.0"

REQUIRES_PYTHON = ">=3.6.0"
# What packages are required for this module to be executed?
REQUIRED = [
"boto3==1.21.26",
"tqdm",
]

# What packages are optional?
EXTRAS = {"dev": ["isort", "black"]}


def get_version():
init = open(os.path.join(HERE, NAME, "__init__.py")).read()
return (
re.compile(r"""__version__ = ['"]([0-9.]+)['"]""")
.search(init)
.group(1)
)


VERSION = get_version()


try:
with io.open((HERE / "README.md"), encoding="utf-8") as f:
LONG_DESCRIPTION = "\n" + f.read()
Expand All @@ -40,7 +54,7 @@
url=URL,
license="MIT",
python_requires=REQUIRES_PYTHON,
packages=find_namespace_packages(include=["bulkboto"]),
packages=find_namespace_packages(include=["bulkboto3"]),
install_requires=REQUIRED,
extras_require=EXTRAS,
include_package_data=True,
Expand All @@ -52,14 +66,23 @@
"Bulk",
"Boto",
"Bulk Boto",
"Bulk Boto3",
"Simple Storage Service",
"Minio",
"Amazon AWS S3",
"Python",
],
classifiers=[
"Development Status :: 3 - Alpha",
"Development Status :: 5 - Production/Stable",
"Intended Audience :: Developers",
"Natural Language :: English",
"Programming Language :: Python",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.6",
"Programming Language :: Python :: 3.7",
"Programming Language :: Python :: 3.8",
"Programming Language :: Python :: 3.9",
"Programming Language :: Python :: 3.10",
"Operating System :: OS Independent",
"License :: OSI Approved :: MIT License",
],
Expand Down

0 comments on commit d7f4f4a

Please sign in to comment.