Skip to content

Commit 69f96a3

Browse files
dbogunowiczbogunowicz@arrival.comjeanniefinks
authored
[Cherry Pick] Internal dev README (potentially user-facing README) (#239)
* Internal dev README (potentially user-facing README) (#205) * initial commit * merge readmes * just need grammar and consistency review * Apply suggestions from code review * Update README.md Co-authored-by: Jeannie Finks <74554921+jeanniefinks@users.noreply.github.com> * Update README.md Co-authored-by: bogunowicz@arrival.com <bogunowicz@arrival.com> Co-authored-by: Jeannie Finks <74554921+jeanniefinks@users.noreply.github.com> * Internal dev README (potentially user-facing README) (#205) * initial commit * merge readmes * just need grammar and consistency review * Apply suggestions from code review * Update README.md Co-authored-by: Jeannie Finks <74554921+jeanniefinks@users.noreply.github.com> * Update README.md Co-authored-by: bogunowicz@arrival.com <bogunowicz@arrival.com> Co-authored-by: Jeannie Finks <74554921+jeanniefinks@users.noreply.github.com> Co-authored-by: bogunowicz@arrival.com <bogunowicz@arrival.com> Co-authored-by: Jeannie Finks <74554921+jeanniefinks@users.noreply.github.com>
1 parent 36ecbba commit 69f96a3

File tree

1 file changed

+239
-15
lines changed

1 file changed

+239
-15
lines changed

README.md

+239-15
Original file line numberDiff line numberDiff line change
@@ -83,35 +83,259 @@ pip install sparsezoo
8383

8484
## Quick Tour
8585

86-
### Python APIs
86+
The SparseZoo Python API enables you to search and download sparsified models. Code examples are given below.
87+
We encourage users to load SparseZoo models by copying a stub directly from a [model page]((https://sparsezoo.neuralmagic.com/)).
8788

88-
The Python APIs respect this format enabling you to search and download models. Some code examples are given below.
89-
The [SparseZoo UI](https://sparsezoo.neuralmagic.com/) also enables users to load models by copying
90-
a stub directly from a model page.
89+
### Introduction to Model Class Object
9190

91+
The `Model` is a fundamental object that serves as a main interface with the SparseZoo library.
92+
It represents a SparseZoo model, together with all its directories and files.
9293

93-
#### Loading from a Stub
94+
#### Creating a Model Class Object From SparseZoo Stub
95+
```python
96+
from sparsezoo import Model
97+
98+
stub = "zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/pruned95_quant-none"
99+
100+
model = Model(stub)
101+
print(str(model))
102+
103+
>> Model(stub=zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/pruned95_quant-none)
104+
```
105+
106+
#### Creating a Model Class Object From Local Model Directory
107+
```python
108+
from sparsezoo import Model
109+
110+
directory = ".../.cache/sparsezoo/eb977dae-2454-471b-9870-4cf38074acf0"
111+
112+
model = Model(directory)
113+
print(str(model))
114+
115+
>> Model(directory=.../.cache/sparsezoo/eb977dae-2454-471b-9870-4cf38074acf0)
116+
```
117+
118+
#### Manually Specifying the Model Download Path
119+
120+
Unless specified otherwise, the model created from the SparseZoo stub is saved to the local sparsezoo cache directory.
121+
This can be overridden by passing the optional `download_path` argument to the constructor:
122+
123+
```python
124+
from sparsezoo import Model
125+
126+
stub = "zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/pruned95_quant-none"
127+
download_directory = "./model_download_directory"
128+
129+
model = Model(stub, download_path = download_directory)
130+
```
131+
#### Downloading the Model Files
132+
Once the model is initialized from a stub, it may be downloaded either by calling the `download()` method or by invoking a `path` property. Both pathways are universal for all the files in SparseZoo. Invoking the `path` property will always trigger file download unless the file has already been downloaded.
133+
134+
```python
135+
# method 1
136+
model.download()
137+
138+
# method 2
139+
model_path = model.path
140+
```
141+
142+
#### Inspecting the Contents of the SparseZoo Model
143+
144+
We call the `available_files` method to inspect which files are present in the SparseZoo model. Then, we select a file by calling the appropriate attribute:
145+
146+
```python
147+
model.available_files
148+
149+
>> {'training': Directory(name=training),
150+
>> 'deployment': Directory(name=deployment),
151+
>> 'sample_inputs': Directory(name=sample_inputs.tar.gz),
152+
>> 'sample_outputs': {'framework': Directory(name=sample_outputs.tar.gz)},
153+
>> 'sample_labels': Directory(name=sample_labels.tar.gz),
154+
>> 'model_card': File(name=model.md),
155+
>> 'recipes': Directory(name=recipe),
156+
>> 'onnx_model': File(name=model.onnx)}
157+
```
158+
Then, we might take a closer look at the contents of the SparseZoo model:
159+
```python
160+
model_card = model.model_card
161+
print(model_card)
162+
163+
>> File(name=model.md)
164+
```
165+
```python
166+
model_card_path = model.model_card.path
167+
print(model_card_path)
168+
169+
>> .../.cache/sparsezoo/eb977dae-2454-471b-9870-4cf38074acf0/model.md
170+
```
171+
172+
173+
### Model, Directory, and File
174+
175+
In general, every file in the SparseZoo model shares a set of attributes: `name`, `path`, `URL`, and `parent`:
176+
- `name` serves as an identifier of the file/directory
177+
- `path` points to the location of the file/directory
178+
- `URL` specifies the server address of the file/directory in question
179+
- `parent` points to the location of the parent directory of the file/directory in question
180+
181+
A directory is a unique type of file that contains other files. For that reason, it has an additional `files` attribute.
182+
183+
```python
184+
print(model.onnx_model)
185+
186+
>> File(name=model.onnx)
187+
188+
print(f"File name: {model.onnx_model.name}\n"
189+
f"File path: {model.onnx_model.path}\n"
190+
f"File URL: {model.onnx_model.url}\n"
191+
f"Parent directory: {model.onnx_model.parent_directory}")
192+
193+
>> File name: model.onnx
194+
>> File path: .../.cache/sparsezoo/eb977dae-2454-471b-9870-4cf38074acf0/model.onnx
195+
>> File URL: https://models.neuralmagic.com/cv-classification/...
196+
>> Parent directory: .../.cache/sparsezoo/eb977dae-2454-471b-9870-4cf38074acf0
197+
```
198+
199+
```python
200+
print(model.recipes)
201+
202+
>> Directory(name=recipe)
203+
204+
print(f"File name: {model.recipes.name}\n"
205+
f"Contains: {[file.name for file in model.recipes.files]}\n"
206+
f"File path: {model.recipes.path}\n"
207+
f"File URL: {model.recipes.url}\n"
208+
f"Parent directory: {model.recipes.parent_directory}")
209+
210+
>> File name: recipe
211+
>> Contains: ['recipe_original.md', 'recipe_transfer-classification.md']
212+
>> File path: /home/user/.cache/sparsezoo/eb977dae-2454-471b-9870-4cf38074acf0/recipe
213+
>> File URL: None
214+
>> Parent directory: /home/user/.cache/sparsezoo/eb977dae-2454-471b-9870-4cf38074acf0
215+
```
216+
217+
### Selecting Checkpoint-Specific Data
218+
219+
A SparseZoo model may contain several checkpoints. The model may contain a checkpoint that had been saved before the model was quantized - that checkpoint would be used for transfer learning. Another checkpoint might have been saved after the quantization step - that one is usually directly used for inference.
220+
221+
The recipes may also vary depending on the use case. We may want to access a recipe that was used to sparsify the dense model (`recipe_original`) or the one that enables us to sparse transfer learn from the already sparsified model (`recipe_transfer`).
222+
223+
There are two ways to access those specific files.
224+
225+
#### Accessing Recipes (Through Python API)
226+
```python
227+
available_recipes = model.recipes.available
228+
print(available_recipes)
229+
230+
>> ['original', 'transfer-classification']
231+
232+
transfer_recipe = model.recipes["transfer-classification"]
233+
print(transfer_recipe)
234+
235+
>> File(name=recipe_transfer-classification.md)
236+
237+
original_recipe = model.recipes.default # recipe defaults to `original`
238+
original_recipe_path = original_recipe.path # downloads the recipe and returns its path
239+
print(original_recipe_path)
240+
241+
>> .../.cache/sparsezoo/eb977dae-2454-471b-9870-4cf38074acf0/recipe/recipe_original.md
242+
```
243+
244+
#### Accessing Checkpoints (Through Python API)
245+
In general, we are expecting the following checkpoints to be included in the model:
246+
247+
- `checkpoint_prepruning`
248+
- `checkpoint_postpruning`
249+
- `checkpoint_preqat`
250+
- `checkpoint_postqat`
251+
252+
The checkpoint that the model defaults to is the `preqat` state (just before the quantization step).
94253

95254
```python
96255
from sparsezoo import Model
97256

98-
# copied from https://sparsezoo.neuralmagic.com/
99-
stub = "zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/pruned90_quant-none"
257+
stub = "zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/pruned_quant_3layers-aggressive_84"
258+
100259
model = Model(stub)
101-
print(model)
260+
available_checkpoints = model.training.available
261+
print(available_checkpoints)
262+
263+
>> ['preqat']
264+
265+
preqat_checkpoint = model.training.default # recipe defaults to `preqat`
266+
preqat_checkpoint_path = preqat_checkpoint.path # downloads the checkpoint and returns its path
267+
print(preqat_checkpoint_path)
268+
269+
>> .../.cache/sparsezoo/0857c6f2-13c1-43c9-8db8-8f89a548dccd/training
270+
271+
[print(file.name) for file in preqat_checkpoint.files]
272+
273+
>> vocab.txt
274+
>> special_tokens_map.json
275+
>> pytorch_model.bin
276+
>> config.json
277+
>> training_args.bin
278+
>> tokenizer_config.json
279+
>> trainer_state.json
280+
>> tokenizer.json
102281
```
103282

104-
#### Searching the Zoo
283+
284+
#### Accessing Recipes (Through Stub String Arguments)
285+
286+
You can also directly request a specific recipe/checkpoint type by appending the appropriate URL query arguments to the stub:
287+
```python
288+
from sparsezoo import Model
289+
290+
stub = "zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/pruned95_quant-none?recipe=transfer"
291+
292+
model = Model(stub)
293+
294+
# Inspect which files are present.
295+
# Note that the available recipes are restricted
296+
# according to the specified URL query arguments
297+
print(model.recipes.available)
298+
299+
>> ['transfer-classification']
300+
301+
transfer_recipe = model.recipes.default # Now the recipes default to the one selected by the stub string arguments
302+
print(transfer_recipe)
303+
304+
>> File(name=recipe_transfer-classification.md)
305+
```
306+
307+
### Accessing Sample Data
308+
309+
The user may easily request a sample batch of data that represents the inputs and outputs of the model.
310+
311+
```python
312+
sample_data = model.sample_batch(batch_size = 10)
313+
314+
print(sample_data['sample_inputs'][0].shape)
315+
>> (10, 3, 224, 224) # (batch_size, num_channels, image_dim, image_dim)
316+
317+
print(sample_data['sample_outputs'][0].shape)
318+
>> (10, 1000) # (batch_size, num_classes)
319+
```
320+
321+
### Model Search
322+
The function `search_models` enables the user to quickly filter the contents of SparseZoo repository to find the stubs of interest:
105323

106324
```python
107325
from sparsezoo import search_models
108326

109-
models = search_models(
110-
domain="cv",
111-
sub_domain="classification",
112-
return_stubs=True,
113-
)
114-
print(models)
327+
args = {
328+
"domain": "cv",
329+
"sub_domain": "segmentation",
330+
"architecture": "yolact",
331+
}
332+
333+
models = search_models(**args)
334+
[print(model) for model in models]
335+
336+
>> Model(stub=zoo:cv/segmentation/yolact-darknet53/pytorch/dbolya/coco/pruned82_quant-none)
337+
>> Model(stub=zoo:cv/segmentation/yolact-darknet53/pytorch/dbolya/coco/pruned90-none)
338+
>> Model(stub=zoo:cv/segmentation/yolact-darknet53/pytorch/dbolya/coco/base-none)
115339
```
116340

117341
### Environmental Variables

0 commit comments

Comments
 (0)