-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
how to build a dataset #5
Comments
Hey, great question!
|
ok, I will create a new issue about 3 thank you for your reply |
Will close when I've written the custom dataset creation notebook 😉 |
you are so nice !!! |
Hi Alex, Great work. Congrats! I want to try the network on my own data, which are raster images. svg=SVG.load_svg("some_char.svg“).normalize().zoom(0.9).canonicalize().simplify_heuristic() Here are my questions: Thanks |
It would be amazing to learn how to train from scratch, i.e. on a a bunch of folders with SVGs. |
@alexandre01 you mentioned that you have already added the "custom dataset creation notebook" but I am not sure which one it is. Am I missing something? |
Hello, Alex. Great work and Thank You for this library 👍 To anyone interested: from concurrent import futures
import os
from argparse import ArgumentParser
import logging
from tqdm import tqdm
import glob
import pickle
import sys
sys.path.append('..')
from deepsvg.svglib.svg import SVG
def convert_svg(svg_file, output_folder):
filename = os.path.splitext(os.path.basename(svg_file))[0]
svg = SVG.load_svg(svg_file)
tensor_data = svg.to_tensor()
with open(os.path.join(output_folder, f"{filename}.pkl"), "wb") as f:
dict_data = {
"tensors": [[tensor_data]],
"fillings": [0]
}
pickle.dump(dict_data, f, pickle.HIGHEST_PROTOCOL)
def main(args):
with futures.ThreadPoolExecutor(max_workers=args.workers) as executor:
svg_files = glob.glob(os.path.join(args.input_folder, "*.svg"))
with tqdm(total=len(svg_files)) as pbar:
preprocess_requests = [executor.submit(convert_svg, svg_file, args.output_folder) for svg_file in svg_files]
for _ in futures.as_completed(preprocess_requests):
pbar.update(1)
logging.info("SVG files' conversion to tensors complete.")
if __name__ == '__main__':
logging.basicConfig(level=logging.INFO)
parser = ArgumentParser()
parser.add_argument("--input_folder")
parser.add_argument("--output_folder")
parser.add_argument("--workers", default=4, type=int)
args = parser.parse_args()
if not os.path.exists(args.output_folder): os.makedirs(args.output_folder)
main(args) All the best. |
Hi @alexandre01 Thank you so sharing this repo! Very interesting work! I'm also trying to train deepsvg on a custom dataset, but I'm unsure how the data should be structured. I've tried to train and got into an indexing issue I don't fully understand: Traceback (most recent call last):
File "c:\users\george.profenza\.pyenv\pyenv-win\versions\3.7.4-amd64\lib\runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "c:\users\george.profenza\.pyenv\pyenv-win\versions\3.7.4-amd64\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "C:\Users\george.profenza\Downloads\gp\deepsvg\deepsvg\train.py", line 150, in <module>
train(cfg, model_name, experiment_name, log_dir=args.log_dir, debug=args.debug, resume=args.resume)
File "C:\Users\george.profenza\Downloads\gp\deepsvg\deepsvg\train.py", line 26, in train
dataset = dataset_load_function(cfg)
File "C:\Users\george.profenza\Downloads\gp\deepsvg\deepsvg\svgtensor_dataset.py", line 242, in load_dataset
cfg.filter_uni, cfg.filter_platform, cfg.filter_category, cfg.train_ratio)
File "C:\Users\george.profenza\Downloads\gp\deepsvg\deepsvg\svgtensor_dataset.py", line 57, in __init__
loaded_tensor = self._load_tensor(self.idx_to_id(0))
File "C:\Users\george.profenza\Downloads\gp\deepsvg\deepsvg\svgtensor_dataset.py", line 111, in idx_to_id
return self.df.iloc[idx].id
File "C:\Users\george.profenza\Downloads\gp\deepsvg-env\lib\site-packages\pandas\core\indexing.py", line 931, in __getitem__
return self._getitem_axis(maybe_callable, axis=axis)
File "C:\Users\george.profenza\Downloads\gp\deepsvg-env\lib\site-packages\pandas\core\indexing.py", line 1566, in _getitem_axis
self._validate_integer(key, axis)
File "C:\Users\george.profenza\Downloads\gp\deepsvg-env\lib\site-packages\pandas\core\indexing.py", line 1500, in _validate_integer
raise IndexError("single positional indexer is out-of-bounds")
IndexError: single positional indexer is out-of-bounds ( I've tried using the preprocess script and noticed it's augmenting svgs, but it wasn't saving the pickle files. just using tensor_data = svg.to_tensor()
with open(os.path.join(output_folder, f"{filename}.pkl"), "wb") as f:
dict_data = {
"tensors": [[tensor_data]],
"fillings": [0]
}
pickle.dump(dict_data, f, pickle.HIGHEST_PROTOCOL) a variation of the above (spotted in the svglib notebook): and also using SVGTensor: tensor_data = svg.copy().numericalize().to_tensor()
tensor_data = SVGTensor.from_data(tensor_data) I'm not sure what the correct method of converting the processed svg to pickle is so I can train. Printing the pandas object from the loaded fonts dataset I do see relevant data:
However, when loading my converted dataset (either using SVGTensor (larger pickle file) or just
For reference, here's a raw svg: <?xml version="1.0"?>
<!DOCTYPE svg PUBLIC '-//W3C//DTD SVG 1.0//EN'
'http://www.w3.org/TR/2001/REC-SVG-20010904/DTD/svg10.dtd'>
<svg xmlns:xlink="http://www.w3.org/1999/xlink" style="fill-opacity:1; color-rendering:auto; color-interpolation:auto; text-rendering:auto; stroke:black; stroke-linecap:square; stroke-miterlimit:10; shape-rendering:auto; stroke-opacity:1; fill:black; stroke-dasharray:none; font-weight:normal; stroke-width:1; font-family:'Dialog'; font-style:normal; stroke-linejoin:miter; font-size:12px; stroke-dashoffset:0; image-rendering:auto;" width="500" height="500" xmlns="http://www.w3.org/2000/svg"
><!--Generated by the Batik Graphics2D SVG Generator--><defs id="genericDefs"
/><g
><g style="stroke-linecap:round;"
><line y2="324.067" style="fill:none;" x1="236.4454" x2="109.2297" y1="204.986"
/></g
><g style="stroke-linecap:round;"
><line y2="422.2296" style="fill:none;" x1="109.2297" x2="263.5546" y1="324.067"
/><line y2="303.1487" style="fill:none;" x1="263.5546" x2="390.7703" y1="422.2296"
/><line y2="204.986" style="fill:none;" x1="390.7703" x2="236.4454" y1="303.1487"
/><line y2="77.7704" style="fill:none;" x1="109.2297" x2="236.4454" y1="196.8513"
/><line y2="175.9331" style="fill:none;" x1="236.4454" x2="390.7703" y1="77.7704"
/><line y2="295.014" style="fill:none;" x1="390.7703" x2="263.5546" y1="175.9331"
/><line y2="196.8513" style="fill:none;" x1="263.5546" x2="109.2297" y1="295.014"
/><line y2="422.2296" style="fill:none;" x1="390.7703" x2="263.5546" y1="303.1487"
/><line y2="295.014" style="fill:none;" x1="263.5546" x2="263.5546" y1="422.2296"
/><line y2="175.9331" style="fill:none;" x1="263.5546" x2="390.7703" y1="295.014"
/><line y2="303.1487" style="fill:none;" x1="390.7703" x2="390.7703" y1="175.9331"
/><line y2="204.986" style="fill:none;" x1="109.2297" x2="236.4454" y1="324.067"
/><line y2="77.7704" style="fill:none;" x1="236.4454" x2="236.4454" y1="204.986"
/><line y2="196.8513" style="fill:none;" x1="236.4454" x2="109.2297" y1="77.7704"
/><line y2="324.067" style="fill:none;" x1="109.2297" x2="109.2297" y1="196.8513"
/><line y2="175.9331" style="fill:none;" x1="390.7703" x2="390.7703" y1="303.1487"
/><line y2="77.7704" style="fill:none;" x1="390.7703" x2="236.4454" y1="175.9331"
/><line y2="204.986" style="fill:none;" x1="236.4454" x2="236.4454" y1="77.7704"
/><line y2="303.1487" style="fill:none;" x1="236.4454" x2="390.7703" y1="204.986"
/><line y2="196.8513" style="fill:none;" x1="109.2297" x2="109.2297" y1="324.067"
/><line y2="295.014" style="fill:none;" x1="109.2297" x2="263.5546" y1="196.8513"
/><line y2="422.2296" style="fill:none;" x1="263.5546" x2="263.5546" y1="295.014"
/><line y2="324.067" style="fill:none;" x1="263.5546" x2="109.2297" y1="422.2296"
/></g
></g
></svg
> I've uploaded a few converted pkl as well (1, 2, 3) Can you please advise on how I might get my own deepsvg dataset trained ? Thank you so much for your time, |
Update I've managed to get past the empy data frame issue by hackily commenting this section in svgtensor_dataset.py: # df = df[(df.nb_groups <= max_num_groups) & (df.max_len_group <= max_seq_len)]
# if max_total_len is not None:
# df = df[df.total_len <= max_total_len] however this landed me right at this error: Traceback (most recent call last):
File "c:\users\george.profenza\.pyenv\pyenv-win\versions\3.7.4-amd64\lib\runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "c:\users\george.profenza\.pyenv\pyenv-win\versions\3.7.4-amd64\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "C:\Users\george.profenza\Downloads\gp\deepsvg\deepsvg\train.py", line 150, in <module>
train(cfg, model_name, experiment_name, log_dir=args.log_dir, debug=args.debug, resume=args.resume)
File "C:\Users\george.profenza\Downloads\gp\deepsvg\deepsvg\train.py", line 51, in train
cfg.set_train_vars(train_vars, dataloader)
File "C:\Users\george.profenza\Downloads\gp\deepsvg\configs\deepsvg\default_icons.py", line 77, in set_train_vars
for idx in random.sample(range(len(dataloader.dataset)), k=10)]
File "C:\Users\george.profenza\Downloads\gp\deepsvg\configs\deepsvg\default_icons.py", line 77, in <listcomp>
for idx in random.sample(range(len(dataloader.dataset)), k=10)]
File "C:\Users\george.profenza\Downloads\gp\deepsvg\deepsvg\svgtensor_dataset.py", line 177, in get
return self.get_data(t_sep, fillings, model_args=model_args, label=label)
File "C:\Users\george.profenza\Downloads\gp\deepsvg\deepsvg\svgtensor_dataset.py", line 208, in get_data
res[arg] = torch.stack([t.cmds() for t in t_list])
RuntimeError: stack expects each tensor to be equal size, but got [66] at entry 0 and [32] at entry 1 Suspecting it's related, but currently I don't fully understand how the data should be structured. Any hints/tip on how I may train on a custom dataset are highly appreciated. Thank you so much, |
Many months ago, I had retrained DeepSVG from scratch and developed a new library for preprocessing SVGs. I was able to retrain from scratch. Please ping me (here or on Twitter: @wichmaennchen) if the problems persist. I may be able to invest some time and help out. |
I had another shot and spotted the default model parameters that after as filters for the meta data frames. However, I'm still stuck in the same @pwichmann If you have version of deepSVG I'd like to give that a go. Thank you so much for offering to support |
Hi, I had the exact same problem as you. Do you have a solution now? |
This problem is due to the fact that, the number of command of a path in your svg file, is greater than the limitation. The limitation is So, the following codes are used to select svgs that meet the requirement. df = df[(df.nb_groups <= max_num_groups) & (df.max_len_group <= max_seq_len)]
if max_total_len is not None:
df = df[df.total_len <= max_total_len] Besides, if you want to construct your own dataset, you have to run |
Agree with previous statement, but don't understand the operation to drop Z. Z command means moving the brush to the beginning of the path so that make the path closed. It's important and Z command is one of 7 command types encoded so I cannot understand the operation of removing them because this makes the command types seems nonsense. |
Well, I agree that But |
This doesn't generate a meta.csv, am I right? It's necessary when using the SVGDataloader included in the library. |
I want to train with my own dataset, it is like FONT-SVG dataset.
but the original data format is .ttf, So My qestion is how to build a dataset as yours(.csv and *.pth)
Mybe I can export some_char from *.ttf to some_char.svg, but if you know how to batch export , please tell me
How some_char.svg convert to *.pth ?
My guess:
I tried replaced svg_pred with other SVG.load_data() , but occur the Error:
The text was updated successfully, but these errors were encountered: