Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to run inference? #17

Open
dniku opened this issue Jun 22, 2019 · 19 comments
Open

How to run inference? #17

dniku opened this issue Jun 22, 2019 · 19 comments

Comments

@dniku
Copy link

dniku commented Jun 22, 2019

I am trying to get human → anime conversion to run. My current code is as follows:

import argparse
from pathlib import Path

import numpy as np
import skimage.io
from tensorpack import SaverRestore, PredictConfig, OfflinePredictor

from model import Model

parser = argparse.ArgumentParser()
parser.add_argument('--model_path', help='path to trained model', required=True)
parser.add_argument('--input_image_path', help='path to load input image from', required=True, type=Path)
parser.add_argument('--output_image_path', help='path to save output images', required=True, type=Path)
args = parser.parse_args()

if __name__ == '__main__':
    pred_config = PredictConfig(
        model=Model(),
        session_init=SaverRestore(args.model_path),
        input_names=['inputB'],
        output_names=['gen/A/deconv3/output:0'],
    )
    predictor = OfflinePredictor(pred_config)

    image = skimage.io.imread(args.input_image_path)
    image = image.astype(np.float32) / 255

    inputB = image.copy()[np.newaxis, ...]

    outputA, = predictor(inputB)
    outputA = (outputA[0].transpose((1, 2, 0)) * 255).astype(np.uint8)

    args.output_image_path.mkdir(exist_ok=True, parents=True)
    skimage.io.imsave(args.output_image_path / 'a.png', outputA)

As input, I am using this Brad Pitt photo. However, the output I am getting with getchu_anime/JNet_dilsc_rsep_sl_r0316-084543/model-140000.index model is:

image

which looks like I miss some normalization or maybe retrieve a wrong tensor.

Same code with good_anime/model-260000.index:

image

With good_anime/JNet_dilsc_rsep_sl0304-120055/model-180000.index:

image

What am I doing wrong?

@dniku dniku changed the title Running inference How to run inference? Jun 22, 2019
@Skylion007
Copy link
Contributor

The graph normalizes images within the graph and expect input between 0-255 as uint. The output you are using is however not normalized and is scaled between 0 and 1 so that rescaling is correct.

@jamestompkin
Copy link
Contributor

jamestompkin commented Jun 23, 2019 via email

@dniku
Copy link
Author

dniku commented Jun 24, 2019

Outputs without the image = image.astype(np.float32) / 255 line:

image

Still doesn't look like expected output.

@Skylion007
Copy link
Contributor

okay so it looks like based on the fact you are getting 3 separate images as output that you are using one of the Tensors in /viz? If you are, the first image should be the input, the second the translation, and the third the reconstruction. Are you still using Brad Pit as input? If so, the first image should look like Brad Pitt and it looks like an anime character.

@dniku
Copy link
Author

dniku commented Jun 24, 2019

I am getting 3 images as output because I am testing with 3 separate pretrained models, as I mentioned in the first post:

  1. getchu_anime/JNet_dilsc_rsep_sl_r0316-084543/model-140000.index
  2. good_anime/model-260000.index
  3. good_anime/JNet_dilsc_rsep_sl0304-120055/model-180000.index

I have downloaded the models from the link in the README. I am using the code above to produce these images by passing the appropriate --model_path.

I'm not sure what you mean by "Tensors in /viz".

@Skylion007
Copy link
Contributor

So I ran the code locally and found the issue. The Brad Pitt photo is just too zoomed ou. I tested your script on the CelebA dataset and pictures from my webcam and the script works fine when you zoom in further. You can recreate this by either resizing the shortest edge of celeba photo to 128 * 1.12 and then center cropping or by using opencv's face detector and modifying the bounding box a little like so.

y = self.y - 0.5 * self.h - 0.1 * self.h
x = self.x - 0.5 * self.w
w = self.w * 2.0
h = self.h * 2.0
x,y,w,h = int(x), int(y), int(w), int(h)

Hope this helps!

@dniku
Copy link
Author

dniku commented Jun 24, 2019

Could you post an example image for which I should expect reasonable results?

@Skylion007
Copy link
Contributor

Sorry, do you still need an example? The CelebA images are all cropped to the appropriate size if you need an example.

@dniku
Copy link
Author

dniku commented Jul 9, 2019

I wouldn't mind one.

I took a random image from celebA (130822.jpg):

image

Resized and cropped it with ImageMagick:

convert 130822.jpg -resize 143x143^ -gravity Center -extent 128x128 130822_128.jpg

image

ran my script on it and got this:

image

What am I missing?

@Skylion007
Copy link
Contributor

Thank you for your patience.

To clarify, did you try it on any other images in the dataset? Does your script work with any of the other model weights such as human2cat or human2doll?

I haven't had to run the script posted above.

@dniku
Copy link
Author

dniku commented Jul 23, 2019

I tried it on maybe 3 images, and it never worked. I haven't checked whether human2cat or human2doll work.

@Skylion007
Copy link
Contributor

Which version of your script are you using? Did you ensure that the values are normalized correctly?

@dniku
Copy link
Author

dniku commented Aug 11, 2019

Apologies for the long reply. Here is the script:

import argparse
from pathlib import Path

import numpy as np
import skimage.io
from tensorpack import SaverRestore, PredictConfig, OfflinePredictor

from model import Model

parser = argparse.ArgumentParser()
parser.add_argument('--model_path', help='path to trained model', required=True)
parser.add_argument('--input_image_path', help='path to load input image from', required=True, type=Path)
parser.add_argument('--output_image_path', help='path to save output images', required=True, type=Path)
args = parser.parse_args()

if __name__ == '__main__':
    pred_config = PredictConfig(
        model=Model(),
        session_init=SaverRestore(args.model_path),
        input_names=['inputB'],
        output_names=['gen/A/deconv3/output:0'],
    )
    predictor = OfflinePredictor(pred_config)

    image = skimage.io.imread(args.input_image_path)
    assert image.dtype == np.uint8

    inputB = image.copy()[np.newaxis, ...]

    outputA, = predictor(inputB)
    outputA = (outputA[0].transpose((1, 2, 0)) * 255).astype(np.uint8)

    args.output_image_path.mkdir(exist_ok=True, parents=True)
    skimage.io.imsave(args.output_image_path / 'a.png', outputA)

Input file:

image

3 models I used:

  1. getchu_anime/JNet_dilsc_rsep_sl_r0316-084543/model-140000.index
  2. good_anime/model-260000.index
  3. good_anime/JNet_dilsc_rsep_sl0304-120055/model-180000.index

Corresponding outputs:

image
image
image

CLI command:

python run.py --model_path /path/to/models/<one of 3 models> --input_image_path /path/to/pitt.png --output_image_path /any/directory

@muxgt
Copy link

muxgt commented Sep 16, 2019

@dniku hi! Seems like I'm facing similar problem as yours. Did you solve the issue?

@dniku
Copy link
Author

dniku commented Sep 16, 2019

@muxgt no.

@terminalh2t3
Copy link

@Skylion007 Do you have any idea? Can you share your code that works with any pre-trained checkpoint?

@andreaferretti
Copy link

I am also curious about how to run inference with these models. Any chance you could add a test script?

@iszotic
Copy link

iszotic commented Oct 8, 2019

the problem is the pretrained models apparently, I trained one myself, (in cyclegan mode, with a ratio of 0.8, instead of 0.33) and the output is fine with the code given by @dniku

@Skylion007
Copy link
Contributor

This code should run the model with a webcam:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# File: GANimorph.py
# Author: Aaron Gokaslan (agokasla@cs.brown.edu)

import cv2
import os, sys
import argparse
import numpy as np
from six.moves import map, zip
import numpy as np
from glob import glob

from model import Model
from tensorpack import *
from tensorpack.dataflow import DataFlow, RNGDataFlow
from tensorpack.utils.viz import *
import tensorpack.tfutils.symbolic_functions as symbf
from tensorpack.tfutils.summary import add_moving_summary
import tensorflow as tf
from tensorflow.python.training import moving_averages
from utils import *
from GAN import GANTrainer, MultiGPUGANTrainer, SeparateGANTrainer, GANModelDesc

from tensorpack.predict.dataset import DatasetPredictorBase

from scipy.misc import imsave
import time
"""
The official code for Improved Shape Deformation in Unsupervised Image to Image
Translation.

Requires Tensorpack and related dependencies.

author: Aaron Gokaslan agokasla@cs.brown.edu
"""

#
# class MultiProcessQueuePredictWorker_SideChannel(MultiProcessPredictWorker):
#     """
#     An offline predictor worker that takes input and produces output by queue.
#     Each process will exit when they see :class:`DIE`.
#     """
#
#     def __init__(self, idx, inqueue, sidechannelqueue, outqueue, config):
#         """
#         Args:
#             idx, config: same as in :class:`MultiProcessPredictWorker`.
#             inqueue (multiprocessing.Queue): input queue to get data point. elements are (task_id, dp)
#             outqueue (multiprocessing.Queue): output queue to put result. elements are (task_id, output)
#         """
#         super(MultiProcessQueuePredictWorker_SideChannel, self).__init__(idx, config)
#         self.inqueue = inqueue
#         self.sidechannelqueue = sidechannelqueue
#         self.outqueue = outqueue
#         assert isinstance(self.inqueue, multiprocessing.queues.Queue)
#         assert isinstance(self.outqueue, multiprocessing.queues.Queue)
#
#     def run(self):
#         self._init_runtime()
#         while True:
#             tid, dp = self.inqueue.get()
#             if tid == DIE:
#                 self.outqueue.put((DIE, None))
#                 return
#             else:
#                 self.outqueue.put((tid, self.predictor(dp)+[self.sidechannelqueue.get()]))
#
# class MultiProcessDatasetPredictor_SideChannel(DatasetPredictorBase):
#     """
#     Run prediction in multiprocesses, on either CPU or GPU.
#     Each process fetch datapoints as tasks and run predictions independently.
#     """
#     # TODO allow unordered
#
#     def __init__(self, config, dataset, nr_proc, use_gpu=True, ordered=True):
#         """
#         Args:
#             config: same as in :class:`DatasetPredictorBase`.
#             dataset: same as in :class:`DatasetPredictorBase`.
#             nr_proc (int): number of processes to use
#             use_gpu (bool): use GPU or CPU.
#                 If GPU, then ``nr_proc`` cannot be more than what's in
#                 CUDA_VISIBLE_DEVICES.
#             ordered (bool): produce outputs in the original order of the
#                 datapoints. This will be a bit slower. Otherwise, :meth:`get_result` will produce
#                 outputs in any order.
#         """
#         if config.return_input:
#             logger.warn("Using the option `return_input` in MultiProcessDatasetPredictor might be slow")
#         assert nr_proc >= 1, nr_proc
#         super(MultiProcessDatasetPredictor_SideChannel, self).__init__(config, dataset)
#
#         self.nr_proc = nr_proc
#         self.ordered = ordered
#
#         self.inqueue, self.inqueue_proc = dump_dataflow_to_process_queue(
#             self.dataset, nr_proc * 2, self.nr_proc)    # put (idx, dp) to inqueue
#
#         if use_gpu:
#             try:
#                 gpus = os.environ['CUDA_VISIBLE_DEVICES'].split(',')
#                 assert len(gpus) >= self.nr_proc, \
#                     "nr_proc={} while only {} gpus available".format(
#                     self.nr_proc, len(gpus))
#             except KeyError:
#                 # TODO number of GPUs not checked
#                 gpus = list(range(self.nr_proc))
#         else:
#             gpus = ['-1'] * self.nr_proc
#         # worker produces (idx, result) to outqueue
#         self.outqueue = multiprocessing.Queue()
#         self.workers = [MultiProcessQueuePredictWorker_SideChannel(
#             i, self.inqueue, self.sidechannelqueue, self.outqueue, self.config)
#             for i in range(self.nr_proc)]
#
#         # start inqueue and workers
#         self.inqueue_proc.start()
#         for p, gpuid in zip(self.workers, gpus):
#             if gpuid == '-1':
#                 logger.info("Worker {} uses CPU".format(p.idx))
#             else:
#                 logger.info("Worker {} uses GPU {}".format(p.idx, gpuid))
#             with change_gpu(gpuid):
#                 p.start()
#
#         if ordered:
#             self.result_queue = OrderedResultGatherProc(
#                 self.outqueue, nr_producer=self.nr_proc)
#             self.result_queue.start()
#             ensure_proc_terminate(self.result_queue)
#         else:
#             self.result_queue = self.outqueue
#         ensure_proc_terminate(self.workers + [self.inqueue_proc])
#
#     def get_result(self):
#         try:
#             sz = self.dataset.size()
#         except NotImplementedError:
#             sz = 0
#         with get_tqdm(total=sz, disable=(sz == 0)) as pbar:
#             die_cnt = 0
#             while True:
#                 res = self.result_queue.get()
#                 pbar.update()
#                 if res[0] != DIE:
#                     yield res[1]
#                 else:
#                     die_cnt += 1
#                     if die_cnt == self.nr_proc:
#                         break
#         self.inqueue_proc.join()
#         self.inqueue_proc.terminate()
#         if self.ordered:    # if ordered, than result_queue is a Process
#             self.result_queue.join()
#             self.result_queue.terminate()
#         for p in self.workers:
#             p.join()
#             p.terminate()


class Webcam(DataFlow):

    def __init__(self):
        self.video = cv2.VideoCapture(0)
        self.faceCascade = cv2.CascadeClassifier('lbpcascade_frontalface_improved.xml')
        self.detectFaces = True
        # Face bounding box size
        self.x = 0
        self.y = 0
        self.w = 640
        self.h = 480
        # Complementary filter weight
        self.complementaryWeight = 0.95
        self.border = 100

    #def extract_face(self):

    def get_data(self):
        video = self.video
        while True:
            _, frame = self.video.read()
            frame = frame[...,::-1]

            if self.detectFaces:
                frameGray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
                #frameGray = cv2.copyMakeBorder(frameGray, self.border, self.border, self.border, self.border, cv2.BORDER_CONSTANT, 128 )
                faces = self.faceCascade.detectMultiScale(frameGray,scaleFactor=1.1,minNeighbors=5)

                # Val contains x,y,w,h of crop region
                val = []
                if len(faces) > 0:
                    val = faces[0]
                else: # No face; no crop
                    val = [0,0,frameGray.shape[1],frameGray.shape[0]]
                print(val)

                # Crop face and resize to size of frame2
                # faces[0] is the first in the list
                # faces[0][0] is x
                self.x = self.x * self.complementaryWeight + val[0] * (1-self.complementaryWeight)
                # faces[0][1] is y
                self.y = self.y * self.complementaryWeight + val[1] * (1-self.complementaryWeight)
                # faces[0][2] is width
                self.w = self.w * self.complementaryWeight + val[2] * (1-self.complementaryWeight)
                # faces[0][3] is height
                self.h = self.h * self.complementaryWeight + val[3] * (1-self.complementaryWeight)
                # Expand view a little
                y = self.y - 0.5 * self.h - 0.1 * self.h
                x = self.x - 0.5 * self.w
                w = self.w * 2.0
                h = self.h * 2.0
                x,y,w,h = int(x), int(y), int(w), int(h)
                # Here, we should really pad with gray or some such
                # So that the image doesn't change aspect ratio
                cropFrame = frame[max(y,0):min(y+h, frameGray.shape[0]), max(x,0):min(x+w, frameGray.shape[1]), 0:3]
                frame = cv2.resize( cropFrame, (128,128) )
            else:
                frame = cv2.resize( frame, (128,128) )

            frame2 = frame.copy()
            yield [frame, frame2]

def get_data_sample(isTrain=False, get_b=False):
    if isTrain:
        resize_range = (0.9, 1.1)
        augs = [
            imgaug.Flip(horiz=True),
            imgaug.ResizeShortestEdge(int(SHAPE * 1.12)),
            imgaug.Rotation(30),
            imgaug.RandomCrop(int(SHAPE * 1.12)),
            imgaug.RandomResize(resize_range, resize_range,
                aspect_ratio_thres=0),
            imgaug.RandomCrop(SHAPE),
        ]
    else:
        augs = []
        #augs = [imgaug.ResizeShortestEdge(int(SHAPE * 1.12)),
        #    imgaug.CenterCrop(SHAPE)
        #]

    def get_image_pairs():
        def get_df():
            df = Webcam()
            return AugmentImageComponents(df, augs)
        return get_df()
    df = get_image_pairs()#*[os.path.join(datadir, n) for n in names])
    df = BatchData(df, BATCH if isTrain else TEST_BATCH,
            remainder=not isTrain)
    return df

def sample(model, model_path, get_b=False, save_path='test_images'):
    '''
    Receives image from webcam, sends it through the A model, and displays the
    B domain result in a window ("Capturing") in batches of 16 frames.
    @param model: the model
    '''
    pred = PredictConfig(
        session_init=get_model_loader(model_path),
        model=model,
        input_names=['inputA', 'inputB'],
        output_names=['viz_A_recon', 'viz_B_recon'])
    ds = get_data_sample(get_b=get_b)
    pred = MultiProcessDatasetPredictor(pred, ds, 1, ordered=True)
    #pred = SimpleDatasetPredictor(pred, ds)

    for batch_num, o in enumerate(pred.get_result()):

        for i in range(len(o[0])):
            img = o[int(get_b)][i]
            if np.any(img == None):
                print('No img')
                continue
            result = img[...,::-1]

            result = cv2.resize(result, (384*3, 384))
            cv2.imshow("Capturing", result[:,1:768,:] )

      		#picks up the key press Q and exits when pressed
            key=cv2.waitKey(1)
            if key==ord('q'):
                 break

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('--load', help='load model')
    parser.add_argument('-b', action='store_true', default=False)
    parser.add_argument('--output', help='where to save images', default='test_images/')
    args = parser.parse_args()
    sample(Model(), args.load, get_b=args.b, save_path=args.output)

@iszotic Is that so, I'll need to double check the pretrained models and maybe retrain them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants