Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added cropping of capture/background #153

Draft
wants to merge 26 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 11 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 7 additions & 2 deletions app/background.cc
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@
#include <opencv2/imgproc.hpp>
#include <opencv2/highgui.hpp>

#include <lib/libbackscrub.h>
jjsarton marked this conversation as resolved.
Show resolved Hide resolved

// Internal state of background processing
struct background_t {
int debug;
Expand Down Expand Up @@ -183,11 +185,14 @@ int grab_background(std::shared_ptr<background_t> pbkd, int width, int height, c
if (pbkd->video) {
// grab frame & frame no. under mutex
std::unique_lock<std::mutex> hold(pbkd->rawmux);
cv::resize(pbkd->raw, out, cv::Size(width, height));
cv::Rect crop = calcCropping(pbkd->raw.cols, pbkd->raw.rows, width, height);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could lift this calculation out of the processing loop to a structure variable, as sizes will not change during operation.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see below

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Size will not change, the frame content will change. Mofifying this implies:
if ! rect_calculated
crop = calc_cropping()
else
crop = get_stored_crop_value()
...
I don't think that the calculation will take a lot ot time more than filling the cv::Rect from already calculated variables.

cv::resize(pbkd->raw(crop), out, cv::Size(width, height));
frm = pbkd->frame;
} else {
// resize still image as requested into out
cv::resize(pbkd->raw, out, cv::Size(width, height));
cv::Rect crop = calcCropping(pbkd->raw.cols, pbkd->raw.rows, width, height);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

..as above, could be removed in favour of a state variable, calculated just once.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must think about that.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must review this part.

Copy link
Author

@jjsarton jjsarton Sep 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have put the variable bg_stored into the structure background_t.
The code is now:

   } else {
   	if (!pbkd->bg_stored) {
   		// resize still image as requested into out
   		cv::Rect crop = bs_calc_cropping(pbkd->raw.cols, pbkd->raw.rows, width, height);
   		// Under some circumstances we must do the job in two steps!
   		// Otherwise this resize(pbkd->raw(crop), pbkd->raw, ...) may fail.
   		pbkd->raw(crop).copyTo(pbkd->raw);
   		cv::resize(pbkd->raw, pbkd->raw, cv::Size(width, height));
   		pbkd->bg_stored = true;
   	}
   	out = pbkd->raw;
   	frm = 1;
   }
   return frm;

For the video processing the code remain to the old code, we will probably every time read a new picture from the video source.

Please note that the changes are not effective now, the tests for video or still image don't work as required.
I propose to detect according to the value cnt which has the value -1 for still image and a greater value for video.
I have also changed the corresponding code.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't worry too much about this, it's a minor performance enhancement to avoid a couple of floating point ops per frame, when we are running a whole inference model too.. I'd rather the code was readable than maximum speed 😄

cv::resize(pbkd->raw(crop), out, cv::Size(width, height));
out = pbkd->raw;
frm = 1;
}
return frm;
Expand Down
57 changes: 48 additions & 9 deletions app/deepseg.cc
Original file line number Diff line number Diff line change
Expand Up @@ -372,6 +372,7 @@ int main(int argc, char* argv[]) try {
bool flipVertical = false;
int fourcc = 0;
size_t blur_strength = 0;
cv::Rect crop_region(0, 0, 0, 0);

const char* modelname = "selfiesegmentation_mlkit-256x256-2021_01_19-v1215.f16.tflite";

Expand Down Expand Up @@ -568,6 +569,14 @@ int main(int argc, char* argv[]) try {
if (expWidth != vidGeo.value().first) {
fprintf(stderr, "Warning: virtual camera aspect ratio does not match capture device.\n");
}
// calculate crop region, only if result always smaller
if (expWidth != vidGeo.value().first &&
vidGeo.value().first <= capGeo.value().first &&
vidGeo.value().second <= capGeo.value().second) {
crop_region = calcCropping(
capGeo.value().first, capGeo.value().second,
vidGeo.value().first, vidGeo.value().second);
}
jjsarton marked this conversation as resolved.
Show resolved Hide resolved

// dump settings..
printf("debug: %d\n", debug);
Expand Down Expand Up @@ -600,7 +609,11 @@ int main(int argc, char* argv[]) try {
}
}
// default green screen background (at capture true geometry)
cv::Mat bg = cv::Mat(capGeo.value().second, capGeo.value().first, CV_8UC3, cv::Scalar(0, 255, 0));
std::optional<std::pair<size_t, size_t>> bg_dim = capGeo;
if ( crop_region.height == 0) {
bg_dim = {crop_region.width, crop_region.height};
}
cv::Mat bg(bg_dim.value().second, bg_dim.value().first, CV_8UC3, cv::Scalar(0, 255, 0));
jjsarton marked this conversation as resolved.
Show resolved Hide resolved

// Virtual camera (at specified geometry)
int lbfd = loopback_init(s_vcam, vidGeo.value().first, vidGeo.value().second, debug);
Expand All @@ -613,11 +626,24 @@ int main(int argc, char* argv[]) try {
loopback_free(lbfd);
});


// Processing components, all at capture true geometry
cv::Mat mask(capGeo.value().second, capGeo.value().first, CV_8U);
std::optional<std::pair<size_t, size_t>> mask_dim = capGeo;
if ( crop_region.height ) {
mask_dim = {crop_region.width, crop_region.height};
}
cv::Mat mask(mask_dim.value().second, mask_dim.value().first, CV_8U);
jjsarton marked this conversation as resolved.
Show resolved Hide resolved

cv::Mat raw;
CalcMask ai(s_model.value(), threads, capGeo.value().first, capGeo.value().second);
int aiw,aih;
if ( crop_region.width == 0) {
jjsarton marked this conversation as resolved.
Show resolved Hide resolved
aiw=capGeo.value().first;
aih=capGeo.value().second;
jjsarton marked this conversation as resolved.
Show resolved Hide resolved
} else {
aiw=crop_region.width;
aih=crop_region.height;
}
jjsarton marked this conversation as resolved.
Show resolved Hide resolved
CalcMask ai(s_model.value(), threads, aiw, aih);
jjsarton marked this conversation as resolved.
Show resolved Hide resolved

ti.lastns = timestamp();
printf("Startup: %ldns\n", diffnanosecs(ti.lastns,ti.bootns));

Expand All @@ -631,22 +657,35 @@ int main(int argc, char* argv[]) try {
// copy new frame to buffer
cap.retrieve(raw);
ti.retrns = timestamp();

if (raw.rows == 0 || raw.cols == 0) continue; // sanity check

if ( crop_region.height) {
jjsarton marked this conversation as resolved.
Show resolved Hide resolved
raw(crop_region).copyTo(raw);
}
ai.set_input_frame(raw);
jjsarton marked this conversation as resolved.
Show resolved Hide resolved
ti.copyns = timestamp();

if (raw.rows == 0 || raw.cols == 0) continue; // sanity check
// do background detection magic
ai.get_output_mask(mask);
ti.copyns = timestamp();

if (filterActive) {
// do background detection magic
ai.get_output_mask(mask);

// get background frame:
// - specified source if set
// - copy of input video if blur_strength != 0
// - default green (initial value)
bool canBlur = false;
if (pbk) {
if (grab_background(pbk, capGeo.value().first, capGeo.value().second, bg)<0)
int tw,th;
if ( crop_region.height ) {
tw = crop_region.width;
th = crop_region.height;
} else {
tw = capGeo.value().first;
th = capGeo.value().second;
jjsarton marked this conversation as resolved.
Show resolved Hide resolved
}
jjsarton marked this conversation as resolved.
Show resolved Hide resolved
if (grab_background(pbk, tw, th, bg) < 0)
throw "Failed to read background frame";
canBlur = true;
} else if (blur_strength) {
Expand Down
28 changes: 27 additions & 1 deletion lib/libbackscrub.cc
Original file line number Diff line number Diff line change
Expand Up @@ -365,7 +365,11 @@ bool bs_maskgen_process(void *context, cv::Mat &frame, cv::Mat &mask) {

// scale up into full-sized mask
cv::Mat tmpbuf;
cv::resize(ctx.ofinal(ctx.in_roidim),tmpbuf,ctx.mroi.size());
// with body-pix-float-050-8.tflite the size of ctx.ofinal is 33x33
// and the wanted roi may be greater as 33x33 so we can crash with
// cv::resize(ctx.ofinal(ctx.in_roidim),tmpbuf,ctx.mroi.size());
ctx.ofinal.copyTo(tmpbuf);
cv::resize(tmpbuf, tmpbuf, ctx.mroi.size());
Copy link
Collaborator

@phlash phlash Sep 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure this is correct? Selecting an ROI from the final output is done because we may have centred the frame into the model earlier (line 289), this change removes the selection step. The calculations at line 237 onward should ensure that in_roidim cannot be larger than ofinal (unless the model itself has output dim < input dim, which I have never seen but I guess could occur? which is indeed the case for the bodypix model). Looks like some assumptions in this code need reviewing, and an additional model output roidim should be calculated to use here.

After going away and thinking about this - this is a separate bug that you do not need to fix in this PR. Let's raise another issue for this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified the original code because I had a crash within opencv.
I ha done this because the model used had an output of 33x33.
I have already performed a lot of tests and never shown an error here. I will check this again.TT

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have just tested this again. With the old code, using backscrub/models/body-pix-float-050-8.tflite I get again a crash, with my correction this work well.
the real size for ofinal is 33x33 and with the ctx.in_roidim we have a size of 256x256!!!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep - it's because they have an output stride variable in the model which defaults to 32, so we get input dim/32 == 257/32 ~= 33.

I'm happy to leave the crash for now, you are officially "not making it worse" with this PR 😁, and we'll probably want to revisit the logic in multiple places to fix this properly.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now covered by issue #156, and thus can be fixed by a separate PR that properly deals with the fact that we do not account for models that have different input and output sizes at all.


// blur at full size for maximum smoothness
cv::blur(tmpbuf,ctx.mroi,ctx.blur);
Expand All @@ -375,3 +379,25 @@ bool bs_maskgen_process(void *context, cv::Mat &frame, cv::Mat &mask) {
return true;
}

cv::Rect calcCropping(int cw, int ch, int vw, int vh) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I suggest:

  • use the same variable names as per the prototype in the header, they are more meaningful
  • stick to our library naming convention so bs_calc_cropping or similar.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am lazy, but you are right.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, not pushed

// if the input and output aspect ratio are not the same
// we can crop the source image. For example if the
// input image has a 16:9 (1280x720) ratio and the output is 4:3 (960x720)
// we will return the cropRegion set as x=160, width=960, y=0, height=720
// which is the centered part of the original image
cv::Rect cropRegion = {0, 0, 0, 0};
float sc = (float)vw / cw;
float st = (float)vh / ch;
sc = st > sc ? st : sc;

int sx = (int)(vw / sc) - cw;
cropRegion.x = (sx < 0 ? -sx : sx) / 2;

int sy = (int)(vh / sc) - ch;
cropRegion.y = (sy < 0 ? -sy : sy) / 2;

cropRegion.width = cw - cropRegion.x * 2;
cropRegion.height = ch - cropRegion.y * 2;

return cropRegion;
}
2 changes: 2 additions & 0 deletions lib/libbackscrub.h
Original file line number Diff line number Diff line change
Expand Up @@ -38,4 +38,6 @@ extern void bs_maskgen_delete(void *context);
// Process a video frame into a mask
extern bool bs_maskgen_process(void *context, cv::Mat& frame, cv::Mat &mask);

extern cv::Rect calcCropping(int inWidth, int inHeight, int targetWidth, int targetHight);
jjsarton marked this conversation as resolved.
Show resolved Hide resolved

#endif