-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Virtual memory std:bad_alloc in 32bit mode #10
Comments
The old versions of JincResize design also suffer form same memory issues - at least partially (with same Evaluate: Unhandled C++ Exception) - AviSynth/jinc-resize#2 Current idea on (partial) solution: One possible partial solution without too much redesign of processing - split one full-frame array of coefficients into array of arrays for each output line. With separate memory allocation for each output line. If everything will go as expected - it will keep compatibility with current SIMD datawords loading from memory. The only redesign of SIMD processing functions is to update pointer to array of coefficients before each new output line start. In theory it will relax requirement of single contiguous buffer allocation in 32bit systems and allow to use all available free memory in 32bit environment. |
With |
@Asd-g — I agree that it wouldn't be worth the hassle to make the plugin "more 32-bit friendly", so to speak. |
"20002000112*4 = ~1.67GB." Oh I forgot about 4bytes size of float32. Also the size of table entry is greatly depends on kernel size. The JincResize() is equal to lowest kernel of JincResize36 and if user wants to use larger kernel JincResize256 the limits hits much faster (as square of kernel size ?). One possible logical optimization about doubling possible memory size at preparation of coefs table: Currently computing of coefs table performed in tmp_array defined at AviSynth-JincResize/src/JincResize.cpp Line 343 in 82a3f5e
And at the end of function this unaligned growing array is simply copied to aligned working array allocated at AviSynth-JincResize/src/JincResize.cpp Line 484 in 82a3f5e
So at time of copy preparation OS must allocate 2 large memory arrays and it may fail faster in 32bit address space., Also possible memory allocation fail is by C function not checked and pointer is used as always good. As logical optimization the computing of coefs may be performed directly into single global use aligned memory area allocated before computing of coefs (its size is possible to compute before start of computing loops). This expected to about double possible memory for larger combination of output frame size and 2D kernel size used. Not very large but sometime may helps. Also it is good to throw formatted error messages to AVS user about current memory limitations for current output size and kernel size requested (and also it exact program start attempt). Simple crash or Unhandled C++ Exception looks like a bug in software and can not describe the real issue. In this way the software will be self-documented at runtime. The memory limitations in 32bit mode are very dynamic and depend on all previous memory allocations in process address space and the AVS plugin is called at the end of calling application init and AVS core init and possible many other plugins init. And residual memory resources are greatly depend on all previous allocations. |
P.S.: |
There exist second branch of JincResize with very small set of kernel coefs (single kernel only as 1/4 of full 2D because of its dual-dimensions symmethry) and it work close to unlimited in output size even in 32bit (also faster because it is not limited by very slow host RAM performance and keep kernel in CPU caches and can reach about 50% of peak FMA performance of CPUs SIMD unit). But it support only fixed scale ratios (also very few implemented because every scale ratio and kernel size and input-output bitdepth require to manually create SIMD program to execute at SIMD co-processor). It is not compiled as 'main' release but it is possible for users of 32bit. Fixed scale ratio is not great limitation because most benefit of 2D upscaling may be at 2x upscale and larger factors may be less visible in difference in comparison with 1D+1D 'classic image' resizers. Same applicable to natural 2x resizers like NNEDI and users live with this. To get any scale ratio you first make resize to close integer ratio scale with 'best' resizer and second pass resize to required size with 'standard' resizer with any scale ratio supported (like Lanczos or any other 1D+1D H+V separate passes). May be it is possible to compile second branch to release as second version/branch for experiments of users ? But users need to install (use) only 1 .dll because functions names are equal and only in supported combinations of scale ratio and kernel size and bitdepth it is call alternative kernel generation and resample engine. |
Actually the 32-bit limitation is mainly of the environment (OS memory limitation) not of the plugin. Blaming the plugin that hits the OS limits and crash, it's not correct imo.
This is already tested (avoiding the use of temporary memory) and it doesn't bring any significant change because the main memory allocation is the problem (it doesn't matter if the vector is temporary or global/permanent) . Currently the max possible memory that can be used is allocated and then only the needed memory is initialized. |
Btw with avslibplacebo (GPU) you have Jinc too - ( |
@filler56789, try the attached 32-bit version. |
The new 32-bit DLL is a nicely-done job thus far :-) |
At Win7 it can not load - LoadPlugin: unable to load "JincResize.dll", Module not found. Install missing library? May be some redistributables required ? |
Probably. You can use Dependencies to see if something missing. |
Hmmm, I use Windows 8.1, so this problem would not happen to me. |
Installing The latest version is 14.42.34433.0 from https://learn.microsoft.com/en-us/cpp/windows/latest-supported-vc-redist?view=msvc-170 helps with missing .dll . But test script ColorBarsHD() JincResize(2000,2000) cause AVSmeter exit without error at script pre-scan and VirtualDub dissapear without error (crash). Will try debugger attaching. Debugger catch the crash: Exception thrown at 0x053A37A9 (JincResize.dll) in VirtualDub.exe: 0xC0000005: Access violation reading location 0xFFFFFFFF. |
I pushed the changes. You can try to build it with msvc because I used clang-cl for that test build. |
The previous comment is for DTL.
You can build it with MSYS2+GCC but it will not work. The plugin must be changed to use C API instead CPP API and then you can use any compiler. |
VARIOUS reasons :-)
These are the reasons why I don't want to use MSVC. |
Download sources from main and build with MSVC 2019. For the test script and in Win10 with VirtualDub 32bit LoadPlugin("JincResize.dll") it is working from JincResize to Jinc144Resize and Jinc256Resize displays error message about not enough memory as expected. Will try to test with Win7 later. |
Very strange indeed. I just tested your new DLL, and VirtualDub32, as expected, opened normally a JincResize256(4096,2304). Tested also with your favorite numbers :-) (2000,2000), and no problem at all as well. |
UPDATE 1: UPDATE 2: The problem is not in ColorBarsHD() itself or its default colorspace YV24, it's the "preferred resizing dimensions" chosen by DTL (2000x2000) — for some obscure reason, JincResize does not like them sometimes. |
Try the attached version with the latest changes. |
I remember some great memory usage optimization. For 4:4:4 formats its
create 3 equal full size tables but they are equal. So it uses 3 time more
RAM. At
https://github.com/Asd-g/AviSynth-JincResize/blob/830236ba12416401e1d872d4baafe772b952a5a9/src/JincResize.cpp#L630
It fills out[i] 3 times with equal tables. But it is enough to create 1
table and set 3 pointers to single table. Also the plugin startup will be 3 times faster.
Some unstability at different runs is normal because it depends on all
previous ram allocations in AVS environment and script before JincResize
call. So the more functions used in a script - the less memory left.
пн, 20 янв. 2025 г., 9:13 Asd-g ***@***.***>:
… Try the attached version with the latest changes.
JincResize.zip
<https://github.com/user-attachments/files/18473104/JincResize.zip>
—
Reply to this email directly, view it on GitHub
<#10 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AQMGLMYE6ZDOD3QH7PXZPKD2LSHZNAVCNFSM6AAAAABU7XLEOSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMBRGQ2TGNBXHE>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
SCRIPT: TESTS AND RESULTS Jinc256Resize(2048,1152) ###OK
|
Tested at my old machine with E7500 CPU and 3 GB RAM usable from motherboard and Win7 x64 It is somehow working even with JincResize256(3200,2000) and AVSmeter reports only 1800 MiB RAM used. Looks like the self-growing table takes much less RAM in comparison with old programmers estimation ? Need to check in the debugger some time. The startup takes really many time like 10..15 seconds and full AVSmeter run until fps metering takes about 1 minute. This build working in Win7 after installing that C++ latest redistributables. Also still no crashes happen (though because of very long startup time testing of many sizes and many kernel sizes takes lots of time - will try some time later at work with much faster CPU). |
Here is x86 test build of single coefs table for 4:4:4 and RGB formats (including YV24 from ColorBarsHD()). Also expected to process alpha plane of RGBA/YUVA formats too (not tested). For Jinc256Resize(3200,2000) it uses only about 600 MiB RAM as AVSmeter reports. Old versions uses about 1800 MiB RAM. Also plugin init is visibly faster at old CPU. Though it looks MSVC2019 builds are significantly slower ? At SSE 4.1 old CPUs. Pull request with these changes requested to main branch. JincResize_x86_msvc2019_200125.zip For UV-subsampled formats 4:2:0 and 4:2:2 some optimization also possible but not as benefitical in RAM usage (single coefs table for smaller UV planes but requires more redesign). |
With the latest changes the plugin can be build with any compiler (including Mingw GCC). |
If attempt to compile 32bit binary and use
JincResize(2000,2000):
For 2000x2000 output frame size: const int coeff_per_pixel = 112 (bytes)
and std::bad_alloc crash (C++ runtime exception) at
AviSynth-JincResize/src/JincResize.cpp
Line 349 in 82a3f5e
where 2000x2000x112 = about 448 MB attempt to allocate+reserve at 32bit process and its fail. Looks like not enough contigous virtual address space left in that 32bit process total address space (and Windows memory manager too lazy to make defragmentation if it even possible).
This issue looks like very rare in 64bit builds but also possible (in theory). Are there any ways to fix it (without significant re-write to use lower sized memory blocks/objects) ?
No ideas how to workaround it easy enough. May be at least add internal C++ bad_alloc exception catch inside plugin and throw formatted error message about current kernel size and output frame size too large for used memory model ?
The text was updated successfully, but these errors were encountered: