Linux Render Node - Memory Error during Process2dPostEffects

Started by WAS, November 13, 2019, 07:54:06 pm

Previous topic - Next topic

WAS

There is a memory exception happening during the Process2dPostEffects (or the next processes in the line) for the 44440 node. I can't say much more as there is no debugging, it just throws:  free(): invalid next size (fast) which I guess is something trying to free something that was no longer allocated (or never was).

Matt

Just because milk is white doesn't mean that clouds are made of milk.


WAS

Under Ubuntu 18.04 Bionic Beaver. I can setup a KVM you guys can test on if you don't have access to similar distro.

WAS



Matt

On my CentOS 6.8 install it works correctly.

Does it finish correctly if you disable bloom?
Does it happen every time?
How much RAM to do you have?
How many threads are you rendering with?
Does it work at lower resolutions?
Just because milk is white doesn't mean that clouds are made of milk.

WAS

  • I'll try disabling bloom to see if that is an issue.
  • Yes.
  • 32gb - I'd hope that's not the issue since it renders fine even on 8gb RAM with 6.4GB available to it. The objects and cloudwork is low as well as MPD.
  • 12c / 24t
  • Haven't tried. I don't want lower resolutions. Benchmark is already at a low resolution.

Does the post effects rely on anything that might be silenced?

CentOS is a very different distro from Debian/Ubuntu.

Linux node has no debugging/log?

WAS

So it's the same scenario with Resolution 800x450 and no bloom

Ram was never an issue, I checked periodically and didn't see anything higher than 16gb. It fluctuated about from about 14.5gb - 16gb

I do notice that core usage between the two Xeons is not consistent. One Xeon suffers low usage. Multi-cpu support seems iffy. It seems to suffer in rendering. The dual CPU setup is very comparable to my current CPU. The benchmark, at normal resolution, only takes about 13-14 minutes on my home system. In fact at full resolution the dual xeon  is performing a little slower than the A10-5800k at 4c/4t.

<<<### APP RUN STARTED ###>>>
Terragen 4 build 4.4.44
Licensed to Jordan Thompson

Receiving maintenance until 2020-08-13
Maintenance days remaining: 270

No EDD key

License key file: Set by an administrator

Found license for Professional Edition
Found 24 processor cores
Using 24 processor cores
Loading plugins in: /home/was/Terragen/TG-44440-Node/
No files matching *.tgp in this directory
Loaded 0 modules in this directory
Loading plugins in: /home/was/Terragen/TG-44440-Node/
No files matching *.cpp in this directory
Loaded 0 modules in this directory
Loading plugins in: /home/was/Terragen/TG-44440-Node/plugins/
Loaded 3 modules in this directory
Loading plugins in: /home/was/Terragen/TG-44440-Node/plugins/
No files matching *.cpp in this directory
Loaded 0 modules in this directory
Loading plugins in: /home/was/Terragen/TG-44440-Node/../../tgdplugins/
No files matching *.tgp in this directory
Loaded 0 modules in this directory
Loading plugins in: /home/was/Terragen/TG-44440-Node/../../tgdplugins/
No files matching *.cpp in this directory
Loaded 0 modules in this directory
Loading plugins in: /home/was/Terragen/TG-44440-Node/../../tgdplugins/linux_intel/
No files matching *.tgp in this directory
Loaded 0 modules in this directory
Loading plugins in: /home/was/Terragen/TG-44440-Node/../../tgdplugins/linux_intel/
No files matching *.cpp in this directory
Loaded 0 modules in this directory
Loaded a total of 3 plugin modules
ReadXML: Attempting to read file: "/home/was/Terragen/Projects/TG4Bench/terragen-4-benchmark_v1.0-nobloom.tgd"
Content path for children of "/Billboard_Dune.tgo" set to "Project_Assets/"
trImage attempting to read file Project_Assets/grijshout-small.jpg
trImage attempting to read file Project_Assets/Dunes-Valley.jpg
trImage attempting to read file Project_Assets/Dunes-Valley.jpg
Content path for children of "/Pop tundraheather-2.tgo/tundraheather-2.tgo" set to "Project_Assets/"
trImage attempting to read file Project_Assets/heidetakje-bloei-small.png
trImage attempting to read file Project_Assets/heidetakje-bloei-small.png
trImage attempting to read file Project_Assets/heidetakje-3+alpha.png
trImage attempting to read file Project_Assets/heidetakje-3+alpha.png
trImage attempting to read file Project_Assets/heidetakje-2+alpha.png
trImage attempting to read file Project_Assets/heidetakje-2+alpha.png
trImage attempting to read file Project_Assets/heidetakje-bloei-small.png
trImage attempting to read file Project_Assets/heidetakje-bloei-small.png
Content path for children of "/Pop bush_5m--v1.tgo/bush_5m--v1.tgo" set to "Project_Assets/"
trImage attempting to read file Project_Assets/lijsterbes-blad-herfst.png
trImage attempting to read file Project_Assets/lijsterbes-blad-herfst.png
trImage attempting to read file Project_Assets/bushbark.jpg
trImage attempting to read file Project_Assets/lijsterbesblad-groen.png
trImage attempting to read file Project_Assets/lijsterbesblad-groen.png
Content path for children of "/Pop pijpestro_27-08-13_v2.tgo/pijpestro_27-08-13_v2.tgo" set to "Project_Assets/"
trImage attempting to read file Project_Assets/pijpestro-aar-opp1.png
trImage attempting to read file Project_Assets/pijpestro-aar-opp1.png
trImage attempting to read file Project_Assets/pijpestro-aar_opp.png
trImage attempting to read file Project_Assets/pijpestro-aar_opp.png
ReadXML: done
Content path for children of "Project" set to "/home/was/Terragen/Projects/TG4Bench/"
Starting render...
Output filename (-o) = ./800-400_no-bloom.tif
No -f specified, so rendering the project's current frame
Preparing to render frame 1
Number of buckets:  12 x 7 between 24 threads
Largest bucket size: 107 x 103
TGOReader: Attempting to open file: /home/was/Terragen/Projects/TG4Bench/Project_Assets/Billboard_Dune.tgo
Billboard_Dune.tgo: Loaded 444 triangles, 0 particles
TGOReader: Attempting to open file: /home/was/Terragen/Projects/TG4Bench/Project_Assets/tundraheather-2.tgo
tundraheather-2.tgo: Loaded 51744 triangles, 0 particles
33507 objects loaded from instance cache for "Pop tundraheather-2.tgo"
33507 objects (1.73 billion triangles) inserted by Populator "Pop tundraheather-2.tgo"
TGOReader: Attempting to open file: /home/was/Terragen/Projects/TG4Bench/Project_Assets/bush_5m--v1.tgo
bush_5m--v1.tgo: Loaded 121247 triangles, 0 particles
2134 objects loaded from instance cache for "Pop bush_5m--v1.tgo"
2134 objects (0.259 billion triangles) inserted by Populator "Pop bush_5m--v1.tgo"
TGOReader: Attempting to open file: /home/was/Terragen/Projects/TG4Bench/Project_Assets/pijpestro_27-08-13_v2.tgo
pijpestro_27-08-13_v2.tgo: Loaded 31080 triangles, 0 particles
17947 objects loaded from instance cache for "Pop pijpestro_27-08-13_v2.tgo"
17947 objects (0.558 billion triangles) inserted by Populator "Pop pijpestro_27-08-13_v2.tgo"
Starting pre pass
Rendered 100% of pre pass
Starting final pass
Number of buckets:  12 x 7 between 24 threads
Largest bucket size: 107 x 103
Rendering final pass... 0:00:30s, 0% of final pass, 0 micro-triangles
Rendering final pass... 0:01:00s, 0% of final pass, 0 micro-triangles
Rendering final pass... 0:01:30s, 0% of final pass, 0 micro-triangles
Rendering final pass... 0:02:00s, 0% of final pass, 0 micro-triangles
Rendering final pass... 0:02:30s, 7% of final pass, 97512 micro-triangles
Rendering final pass... 0:03:00s, 9% of final pass, 129659 micro-triangles
Rendering final pass... 0:03:30s, 13% of final pass, 177791 micro-triangles
Rendering final pass... 0:04:00s, 14% of final pass, 193995 micro-triangles
Rendering final pass... 0:04:30s, 14% of final pass, 193995 micro-triangles
Rendering final pass... 0:05:00s, 14% of final pass, 193995 micro-triangles
Rendering final pass... 0:05:30s, 14% of final pass, 193995 micro-triangles
Rendering final pass... 0:06:00s, 15% of final pass, 214094 micro-triangles
Rendering final pass... 0:06:30s, 17% of final pass, 258162 micro-triangles
Rendering final pass... 0:07:00s, 22% of final pass, 404830 micro-triangles
Rendering final pass... 0:07:30s, 26% of final pass, 546236 micro-triangles
Rendering final pass... 0:08:00s, 32% of final pass, 688827 micro-triangles
Rendering final pass... 0:08:30s, 32% of final pass, 688827 micro-triangles
Rendering final pass... 0:09:00s, 33% of final pass, 776360 micro-triangles
Rendering final pass... 0:09:30s, 33% of final pass, 776360 micro-triangles
Rendering final pass... 0:10:00s, 38% of final pass, 1002786 micro-triangles
Rendering final pass... 0:10:30s, 40% of final pass, 1148230 micro-triangles
Rendering final pass... 0:11:00s, 44% of final pass, 1360359 micro-triangles
Rendering final pass... 0:11:30s, 46% of final pass, 1427838 micro-triangles
Rendering final pass... 0:12:00s, 48% of final pass, 1492021 micro-triangles
Rendering final pass... 0:12:30s, 52% of final pass, 1606176 micro-triangles
Rendering final pass... 0:13:00s, 55% of final pass, 1716406 micro-triangles
Rendering final pass... 0:13:30s, 58% of final pass, 1785318 micro-triangles
Rendering final pass... 0:14:00s, 59% of final pass, 1817142 micro-triangles
Rendering final pass... 0:14:30s, 61% of final pass, 1892890 micro-triangles
Rendering final pass... 0:15:00s, 64% of final pass, 1945688 micro-triangles
Rendering final pass... 0:15:30s, 69% of final pass, 2066424 micro-triangles
Rendering final pass... 0:16:00s, 70% of final pass, 2100663 micro-triangles
Rendering final pass... 0:16:30s, 72% of final pass, 2156434 micro-triangles
Rendering final pass... 0:17:00s, 75% of final pass, 2211587 micro-triangles
Rendering final pass... 0:17:30s, 84% of final pass, 2421976 micro-triangles
Rendering final pass... 0:18:00s, 85% of final pass, 2448288 micro-triangles
Rendering final pass... 0:18:30s, 91% of final pass, 2565410 micro-triangles
Rendering final pass... 0:19:00s, 95% of final pass, 2633242 micro-triangles
Rendering final pass... 0:19:30s, 98% of final pass, 2712385 micro-triangles
Rendered 100% of final pass
Process2dPostEffects...
free(): invalid next size (fast)
Aborted (core dumped)

Have you looked into the error and use of C regarding the step?

Matt

I've looked into the error. I don't think I'm double-freeing because that should show up on CentOS, Windows and Mac. But there may be some memory corruption at an earlier stage, and different runtime environments may play it out differently.

I've looked at my code. There is just one function call between two calls to print "Process2dPostEffects..." and "Process2dPostEffects: done". But when both bloom and starburst are disabled, that function does literally nothing. So this is very puzzling.

Can you email me that exact .TGD file, no matter how simple it is?

If it only occurs on Ubuntu I'll have to look at that after the 4.4 launch.
Just because milk is white doesn't mean that clouds are made of milk.

WAS

Why would you fix a program breaking error, preventing me, and probably anyone using Ubuntu 18.x after release? This puts a pause on everything as my PC is used for working on the next steps, I can only do one or the other in a timely matter. That doesn't even make sense from a development standpoint. You discovered it before launch, so why would you wait until after launch? This takes time from my active "maintenance" days.

The fact this is Terragen's benchmark failing, and you seem to think I've done something to it (besides edits you suggested), and not that this is a fault of TG is a little disconcerting too.

nobloom tgd was resaved from 4.44 so may give warnings on lower versions.

Also, if you do not provide a correct project to the node, instead of telling the user there was no project file found, it just renders the default project as the file that doesn't exit. Little confusing... And the default blank project renders fine. So it's settings related, or something in the scene like pops.

Matt

Thanks for the TGD file.

When debugging something, the variables are numerous. It's important to eliminate as many as possible, and having the exact TGD file is a small but important part of that, no matter how obvious the changes seem to the one reporting the bug. Debugging is such a time consuming process that I have to eliminate as many variables as much as possible.

I'll dig into it ASAP this week, but in the mean time can you help by testing older builds? If it broke some time between 4.3 and 4.4 then there's a good chance we can find the cause and fix it quickly. Do you have the 4.3 release build? Does that work?

I'd love to get all our builds working on every Linux distro but we can't guarantee that because of the variety of Linux distros out there. I know Ubuntu is a big one but it's also not close to CentOS on the family tree, as far as I can tell. I'll do my best to fix it with the resources we have. I suspect it's probably going to turn to be a simple fix, but the unpredictable part is how long it takes to find the cause.
Just because milk is white doesn't mean that clouds are made of milk.

WAS

Quote from: Matt on November 18, 2019, 04:00:15 pmThanks for the TGD file.

When debugging something, the variables are numerous. It's important to eliminate as many as possible, and having the exact TGD file is a small but important part of that, no matter how obvious the changes seem to the one reporting the bug. Debugging is such a time consuming process that I have to eliminate as many variables as much as possible.

I'll dig into it ASAP this week, but in the mean time can you help by testing older builds? If it broke some time between 4.3 and 4.4 then there's a good chance we can find the cause and fix it quickly. Do you have the 4.3 release build? Does that work?

I'd love to get all our builds working on every Linux distro but we can't guarantee that because of the variety of Linux distros out there. I know Ubuntu is a big one but it's also not close to CentOS on the family tree, as far as I can tell. I'll do my best to fix it with the resources we have. I suspect it's probably going to turn to be a simple fix, but the unpredictable part is how long it takes to find the cause.

Ubuntu is based on Debian, where CentOS is a free flavor of RHEL. For running enterprise level software developed for the RHEL platforms. In the age of information, their monopoly on Linux is seen through as just a proprietary distro, which could add a level of insecurity to use of bleeding-edge versions, and past versions. CentOS is community driven and open source.

 I'd think for something like Terragen, focusing on CentOS may not be the best option. It's not really geared towards multimedia. I'm pretty sure it's repos for things like libpng, libjpeg, etc are all outdated and not as up-to-date as Debian/Ubuntu. Ubuntu and Debian have started ditching old repos in fact. For example you would have to manually install anything under PHP 7.2 in Bionic Beaver.

Can you, or Oshyan send me some links to stable and later versions leading up to 4440? I'll test them all and see if I can narrow down what version this shows up in.

I'm also still very concerned that TG underutilizes the first CPU on the board. (It's a 2x6c - 24t system). The first CPU barely gets over 60-75% usage while the second CPU sees 100% max across cores. According to benchmarks (CPU benchmarks and specs) I should be pulling almost exactly same times as my home desktop.

Matt

I've sent you a PM with links to previous Linux builds.
Just because milk is white doesn't mean that clouds are made of milk.

WAS