You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We're running Quadlet-based rootless Python/Django containers on separate test and production servers on Ubuntu 22.04, AMD64, with Podman and its dependencies built from source or downloaded as release binaries from GitHub, as applicable. Today, upon doing a CI deploy to the test server, the job failed with this error:
Error: writing to file "/run/user/1001/containers/auth.json": open /run/user/1001/containers/.tmp-auth.json3577526995: no space left on device
The part of the CI job that failed was a container registry login. I went to check on the server and saw that the tmpfs at /run/user/1001 (the UID under which the rootless containers run) was halfway full, occupying about 340 MB. I don't know why the error said that space had run out, but then I don't know the inner workings of tmpfs. The normal space usage should be in the kilobytes, not the hundreds of MBs, so there was clearly a problem.
Looking closer, the directory /run/user/1001/libpod/tmp/persist had tens of thousands of directories with 64-character hexadecimal names, corresponding to current or past container IDs of our application user. Nearly all of them contained a single 1-byte file called exit with the 0 character, and nothing else. Stopping containers, deleting the dirs and starting containers back up again worked, and the CI job was retried succesfully.
We normally run ten containers on the server 24/7. Upon starting up, all these containers gained a directory there corresponding to their ID. None of them had exit files, which made sense as none had yet exited. When I stopped a container, the exit file would appear, and the directory would stick around. Nothing seemed to be cleaning it up. No error was printed in the journalctl output of the container service regarding the inability to clean up the directory.
Worryingly, more directories and exit files kept being created at a constant rate without me restarting any containers.
Then I remembered that this server is running several containers that are started up via systemd timers to act as cron jobs. They all run successfully to completion based on their journalctl --user -u output, corresponding to 0 in the exit files. And it's these that are really filling up the tmpfs, as they get run hundreds of times every day.
This is only happening on the test server, not production, despite both running the same containers, including systemd-timed containers, the same OS version and an application user configured the same way.
The salient difference is that the test server is running nearer up-to-date versions of Podman + dependencies whereas production has older versions. So it seems that some regression has been introduced in one of the components since when Podman 4.8.3 was current.
Test server, exhibiting the issue:
Podman 5.2.1
conmon 2.1.12
netavark 1.12.2
aardvark-dns 1.12.1
crun 1.16.1
Production server, NOT exhibiting the issue:
Podman 4.8.3
conmon 2.1.10
netavark 1.9.0
aardvark-dns 1.9.0
crun 1.12
Steps to reproduce the issue
I don't know if this is universally reproducible outside our environment, but:
Run a rootless Quadlet-based container as an unprivileged user with user mode systemd on a Ubuntu 22.04 AMD64 server with Podman 5.2.1
Watch as dirs and files accumulate under /run/user/[uid]/libpod/tmp/persist, corresponding to the IDs of the containers even after their exit, eventually filling the tmpfs.
Describe the results you received
Containers have their /run/user/[uid]/libpod/tmp/persist/[container ID] tmpfs dirs left over after exiting successfully (exit code 0).
Describe the results you expected
Any directories and files created under /run/user/[uid]/libpod/tmp/persist would get cleaned up as containers exit.
podman info output
Note: this is from the test server. The production server didn't seem to have relevant differences outside component versions.host:
arch: amd64buildahVersion: 1.37.1cgroupControllers:
- cpuset
- cpu
- io
- memory
- pidscgroupManager: systemdcgroupVersion: v2conmon:
package: Unknownpath: /usr/libexec/podman/conmonversion: 'conmon version 2.1.12, commit: unknown'cpuUtilization:
idlePercent: 92.85systemPercent: 1.15userPercent: 6cpus: 4databaseBackend: boltdbdistribution:
codename: jammydistribution: ubuntuversion: "22.04"eventLogger: journaldfreeLocks: 2037hostname: <redacted>-stagingidMappings:
gidmap:
- container_id: 0host_id: 1001size: 1
- container_id: 1host_id: 165536size: 65536uidmap:
- container_id: 0host_id: 1001size: 1
- container_id: 1host_id: 165536size: 65536kernel: 5.15.0-130-genericlinkmode: dynamiclogDriver: journaldmemFree: 1377988608memTotal: 8322985984networkBackend: netavarknetworkBackendInfo:
backend: netavarkdns:
package: Unknownpath: /usr/libexec/podman/aardvark-dnsversion: aardvark-dns 1.12.1package: Unknownpath: /usr/libexec/podman/netavarkversion: netavark 1.12.2ociRuntime:
name: crunpackage: Unknownpath: /usr/bin/crunversion: |- crun version 1.16.1 commit: afa829ca0122bd5e1d67f1f38e6cc348027e3c32 rundir: /run/user/1001/crun spec: 1.0.0 +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJLos: linuxpasta:
executable: /usr/local/bin/pastapackage: Unknownversion: | pasta unknown version Copyright Red Hat GNU General Public License, version 2 or later <https://www.gnu.org/licenses/old-licenses/gpl-2.0.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law.remoteSocket:
exists: falsepath: /run/user/1001/podman/podman.sockrootlessNetworkCmd: pastasecurity:
apparmorEnabled: falsecapabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOTrootless: trueseccompEnabled: trueseccompProfilePath: ""selinuxEnabled: falseserviceIsRemote: falseslirp4netns:
executable: ""package: ""version: ""swapFree: 312365056swapTotal: 536866816uptime: 1187h 1m 45.00s (Approximately 49.46 days)variant: ""plugins:
authorization: nulllog:
- k8s-file
- none
- passthrough
- journaldnetwork:
- bridge
- macvlan
- ipvlanvolume:
- localregistries:
docker.io:
Blocked: falseInsecure: falseLocation: docker.ioMirrorByDigestOnly: falseMirrors: nullPrefix: docker.ioPullFromMirror: ""search:
- docker.iostore:
configFile: /home/appuser/.config/containers/storage.confcontainerStore:
number: 10paused: 0running: 10stopped: 0graphDriverName: overlaygraphOptions: {}graphRoot: /home/appuser/.local/share/containers/storagegraphRootAllocated: 168488570880graphRootUsed: 37740863488graphStatus:
Backing Filesystem: extfsNative Overlay Diff: "true"Supports d_type: "true"Supports shifting: "false"Supports volatile: "true"Using metacopy: "false"imageCopyTmpDir: /var/tmpimageStore:
number: 19runRoot: /tmp/containers-user-1001/containerstransientStore: falsevolumePath: /home/appuser/.local/share/containers/storage/volumesversion:
APIVersion: 5.2.1Built: 1724236924BuiltTime: Wed Aug 21 13:42:04 2024GitCommit: ""GoVersion: go1.22.5Os: linuxOsArch: linux/amd64Version: 5.2.1
Podman in a container
No
Privileged Or Rootless
Rootless
Upstream Latest Release
No
Additional environment details
Linode VPS.
Additional information
No response
The text was updated successfully, but these errors were encountered:
This discussion was for a similar, though not identical, issue in 2023 with no resolution, but only last week the user @gee456 posted in the comments that they'd had this exact issue of the same directory filling up.
It looks like it's the exit directories from Conmon, which for some reason aren't being cleaned up. We should probably be removing it when we clean up the OCI runtime.
This seems to have been added as part of the cleanup of our
handling of OOM files, but code was never added to remove it, so
we leaked a single directory with an exit file and OOM file per
container run. Apparently have been doing this for a while - I'd
guess since March of '23 - so I'm surprised more people didn't
notice.
Fixescontainers#25291
Signed-off-by: Matt Heon <mheon@redhat.com>
Issue Description
We're running Quadlet-based rootless Python/Django containers on separate test and production servers on Ubuntu 22.04, AMD64, with Podman and its dependencies built from source or downloaded as release binaries from GitHub, as applicable. Today, upon doing a CI deploy to the test server, the job failed with this error:
The part of the CI job that failed was a container registry login. I went to check on the server and saw that the tmpfs at
/run/user/1001
(the UID under which the rootless containers run) was halfway full, occupying about 340 MB. I don't know why the error said that space had run out, but then I don't know the inner workings of tmpfs. The normal space usage should be in the kilobytes, not the hundreds of MBs, so there was clearly a problem.Looking closer, the directory
/run/user/1001/libpod/tmp/persist
had tens of thousands of directories with 64-character hexadecimal names, corresponding to current or past container IDs of our application user. Nearly all of them contained a single 1-byte file calledexit
with the 0 character, and nothing else. Stopping containers, deleting the dirs and starting containers back up again worked, and the CI job was retried succesfully.We normally run ten containers on the server 24/7. Upon starting up, all these containers gained a directory there corresponding to their ID. None of them had
exit
files, which made sense as none had yet exited. When I stopped a container, theexit
file would appear, and the directory would stick around. Nothing seemed to be cleaning it up. No error was printed in thejournalctl
output of the container service regarding the inability to clean up the directory.Worryingly, more directories and
exit
files kept being created at a constant rate without me restarting any containers.Then I remembered that this server is running several containers that are started up via systemd timers to act as cron jobs. They all run successfully to completion based on their
journalctl --user -u
output, corresponding to 0 in theexit
files. And it's these that are really filling up the tmpfs, as they get run hundreds of times every day.This is only happening on the test server, not production, despite both running the same containers, including systemd-timed containers, the same OS version and an application user configured the same way.
The salient difference is that the test server is running nearer up-to-date versions of Podman + dependencies whereas production has older versions. So it seems that some regression has been introduced in one of the components since when Podman 4.8.3 was current.
Test server, exhibiting the issue:
Production server, NOT exhibiting the issue:
Steps to reproduce the issue
I don't know if this is universally reproducible outside our environment, but:
Describe the results you received
Containers have their
/run/user/[uid]/libpod/tmp/persist/[container ID]
tmpfs dirs left over after exiting successfully (exit code 0).Describe the results you expected
Any directories and files created under
/run/user/[uid]/libpod/tmp/persist
would get cleaned up as containers exit.podman info output
Podman in a container
No
Privileged Or Rootless
Rootless
Upstream Latest Release
No
Additional environment details
Linode VPS.
Additional information
No response
The text was updated successfully, but these errors were encountered: