<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Docker &#8211; Noise</title>
	<atom:link href="https://noise.getoto.net/tag/docker/feed/" rel="self" type="application/rss+xml" />
	<link>https://noise.getoto.net</link>
	<description>The collective thoughts of the interwebz</description>
	<lastBuildDate>Fri, 03 Oct 2025 15:01:53 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.8.2</generator>
	<item>
		<title>Reduce Docker image build time on AWS CodeBuild using Amazon ECR as a remote cache</title>
		<link>https://noise.getoto.net/2025/10/03/reduce-docker-image-build-time-on-aws-codebuild-using-amazon-ecr-as-a-remote-cache/</link>
		
		<dc:creator><![CDATA[Kirubakaran Sundaramoorthy]]></dc:creator>
		<pubDate>Fri, 03 Oct 2025 15:01:53 +0000</pubDate>
				<category><![CDATA[Amazon ECR]]></category>
		<category><![CDATA[AWS CodeBuild]]></category>
		<category><![CDATA[codebuild]]></category>
		<category><![CDATA[Developer Tools]]></category>
		<category><![CDATA[devops]]></category>
		<category><![CDATA[Docker]]></category>
		<guid isPermaLink="false">http://noise.getoto.net/?guid=c6159f4e04ee2ba0bc7a815a8abb11d2</guid>

					<description><![CDATA[In modern software development, containerization with Docker has revolutionized how we build and deploy applications. While Docker enables packaging applications into portable containers, the continuous need to update these images can be resource intensive. AWS CodeBuild addresses this challenge by providing a managed build service that eliminates infrastructure maintenance overhead. In this blog post, we’ll […]]]></description>
		
		
		<enclosure url="" length="0" type="" />

			</item>
		<item>
		<title>Deploying Zabbix Components with Docker and Docker Compose</title>
		<link>https://noise.getoto.net/2025/04/08/deploying-zabbix-components-with-docker-and-docker-compose/</link>
		
		<dc:creator><![CDATA[Janis Eidaks]]></dc:creator>
		<pubDate>Tue, 08 Apr 2025 12:36:04 +0000</pubDate>
				<category><![CDATA[Docker]]></category>
		<category><![CDATA[Handy Tips]]></category>
		<category><![CDATA[How-to]]></category>
		<category><![CDATA[Zabbix]]></category>
		<guid isPermaLink="false">https://blog.zabbix.com/?p=30025</guid>

					<description><![CDATA[<p>Installing Zabbix from packages can feel overwhelming, due to the availability of different configuration options. The detailed and comprehensive…</p>
<p>The post <a href="https://blog.zabbix.com/deploying-zabbix-components-with-docker-and-docker-compose/30025/">Deploying Zabbix Components with Docker and Docker Compose</a> appeared first on <a href="https://blog.zabbix.com/">Zabbix Blog</a>.</p>]]></description>
		
		
		<enclosure url="" length="0" type="" />

			</item>
		<item>
		<title>Accelerate Serverless Streamlit App Deployment with Terraform</title>
		<link>https://noise.getoto.net/2024/10/09/accelerate-serverless-streamlit-app-deployment-with-terraform/</link>
		
		<dc:creator><![CDATA[Kevon Mayers]]></dc:creator>
		<pubDate>Wed, 09 Oct 2024 05:07:23 +0000</pubDate>
				<category><![CDATA[Advanced (300)]]></category>
		<category><![CDATA[automation]]></category>
		<category><![CDATA[Best practices]]></category>
		<category><![CDATA[CI/CD]]></category>
		<category><![CDATA[Customer Solutions]]></category>
		<category><![CDATA[Developer Tools]]></category>
		<category><![CDATA[devops]]></category>
		<category><![CDATA[Docker]]></category>
		<category><![CDATA[Featured]]></category>
		<category><![CDATA[Infrastructure as Code]]></category>
		<category><![CDATA[Integration & Automation]]></category>
		<category><![CDATA[Pipelines]]></category>
		<category><![CDATA[Provisioning and orchestration]]></category>
		<category><![CDATA[Technical How-to]]></category>
		<category><![CDATA[Terraform]]></category>
		<category><![CDATA[Top Posts*]]></category>
		<guid isPermaLink="false">http://noise.getoto.net/?guid=7814239910884dfa4f82b1e631c8256f</guid>

					<description><![CDATA[Graphic created by Kevon Mayers. Introduction As customers increasingly seek to harness the power of generative AI (GenAI) and machine learning to deliver cutting-edge applications, the need for a flexible, intuitive, and scalable development platform has never been greater. In this landscape, Streamlit has emerged as a standout tool, making it easy for developers to […]]]></description>
		
		
		<enclosure url="" length="0" type="" />

			</item>
		<item>
		<title>Evolution of Catwalk: Model serving platform at Grab</title>
		<link>https://noise.getoto.net/2024/10/01/evolution-of-catwalk-model-serving-platform-at-grab/</link>
		
		<dc:creator><![CDATA[Grab Tech]]></dc:creator>
		<pubDate>Tue, 01 Oct 2024 00:00:50 +0000</pubDate>
				<category><![CDATA[Data Science]]></category>
		<category><![CDATA[Docker]]></category>
		<category><![CDATA[Engineering]]></category>
		<category><![CDATA[Kubernetes]]></category>
		<category><![CDATA[machine learning]]></category>
		<category><![CDATA[Models]]></category>
		<category><![CDATA[TensorFlow]]></category>
		<guid isPermaLink="false">https://engineering.grab.com/catwalk-evolution</guid>

					<description><![CDATA[Introduction

As Southeast Asia’s leading super app, Grab serves millions of users across multiple countries every day. Our services range from ride-hailing and food delivery to digital payments and much more. The backbone of our operations? Machine Le...]]></description>
		
		
		<enclosure url="" length="0" type="" />

			</item>
		<item>
		<title>Docker Raises Prices Up to 80 Percent and More</title>
		<link>https://noise.getoto.net/2024/09/13/docker-raises-prices-up-to-80-percent-and-more/</link>
		
		<dc:creator><![CDATA[Cliff Robinson]]></dc:creator>
		<pubDate>Fri, 13 Sep 2024 00:03:16 +0000</pubDate>
				<category><![CDATA[Docker]]></category>
		<category><![CDATA[Server Applications]]></category>
		<guid isPermaLink="false">https://www.servethehome.com/?p=80935</guid>

					<description><![CDATA[Docker Pro annual pricing is increaseing by 80 percent while some of the benefits are being adjusted to the subscription level
The post Docker Raises Prices Up to 80 Percent and More appeared first on ServeTheHome.
]]></description>
		
		
		<enclosure url="" length="0" type="" />

			</item>
		<item>
		<title>Debugging a FUSE deadlock in the Linux kernel</title>
		<link>https://noise.getoto.net/2023/05/19/debugging-a-fuse-deadlock-in-the-linux-kernel/</link>
		
		<dc:creator><![CDATA[Netflix Technology Blog]]></dc:creator>
		<pubDate>Fri, 19 May 2023 19:21:03 +0000</pubDate>
				<category><![CDATA[Deadlock]]></category>
		<category><![CDATA[Docker]]></category>
		<category><![CDATA[fuse]]></category>
		<category><![CDATA[linux]]></category>
		<guid isPermaLink="false">https://medium.com/p/c75cd7989b6d</guid>

					<description><![CDATA[<p><a href="https://tycho.pizza/">Tycho Andersen</a></p><p>The Compute team at Netflix is charged with managing all AWS and containerized workloads at Netflix, including autoscaling, deployment of containers, issue remediation, etc. As part of this team, I work on fixing strange things that users report.</p><p>This particular issue involved a custom internal <a href="https://www.kernel.org/doc/html/latest/filesystems/fuse.html">FUSE filesystem</a>: <a href="https://netflixtechblog.com/netflix-drive-a607538c3055">ndrive</a>. It had been festering for some time, but needed someone to sit down and look at it in anger. This blog post describes how I poked at /procto get a sense of what was going on, before posting the issue to the kernel mailing list and getting schooled on how the kernel’s wait code actually works!</p><h3>Symptom: Stuck Docker Kill &#38; A Zombie Process</h3><p>We had a stuck docker API call:</p><pre>goroutine 146 [select, 8817 minutes]:<br>net/http.(*persistConn).roundTrip(0xc000658fc0, 0xc0003fc080, 0x0, 0x0, 0x0)<br>        /usr/local/go/src/net/http/transport.go:2610 +0x765<br>net/http.(*Transport).roundTrip(0xc000420140, 0xc000966200, 0x30, 0x1366f20, 0x162)<br>        /usr/local/go/src/net/http/transport.go:592 +0xacb<br>net/http.(*Transport).RoundTrip(0xc000420140, 0xc000966200, 0xc000420140, 0x0, 0x0)<br>        /usr/local/go/src/net/http/roundtrip.go:17 +0x35<br>net/http.send(0xc000966200, 0x161eba0, 0xc000420140, 0x0, 0x0, 0x0, 0xc00000e050, 0x3, 0x1, 0x0)<br>        /usr/local/go/src/net/http/client.go:251 +0x454<br>net/http.(*Client).send(0xc000438480, 0xc000966200, 0x0, 0x0, 0x0, 0xc00000e050, 0x0, 0x1, 0x10000168e)<br>        /usr/local/go/src/net/http/client.go:175 +0xff<br>net/http.(*Client).do(0xc000438480, 0xc000966200, 0x0, 0x0, 0x0)<br>        /usr/local/go/src/net/http/client.go:717 +0x45f<br>net/http.(*Client).Do(...)<br>        /usr/local/go/src/net/http/client.go:585<br>golang.org/x/net/context/ctxhttp.Do(0x163bd48, 0xc000044090, 0xc000438480, 0xc000966100, 0x0, 0x0, 0x0)<br>        /go/pkg/mod/golang.org/x/net@v0.0.0-20211209124913-491a49abca63/context/ctxhttp/ctxhttp.go:27 +0x10f<br>github.com/docker/docker/client.(*Client).doRequest(0xc0001a8200, 0x163bd48, 0xc000044090, 0xc000966100, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)<br>        /go/pkg/mod/github.com/moby/moby@v0.0.0-20190408150954-50ebe4562dfc/client/request.go:132 +0xbe<br>github.com/docker/docker/client.(*Client).sendRequest(0xc0001a8200, 0x163bd48, 0xc000044090, 0x13d8643, 0x3, 0xc00079a720, 0x51, 0x0, 0x0, 0x0, ...)<br>        /go/pkg/mod/github.com/moby/moby@v0.0.0-20190408150954-50ebe4562dfc/client/request.go:122 +0x156<br>github.com/docker/docker/client.(*Client).get(...)<br>        /go/pkg/mod/github.com/moby/moby@v0.0.0-20190408150954-50ebe4562dfc/client/request.go:37<br>github.com/docker/docker/client.(*Client).ContainerInspect(0xc0001a8200, 0x163bd48, 0xc000044090, 0xc0006a01c0, 0x40, 0x0, 0x0, 0x0, 0x0, 0x0, ...)<br>        /go/pkg/mod/github.com/moby/moby@v0.0.0-20190408150954-50ebe4562dfc/client/container_inspect.go:18 +0x128<br>github.com/Netflix/titus-executor/executor/runtime/docker.(*DockerRuntime).Kill(0xc000215180, 0x163bdb8, 0xc000938600, 0x1, 0x0, 0x0)<br>        /var/lib/buildkite-agent/builds/ip-192-168-1-90-1/netflix/titus-executor/executor/runtime/docker/docker.go:2835 +0x310<br>github.com/Netflix/titus-executor/executor/runner.(*Runner).doShutdown(0xc000432dc0, 0x163bd10, 0xc000938390, 0x1, 0xc000b821e0, 0x1d, 0xc0005e4710)<br>        /var/lib/buildkite-agent/builds/ip-192-168-1-90-1/netflix/titus-executor/executor/runner/runner.go:326 +0x4f4<br>github.com/Netflix/titus-executor/executor/runner.(*Runner).startRunner(0xc000432dc0, 0x163bdb8, 0xc00071e0c0, 0xc0a502e28c08b488, 0x24572b8, 0x1df5980)<br>        /var/lib/buildkite-agent/builds/ip-192-168-1-90-1/netflix/titus-executor/executor/runner/runner.go:122 +0x391<br>created by github.com/Netflix/titus-executor/executor/runner.StartTaskWithRuntime<br>        /var/lib/buildkite-agent/builds/ip-192-168-1-90-1/netflix/titus-executor/executor/runner/runner.go:81 +0x411</pre><p>Here, our management engine has made an HTTP call to the Docker API’s unix socket asking it to kill a container. Our containers are configured to be killed via SIGKILL. But this is strange. kill(SIGKILL) should be relatively fatal, so what is the container doing?</p><pre>$ docker exec -it 6643cd073492 bash<br>OCI runtime exec failed: exec failed: container_linux.go:380: starting container process caused: process_linux.go:130: executing setns process caused: exit status 1: unknown</pre><p>Hmm. Seems like it’s alive, but setns(2) fails. Why would that be? If we look at the process tree via ps awwfux, we see:</p><pre>\_ containerd-shim -namespace moby -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/moby/6643cd073492ba9166100ed30dbe389ff1caef0dc3d35<br>&#124;  \_ [docker-init]<br>&#124;      \_ [ndrive] &#60;defunct&#62;</pre><p>Ok, so the container’s init process is still alive, but it has one zombie child. What could the container’s init process possibly be doing?</p><pre># cat /proc/1528591/stack<br>[&#60;0&#62;] do_wait+0x156/0x2f0<br>[&#60;0&#62;] kernel_wait4+0x8d/0x140<br>[&#60;0&#62;] zap_pid_ns_processes+0x104/0x180<br>[&#60;0&#62;] do_exit+0xa41/0xb80<br>[&#60;0&#62;] do_group_exit+0x3a/0xa0<br>[&#60;0&#62;] __x64_sys_exit_group+0x14/0x20<br>[&#60;0&#62;] do_syscall_64+0x37/0xb0<br>[&#60;0&#62;] entry_SYSCALL_64_after_hwframe+0x44/0xae</pre><p>It is in the process of exiting, but it seems stuck. The only child is the ndrive process in Z (i.e. “zombie”) state, though. Zombies are processes that have successfully exited, and are waiting to be reaped by a corresponding wait() syscall from their parents. So how could the kernel be stuck waiting on a zombie?</p><pre># ls /proc/1544450/task<br>1544450  1544574</pre><p>Ah ha, there are two threads in the thread group. One of them is a zombie, maybe the other one isn’t:</p><pre># cat /proc/1544574/stack<br>[&#60;0&#62;] request_wait_answer+0x12f/0x210<br>[&#60;0&#62;] fuse_simple_request+0x109/0x2c0<br>[&#60;0&#62;] fuse_flush+0x16f/0x1b0<br>[&#60;0&#62;] filp_close+0x27/0x70<br>[&#60;0&#62;] put_files_struct+0x6b/0xc0<br>[&#60;0&#62;] do_exit+0x360/0xb80<br>[&#60;0&#62;] do_group_exit+0x3a/0xa0<br>[&#60;0&#62;] get_signal+0x140/0x870<br>[&#60;0&#62;] arch_do_signal_or_restart+0xae/0x7c0<br>[&#60;0&#62;] exit_to_user_mode_prepare+0x10f/0x1c0<br>[&#60;0&#62;] syscall_exit_to_user_mode+0x26/0x40<br>[&#60;0&#62;] do_syscall_64+0x46/0xb0<br>[&#60;0&#62;] entry_SYSCALL_64_after_hwframe+0x44/0xae</pre><p>Indeed it is not a zombie. It is trying to become one as hard as it can, but it’s blocking inside FUSE for some reason. To find out why, let’s look at some kernel code. If we look at <a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/pid_namespace.c?h=v5.19#n166">zap_pid_ns_processes()</a>, it does:</p><pre>/*<br> * Reap the EXIT_ZOMBIE children we had before we ignored SIGCHLD.<br> * kernel_wait4() will also block until our children traced from the<br> * parent namespace are detached and become EXIT_DEAD.<br> */<br>do {<br>        clear_thread_flag(TIF_SIGPENDING);<br>        rc = kernel_wait4(-1, NULL, __WALL, NULL);<br>} while (rc != -ECHILD);</pre><p>which is where we are stuck, but before that, it has done:</p><pre>/* Don't allow any more processes into the pid namespace */<br>disable_pid_allocation(pid_ns);</pre><p>which is why docker can’t setns() — the <em>namespace</em> is a zombie. Ok, so we can’t setns(2), but why are we stuck in kernel_wait4()? To understand why, let’s look at what the other thread was doing in FUSE’s <a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/fuse/dev.c?h=v5.19#n407">request_wait_answer()</a>:</p><pre>/*<br> * Either request is already in userspace, or it was forced.<br> * Wait it out.<br> */<br>wait_event(req-&#62;waitq, test_bit(FR_FINISHED, &#38;req-&#62;flags));</pre><p>Ok, so we’re waiting for an event (in this case, that userspace has replied to the FUSE flush request). But zap_pid_ns_processes()sent a SIGKILL! SIGKILL should be very fatal to a process. If we look at the process, we can indeed see that there’s a pending SIGKILL:</p><pre># grep Pnd /proc/1544574/status<br>SigPnd: 0000000000000000<br>ShdPnd: 0000000000000100</pre><p>Viewing process status this way, you can see 0x100 (i.e. the 9th bit is set) under SigPnd, which is the signal number corresponding to SIGKILL. Pending signals are signals that have been generated by the kernel, but have not yet been delivered to userspace. Signals are only delivered at certain times, for example when entering or leaving a syscall, or when waiting on events. If the kernel is currently doing something on behalf of the task, the signal may be pending. Signals can also be blocked by a task, so that they are never delivered. Blocked signals will show up in their respective pending sets as well. However, man 7 signal says: “The signals SIGKILL and SIGSTOP cannot be caught, blocked, or ignored.” But here the kernel is telling us that we have a pending SIGKILL, aka that it is being ignored even while the task is waiting!</p><h3>Red Herring: How do Signals Work?</h3><p>Well that is weird. The wait code (i.e. include/linux/wait.h) is used everywhere in the kernel: semaphores, wait queues, completions, etc. Surely it knows to look for SIGKILLs. So what does wait_event() actually do? Digging through the macro expansions and wrappers, the meat of it is:</p><pre>#define ___wait_event(wq_head, condition, state, exclusive, ret, cmd)           \<br>({                                                                              \<br>        __label__ __out;                                                        \<br>        struct wait_queue_entry __wq_entry;                                     \<br>        long __ret = ret;       /* explicit shadow */                           \<br>                                                                                \<br>        init_wait_entry(&#38;__wq_entry, exclusive ? WQ_FLAG_EXCLUSIVE : 0);        \<br>        for (;;) {                                                              \<br>                long __int = prepare_to_wait_event(&#38;wq_head, &#38;__wq_entry, state);\<br>                                                                                \<br>                if (condition)                                                  \<br>                        break;                                                  \<br>                                                                                \<br>                if (___wait_is_interruptible(state) &#38;&#38; __int) {                 \<br>                        __ret = __int;                                          \<br>                        goto __out;                                             \<br>                }                                                               \<br>                                                                                \<br>                cmd;                                                            \<br>        }                                                                       \<br>        finish_wait(&#38;wq_head, &#38;__wq_entry);                                     \<br>__out:  __ret;                                                                  \<br>})</pre><p>So it loops forever, doing prepare_to_wait_event(), checking the condition, then checking to see if we need to interrupt. Then it does cmd, which in this case is schedule(), i.e. “do something else for a while”. prepare_to_wait_event() looks like:</p><pre>long prepare_to_wait_event(struct wait_queue_head *wq_head, struct wait_queue_entry *wq_entry, int state)<br>{<br>        unsigned long flags;<br>        long ret = 0;<br><br>        spin_lock_irqsave(&#38;wq_head-&#62;lock, flags);<br>        if (signal_pending_state(state, current)) {<br>                /*<br>                 * Exclusive waiter must not fail if it was selected by wakeup,<br>                 * it should "consume" the condition we were waiting for.<br>                 *<br>                 * The caller will recheck the condition and return success if<br>                 * we were already woken up, we can not miss the event because<br>                 * wakeup locks/unlocks the same wq_head-&#62;lock.<br>                 *<br>                 * But we need to ensure that set-condition + wakeup after that<br>                 * can't see us, it should wake up another exclusive waiter if<br>                 * we fail.<br>                 */<br>                list_del_init(&#38;wq_entry-&#62;entry);<br>                ret = -ERESTARTSYS;<br>        } else {<br>                if (list_empty(&#38;wq_entry-&#62;entry)) {<br>                        if (wq_entry-&#62;flags &#38; WQ_FLAG_EXCLUSIVE)<br>                                __add_wait_queue_entry_tail(wq_head, wq_entry);<br>                        else<br>                                __add_wait_queue(wq_head, wq_entry);<br>                }<br>                set_current_state(state);<br>        }<br>        spin_unlock_irqrestore(&#38;wq_head-&#62;lock, flags);<br><br>        return ret;<br>}<br>EXPORT_SYMBOL(prepare_to_wait_event);</pre><p>It looks like the only way we can break out of this with a non-zero exit code is if signal_pending_state() is true. Since our call site was just wait_event(), we know that state here is TASK_UNINTERRUPTIBLE; the definition of signal_pending_state() looks like:</p><pre>static inline int signal_pending_state(unsigned int state, struct task_struct *p)<br>{<br>        if (!(state &#38; (TASK_INTERRUPTIBLE &#124; TASK_WAKEKILL)))<br>                return 0;<br>        if (!signal_pending(p))<br>                return 0;<br><br>        return (state &#38; TASK_INTERRUPTIBLE) &#124;&#124; __fatal_signal_pending(p);<br>}</pre><p>Our task is not interruptible, so the first if fails. Our task should have a signal pending, though, right?</p><pre>static inline int signal_pending(struct task_struct *p)<br>{<br>        /*<br>         * TIF_NOTIFY_SIGNAL isn't really a signal, but it requires the same<br>         * behavior in terms of ensuring that we break out of wait loops<br>         * so that notify signal callbacks can be processed.<br>         */<br>        if (unlikely(test_tsk_thread_flag(p, TIF_NOTIFY_SIGNAL)))<br>                return 1;<br>        return task_sigpending(p);<br>}</pre><p>As the comment notes, TIF_NOTIFY_SIGNAL isn’t relevant here, in spite of its name, but let’s look at task_sigpending():</p><pre>static inline int task_sigpending(struct task_struct *p)<br>{<br>        return unlikely(test_tsk_thread_flag(p,TIF_SIGPENDING));<br>}</pre><p>Hmm. Seems like we should have that flag set, right? To figure that out, let’s look at how signal delivery works. When we’re shutting down the pid namespace in zap_pid_ns_processes(), it does:</p><pre>group_send_sig_info(SIGKILL, SEND_SIG_PRIV, task, PIDTYPE_MAX);</pre><p>which eventually gets to __send_signal_locked(), which has:</p><pre>pending = (type != PIDTYPE_PID) ? &#38;t-&#62;signal-&#62;shared_pending : &#38;t-&#62;pending;<br>...<br>sigaddset(&#38;pending-&#62;signal, sig);<br>...<br>complete_signal(sig, t, type);</pre><p>Using PIDTYPE_MAX here as the type is a little weird, but it roughly indicates “this is very privileged kernel stuff sending this signal, you should definitely deliver it”. There is a bit of unintended consequence here, though, in that __send_signal_locked() ends up sending the SIGKILL to the shared set, instead of the individual task’s set. If we look at the __fatal_signal_pending() code, we see:</p><pre>static inline int __fatal_signal_pending(struct task_struct *p)<br>{<br>        return unlikely(sigismember(&#38;p-&#62;pending.signal, SIGKILL));<br>}</pre><p>But it turns out this is a bit of a red herring (<a href="https://lore.kernel.org/all/YuGUyayVWDB7R89i@tycho.pizza/">although</a> <a href="https://lore.kernel.org/all/20220728091220.GA11207@redhat.com/">it</a> <a href="https://lore.kernel.org/all/871qu6bjp3.fsf@email.froward.int.ebiederm.org/">took</a> <a href="https://lore.kernel.org/all/8735elhy4u.fsf@email.froward.int.ebiederm.org/">a</a> <a href="https://lore.kernel.org/all/87pmhofr1q.fsf@email.froward.int.ebiederm.org/">while</a> for me to understand that).</p><h3>How Signals Actually Get Delivered To a Process</h3><p>To understand what’s really going on here, we need to look at complete_signal(), since it unconditionally adds a SIGKILL to the task’s pending set:</p><pre>sigaddset(&#38;t-&#62;pending.signal, SIGKILL);</pre><p>but why doesn’t it work? At the top of the function we have:</p><pre>/*<br> * Now find a thread we can wake up to take the signal off the queue.<br> *<br> * If the main thread wants the signal, it gets first crack.<br> * Probably the least surprising to the average bear.<br> */<br>if (wants_signal(sig, p))<br>        t = p;<br>else if ((type == PIDTYPE_PID) &#124;&#124; thread_group_empty(p))<br>        /*<br>         * There is just one thread and it does not need to be woken.<br>         * It will dequeue unblocked signals before it runs again.<br>         */<br>        return;</pre><p>but as <a href="https://lore.kernel.org/all/877d4jbabb.fsf@email.froward.int.ebiederm.org/">Eric Biederman described</a>, basically every thread can handle a SIGKILL at any time. Here’s wants_signal():</p><pre>static inline bool wants_signal(int sig, struct task_struct *p)<br>{<br>        if (sigismember(&#38;p-&#62;blocked, sig))<br>                return false;<br><br>        if (p-&#62;flags &#38; PF_EXITING)<br>                return false;<br><br>        if (sig == SIGKILL)<br>                return true;<br><br>        if (task_is_stopped_or_traced(p))<br>                return false;<br><br>        return task_curr(p) &#124;&#124; !task_sigpending(p);<br>}</pre><p>So… if a thread is already exiting (i.e. it has PF_EXITING), it doesn’t want a signal. Consider the following sequence of events:</p><p>1. a task opens a FUSE file, and doesn’t close it, then exits. During that exit, the kernel dutifully calls do_exit(), which does the following:</p><pre>exit_signals(tsk); /* sets PF_EXITING */</pre><p>2. do_exit() continues on to exit_files(tsk);, which flushes all files that are still open, resulting in the stack trace above.</p><p>3. the pid namespace exits, and enters zap_pid_ns_processes(), sends a SIGKILL to everyone (that it expects to be fatal), and then waits for everyone to exit.</p><p>4. this kills the FUSE daemon in the pid ns so it can never respond.</p><p>5. complete_signal() for the FUSE task that was already exiting ignores the signal, since it has PF_EXITING.</p><p>6. Deadlock. Without manually aborting the FUSE connection, things will hang forever.</p><h3>Solution: don’t wait!</h3><p>It doesn’t really make sense to wait for flushes in this case: the task is dying, so there’s nobody to tell the return code of flush() to. It also turns out that this bug can happen with several filesystems (anything that calls the kernel’s wait code in flush(), i.e. basically anything that talks to something outside the local kernel).</p><p>Individual filesystems will need to be patched in the meantime, for example the fix for FUSE is <a href="https://github.com/torvalds/linux/commit/14feceeeb012faf9def7d313d37f5d4f85e6572b">here</a>, which was released on April 23 in Linux 6.3.</p><p>While this blog post addresses FUSE deadlocks, there are definitely issues in the nfs code and elsewhere, which we have not hit in production yet, but almost certainly will. You can also see it as a <a href="https://lore.kernel.org/all/20230512225414.GE3223426@dread.disaster.area/">symptom of other filesystem bugs</a>. Something to look out for if you have a pid namespace that won’t exit.</p><p>This is just a small taste of the variety of strange issues we encounter running containers at scale at Netflix. Our team is hiring, so please reach out if you also love red herrings and kernel deadlocks!</p><img src="https://medium.com/_/stat?event=post.clientViewed&#38;referrerSource=full_rss&#38;postId=c75cd7989b6d" width="1" height="1" alt=""><hr><p><a href="https://netflixtechblog.com/debugging-a-fuse-deadlock-in-the-linux-kernel-c75cd7989b6d">Debugging a FUSE deadlock in the Linux kernel</a> was originally published in <a href="https://netflixtechblog.com/">Netflix TechBlog</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></description>
		
		
		<enclosure url="" length="0" type="" />

			</item>
		<item>
		<title>How to run AWS CloudHSM workloads in container environments</title>
		<link>https://noise.getoto.net/2023/01/25/how-to-run-aws-cloudhsm-workloads-in-container-environments/</link>
		
		<dc:creator><![CDATA[Derek Tumulak]]></dc:creator>
		<pubDate>Wed, 25 Jan 2023 21:59:27 +0000</pubDate>
				<category><![CDATA[AWS CloudHSM]]></category>
		<category><![CDATA[Containers]]></category>
		<category><![CDATA[Docker]]></category>
		<category><![CDATA[Intermediate (200)]]></category>
		<category><![CDATA[java]]></category>
		<category><![CDATA[PKCS#11]]></category>
		<category><![CDATA[Security Blog]]></category>
		<category><![CDATA[Security, Identity & Compliance]]></category>
		<category><![CDATA[Technical How-to]]></category>
		<guid isPermaLink="false">http://noise.getoto.net/?guid=100c0e2e943d243b5b7506639bdbed50</guid>

					<description><![CDATA[January 25, 2023: We updated this post to reflect the fact that CloudHSM SDK3 does not support serverless environments and we strongly recommend deploying SDK5. AWS CloudHSM provides hardware security modules (HSMs) in the AWS Cloud. With CloudHSM, you can generate and use your own encryption keys in the AWS Cloud, and manage your keys […]]]></description>
		
		
		<enclosure url="" length="0" type="" />

			</item>
		<item>
		<title>Monitoring Kubernetes with Zabbix</title>
		<link>https://noise.getoto.net/2023/01/24/monitoring-kubernetes-with-zabbix/</link>
		
		<dc:creator><![CDATA[Michaela DeForest]]></dc:creator>
		<pubDate>Tue, 24 Jan 2023 13:00:09 +0000</pubDate>
				<category><![CDATA[Containers]]></category>
		<category><![CDATA[Docker]]></category>
		<category><![CDATA[How-to]]></category>
		<category><![CDATA[Kubernetes]]></category>
		<category><![CDATA[zabbix 6.0]]></category>
		<guid isPermaLink="false">https://blog.zabbix.com/?p=25055</guid>

					<description><![CDATA[There are many options available for monitoring Kubernetes and cloud-native applications. In this multi-part blog series, we’ll explore how…]]></description>
		
		
		<enclosure url="" length="0" type="" />

			</item>
		<item>
		<title>Docker Container Monitoring With Zabbix</title>
		<link>https://noise.getoto.net/2022/04/19/docker-container-monitoring-with-zabbix/</link>
		
		<dc:creator><![CDATA[Dmitry Lambert]]></dc:creator>
		<pubDate>Tue, 19 Apr 2022 10:41:32 +0000</pubDate>
				<category><![CDATA[community]]></category>
		<category><![CDATA[Containers]]></category>
		<category><![CDATA[Docker]]></category>
		<category><![CDATA[How-to]]></category>
		<category><![CDATA[Technical]]></category>
		<category><![CDATA[templates]]></category>
		<guid isPermaLink="false">https://blog.zabbix.com/?p=20175</guid>

					<description><![CDATA[<p>In this blog post, I will cover Docker container monitoring with Zabbix. We will use the official Docker by…</p>
<p>The post <a rel="nofollow" href="https://blog.zabbix.com/docker-container-monitoring-with-zabbix/20175/">Docker Container Monitoring With Zabbix</a> appeared first on <a rel="nofollow" href="https://blog.zabbix.com/">Zabbix Blog</a>.</p>]]></description>
		
		
		<enclosure url="" length="0" type="" />

			</item>
		<item>
		<title>Deploy and Manage Gitlab Runners on Amazon EC2</title>
		<link>https://noise.getoto.net/2022/01/26/deploy-and-manage-gitlab-runners-on-amazon-ec2/</link>
		
		<dc:creator><![CDATA[Sylvia Qi]]></dc:creator>
		<pubDate>Wed, 26 Jan 2022 00:51:07 +0000</pubDate>
				<category><![CDATA[Amazon EC2]]></category>
		<category><![CDATA[AWS CloudFormation]]></category>
		<category><![CDATA[devops]]></category>
		<category><![CDATA[Docker]]></category>
		<category><![CDATA[Expert (400)]]></category>
		<category><![CDATA[GitLab]]></category>
		<category><![CDATA[Technical How-to]]></category>
		<guid isPermaLink="false">http://noise.getoto.net/?guid=a5475e549e7f0d69c7bc280753cd0629</guid>

					<description><![CDATA[Gitlab CI is a tool utilized by many enterprises to automate their Continuous integration, continuous delivery and deployment (CI/CD) process. A Gitlab CI/CD pipeline consists of two major components: A .gitlab-ci.yml file describing a pipeline’s jobs, and a Gitlab Runner, an application that executes the pipeline jobs. Setting up the Gitlab Runner is a time-consuming […]]]></description>
		
		
		<enclosure url="" length="0" type="" />

			</item>
		<item>
		<title>Handy Tips #21: Deploying Zabbix Server with Docker containers</title>
		<link>https://noise.getoto.net/2022/01/19/handy-tips-21-deploying-zabbix-server-with-docker-containers/</link>
		
		<dc:creator><![CDATA[Arturs Lontons]]></dc:creator>
		<pubDate>Wed, 19 Jan 2022 08:57:12 +0000</pubDate>
				<category><![CDATA[Docker]]></category>
		<category><![CDATA[Handy Tips]]></category>
		<category><![CDATA[zabbix server]]></category>
		<guid isPermaLink="false">https://blog.zabbix.com/?p=18972</guid>

					<description><![CDATA[<p>Deploy Zabbix components in docker containers for advanced automation, scalability, and maintenance. In the past few years, containers have…</p>
<p>The post <a rel="nofollow" href="https://blog.zabbix.com/handy-tips-21-deploying-zabbix-server-with-docker-containers/18972/">Handy Tips #21: Deploying Zabbix Server with Docker containers</a> appeared first on <a rel="nofollow" href="https://blog.zabbix.com/">Zabbix Blog</a>.</p>]]></description>
		
		
		<enclosure url="" length="0" type="" />

			</item>
		<item>
		<title>Using AWS CodePipeline for deploying container images to AWS Lambda Functions</title>
		<link>https://noise.getoto.net/2021/08/20/using-aws-codepipeline-for-deploying-container-images-to-aws-lambda-functions/</link>
		
		<dc:creator><![CDATA[Kirankumar Chandrashekar]]></dc:creator>
		<pubDate>Fri, 20 Aug 2021 18:25:51 +0000</pubDate>
				<category><![CDATA[Amazon CodeBuild]]></category>
		<category><![CDATA[Amazon CodePipeline]]></category>
		<category><![CDATA[Amazon ECR]]></category>
		<category><![CDATA[AWS CodeBuild]]></category>
		<category><![CDATA[AWS CodePipeline]]></category>
		<category><![CDATA[AWS Lambda]]></category>
		<category><![CDATA[Compute]]></category>
		<category><![CDATA[Containers]]></category>
		<category><![CDATA[Developer Tools]]></category>
		<category><![CDATA[devops]]></category>
		<category><![CDATA[Docker]]></category>
		<category><![CDATA[Technical How-to]]></category>
		<guid isPermaLink="false">http://noise.getoto.net/?guid=6d0fc67c77700d575d8ea09e134afd18</guid>

					<description><![CDATA[AWS Lambda launched support for packaging and deploying functions as container images at re:Invent 2020. In the post working with Lambda layers and extensions in container images, we demonstrated packaging Lambda Functions with layers while using container images. This post will teach you to use AWS CodePipeline to deploy docker images for microservices architecture involving […]]]></description>
		
		
		<enclosure url="" length="0" type="" />

			</item>
		<item>
		<title>Build and deploy .NET web applications to ARM-powered AWS Graviton 2 Amazon ECS Clusters using AWS CDK</title>
		<link>https://noise.getoto.net/2021/06/10/build-and-deploy-net-web-applications-to-arm-powered-aws-graviton-2-amazon-ecs-clusters-using-aws-cdk/</link>
		
		<dc:creator><![CDATA[Matt Laver]]></dc:creator>
		<pubDate>Thu, 10 Jun 2021 02:02:46 +0000</pubDate>
				<category><![CDATA[.net]]></category>
		<category><![CDATA[Amazon ECS]]></category>
		<category><![CDATA[AWS .NET Development]]></category>
		<category><![CDATA[AWS CDK]]></category>
		<category><![CDATA[Containers]]></category>
		<category><![CDATA[devops]]></category>
		<category><![CDATA[Docker]]></category>
		<category><![CDATA[Graviton]]></category>
		<category><![CDATA[graviton2]]></category>
		<category><![CDATA[Infrastructure & Automation]]></category>
		<guid isPermaLink="false">http://noise.getoto.net/?guid=c9ef6bff0f3c726a72d4554a36e62254</guid>

					<description><![CDATA[With .NET providing first-class support for ARM architecture, running .NET applications on an AWS Graviton processor provides you with more choices to help optimize performance and cost. We have already written about .NET 5 with Graviton benchmarks; in this post, we explore how C#/.NET developers can take advantages of Graviton processors and obtain this performance […]]]></description>
		
		
		<enclosure url="" length="0" type="" />

			</item>
		<item>
		<title>Build and Deploy Docker Images to AWS using EC2 Image Builder</title>
		<link>https://noise.getoto.net/2021/05/22/build-and-deploy-docker-images-to-aws-using-ec2-image-builder/</link>
		
		<dc:creator><![CDATA[Joseph Keating]]></dc:creator>
		<pubDate>Sat, 22 May 2021 01:11:42 +0000</pubDate>
				<category><![CDATA[Amazon ECR]]></category>
		<category><![CDATA[devops]]></category>
		<category><![CDATA[Docker]]></category>
		<category><![CDATA[EC2]]></category>
		<category><![CDATA[Pipelines]]></category>
		<category><![CDATA[Technical How-to]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<guid isPermaLink="false">http://noise.getoto.net/?guid=bc048e48db856ec582230ce3d1f6b8bd</guid>

					<description><![CDATA[The NFL, an AWS Professional Services partner, is collaborating with NFL’s Player Health and Safety team to build the Digital Athlete Program. The Digital Athlete Program is working to drive progress in the prevention, diagnosis, and treatment of injuries; enhance medical protocols; and further improve the way football is taught and played. The NFL, in […]]]></description>
		
		
		<enclosure url="" length="0" type="" />

			</item>
		<item>
		<title>Scaling Zabbix with containers</title>
		<link>https://noise.getoto.net/2021/02/10/scaling-zabbix-with-containers/</link>
		
		<dc:creator><![CDATA[Robert Silva]]></dc:creator>
		<pubDate>Wed, 10 Feb 2021 14:36:26 +0000</pubDate>
				<category><![CDATA[community]]></category>
		<category><![CDATA[conferences]]></category>
		<category><![CDATA[Containers]]></category>
		<category><![CDATA[Docker]]></category>
		<category><![CDATA[How-to]]></category>
		<guid isPermaLink="false">https://blog.zabbix.com/?p=13155</guid>

					<description><![CDATA[In this post, a new approach with Zabbix in High Availability is explained, as well as discussed challenges when…]]></description>
		
		
		<enclosure url="" length="0" type="" />

			</item>
		<item>
		<title>Evolving Container Security With Linux User Namespaces</title>
		<link>https://noise.getoto.net/2020/12/23/evolving-container-security-with-linux-user-namespaces/</link>
		
		<dc:creator><![CDATA[Netflix Technology Blog]]></dc:creator>
		<pubDate>Wed, 23 Dec 2020 16:02:47 +0000</pubDate>
				<category><![CDATA[Containers]]></category>
		<category><![CDATA[Docker]]></category>
		<category><![CDATA[Kubernetes]]></category>
		<category><![CDATA[security]]></category>
		<guid isPermaLink="false">https://medium.com/p/afbe3308c082</guid>

					<description><![CDATA[By Fabio Kung, Sargun Dhillon, Andrew Spyker, Kyle, Rob Gulewich, Nabil Schear, Andrew Leung, Daniel Muino, and Manas AlekarAs previously discussed on the Netflix Tech Blog, Titus is the Netflix container orchestration system. It runs a wide variety of...]]></description>
		
		
		<enclosure url="" length="0" type="" />

			</item>
		<item>
		<title>Reducing Docker image build time on AWS CodeBuild using an external cache</title>
		<link>https://noise.getoto.net/2020/09/04/reducing-docker-image-build-time-on-aws-codebuild-using-an-external-cache/</link>
		
		<dc:creator><![CDATA[Camillo Anania]]></dc:creator>
		<pubDate>Fri, 04 Sep 2020 00:20:34 +0000</pubDate>
				<category><![CDATA[Amazon EC2 Container Registry]]></category>
		<category><![CDATA[Amazon ECR]]></category>
		<category><![CDATA[AWS CodeBuild]]></category>
		<category><![CDATA[devops]]></category>
		<category><![CDATA[Docker]]></category>
		<guid isPermaLink="false">http://noise.getoto.net/?guid=b4096ccce6106828871d2ee7f73a2426</guid>

					<description><![CDATA[With the proliferation of containerized solutions to simplify creating, deploying, and running applications, coupled with the use of automation CI/CD pipelines that continuously rebuild, test, and deploy such applications when new changes are committed, it&#8217;s important that your CI/CD pipelines run as quickly as possible, enabling you to get early feedback and allowing for faster [&#8230;]]]></description>
		
		
		<enclosure url="" length="0" type="" />

			</item>
	</channel>
</rss>

<!--
Performance optimized by W3 Total Cache. Learn more: https://www.boldgrid.com/w3-total-cache/

Object Caching 60/418 objects using Memcached
Page Caching using Disk: Enhanced 
Lazy Loading (feed)
Database Caching using Memcached

Served from: noise.getoto.net @ 2025-12-08 10:00:41 by W3 Total Cache
-->