<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Jupyter Notebook &#8211; Noise</title>
	<atom:link href="https://noise.getoto.net/tag/jupyter-notebook/feed/" rel="self" type="application/rss+xml" />
	<link>https://noise.getoto.net</link>
	<description>The collective thoughts of the interwebz</description>
	<lastBuildDate>Mon, 14 Oct 2024 20:02:47 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.8.2</generator>
	<item>
		<title>Investigation of a Workbench UI Latency Issue</title>
		<link>https://noise.getoto.net/2024/10/14/investigation-of-a-workbench-ui-latency-issue/</link>
		
		<dc:creator><![CDATA[Netflix Technology Blog]]></dc:creator>
		<pubDate>Mon, 14 Oct 2024 20:02:47 +0000</pubDate>
				<category><![CDATA[CPU]]></category>
		<category><![CDATA[debugging]]></category>
		<category><![CDATA[Jupyter Notebook]]></category>
		<category><![CDATA[Performance]]></category>
		<guid isPermaLink="false">https://medium.com/p/faa017b4653d</guid>

					<description><![CDATA[<p>By: <a href="https://www.linkedin.com/in/hechaoli/">Hechao Li</a> and <a href="https://www.linkedin.com/in/mayworm/">Marcelo Mayworm</a></p><p>With special thanks to our stunning colleagues <a href="https://www.linkedin.com/in/amer-ather-9071181/">Amer Ather</a>, <a href="https://www.linkedin.com/in/itaydafna">Itay Dafna</a>, <a href="https://www.linkedin.com/in/lucaepozzi/">Luca Pozzi</a>, <a href="https://www.linkedin.com/in/matheusdeoleao/">Matheus Leão</a>, and <a href="https://www.linkedin.com/in/yeji682/">Ye Ji</a>.</p><h3>Overview</h3><p>At Netflix, the Analytics and Developer Experience organization, part of the Data Platform, offers a product called Workbench. Workbench is a remote development workspace based on<a href="https://netflixtechblog.com/titus-the-netflix-container-management-platform-is-now-open-source-f868c9fb5436"> Titus</a> that allows data practitioners to work with big data and machine learning use cases at scale. A common use case for Workbench is running<a href="https://jupyterlab.readthedocs.io/en/latest/"> JupyterLab</a> Notebooks.</p><p>Recently, several users reported that their JupyterLab UI becomes slow and unresponsive when running certain notebooks. This document details the intriguing process of debugging this issue, all the way from the UI down to the Linux kernel.</p><h3>Symptom</h3><p>Machine Learning engineer <a href="https://www.linkedin.com/in/lucaepozzi/">Luca Pozzi</a> reported to our Data Platform team that their <strong>JupyterLab UI on their workbench becomes slow and unresponsive when running some of their Notebooks.</strong> Restarting the <em>ipykernel</em> process, which runs the Notebook, might temporarily alleviate the problem, but the frustration persists as more notebooks are run.</p><h3>Quantify the Slowness</h3><p>While we observed the issue firsthand, the term “UI being slow” is subjective and difficult to measure. To investigate this issue, <strong>we needed a quantitative analysis of the slowness</strong>.</p><p><a href="https://www.linkedin.com/in/itaydafna">Itay Dafna</a> devised an effective and simple method to quantify the UI slowness. Specifically, we opened a terminal via JupyterLab and held down a key (e.g., “j”) for 15 seconds while running the user’s notebook. The input to stdin is sent to the backend (i.e., JupyterLab) via a WebSocket, and the output to stdout is sent back from the backend and displayed on the UI. We then exported the <em>.har </em>file recording all communications from the browser and loaded it into a Notebook for analysis.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*ltV3CYtNjLCzolXD"></figure><p>Using this approach, we observed latencies ranging from 1 to 10 seconds, averaging 7.4 seconds.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/704/0*H7KW62J0jZKPTjQH"></figure><h3>Blame The Notebook</h3><p>Now that we have an objective metric for the slowness, let’s officially start our investigation. If you have read the symptom carefully, you must have noticed that the slowness only occurs when the user runs <strong>certain</strong> notebooks but not others.</p><p>Therefore, the first step is scrutinizing the specific Notebook experiencing the issue. Why does the UI always slow down after running this particular Notebook? Naturally, you would think that there must be something wrong with the code running in it.</p><p>Upon closely examining the user’s Notebook, we noticed a library called <em>pystan</em> , which provides Python bindings to a native C++ library called stan, looked suspicious. Specifically, <em>pystan</em> uses <em>asyncio</em>. However, <strong>because there is already an existing <em>asyncio</em> event loop running in the Notebook process and <em>asyncio</em> cannot be nested by design, in order for <em>pystan</em> to work, the authors of <em>pystan</em> </strong><a href="https://pystan.readthedocs.io/en/latest/faq.html#how-can-i-use-pystan-with-jupyter-notebook-or-jupyterlab"><strong>recommend</strong></a><strong> injecting <em>pystan</em> into the existing event loop by using a package called </strong><a href="https://pypi.org/project/nest-asyncio/"><strong><em>nest_asyncio</em></strong></a>, a library that became unmaintained because <a href="https://github.com/erdewit/ib_insync/commit/ef5ea29e44e0c40bbadbc16c2281b3ac58aa4a40">the author unfortunately passed away</a>.</p><p>Given this seemingly hacky usage, we naturally suspected that the events injected by <em>pystan</em> into the event loop were blocking the handling of the WebSocket messages used to communicate with the JupyterLab UI. This reasoning sounds very plausible. However, <strong>the user claimed that there were cases when a Notebook not using <em>pystan</em> runs, the UI also became slow</strong>.</p><p>Moreover, after several rounds of discussion with ChatGPT, we learned more about the architecture and realized that, in theory, <strong>the usage of <em>pystan</em> and <em>nest_asyncio</em> should not cause the slowness in handling the UI WebSocket</strong> for the following reasons:</p><p>Even though <em>pystan</em> uses <em>nest_asyncio</em> to inject itself into the main event loop, <strong>the Notebook runs on a child process (i.e.</strong>,<strong> the <em>ipykernel</em> process) of the <em>jupyter-lab</em> server process</strong>, which means the main event loop being injected by <em>pystan</em> is that of the <em>ipykernel</em> process, not the <em>jupyter-server</em> process. Therefore, even if <em>pystan</em> blocks the event loop, it shouldn’t impact the <em>jupyter-lab</em> main event loop that is used for UI websocket communication. See the diagram below:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/738/0*DsQuZV5qnRXp5mVw"></figure><p>In other words, <strong><em>pystan</em> events are injected to the event loop B in this diagram instead of event loop A</strong>. So, it shouldn’t block the UI WebSocket events.</p><p>You might also think that because event loop A handles both the WebSocket events from the UI and the ZeroMQ socket events from the <em>ipykernel</em> process, a high volume of ZeroMQ events generated by the notebook could block the WebSocket. However, <strong>when we captured packets on the ZeroMQ socket while reproducing the issue, we didn’t observe heavy traffic on this socket that could cause such blocking</strong>.</p><p>A stronger piece of evidence to rule out <em>pystan</em> was that we were ultimately able to reproduce the issue even without it, which I’ll dive into later.</p><h3>Blame Noisy Neighbors</h3><p>The Workbench instance runs as a <a href="https://netflixtechblog.com/titus-the-netflix-container-management-platform-is-now-open-source-f868c9fb5436">Titus container</a>. To efficiently utilize our compute resources, <strong>Titus employs a CPU oversubscription feature</strong>, meaning the combined virtual CPUs allocated to containers exceed the number of available physical CPUs on a Titus agent. <strong>If a container is unfortunate enough to be scheduled alongside other “noisy” containers — those that consume a lot of CPU resources — it could suffer from CPU deficiency.</strong></p><p>However, after examining the CPU utilization of neighboring containers on the same Titus agent as the Workbench instance, as well as the overall CPU utilization of the Titus agent, we quickly ruled out this hypothesis. Using the top command on the Workbench, we observed that when running the Notebook, <strong>the Workbench instance uses only 4 out of the 64 CPUs allocated to it</strong>. Simply put, <strong>this workload is not CPU-bound.</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/892/0*YXsntKLiontnkNhf"></figure><h3>Blame The Network</h3><p>The next theory was that the network between the web browser UI (on the laptop) and the JupyterLab server was slow. To investigate, we <strong>captured all the packets between the laptop and the server</strong> while running the Notebook and continuously pressing ‘j’ in the terminal.</p><p>When the UI experienced delays, we observed a 5-second pause in packet transmission from server port 8888 to the laptop. Meanwhile,<strong> traffic from other ports, such as port 22 for SSH, remained unaffected</strong>. This led us to conclude that the pause was caused by the application running on port 8888 (i.e., the JupyterLab process) rather than the network.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*c660xBwF4XuCA8KN"></figure><h3>The Minimal Reproduction</h3><p>As previously mentioned, another strong piece of evidence proving the innocence of pystan was that we could reproduce the issue without it. By gradually stripping down the “bad” Notebook, we eventually arrived at a minimal snippet of code that reproduces the issue without any third-party dependencies or complex logic:</p><pre>import time<br>import os<br>from multiprocessing import Process<br><br>N = os.cpu_count()<br><br>def launch_worker(worker_id):<br>  time.sleep(60)<br><br>if __name__ == '__main__':<br>  with open('/root/2GB_file', 'r') as file:<br>    data = file.read()<br>    processes = []<br>    for i in range(N):<br>      p = Process(target=launch_worker, args=(i,))<br>      processes.append(p)<br>      p.start()<br> <br>    for p in processes:<br>      p.join()</pre><p>The code does only two things:</p><ol><li>Read a 2GB file into memory (the Workbench instance has 480G memory in total so this memory usage is almost negligible).</li><li>Start N processes where N is the number of CPUs. The N processes do nothing but sleep.</li></ol><p>There is no doubt that this is the most silly piece of code I’ve ever written. It is neither CPU bound nor memory bound. Yet <strong>it can cause the JupyterLab UI to stall for as many as 10 seconds!</strong></p><h3>Questions</h3><p>There are a couple of interesting observations that raise several questions:</p><ul><li>We noticed that <strong>both steps are required in order to reproduce the issue</strong>. If you don’t read the 2GB file (that is not even used!), the issue is not reproducible. <strong>Why using 2GB out of 480GB memory could impact the performance?</strong></li><li><strong>When the UI delay occurs, the <em>jupyter-lab</em> process CPU utilization spikes to 100%</strong>, hinting at contention on the single-threaded event loop in this process (event loop A in the diagram before). <strong>What does the <em>jupyter-lab</em> process need the CPU for, given that it is not the process that runs the Notebook?</strong></li><li>The code runs in a Notebook, which means it runs in the <em>ipykernel</em> process, that is a child process of the <em>jupyter-lab</em> process. <strong>How can anything that happens in a child process cause the parent process to have CPU contention?</strong></li><li>The workbench has 64CPUs. But when we printed <em>os.cpu_count()</em>, the output was 96. That means <strong>the code starts more processes than the number of CPUs</strong>. <strong>Why is that?</strong></li></ul><p>Let’s answer the last question first. In fact, if you run <em>lscpu</em> and <em>nproc</em> commands inside a Titus container, you will also see different results — the former gives you 96, which is the number of physical CPUs on the Titus agent, whereas the latter gives you 64, which is the number of virtual CPUs allocated to the container. This discrepancy is due to the lack of a “CPU namespace” in the Linux kernel, causing the number of physical CPUs to be leaked to the container when calling certain functions to get the CPU count. The assumption here is that Python <strong><em>os.cpu_count()</em> uses the same function as the <em>lscpu</em> command, causing it to get the CPU count of the host instead of the container</strong>. Python 3.13 has <a href="https://docs.python.org/3.13/library/os.html#os.process_cpu_count">a new call that can be used to get the accurate CPU count</a>, but it’s not GA’ed yet.</p><p>It will be proven later that this inaccurate number of CPUs can be a contributing factor to the slowness.</p><h3>More Clues</h3><p>Next, we used <em>py-spy</em> to do a profiling of the <em>jupyter-lab</em> process. Note that we profiled the parent <em>jupyter-lab </em>process, <strong>not</strong> the <em>ipykernel</em> child process that runs the reproduction code. The profiling result is as follows:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*ho2C4015Disa8aFv"></figure><p>As one can see, <strong>a lot of CPU time (89%!!) is spent on a function called <em>__parse_smaps_rollup</em></strong>. In comparison, the terminal handler used only 0.47% CPU time. From the stack trace, we see that <strong>this function is inside the event loop A</strong>,<strong> so it can definitely cause the UI WebSocket events to be delayed</strong>.</p><p>The stack trace also shows that this function is ultimately called by a function used by a Jupyter lab extension called <em>jupyter_resource_usage</em>. <strong>We then disabled this extension and restarted the <em>jupyter-lab</em> process. As you may have guessed, we could no longer reproduce the slowness!</strong></p><p>But our puzzle is not solved yet. Why does this extension cause the UI to slow down? Let’s keep digging.</p><h3>Root Cause Analysis</h3><p>From the name of the extension and the names of the other functions it calls, we can infer that this extension is used to get resources such as CPU and memory usage information. Examining the code, we see that this function call stack is triggered when an API endpoint <em>/metrics/v1</em> is called from the UI. <strong>The UI apparently calls this function periodically</strong>, according to the network traffic tab in Chrome’s Developer Tools.</p><p>Now let’s look at the implementation starting from the call <em>get(jupter_resource_usage/api.py:42)</em> . The full code is <a href="https://github.com/jupyter-server/jupyter-resource-usage/blob/6f15ef91d5c7e50853516b90b5e53b3913d2ed34/jupyter_resource_usage/api.py#L28">here</a> and the key lines are shown below:</p><pre>cur_process = psutil.Process()<br>all_processes = [cur_process] + cur_process.children(recursive=True)<br><br>for p in all_processes:<br>  info = p.memory_full_info()</pre><p>Basically, it gets all children processes of the <em>jupyter-lab</em> process recursively, including both the <em>ipykernel</em> Notebook process and all processes created by the Notebook. Obviously, <strong>the cost of this function is linear to the number of all children processes</strong>. In the reproduction code, we create 96 processes. So here we will have at least 96 (sleep processes) + 1 (<em>ipykernel</em> process) + 1 (<em>jupyter-lab</em> process) = 98 processes when it should actually be 64 (allocated CPUs) + 1 (<em>ipykernel</em> process) + 1 <em>(jupyter-lab</em> process) = 66 processes, because the number of CPUs allocated to the container is, in fact, 64.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/971/0*sHTjycVMUk1yVAsk"></figure><p>This is truly ironic. <strong>The more CPUs we have, the slower we are!</strong></p><p>At this point, we have answered one question: <strong>Why does starting many grandchildren processes in the child process cause the parent process to be slow? </strong>Because the parent process runs a function that’s linear to the number all children process recursively.</p><p>However, this solves only half of the puzzle. If you remember the previous analysis, <strong>starting many child processes ALONE doesn’t reproduce the issue</strong>. If we don’t read the 2GB file, even if we create 2x more processes, we can’t reproduce the slowness.</p><p>So now we must answer the next question: <strong>Why does reading a 2GB file in the child process affect the parent process performance, </strong>especially when the workbench has as much as 480GB memory in total?</p><p>To answer this question, let’s look closely at the function <em>__parse_smaps_rollup</em>. As the name implies, <a href="https://github.com/giampaolo/psutil/blob/c034e6692cf736b5e87d14418a8153bb03f6cf42/psutil/_pslinux.py#L1978">this function</a> parses the file <em>/proc/&#60;pid&#62;/smaps_rollup</em>.</p><pre>def _parse_smaps_rollup(self):<br>  uss = pss = swap = 0<br>  with open_binary("{}/{}/smaps_rollup".format(self._procfs_path, self.pid)) as f:<br>  for line in f:<br>    if line.startswith(b”Private_”):<br>    # Private_Clean, Private_Dirty, Private_Hugetlb<br>      s uss += int(line.split()[1]) * 1024<br>    elif line.startswith(b”Pss:”):<br>      pss = int(line.split()[1]) * 1024<br>    elif line.startswith(b”Swap:”):<br>      swap = int(line.split()[1]) * 1024<br>return (uss, pss, swap)</pre><p>Naturally, you might think that when memory usage increases, this file becomes larger in size, causing the function to take longer to parse. Unfortunately, this is not the answer because:</p><ul><li>First, <a href="https://www.kernel.org/doc/Documentation/ABI/testing/procfs-smaps_rollup"><strong>the number of lines in this file is constant</strong></a><strong> for all processes</strong>.</li><li>Second, <strong>this is a special file in the /proc filesystem, which should be seen as a kernel interface</strong> instead of a regular file on disk. In other words, <strong>I/O operations of this file are handled by the kernel rather than disk</strong>.</li></ul><p>This file was introduced in <a href="https://github.com/torvalds/linux/commit/493b0e9d945fa9dfe96be93ae41b4ca4b6fdb317#diff-cb79e2d6ea6f9627ff68d1342a219f800e04ff6c6fa7b90c7e66bb391b2dd3ee">this commit</a> in 2017, with the purpose of improving the performance of user programs that determine aggregate memory statistics. Let’s first focus on <a href="https://elixir.bootlin.com/linux/v6.5.13/source/fs/proc/task_mmu.c#L1025">the handler of <em>open</em> syscall</a> on this <em>/proc/&#60;pid&#62;/smaps_rollup</em>.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/995/0*vGOD79Tleii7X22B"></figure><p>Following through the <em>single_open</em> <a href="https://elixir.bootlin.com/linux/v6.5.13/source/fs/seq_file.c#L582">function</a>, we will find that it uses the function <em>show_smaps_rollup</em> for the show operation, which can translate to the <em>read</em> system call on the file. Next, we look at the <em>show_smaps_rollup</em> <a href="https://elixir.bootlin.com/linux/v6.5.13/source/fs/proc/task_mmu.c#L916">implementation</a>. You will notice <strong>a do-while loop that is linear to the virtual memory area</strong>.</p><pre>static int show_smaps_rollup(struct seq_file *m, void *v) {<br>  …<br>  vma_start = vma-&#62;vm_start;<br>  do {<br>    smap_gather_stats(vma, &#38;mss, 0);<br>    last_vma_end = vma-&#62;vm_end;<br>    …<br>  } for_each_vma(vmi, vma);<br>  …<br>}</pre><p>This perfectly <strong>explains why the function gets slower when a 2GB file is read into memory</strong>. <strong>Because the handler of reading the <em>smaps_rollup</em> file now takes longer to run the while loop</strong>. Basically, even though <strong><em>smaps_rollup</em></strong> already improved the performance of getting memory information compared to the old method of parsing the <em>/proc/&#60;pid&#62;/smaps</em> file, <strong>it is still linear to the virtual memory used</strong>.</p><h3>More Quantitative Analysis</h3><p>Even though at this point the puzzle is solved, let’s conduct a more quantitative analysis. How much is the time difference when reading the <em>smaps_rollup</em> file with small versus large virtual memory utilization? Let’s write some simple benchmark code like below:</p><pre>import os<br><br>def read_smaps_rollup(pid):<br>  with open("/proc/{}/smaps_rollup".format(pid), "rb") as f:<br>    for line in f:<br>      pass<br><br>if __name__ == “__main__”:<br>  pid = os.getpid()<br>  <br>  read_smaps_rollup(pid)<br><br>  with open(“/root/2G_file”, “rb”) as f:<br>    data = f.read()<br><br>  read_smaps_rollup(pid)</pre><p>This program performs the following steps:</p><ol><li>Reads the <em>smaps_rollup</em> file of the current process.</li><li>Reads a 2GB file into memory.</li><li>Repeats step 1.</li></ol><p>We then use <em>strace</em> to find the accurate time of reading the <em>smaps_rollup</em> file.</p><pre>$ sudo strace -T -e trace=openat,read python3 benchmark.py 2&#62;&#38;1 &#124; grep “smaps_rollup” -A 1<br><br>openat(AT_FDCWD, “/proc/3107492/smaps_rollup”, O_RDONLY&#124;O_CLOEXEC) = 3 &#60;0.000023&#62;<br>read(3, “560b42ed4000–7ffdadcef000 — -p 0”…, 1024) = 670 &#60;0.000259&#62;<br>...<br>openat(AT_FDCWD, “/proc/3107492/smaps_rollup”, O_RDONLY&#124;O_CLOEXEC) = 3 &#60;0.000029&#62;<br>read(3, “560b42ed4000–7ffdadcef000 — -p 0”…, 1024) = 670 &#60;0.027698&#62;</pre><p>As you can see, both times, the read <em>syscall</em> returned 670, meaning the file size remained the same at 670 bytes. However, <strong>the time it took the second time (i.e.</strong>,<strong> 0.027698 seconds) is 100x the time it took the first time (i.e.</strong>,<strong> 0.000259 seconds)</strong>! This means that if there are 98 processes, the time spent on reading this file alone will be 98 * 0.027698 = 2.7 seconds! Such a delay can significantly affect the UI experience.</p><h3>Solution</h3><p>This extension is used to display the CPU and memory usage of the notebook process on the bar at the bottom of the Notebook:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/524/0*bNYMYTc5QQAxLyya"></figure><p>We confirmed with the user that disabling the <em>jupyter-resource-usage</em> extension meets their requirements for UI responsiveness, and that this extension is not critical to their use case. Therefore, we provided a way for them to disable the extension.</p><h3>Summary</h3><p>This was such a challenging issue that required debugging from the UI all the way down to the Linux kernel. It is fascinating that the problem is linear to both the number of CPUs and the virtual memory size — two dimensions that are generally viewed separately.</p><p>Overall, we hope you enjoyed the irony of:</p><ol><li>The extension used to monitor CPU usage causing CPU contention.</li><li>An interesting case where the more CPUs you have, the slower you get!</li></ol><p>If you’re excited by tackling such technical challenges and have the opportunity to solve complex technical challenges and drive innovation, consider joining our <a href="https://explore.jobs.netflix.net/careers?query=Data%20Platform&#38;pid=790298020581&#38;domain=netflix.com&#38;sort_by=relevance">Data Platform team</a>s. Be part of shaping the future of Data Security and Infrastructure, Data Developer Experience, Analytics Infrastructure and Enablement, and more. Explore the impact you can make with us!</p><img src="https://medium.com/_/stat?event=post.clientViewed&#38;referrerSource=full_rss&#38;postId=faa017b4653d" width="1" height="1" alt=""><hr><p><a href="https://netflixtechblog.com/investigation-of-a-workbench-ui-latency-issue-faa017b4653d">Investigation of a Workbench UI Latency Issue</a> was originally published in <a href="https://netflixtechblog.com/">Netflix TechBlog</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></description>
		
		
		<enclosure url="" length="0" type="" />

			</item>
		<item>
		<title>Accelerate your data exploration and experimentation with the AWS Analytics Reference Architecture library</title>
		<link>https://noise.getoto.net/2023/01/05/accelerate-your-data-exploration-and-experimentation-with-the-aws-analytics-reference-architecture-library/</link>
		
		<dc:creator><![CDATA[Lotfi Mouhib]]></dc:creator>
		<pubDate>Thu, 05 Jan 2023 16:20:24 +0000</pubDate>
				<category><![CDATA[Advanced (300)]]></category>
		<category><![CDATA[Amazon EMR]]></category>
		<category><![CDATA[Amazon EMR on EKS]]></category>
		<category><![CDATA[AWS CDK]]></category>
		<category><![CDATA[Best practices]]></category>
		<category><![CDATA[data exploration]]></category>
		<category><![CDATA[data preparation]]></category>
		<category><![CDATA[EDA]]></category>
		<category><![CDATA[EKS]]></category>
		<category><![CDATA[Jupyter]]></category>
		<category><![CDATA[Jupyter Notebook]]></category>
		<category><![CDATA[jupyternotebook]]></category>
		<category><![CDATA[Technical How-to]]></category>
		<guid isPermaLink="false">http://noise.getoto.net/?guid=e3a68cc4d6d88b9a75870839e8fd5e81</guid>

					<description><![CDATA[Organizations use their data to solve complex problems by starting small, running iterative experiments, and refining the solution. Although the power of experiments can’t be ignored, organizations have to be cautious about the cost-effectiveness of such experiments. If time is spent creating the underlying infrastructure for enabling experiments, it further adds to the cost. Developers […]]]></description>
		
		
		<enclosure url="" length="0" type="" />

			</item>
		<item>
		<title>Get started with data integration from Amazon S3 to Amazon Redshift using AWS Glue interactive sessions</title>
		<link>https://noise.getoto.net/2022/11/21/get-started-with-data-integration-from-amazon-s3-to-amazon-redshift-using-aws-glue-interactive-sessions/</link>
		
		<dc:creator><![CDATA[Vikas Omer]]></dc:creator>
		<pubDate>Mon, 21 Nov 2022 20:39:18 +0000</pubDate>
				<category><![CDATA[Amazon Redshift]]></category>
		<category><![CDATA[Amazon Simple Storage Service (S3)]]></category>
		<category><![CDATA[Analytics]]></category>
		<category><![CDATA[AWS Big Data]]></category>
		<category><![CDATA[AWS Glue]]></category>
		<category><![CDATA[Data Integrations]]></category>
		<category><![CDATA[Interactive Job Development]]></category>
		<category><![CDATA[Intermediate (200)]]></category>
		<category><![CDATA[Jupyter Notebook]]></category>
		<category><![CDATA[serverless]]></category>
		<category><![CDATA[Technical How-to]]></category>
		<guid isPermaLink="false">http://noise.getoto.net/?guid=e7036a67c84a9fc392ccd64321936607</guid>

					<description><![CDATA[Organizations are placing a high priority on data integration, especially to support analytics, machine learning (ML), business intelligence (BI), and application development initiatives. Data is growing exponentially and is generated by increasingly diverse data sources. Data integration becomes challenging when processing data at scale and the inherent heavy lifting associated with infrastructure required to manage […]]]></description>
		
		
		<enclosure url="" length="0" type="" />

			</item>
	</channel>
</rss>

<!--
Performance optimized by W3 Total Cache. Learn more: https://www.boldgrid.com/w3-total-cache/

Object Caching 52/99 objects using Memcached
Page Caching using Disk: Enhanced 
Lazy Loading (feed)
Database Caching using Memcached

Served from: noise.getoto.net @ 2025-12-06 02:22:00 by W3 Total Cache
-->