Post Syndicated from Geographics original https://www.youtube.com/watch?v=ntg7yHprX90
Yearly Archives: 2024
world backup day 2024
Post Syndicated from turnoff.us original http://turnoff.us/geek/world-backup-day-2024/

Comic for 2024.03.31 – Jesus On The Cross
Post Syndicated from Explosm.net original https://explosm.net/comics/jesus-on-the-cross
New Cyanide and Happiness Comic
Wrath
Post Syndicated from Oglaf! -- Comics. Often dirty. original https://www.oglaf.com/wrath/
tar.gz
Post Syndicated from turnoff.us original http://turnoff.us/geek/xz-backdoor/

STH Q1 2024 Letter from the Editor New Truck and Our New Project
Post Syndicated from Patrick Kennedy original https://www.servethehome.com/sth-q1-2024-letter-from-the-editor-our-new-project-tesla-cybertruck-axautik/
In our STH Q1 2024 Letter from the Editor, we discuss the new Cybertruck seen in the STH studio, the new analyst firm, and more
The post STH Q1 2024 Letter from the Editor New Truck and Our New Project appeared first on ServeTheHome.
Comic for 2024.03.30 – Moths
Post Syndicated from Explosm.net original https://explosm.net/comics/moths
New Cyanide and Happiness Comic
A few relevant quotes
Post Syndicated from corbet original https://lwn.net/Articles/967420/
I’m on a holiday and only happened to look at my emails and it
seems to be a major mess.
The reality that we are struggling with is that the free software
infrastructure on which much of computing runs is massively and
painfully underfunded by society as a whole, and is almost entirely
dependent on random people maintaining things in their free time
because they find it fun, many of whom are close to burnout. This
is, in many ways, the true root cause of this entire event.
Incredible work from Andres. The attackers made a serious
strategic mistake: they made PostgreSQL slightly slower.
There is no way to discuss this in public without turning a single
malicious entity into 10 000 malicious entities once the
information is widely known.Making sure the impact and mitigations are known before posting
this publicly so that everyone knows what to do before the 10 000
malicious entities start attacking is just common sense.
Again the FOSS world has proven to be vigilant and proactive in
finding bugs and backdoors, IMHO. The level of transparency is
stellar, especially compared to proprietary software
companies. What the FOSS world has accomplished in 24 hours after
detection of the backdoor code in #xz deserves a moment of
humbleness. Instead we have flamewars and armchair experts shouting
that we must change everything NOW. Which would introduce even more
risks. Progress is made iteratively. Learn, adapt, repeat.
Deadphones – Sony’s late lamented 9.1 Surround Sound Headphones
Post Syndicated from Techmoan original https://www.youtube.com/watch?v=unfJahrvtuU
pid 1
Post Syndicated from turnoff.us original http://turnoff.us/geek/pid1/

Седмицата (25–30 март)
Post Syndicated from original https://www.toest.bg/sedmitsata-25-30-mart/
Тайната на грънчаря е, че всеки път започва отначало. Същото се отнася за нашия герой – Хираяма. Всеки ден той започва отначало. Не мисли как го е направил вчера или как ще го направи утре. Винаги го прави на момента. Това е още една от тайните на грънчаря, който придава различна красота на всяко повторение.
Вим Вендерс

В последния уикенд на „София Филм Фест“ все още имате възможност да гледате „Прекрасни дни“ – най-новия филм на режисьора Вим Вендерс, който представлява дълбоко вълнуващ и поетичен разказ за откриването на красота в ежедневието. То може да бъде толкова натоварващо и изпълнено със стрес и тревоги, но винаги крие в себе си и неподозирани късчета красота. Ние се стараем да ви помагаме да ги откриете дори когато изглежда много трудно.
От гледна точка на политическите новини дните от изминалата седмица не са точно прекрасни. Оказва се невъзможно да се намери нов служебен премиер и ситуацията е толкова предизвикателна, а досегашните предложения – нелепи и несполучливи, че дори беше създаден сайт „Министър ли съм?“ с отговор на въпроса попаднали ли сме случайно в списъка с министерски номинации… Много се говори как държавата ще изпадне в конституционна криза. Но Емилия Милчева смята, че това са само внушения, и в своя седмичен анализ „Български политически животни“ разсъждава колко висок е потенциалът на родните политически фигури, повели обществото към шести парламентарни избори за последните три години.
По повод 20-тата годишнина от членството на България в НАТО президентът Румен Радев направи равносметка и обобщи, че всички държави членки са длъжни да обединяват потенциал, усилия и ресурси, за да може да развиват военна наука, високи технологии, Космос и изкуствен интелект. И докато вървят красиви речи за международен мир, сигурност и справедливост, в същото това време МВР е заето с търсене на поводи за увеличаване на полицейското присъствие в София. Въоръжените полицаи, патрулиращи по улиците, карат ли ви да се чувствате в по-голяма безопасност? Светла Енчева си задава въпроса дали тези акции, тип „Въоръжена бдителност по никое време“, са продиктувани от загриженост за гражданите, или са по-скоро бутафорно присъствие с цел печелене на предизборни точки за показна активност.
В опит да отместим фокуса от тези проблеми отново повдигаме завесата над също не особено приятната тема за качеството на медицинската помощ у нас. Според закона, когато пациент отиде в лечебно заведение, той има право да очаква, а медицинският персонал е длъжен да му оказва такава медицинска помощ, която е утвърдена и в медицинската теория, и в практиката. Уви, реалността често се разминава с предписаното в закона. В „Грешка 404: Здравеопазване по неправилно зададени параметри“ Надежда Цекулова повдига важни въпроси в интервю с адв. Мария Шаркова, създателка на уебсайта „Лекарска грешка“, който събира статистика за делата по случаите, наречени в правото „медицински деликти“.
След подробното запознаване с наболели въпроси от седмицата на човек внезапно му се приисква просто тихомълком да излезе от света на новините. И тук на помощ се притичва редовната ни рубрика „От дума на дума“, в която тази седмица получаваме подробно разяснение на всички възможни начини за изсулване, тоест за елегантно измъкване, макар че „Винаги другите са тези, които се измъкват (по терлици)“, както е озаглавила текста си този път Екатерина Петрова.
Почти накрая, но не и по важност, ето и една великолепна инициатива, която не бива да пропускате: най-популярният, заслужил репутацията си през годините езиков справочник „Как се пише?“ обедини усилия с екипа на The Huts Group, за да направят заедно дигитално помагало в помощ на дипломиращите се дванайсетокласници. Под надслов „Животът няма „аУтокорект“ от днес до деня на матурата по български език и литература (17 май) в профила на „Как се пише?“ в Threads ще се публикуват тестови въпроси, съобразени изцяло с формата на държавния зрелостен изпит по БЕЛ. Колко често? Всеки ден. Защото „Как се пише?“ вече не е просто важен въпрос, а неотложен.
Краят на месеца е и няма да ви оставим без стихотворение… дори и то да напомня, че „В ада без изход и ние бяхме…“. Йоан Ес. Поп е един от живите класици на съвременната румънска поезия, а чудесният превод е на Лора Ненковска. За щастие, поезията продължава да съществува, за да ни предлага алтернатива на грубата реалност… или поне различен поглед към нея в опит да я осмислим по-добре.
Зорница Христова отлично разбира тази необходимост и съвсем уместно е избрала да ни представи в рубриката „По буквите“ един от най-големите съвременни английски поети – Филип Ларкин, с последната му стихосбирка „Високи прозорци“. Какво липсва в дните? Нещо, което да не е мираж, нито неумело самозалъгване, нито лъжа…
В седмичните анализи на „Тоест“ май нищо не липсва.
Приятно четене!
Unique Inventec P8000IG6 8x NVIDIA B100 System Shown at GTC 2024
Post Syndicated from Cliff Robinson original https://www.servethehome.com/unique-inventec-p8000ig6-8x-nvidia-b100-system-shown-at-gtc-2024/
We found a really unique NVIDIA HGX design on the NVIDIA GTC 2024 show floor with the Inventec P8000IG6 server
The post Unique Inventec P8000IG6 8x NVIDIA B100 System Shown at GTC 2024 appeared first on ServeTheHome.
Pioneering roboticist & bestselling author on the importance of understanding how things work
Post Syndicated from Talks at Google original https://www.youtube.com/watch?v=SSlCISE4iic
Friday Squid Blogging: The Geopolitics of Eating Squid
Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2024/03/68676.html
New York Times op-ed on the Chinese dominance of the squid industry:
China’s domination in seafood has raised deep concerns among American fishermen, policymakers and human rights activists. They warn that China is expanding its maritime reach in ways that are putting domestic fishermen around the world at a competitive disadvantage, eroding international law governing sea borders and undermining food security, especially in poorer countries that rely heavily on fish for protein. In some parts of the world, frequent illegal incursions by Chinese ships into other nations’ waters are heightening military tensions. American lawmakers are concerned because the United States, locked in a trade war with China, is the world’s largest importer of seafood.
As usual, you can also use this squid post to talk about the security stories in the news that I haven’t covered.
Read my blog posting guidelines here.
Amazon GuardDuty EC2 Runtime Monitoring is now generally available
Post Syndicated from Channy Yun original https://aws.amazon.com/blogs/aws/amazon-guardduty-ec2-runtime-monitoring-is-now-generally-available/
Amazon GuardDuty is a machine learning (ML)-based security monitoring and intelligent threat detection service that analyzes and processes various AWS data sources, continuously monitors your AWS accounts and workloads for malicious activity, and delivers detailed security findings for visibility and remediation.

I love the feature of GuardDuty Runtime Monitoring that analyzes operating system (OS)-level, network, and file events to detect potential runtime threats for specific AWS workloads in your environment. I first introduced the general availability of this feature for Amazon Elastic Kubernetes Service (Amazon EKS) resources in March 2023. Seb wrote about the expansion of the Runtime Monitoring feature to provide threat detection for Amazon Elastic Container Service (Amazon ECS) and AWS Fargate as well as the preview for Amazon Elastic Compute Cloud (Amazon EC2) workloads in Nov 2023.
Today, we are announcing the general availability of Amazon GuardDuty EC2 Runtime Monitoring to expand threat detection coverage for EC2 instances at runtime and complement the anomaly detection that GuardDuty already provides by continuously monitoring VPC Flow Logs, DNS query logs, and AWS CloudTrail management events. You now have visibility into on-host, OS-level activities and container-level context into detected threats.
With GuardDuty EC2 Runtime Monitoring, you can identify and respond to potential threats that might target the compute resources within your EC2 workloads. Threats to EC2 workloads often involve remote code execution that leads to the download and execution of malware. This could include instances or self-managed containers in your AWS environment that are connecting to IP addresses associated with cryptocurrency-related activity or to malware command-and-control related IP addresses.
GuardDuty Runtime Monitoring provides visibility into suspicious commands that involve malicious file downloads and execution across each step, which can help you discover threats during initial compromise and before they become business-impacting events. You can also centrally enable runtime threat detection coverage for accounts and workloads across the organization using AWS Organizations to simplify your security coverage.
Configure EC2 Runtime Monitoring in GuardDuty
With a few clicks, you can enable GuardDuty EC2 Runtime Monitoring in the GuardDuty console. For your first use, you need to enable Runtime Monitoring.

Any customers that are new to the EC2 Runtime Monitoring feature can try it for free for 30 days and gain access to all features and detection findings. The GuardDuty console shows how many days are left in the free trial.
Now, you can set up the GuardDuty security agent for the individual EC2 instances for which you want to monitor the runtime behavior. You can choose to deploy the GuardDuty security agent either automatically or manually. At GA, you can enable Automated agent configuration, which is a preferred option for most customers as it allows GuardDuty to manage the security agent on their behalf.

The agent will be deployed on EC2 instances with AWS Systems Manager and uses an Amazon Virtual Private Cloud (Amazon VPC) endpoint to receive the runtime events associated with your resource. If you want to manage the GuardDuty security agent manually, visit Managing the security agent Amazon EC2 instance manually in the AWS documentation. In multiple-account environments, delegated GuardDuty administrator accounts manage their member accounts using AWS Organizations. For more information, visit Managing multiple accounts in the AWS documentation.
When you enable EC2 Runtime Monitoring, you can find the covered EC2 instances list, account ID, and coverage status, and whether the agent is able to receive runtime events from the corresponding resource in the EC2 instance runtime coverage tab.

Even when the coverage status is Unhealthy, meaning it is not currently able to receive runtime findings, you still have defense in depth for your EC2 instance. GuardDuty continues to provide threat detection to the EC2 instance by monitoring CloudTrail, VPC flow, and DNS logs associated with it.
Check out GuardDuty EC2 Runtime security findings
When GuardDuty detects a potential threat and generates security findings, you can view the details of the healthy information.
Choose Findings in the left pane if you want to find security findings specific to Amazon EC2 resources. You can use the filter bar to filter the findings table by specific criteria, such as a Resource type of Instance. The severity and details of the findings differ based on the resource role, which indicates whether the EC2 resource was the target of suspicious activity or the actor performing the activity.

With today’s launch, we support over 30 runtime security findings for EC2 instances, such as detecting abused domains, backdoors, cryptocurrency-related activity, and unauthorized communications. For the full list, visit Runtime Monitoring finding types in the AWS documentation.
Resolve your EC2 security findings
Choose each EC2 security finding to know more details. You can find all the information associated with the finding and examine the resource in question to determine if it is behaving in an expected manner.

If the activity is authorized, you can use suppression rules or trusted IP lists to prevent false positive notifications for that resource. If the activity is unexpected, the security best practice is to assume the instance has been compromised and take the actions detailed in Remediating a potentially compromised Amazon EC2 instance in the AWS documentation.
You can integrate GuardDuty EC2 Runtime Monitoring with other AWS security services, such as AWS Security Hub or Amazon Detective. Or you can use Amazon EventBridge, allowing you to use integrations with security event management or workflow systems, such as Splunk, Jira, and ServiceNow, or trigger automated and semi-automated responses such as isolating a workload for investigation.
When you choose Investigate with Detective, you can find Detective-created visualizations for AWS resources to quickly and easily investigate security issues. To learn more, visit Integration with Amazon Detective in the AWS documentation.

Things to know
GuardDuty EC2 Runtime Monitoring support is now available for EC2 instances running Amazon Linux 2 or Amazon Linux 2023. You have the option to configure maximum CPU and memory limits for the agent. To learn more and for future updates, visit Prerequisites for Amazon EC2 instance support in the AWS documentation.
To estimate the daily average usage costs for GuardDuty, choose Usage in the left pane. During the 30-day free trial period, you can estimate what your costs will be after the trial period. At the end of the trial period, we charge you per vCPU hours tracked monthly for the monitoring agents. To learn more, visit the Amazon GuardDuty pricing page.

Enabling EC2 Runtime Monitoring also allows for a cost-saving opportunity on your GuardDuty cost. When the feature is enabled, you won’t be charged for GuardDuty foundational protection VPC Flow Logs sourced from the EC2 instances running the security agent. This is due to similar, but more contextual, network data available from the security agent. Additionally, GuardDuty would still process VPC Flow Logs and generate relevant findings so you will continue to get network-level security coverage even if the agent experiences downtime.
Now available
Amazon GuardDuty EC2 Runtime Monitoring is now available in all AWS Regions where GuardDuty is available, excluding AWS GovCloud (US) Regions and AWS China Regions. For a full list of Regions where EC2 Runtime Monitoring is available, visit Region-specific feature availability.
Give GuardDuty EC2 Runtime Monitoring a try in the GuardDuty console. For more information, visit the Amazon GuardDuty User Guide and send feedback to AWS re:Post for Amazon GuardDuty or through your usual AWS support contacts.
— Channy
Metasploit Weekly Wrap-Up 03/29/2024
Post Syndicated from Brendan Watters original https://blog.rapid7.com/2024/03/29/metasploit-weekly-wrap-up-03-29-2024/
PHP code execution and Overshare[point]

Here in the Northern Hemisphere, Spring is in the air: flowers, bees, pollen… a new Metasploit 6.4 release, and now, fresh on the heels of this new release is a bountiful crop of exploits, features, and bug-fixes. Leading the pack is a pair of 2024 PHP code execution vulnerabilities in Artica Proxy and the Bricks Builder WordPress theme, and not to be outshone is a pair of Sharepoint vulnerabilities chained to give unauthenticated code execution as administrator.
New module content (3)
Artica Proxy Unauthenticated PHP Deserialization Vulnerability
Authors: Jaggar Henry of KoreLogic Inc. and h00die-gr3y [email protected]
Type: Exploit
Pull request: #18967 contributed by h00die-gr3y
Path: linux/http/artica_proxy_unauth_rce_cve_2024_2054
AttackerKB reference: CVE-2024-2054
Description: The PR adds a module targeting CVE-2024-2054, a command injection vulnerability in Artica Proxy appliance version 4.50 and 4.40. The exploit allows remote unauthenticated attackers to run arbitrary commands as the www-data user.
Unauthenticated RCE in Bricks Builder Theme
Authors: Calvin Alkan and Valentin Lobstein
Type: Exploit
Pull request: #18891 contributed by Chocapikk
Path: multi/http/wp_bricks_builder_rce
AttackerKB reference: CVE-2024-25600
Description: This PR adds an exploit module that targets a known vulnerability, CVE-2024-25600, in the WordPress Bricks Builder Theme, versions prior to 1.9.6.
Sharepoint Dynamic Proxy Generator Unauth RCE
Authors: Jang and jheysel-r7
Type: Exploit
Pull request: #18721 contributed by jheysel-r7
Path: windows/http/sharepoint_dynamic_proxy_generator_auth_bypass_rce
AttackerKB reference: CVE-2023-24955
Description: This PR adds a module that allows unauthenticated remote code execution as Administrator on Sharepoint 2019 hosts. It performs this by exploiting two vulnerabilities in Sharepoint 2019. First, it uses CVE-2023-29357, an auth bypass patched in June of 2023 to impersonate the Administrator user, then it uses CVE-2023-24955, an RCE patched in May of 2023 to execute commands as Administrator.
Enhancements and features (4)
- #18925 from sjanusz-r7 – Updates RPC API to include Auxiliary and Exploit modules in
session.compatible_modulesresponse. - #18982 from ekalinichev-r7 – Adds RPC methods
session.interactive_readandsession.interactive_writethat support interaction with SQL, SMB, and Meterpreter sessions via RPC API. - #19016 from zgoldman-r7 – Updates the MSSQL modules to support the GUID column type. This also improves error logging.
- #19017 from zgoldman-r7 – Improves the
auxiliary/admin/mssql/mssql_execandauxiliary/admin/mssql/mssql_sqlmodules to have improved error logging.
Bugs fixed (6)
- #18985 from cgranleese-r7 – Fixes store_valid_credential conditional logic for
unix/webapp/wp_admin_shell_uploadmodule. - #18992 from adfoster-r7 – Fixes a crash within the postgres version module.
- #19006 from cgranleese-r7 – This fixes an issue where WMAP plugin module loading was causing failures.
- #19009 from sjanusz-r7 – Updates
modules/exploits/osx/local/persistenceto no longer be marked as a compatible module for Windows targets. - #19012 from zeroSteiner – This fixes an issue that was reported where msfconsole will fail to start if the user’s
/etc/hostsfile contained a host name ending in a.or containing_characters. - #19015 from zeroSteiner – Previously, we fixed an issue where Metasploit would crash while parsing the
hostsfile if it ended in unexpected values like.or_. This fixes the same kind of issue in DNS names that enter the hostnames data through a different path by removing any trailing.so they can be used for DNS resolution.
Documentation added (1)
- #18961 from zgoldman-r7 – This adds documentation for the new SQL and SMB session types.
You can always find more documentation on our docsite at docs.metasploit.com.
Get it
As always, you can update to the latest Metasploit Framework with msfupdate
and you can get more details on the changes since the last blog post from
GitHub:
If you are a git user, you can clone the Metasploit Framework repo (master branch) for the latest.
To install fresh without using git, you can use the open-source-only Nightly Installers or the
commercial edition Metasploit Pro
A backdoor in xz
Post Syndicated from corbet original https://lwn.net/Articles/967180/
Andres Freund has posted a
detailed investigation into a backdoor that was shipped with versions
5.6.0 and 5.6.1 of the xz compression utility. It appears that the
malicious code may be aimed at allowing SSH authentication to be bypassed.
I have not yet analyzed precisely what is being checked for in the
injected code, to allow unauthorized access. Since this is running
in a pre-authentication context, it seems likely to allow some form
of access or other form of remote code execution.
The affected versions are not yet widely shipped, but checking systems for
the bad version would be a good idea.
Nexthink scales to trillions of events per day with Amazon MSK
Post Syndicated from Moe Haidar original https://aws.amazon.com/blogs/big-data/nexthink-scales-to-trillions-of-events-per-day-with-amazon-msk/
Real-time data streaming and event processing present scalability and management challenges. AWS offers a broad selection of managed real-time data streaming services to effortlessly run these workloads at any scale.
In this post, Nexthink shares how Amazon Managed Streaming for Apache Kafka (Amazon MSK) empowered them to achieve massive scale in event processing. Experiencing business hyper-growth, Nexthink migrated to AWS to overcome the scaling limitations of on-premises solutions. With Amazon MSK, Nexthink now seamlessly processes trillions of events per day, reaching over 5 GB per second of aggregated throughput.
In the following sections, Nexthink introduces their product and the need for scalability. They then highlight the challenges of their legacy on-premises application and present their transition to a cloud-centered software as a service (SaaS) architecture powered by Amazon MSK. Finally, Nexthink details the benefits achieved by adopting Amazon MSK.
Nexthink’s need to scale
Nexthink is the leader in digital employee experience (DeX). The company is shaping the future of work by providing IT leaders and C-levels with insights into employees’ daily technology experiences at the device and application level. This allows IT to evolve from reactive problem-solving to proactive optimization.
The Nexthink Infinity platform combines analytics, monitoring, automation, and more to manage the employee digital experience. By collecting device and application events, processing them in real time, and storing them, our platform analyzes data to solve problems and boost experiences for over 15 million employees across five continents.
In just 3 years, Nexthink’s business grew tenfold, and with the introduction of more real-time data our application had to scale from processing 200 MB per second to 5 GB per second and trillions of events daily. To enable this growth, we modernized our application from an on-premises single-tenant monolith to a cloud-based scalable SaaS solution powered by Amazon MSK.
The next sections detail our modernization journey, including the challenges we faced and the benefits we realized with our new cloud-centered, AWS-based architecture.
The on-premises solution and its challenges
Let’s first explore our previous on-premises solution, Nexthink V6, before examining how Amazon MSK addressed its challenges. The following diagram illustrates its architecture.
V6 was made up of two monolithic, single-tenant Java and C++ applications that were tightly coupled. The portal was a backend-for-frontend Java application, and the core engine was an in-house C++ in-memory database application that was also handling device connections, data ingestion, aggregation, and querying. By bundling all these functions together, the engine became difficult to manage and improve.
V6 also lacked scalability. Initially supporting 10,000 devices, some new tenants had over 300,000 devices. We reacted by deploying multiple V6 engines per tenant, increasing complexity and cost, hampering user experience, and delaying time to market. This also led to longer proof of concept and onboarding cycles, which hurt the business.
Furthermore, the absence of a streaming platform like Kafka created dependencies between teams through tight HTTP/gRPC coupling. Additionally, teams couldn’t access real-time events before ingestion into the database, limiting feature development. We also lacked a data buffer, risking potential data loss during outages. Such constraints impeded innovation and increased risks.
In summary, although the V6 system served its initial purpose, reinventing it with cloud-centered technologies became imperative to enhance scalability, reliability, and foster innovation by our engineering and product teams.
Transitioning to a cloud-centered architecture with Amazon MSK
To achieve our modernization goals, after thorough research and iterations, we implemented an event-driven microservices design on Amazon Elastic Kubernetes Service (Amazon EKS), using Kafka on Amazon MSK for distributed event storage and streaming.
Our transition from the v6 on-prem solution to the cloud-centered platform was phased over four iterations:
- Phase 1 – We lifted and shifted from on premises to virtual machines in the cloud, reducing operational complexities and accelerating proof of concept cycles while transparently migrating customers.
- Phase 2 – We extended the cloud architecture by implementing new product features with microservices and self-managed Kafka on Kubernetes. However, operating Kafka clusters ourselves proved overly difficult, leading us to Phase 3.
- Phase 3 – We switched from self-managed Kafka to Amazon MSK, improving stability and reducing operational costs. We realized that managing Kafka wasn’t our core competency or differentiator, and the overhead was high. Amazon MSK enabled us to focus on our core application, freeing us from the burden of undifferentiated Kafka management.
- Phase 4 – Finally, we eliminated all legacy components, completing the transition to a fully cloud-centered SaaS platform. This multi-year journey of learning and transformation took 3 years.
Today, after our successful transition, we use Amazon MSK for two key functions:
- Real-time data ingestion and processing of trillions of daily events from over 15 million devices worldwide, as illustrated in the following figure.
- Enabling an event-driven system that decouples data producers and consumers, as depicted in the following figure.
To further enhance our scalability and resilience, we adopted a cell-based architecture using the wide availability of Amazon MSK across AWS Regions. We currently operate over 10 cells, each representing an independent regional deployment of our SaaS solution. This cell-based approach minimizes the area of impact in case of issues, addresses data residency requirements, and enables horizontal scaling across AWS Regions, as illustrated in the following figure.
Benefits of Amazon MSK
Amazon MSK has been critical in enabling our event-driven design. In this section, we outline the main benefits we gained from its adoption.
Improved data resilience
In our new architecture, data from devices is pushed directly to Kafka topics in Amazon MSK, which provides high availability and resilience. This makes sure that events can be safely received and stored at any time. Our services consuming this data inherit the same resilience from Amazon MSK. If our backend ingestion services face disruptions, no event is lost, because Kafka retains all published messages. When our services resume, they seamlessly continue processing from where they left off, thanks to Kafka’s producer semantics, which allow processing messages exactly-once, at-least-once, or at-most-once based on application needs.
Amazon MSK enables us to tailor the data retention duration to our specific requirements, ranging from seconds to unlimited duration. This flexibility grants uninterrupted data availability to our application, which wasn’t possible with our previous architecture. Furthermore, to safeguard data integrity in the event of processing errors or corruption, Kafka enabled us to implement a data replay mechanism, ensuring data consistency and reliability.
Organizational scaling
By adopting an event-driven architecture with Amazon MSK, we decomposed our monolithic application into loosely coupled, stateless microservices communicating asynchronously via Kafka topics. This approach enabled our engineering organization to scale rapidly from just 4–5 teams in 2019 to over 40 teams and approximately 350 engineers today.
The loose coupling between event publishers and subscribers empowered teams to focus on distinct domains, such as data ingestion, identification services, and data lakes. Teams could develop solutions independently within their domains, communicating through Kafka topics without tight coupling. This architecture accelerated feature development by minimizing the risk of new features impacting existing ones. Teams could efficiently consume events published by others, offering new capabilities more rapidly while reducing cross-team dependencies.
The following figure illustrates the seamless workflow of adding new domains to our system.
Furthermore, the event-driven design allowed teams to build stateless services that could seamlessly auto scale based on MSK metrics like messages per second. This event-driven scalability eliminated the need for extensive capacity planning and manual scaling efforts, freeing up development time.
By using an event-driven microservices architecture on Amazon MSK, we achieved organizational agility, enhanced scalability, and accelerated innovation while minimizing operational overhead.
Seamless infrastructure scaling
Nexthink’s business grew tenfold in 3 years, and many new capabilities were added to the product, leading to a substantial increase in traffic from 200 MB per second to 5 GB per second. This exponential data growth was enabled by the robust scalability of Amazon MSK. Achieving such scale with an on-premises solution would have been challenging and expensive, if not infeasible.
Attempting to self-manage Kafka imposed unnecessary operational overhead without providing business value. Running it with just 5% of today’s traffic was already complex and required two engineers. At today’s volumes, we estimated needing 6–10 dedicated staff, increasing costs and diverting resources away from core priorities.
Real-time capabilities
By channeling all our data through Amazon MSK, we enabled real-time processing of events. This unlocked capabilities like real-time alerts, event-driven triggers, and webhooks that were previously unattainable. As such, Amazon MSK was instrumental in facilitating our event-driven architecture and powering impactful innovations.
Secure data access
Transitioning to our new architecture, we met our security and data integrity goals. With Kafka ACLs, we enforced strict access controls, allowing consumers and producers to only interact with authorized topics. We based these granular data access controls on criteria like data type, domain, and team.
To securely scale decentralized management of topics, we introduced proprietary Kubernetes Custom Resource Definitions (CRDs). These CRDs enabled teams to independently manage their own topics, settings, and ACLs without compromising security.
Amazon MSK encryption made sure that the data remained encrypted at rest and in transit. We also introduced a Bring Your Own Key (BYOK) option, allowing application-level encryption with customer keys for all single-tenant and multi-tenant topics.
Enhanced observability
Amazon MSK gave us great visibility into our data flows. The out-of-the-box Amazon CloudWatch metrics let us see the amount and types of data flowing through each topic and cluster. This helped us quantify the usage of our product features by tracking data volumes at the topic level. The Amazon MSK operational metrics enabled effortless monitoring and right-sizing of clusters and brokers. Overall, the rich observability of Amazon MSK facilitated data-driven decisions about architecture and product features.
Conclusion
Nexthink’s journey from an on-premises monolith to a cloud SaaS was streamlined by using Amazon MSK, a fully managed Kafka service. Amazon MSK allowed us to scale seamlessly while benefiting from enterprise-grade reliability and security. By offloading Kafka management to AWS, we could stay focused on our core business and innovate faster.
Going forward, we plan to further improve performance, costs, and scalability by adopting Amazon MSK capabilities such as tiered storage and AWS Graviton-based EC2 instance types.
We are also working closely with the Amazon MSK team to prepare for upcoming service features. Rapidly adopting new capabilities will help us remain at the forefront of innovation while continuing to grow our business.
To learn more about how Nexthink uses AWS to serve its global customer base, explore the Nexthink on AWS case study. Additionally, discover other customer success stories with Amazon MSK by visiting the Amazon MSK blog category.
About the Authors
2024 Ultimate Robot Vacuum and Mop Comparison || Dreametech, eufy, Roborock, Narwal, and Ecovacs
Post Syndicated from The Hook Up original https://www.youtube.com/watch?v=31hueQqN7Wk
Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics, Part 3: Visualization and trend analysis using Amazon QuickSight
Post Syndicated from Noritaka Sekiyama original https://aws.amazon.com/blogs/big-data/enhance-monitoring-and-debugging-for-aws-glue-jobs-using-new-job-observability-metrics-part-3-visualization-and-trend-analysis-using-amazon-quicksight/
In Part 2 of this series, we discussed how to enable AWS Glue job observability metrics and integrate them with Grafana for real-time monitoring. Grafana provides powerful customizable dashboards to view pipeline health. However, to analyze trends over time, aggregate from different dimensions, and share insights across the organization, a purpose-built business intelligence (BI) tool like Amazon QuickSight may be more effective for your business. QuickSight makes it straightforward for business users to visualize data in interactive dashboards and reports.
In this post, we explore how to connect QuickSight to Amazon CloudWatch metrics and build graphs to uncover trends in AWS Glue job observability metrics. Analyzing historical patterns allows you to optimize performance, identify issues proactively, and improve planning. We walk through ingesting CloudWatch metrics into QuickSight using a CloudWatch metric stream and QuickSight SPICE. With this integration, you can use line charts, bar charts, and other graph types to uncover daily, weekly, and monthly patterns. QuickSight lets you perform aggregate calculations on metrics for deeper analysis. You can slice data by different dimensions like job name, see anomalies, and share reports securely across your organization. With these insights, teams have the visibility to make data integration pipelines more efficient.
Solution overview
The following architecture diagram illustrates the workflow to implement the solution.

The workflow includes the following steps:
- AWS Glue jobs emit observability metrics to CloudWatch metrics.
- CloudWatch streams metric data through a metric stream into Amazon Data Firehose.
- Data Firehose uses an AWS Lambda function to transform data and ingest the transformed records into an Amazon Simple Storage Service (Amazon S3) bucket.
- An AWS Glue crawler scans data on the S3 bucket and populates table metadata on the AWS Glue Data Catalog.
- QuickSight periodically runs Amazon Athena queries to load query results to SPICE and then visualize the latest metric data.
All of the resources are defined in a sample AWS Cloud Development Kit (AWS CDK) template. You can deploy the end-to-end solution to visualize and analyze trends of the observability metrics.
Sample AWS CDK template
This post provides a sample AWS CDK template for a dashboard using AWS Glue observability metrics.
Typically, you have multiple accounts to manage and run resources for your data pipeline.
In this template, we assume the following accounts:
- Monitoring account – This hosts the central S3 bucket, central Data Catalog, and QuickSight-related resources
- Source account – This hosts individual data pipeline resources on AWS Glue and the resources to send metrics to the monitoring account
The template works even when the monitoring account and source account are the same.
This sample template consists of four stacks:
- Amazon S3 stack – This provisions the S3 bucket
- Data Catalog stack – This provisions the AWS Glue database, table, and crawler
- QuickSight stack – This provisions the QuickSight data source, dataset, and analysis
- Metrics sender stack – This provisions the CloudWatch metric stream, Firehose delivery stream, and Lambda function for transformation
Prerequisites
You should have the following prerequisites:
- Python 3.9 or later
- AWS accounts for the monitoring account and source account
- An AWS named profile for the monitoring account and source account
- The AWS CDK Toolkit 2.87.0 or later
Initialize the CDK project
To initialize the project, complete the following steps:
- Clone the cdk template to your workplace:
- Create a Python virtual environment specific to the project on the client machine:
We use a virtual environment in order to isolate the Python environment for this project and not install software globally.
- Activate the virtual environment according to your OS:
- On MacOS and Linux, use the following code:
- On a Windows platform, use the following code:
After this step, the subsequent steps run within the bounds of the virtual environment on the client machine and interact with the AWS account as needed.
- Install the required dependencies described in requirements.txt to the virtual environment:
- Edit the configuration file
default-config.yamlbased on your environments (replace each account ID with your own.
Bootstrap your AWS environments
Run the following commands to bootstrap your AWS environments:
- In the monitoring account, provide your monitoring account number, AWS Region, and monitoring profile:
- In the source account, provide your source account number, Region, and source profile:x
When you use only one account for all environments, you can just run thecdk bootstrapcommand one time.
Deploy your AWS resources
Run the following commands to deploy your AWS resources:
- Run the following command using the monitoring account to deploy resources defined in the AWS CDK template:
- Run the following command using the source account to deploy resources defined in the AWS CDK template:
Configure QuickSight permissions
Initially, the new QuickSight resources including the dataset and analysis created by the AWS CDK template are not visible for you because there are no QuickSight permissions configured yet.
To make the dataset and analysis visible for you, complete the following steps:
- On the QuickSight console, navigate to the user menu and choose Manage QuickSight.
- In the navigation pane, choose Manage assets.
- Under Browse assets, choose Analysis.
- Search for GlueObservabilityAnalysis, and select it.
- Choose SHARE.
- For User or Group, select your user, then choose SHARE (1).
- Wait for the share to be complete, then choose DONE.
- On the Manage assets page, choose Datasets.
- Search for observability_demo.metrics_data, and select it.
- Choose SHARE.
- For User or Group, select your user, then choose SHARE (1).
- Wait for the share to be complete, then choose DONE.
Explore the default QuickSight analysis
Now your QuickSight analysis and dataset are visible to you. You can return to the QuickSight console and choose GlueObservabilityAnalysis under Analysis. The following screenshot shows your dashboard.

The sample analysis has two tabs: Monitoring and Insights. By default, the Monitoring tab has the following charts:
- [Reliability] Job Run Errors Breakdown
- [Reliability] Job Run Errors (Total)
- [Performance] Skewness Job
- [Performance] Skewness Job per Job

- [Resource Utilization] Worker Utilization
- [Resource Utilization] Worker Utilization per Job
- [Throughput] BytesRead, RecordsRead, FilesRead, PartitionRead (Avg)
- [Throughput] BytesWritten, RecordsWritten, FilesWritten (Avg)

- [Resource Utilization Disk Available GB (Min)
- [Resource Utilization Max Disk Used % (Max)

- [Driver OOM] OOM Error Count
- [Driver OOM] Max Heap Memory Used % (Max)
- [Executor OOM] OOM Error Count
- [Executor OOM] Max Heap Memory Used % (Max)

By default, the Insights tab has following insights:
- Bottom Ranked Worker Utilization
- Top Ranked Skewness Job

- Forecast Worker Utilization
- Top Mover readBytes

You can add any new graph charts or insights using the observability metrics based on your requirements.
Publish the QuickSight dashboard
When the analysis is ready, complete the following steps to publish the dashboard:
- Choose PUBLISH.
- Select Publish new dashboard as, and enter
GlueObservabilityDashboard. - Choose Publish dashboard.

Then you can view and share the dashboard.

Visualize and analyze with AWS Glue job observability metrics
Let’s use the dashboard to make AWS Glue usage more performant.
Looking at the Skewness Job per Job visualization, there was spike on November 1, 2023. The skewness metrics of the job multistage-demo showed 9.53, which is significantly higher than others.

Let’s drill down into details. You can choose Controls, and change filter conditions based on date time, Region, AWS account ID, AWS Glue job name, job run ID, and the source and sink of the data stores. For now, let’s filter with the job name multistage-demo.

The filtered Worker Utilization per Job visualization shows 0.5, and its minimum value was 0.16. It seems like that there is a room for improvement in resource utilization. This observation guides you to enable auto scaling for this job to increase the worker utilization.

Clean up
Run the following commands to clean up your AWS resources:
- Run the following command using the monitoring account to clean up resources:
Run the following command using the source account to clean up resources:
Considerations
QuickSight integration is designed for analysis and better flexibility. You can aggregate metrics based on any fields. When dealing with many jobs at once, QuickSight insights help you identify problematic jobs.
QuickSight integration is achieved with more resources in your environments. The monitoring account needs an AWS Glue database, table, crawler, and S3 bucket, and the ability to run Athena queries to visualize metrics in QuickSight. Each source account needs to have one metric stream and one Firehose delivery stream. This can incur additional costs.
All the required resources are templatized in AWS CDK.
Conclusion
In this post, we explored how to visualize and analyze AWS Glue job observability metrics on QuickSight using CloudWatch metric streams and SPICE. By connecting the new observability metrics to interactive QuickSight dashboards, you can uncover daily, weekly, and monthly patterns to optimize AWS Glue job usage. The rich visualization capabilities of QuickSight allow you to analyze trends in metrics like worker utilization, error categories, throughput, and more. Aggregating metrics and slicing data by different dimensions such as job name can provide deeper insights.
The sample dashboard showed metrics over time, top errors, and comparative job analytics. These visualizations and reports can be securely shared with teams across the organization. With data-driven insights on the AWS Glue observability metrics, you can have deeper insights on performance bottlenecks, common errors, and more.
About the Authors
Noritaka Sekiyama is a Principal Big Data Architect on the AWS Glue team. He is responsible for building software artifacts to help customers. In his spare time, he enjoys cycling with his new road bike.
Chuhan Liu is a Software Development Engineer on the AWS Glue team. He is passionate about building scalable distributed systems for big data processing, analytics, and management. In his spare time, he enjoys playing tennis.
XiaoRun Yu is a Software Development Engineer on the AWS Glue team. He is working on building new features for AWS Glue to help customers. Outside of work, Xiaorun enjoys exploring new places in the Bay Area.
Sean Ma is a Principal Product Manager on the AWS Glue team. He has a track record of more than 18 years innovating and delivering enterprise products that unlock the power of data for users. Outside of work, Sean enjoys scuba diving and college football.
Mohit Saxena is a Senior Software Development Manager on the AWS Glue team. His team focuses on building distributed systems to enable customers with interactive and simple to use interfaces to efficiently manage and transform petabytes of data seamlessly across data lakes on Amazon S3, databases and data-warehouses on cloud.








