Патриотична, с червен конец, прозападна. Каква партия строи Иван Гешев?

Post Syndicated from Емилия Милчева original https://www.toest.bg/patriotichna-s-cherven-konets-prozapadna-kakva-partiya-stroi-ivan-geshev/

Патриотична, с червен конец, прозападна. Каква партия строи Иван Гешев?

Какво ще се получи, ако миксираме „Републиканци за България“ и „Има такъв народ“? Нещо консервативно, българо-мъжкарско, евроатлантическо, антипутинско, с три думи: партия на Гешев. Пилона на Рожен с вратовръзка на слончета и червен конец против уроки. И ще се извисим като народ в целия си ръст, както обеща бившият обвинител №1 в реч в YouTube, преди да прекрачи в политиката – вместо да поеме към кабинет във Върховната касационна прокуратура в съседство с бившия главен прокурор Сотир Цацаров и настоящия и.д. главен прокурор Борислав Сарафов.

Всъщност прекрачи от политиката в политиката, защото, по собствените му признания, политиката не е излязла от прокуратурата, а той така и не успя да убеди обществото, че е ефективен магистрат и българската прокуратура е независима и гарант за спазването на законността. За свое оправдание заяви, че „никоя прокуратура не може да се пребори с корупцията и кражбите тогава, когато корупцията и кражбите са превърнати в държавна политика“.

И това от най-недосегаемия човек в държавата, какъвто беше Иван Гешев в тези близо три години и половина от мандата си като главен прокурор, преди това – ръководител на специализираната прокуратура, магистрат, който имаше подръка институция, лостове и закони, за да се бори с мафията. Над него наистина беше само Господ, обаче зад него бяха политиците кукловоди.

Имидж за политическа употреба

Най-новата история на България показва, че преди да стартират политическата си кариера, политиците обикновено имат имиджов трамплин. Катапултът за Бойко Борисов бе легендата за неустрашимия главен секретар на МВР, изградена от медийно облюбване, черно кожено манто и маниери на мутра от Прехода.

За други входът в политиката е бащиното име – Сергей Станишев влезе в БСП като син на секретаря за международните връзки на ЦК на БКП Димитър Станишев. Трети биват изтласкани на политическата сцена от исторически обстоятелства – революции, протести и стачки, в които едни губят живота си, а други печелят огряно от слава и почести място. Има и политици като лидера на „Възраждане“ Костадин Костадинов, които чиракуват в други партии, шлифоват се на избори и ето че идва и техният ред.

Репутацията на Иван Гешев не е особено убедителна нито като борец с мафията, нито като защитник на прокуратурата – „последната крепост на правовия ред и демокрацията“ (из речта му от 24 февруари 2023 г.). Няма осъдени олигарси, няма осъдени за източването и фалита на КТБ – делото, с което се занимаваше две години като спецпрокурор и от което липсва олигархът и най-силният човек в ДПС – Делян Пеевски, а другите дела като „Барселонагейт“, които пряко засягат Бойко Борисов, стояха на трупчета. Свали ги в името на разплата, че го жертва, за да спаси кожата си. Както се констатира в анализ на „Сега“, делата, с които прокуратурата обича да се хвали, или не стигат до съда, или там се провалят. Затова и смелостта, обзела бившия главен прокурор на излизане от прокуратурата, изглежда комична.

За разлика от Борисов, който обикаляше страната, шофирайки джипа си на главен секретар, за да лови престъпници (!), като главен прокурор в последните месеци Гешев пътуваше, за да утешава близки на жертви на пътнотранспортни произшествия, да раздава икони, да беседва с ученици. Най-значимата му кауза беше срещу езика на омразата, за което го подкрепиха еврейски организации. Проведе и срещи с ромски лидери с помощта – за това помогна и Яков Джераси, един от активните представители на еврейската общност в България, съратник и на Сакскобургготски.

Трудно може да се напомпа рейтинг от такива събития. Ако си беше свършил работата като магистрат, те щяха да полират образа на борбения прокурор със състрадателна душа, отзивчив към човешките болки и проблемите на сегрегираните общности. Но извън този контекст пътуванията и срещите му още тогава предизвикаха коментари, че Гешев се готви за политическа кариера.

Място на терена

На терена, на който ще се бори и Гешев, е доста разкаляно. Разделението ляво–дясно не е валидно за българската политика, превзета от либерали vs. консерватори. Консерваторите са ѝ в повече – консерватори путинофили, патриотично-консервативни, консерватори русофоби, консерватори тръмписти… Консерватори се откриват във всяка коалиция и политическа сила в парламента и това е и общото кратно: „Демократи за силна България“ са консерваторите в ПП–ДБ, консерваторите доминират в ГЕРБ, а за „Има такъв народ“ и „Възраждане“ консервативното е част от политическата им тъкан. БСП отдавна е изгубила, ако изобщо е имала каквото и да било ляво, и е надянала същата консервативна маска.

На Иван Гешев най му подхожда като българин, мъж и християнин да застъпи в редиците на социалните консерватори, подкрепящи традиционното семейство, обявявайки се за социална устойчивост и умерен национализъм (не може да се мери с Костадинов, който е историк, а не върви все да говори за Левски като в ученическо есе), с умерен евроскептицизъм (все пак щеше да арестува Путин, ако дойде в България) и поносима доза религиозна екстатичност (сдържано, за да не напомня твърде много на евангелски дискурс за политиката). Според изследване на Gallup International Association (GIA) на религиозните нагласи в 61 държави от март т.г. 53% от българите се определят като религиозни, а 58% твърдят, че вярват в Бог. Данните поставят България малко под средата в листата на държавите по декларирана религиозност, което означава, че и червен конец на китката не пречи.

Позиционирането на Иван Гешев и бъдещата му политическа сила на този терен следва и политическите тенденции в световен мащаб, които показват изместване към консервативното дясно. Изследване на Gallup, представено по CNN, показа, че социалният консерватизъм в САЩ е на най-високото си ниво от десетилетие – предвид новите държавни предложения по проблемите на транссексуалните, абортите, употребата на наркотици и преподаване за пола и сексуалността в училищата. Така че шествия като походите в защита на традиционното семейство в България ще зачестят и нищо чудно Гешев да е в челните редици.

Тепърва ще разбере колко уязвим е като политик. Докато беше (главен) прокурор, институцията го пазеше. Но вече е политик, комуто изпълняващият длъжността главен прокурор е враг.

Заглавно изображение: Скрийншот от речта на Иван Гешев, публикувана в YouTube канала му на 19 юни 2023 г.

Бавното демократично проглеждане

Post Syndicated from Светла Енчева original https://www.toest.bg/bavnoto-demokratichno-proglezhdane/

Бавното демократично проглеждане

Две неотдавнашни събития демонстрираха ужасяващите размери на антиевропейската и антидемократичната пропаганда в България. И двете са свързани с партия „Възраждане“ и станаха на метри едно от друго.

Двете капки, които стигнаха до ръба на чашата

На 10 юни привърженици на партията на Костадин Костадинов провалиха прожекция на белгийския филм „Близо“ в кино „Одеон“ пред безучастния поглед на полицията. Причината? Филмът – носител на десетки международни награди, беше включен в програмата на филмовия фестивал на „София прайд“, поради което протестиращите срещу него го обявиха за „педофилски“.

Същия ден мъже с тениски на „Възраждане“ (може би част от протестиращите пред „Одеон“) тормозиха продавачка в магазинче за крафтбира на „Граф Игнатиев“, защото в него имало знамена на различни държави. Афектирана, на следващия ден собственичката на магазинчето сложила на витрината надпис, че в него не се обслужват привърженици на „Възраждане“. Отмъщението – на 12 юни магазинът осъмна с надпис Jude и шестоъгълна звезда на витрината. Така по времето на Хитлер са маркирани търговските обекти, притежавани от евреи. Режимът на Хитлер е отговорен за смъртта на над шест милиона евреи (освен това и на много хомосексуални, роми, хора с умствени и физически увреждания и всякакви, които не отговарят на идеала за „чиста арийска раса“).

Реакциите и прецедентите

Никоя парламентарно представена партия или коалиция не излезе с официален коментар по отношение на проявата на антисемитизъм в центъра на столицата. Впрочем антисемитизмът и неонацизмът в България масово не се разпознават като проблем. Пример за това са и мудните реакции на Асоциация „Българска книга“ и прокуратурата по отношение на присъствието на щанд на неонацисткото издателство „Еделвайс“ на Панаира на книгата. Нямаше реакции и когато Татяна Дончева от „Левицата“ си позволи да сее антисемитизъм в национален ефир.

ПП и ДБ осъдиха в съвместна декларация акцията пред „Одеон“, ала без да споменават, че тя е провокирана от хомофобия. Тази година обаче за първи път не само от Зеленото движение, а и от „Да, България“ се обявиха в подкрепа на „София прайд“. В декларацията на „Да, България“ правата на ЛГБТИ+ хората се определят като човешки права, а защитата им – като проява на патриотизъм. Групата на „Демократична България“ в Столичния общински съвет пък излезе с позиция в подкрепа на прайда, представена от актьора и общински съветник Веселин Калановски. На самия прайд присъстваха няколко политици от ДБ, както и евродепутатът от БСП Петър Витанов, несъгласен с хомофобската политика на Корнелия Нинова.

Сред прецедентите се нарежда и събитие, инициирано от частно лице. На 15 юни гражданка на име Кристина Пекова организира шествие в защита на ЕС и демокрацията под надслов „Фашизмът не е патриотизъм“. Целта на шествието е да се изрази несъгласие с позициите и действията на президента Румен Радев и „Възраждане“, които имат за цел да ограничат демокрацията, да отдалечат България от ЕС и НАТО, да насаждат омраза, лъжи и дезинформация и да всяват разделение. А конкретният повод е случката пред „Одеон“.

Кое е новото?

Съществената промяна е не толкова във факта, че в ДБ все повече узряват за идеята, че защитата на правата на ЛГБТИ+ хората е важна с оглед на евроатлантическата ориентация на България. Защото ако я карат със същото темпо, до пълното им узряване може да минат още доста години и парламенти, особено ако чакаме и ПП да последват примера им.

Действително важното е, че най-сетне евроатлантическата ориентация и противопоставянето на фашизма бяха поставени на една плоскост. До неотдавна връзката между тези две неща изобщо не беше очевидна. Дори напротив.

На ежегодните протести срещу неонацисткия „Луковмарш“ освен дежурната шепа либерали присъстват анархисти, леви активисти, хора, близки до БСП, и могат да се видят плакати против НАТО. Много от идентифициращите се като десни демократи пък идеализират политическия живот в България преди 1944 г., омаловажават изпращането на над 11 300 евреи от новоприсъединените територии на България на смърт и са готови да водят до безкрай спора „Имало ли е фашизъм в България?“. А той измества същността на проблема, защото в България фашизъм като тоталитарна система може и да е нямало, но страната ни е била съюзник на Хитлерова Германия и е въвела важни елементи от нацисткото законодателство в своето.

Източноевропейски особености и техните последствия

След Втората световна война Западна Европа успява да се изгради като общо и мирно пространство върху базисното съгласие, че фашизмът/нацизмът е нещо лошо, което не трябва да се допуска никога повече. На това се основава съвременната либерална демокрация. Страните от Източния блок пък възприемат фашизма като нещо, с което системите им вече са се преборили, и го слагат в един кюп с капитализма, т.е. с начина на живот в западните страни.

Ето защо след падането на желязната завеса бившите социалистически страни имат проблем с борбата срещу фашизма. Хората в тях са склонни или да идеализират времената отпреди създаването на Източния блок, или да смятат, че само социализмът се е борил с фашизма, защото това, което наричат капитализъм, според тях си е фашизъм.

Затова поставянето на евроатлантическата интеграция и противопоставянето на неонацистките тенденции на една плоскост е повод за надежда, че в България може и да е възможна истинска евроинтеграция, а не такава, която се ограничава само в свободното движение и усвояването на фондове.

Войната на Русия срещу Украйна показа докъде може да стигне една държава, необичаща либералната демокрация, но повече от 70 години „лежаща на лаврите си“, че има водеща роля в преборването на фашизма. Тази война руши самия фундамент на Европа, опитвайки се да я върне до точката табу, до „никога повече“. Защото след Втората световна война Старият континент не е преживял нищо, което да прилича повече на агресията на Хитлер от агресията на Путин.

Затова не е за учудване, че проруският Волен Сидеров пишеше антисемитски книги, а привържениците на „Възраждане“ рисуват шестолъчки по витрините.

Рано е да се каже дали последните реакции в защита на човешките права и демокрацията, провокирани от „Възраждане“ и президента Румен Радев, са инцидентни, или част от цялостна тенденция. Важно е обаче проевропейските партии да не забравят какво стои в основата на европейската идентичност.

Смислена правосъдна реформа може да има едва тогава, когато е налице консенсус, че ценността на човека стои в основата на правото. На който и да е човек, включително ЛГБТИ+ хората, евреите, ромите, македонците, онези, чиито мнения не ни харесват, престъпниците… И дори Гешев и Пеевски.

Multi-tenancy Apache Kafka clusters in Amazon MSK with IAM access control and Kafka Quotas – Part 1

Post Syndicated from Vikas Bajaj original https://aws.amazon.com/blogs/big-data/multi-tenancy-apache-kafka-clusters-in-amazon-msk-with-iam-access-control-and-kafka-quotas-part-1/

With Amazon Managed Streaming for Apache Kafka (Amazon MSK), you can build and run applications that use Apache Kafka to process streaming data. To process streaming data, organizations either use multiple Kafka clusters based on their application groupings, usage scenarios, compliance requirements, and other factors, or a dedicated Kafka cluster for the entire organization. It doesn’t matter what pattern is used, Kafka clusters are typically multi-tenant, allowing multiple producer and consumer applications to consume and produce streaming data simultaneously.

With multi-tenant Kafka clusters, however, one of the challenges is to make sure that data consumer and producer applications don’t overuse cluster resources. There is a possibility that a few poorly behaved applications may overuse cluster resources, affecting the well-behaved applications as a result. Therefore, teams who manage multi-tenant Kafka clusters need a mechanism to prevent applications from overconsuming cluster resources in order to avoid issues. This is where Kafka quotas come into play. Kafka quotas control the amount of resources client applications can use within a Kafka cluster.

In Part 1 of this two-part series, we explain the concepts of how to enforce Kafka quotas in MSK multi-tenant Kafka clusters while using AWS Identity and Access Management (IAM) access control for authentication and authorization. In Part 2, we cover detailed implementation steps along with sample Kafka client applications.

Brief introduction to Kafka quotas

Kafka quotas control the amount of resources client applications can use within a Kafka cluster. It’s possible for the multi-tenant Kafka cluster to experience performance degradation or a complete outage due to resource constraints if one or more client applications produce or consume large volumes of data or generate requests at a very high rate for a continuous period of time, monopolizing Kafka cluster’s resources.

To prevent applications from overwhelming the cluster, Apache Kafka allows configuring quotas that determine how much traffic each client application produces and consumes per Kafka broker in a cluster. Kafka brokers throttle the client applications’ requests in accordance with their allocated quotas. Kafka quotas can be configured for specific users, or specific client IDs, or both. The client ID is a logical name defined in the application code that Kafka brokers use to identify which application sent messages. The user represents the authenticated user principal of a client application in a secure Kafka cluster with authentication enabled.

There are two types of quotas supported in Kafka:

  • Network bandwidth quotas – The byte-rate thresholds define how much data client applications can produce to and consume from each individual broker in a Kafka cluster measured in bytes per second.
  • Request rate quotas – This limits the percentage of time each individual broker spends processing client applications requests.

Depending on the business requirements, you can use either of these quota configurations. However, the use of network bandwidth quotas is common because it allows organizations to cap platform resources consumption according to the amount of data produced and consumed by applications per second.

Because this post uses an MSK cluster with IAM access control, we specifically discuss configuring network bandwidth quotas based on the applications’ client IDs and authenticated user principals.

Considerations for Kafka quotas

Keep the following in mind when working with Kafka quotas:

  • Enforcement level – Quotas are enforced at the broker level rather than at the cluster level. Suppose there are six brokers in a Kafka cluster and you specify a 12 MB/sec produce quota for a client ID and user. The producer application using the client ID and user can produce a max of 12MB/sec on each broker at the same time, for a total of max 72 MB/sec across all six brokers. However, if leadership for every partition of a topic resides on one broker, the same producer application can only produce a max of 12 MB/sec. Due to the fact that throttling occurs per broker, it’s essential to maintain an even balance of topics’ partitions leadership across all the brokers.
  • Throttling – When an application reaches its quota, it is throttled, not failed, meaning the broker doesn’t throw an exception. Clients who reach their quota on a broker will begin to have their requests throttled by the broker to prevent exceeding the quota. Instead of sending an error when a client exceeds a quota, the broker attempts to slow it down. Brokers calculate the amount of delay necessary to bring clients under quotas and delay responses accordingly. As a result of this approach, quota violations are transparent to clients, and clients don’t have to implement any special backoff or retry policies. However, when using an asynchronous producer and sending messages at a rate greater than the broker can accept due to quota, the messages will be queued in the client application memory first. The client will eventually run out of buffer space if the rate of sending messages continues to exceed the rate of accepting messages, causing the next Producer.send() call to be blocked. Producer.send() will eventually throw a TimeoutException if the timeout delay isn’t sufficient to allow the broker to catch up to the producer application.
  • Shared quotas – If more than one client application has the same client ID and user, the quota configured for the client ID and user will be shared among all those applications. Suppose you configure a produce quota of 5 MB/sec for the combination of client-id="marketing-producer-client" and user="marketing-app-user". In this case, all producer applications that have marketing-producer-client as a client ID and marketing-app-user as an authenticated user principal will share the 5 MB/sec produce quota, impacting each other’s throughput.
  • Produce throttling – The produce throttling behavior is exposed to producer clients via client metrics such as produce-throttle-time-avg and produce-throttle-time-max. If these are non-zero, it indicates that the destination brokers are slowing the producer down and the quotas configuration should be reviewed.
  • Consume throttling – The consume throttling behavior is exposed to consumer clients via client metrics such as fetch-throttle-time-avg and fetch-throttle-time-max. If these are non-zero, it indicates that the origin brokers are slowing the consumer down and the quotas configuration should be reviewed.

Note that client metrics are metrics exposed by clients connecting to Kafka clusters.

  • Quota configuration – It’s possible to configure Kafka quotas either statically through the Kafka configuration file or dynamically through kafka-config.sh or the Kafka Admin API. The dynamic configuration mechanism is much more convenient and manageable because it allows quotas for the new producer and consumer applications to be configured at any time without having to restart brokers. Even while application clients are producing or consuming data, dynamic configuration changes take effect in real time.
  • Configuration keys – With the kafka-config.sh command-line tool, you can set dynamic consume, produce, and request quotas using the following three configuration keys, respectively: consumer_byte_rate, producer_byte_rate, and request_percentage.

For more information about Kafka quotas, refer to Kafka documentation.

Enforce network bandwidth quotas with IAM access control

Following our understanding of Kafka quotas, let’s look at how to enforce them in an MSK cluster while using IAM access control for authentication and authorization. IAM access control in Amazon MSK eliminates the need for two separate mechanisms for authentication and authorization.

The following figure shows an MSK cluster that is configured to use IAM access control in the demo account. Each producer and consumer application has a quota that determines how much data they can produce or consume in bytes per second. For example, ProducerApp-1 has a produce quota of 1024 bytes/sec, and ConsumerApp-1 and ConsumerApp-2 each have a consume quota of 5120 and 1024 bytes/sec, respectively. It’s important to note that Kafka quotas are set on the Kafka cluster rather than in the client applications.

The preceding figure illustrates how Kafka client applications (ProducerApp-1, ConsumerApp-1, and ConsumerApp-2) access Topic-B in the MSK cluster by assuming write and read IAM roles. The workflow is as follows:

  • P1ProducerApp-1 (via its ProducerApp-1-Role IAM role) assumes the Topic-B-Write-Role IAM role to send messages to Topic-B in the MSK cluster.
  • P2 – With the Topic-B-Write-Role IAM role assumed, ProducerApp-1 begins sending messages to Topic-B.
  • C1ConsumerApp-1 (via its ConsumerApp-1-Role IAM role) and ConsumerApp-2 (via its ConsumerApp-2-Role IAM role) assume the Topic-B-Read-Role IAM role to read messages from Topic-B in the MSK cluster.
  • C2 – With the Topic-B-Read-Role IAM role assumed, ConsumerApp-1 and ConsumerApp-2 start consuming messages from Topic-B.

ConsumerApp-1 and ConsumerApp-2 are two separate consumer applications. They do not belong to the same consumer group.

Configuring client IDs and understanding authenticated user principal

As explained earlier, Kafka quotas can be configured for specific users, specific client IDs, or both. Let’s explore client ID and user concepts and configurations required for Kafka quota allocation.

Client ID

A client ID representing an application’s logical name can be configured within an application’s code. In Java applications, for example, you can set the producer’s and consumer’s client IDs using ProducerConfig.CLIENT_ID_CONFIG and ConsumerConfig.CLIENT_ID_CONFIG configurations, respectively. The following code snippet illustrates how ProducerApp-1 sets the client ID to this-is-me-producerapp-1 using ProducerConfig.CLIENT_ID_CONFIG:

Properties props = new Properties();
props.put(ProducerConfig.CLIENT_ID_CONFIG,"this-is-me-producerapp-1");

User

The user refers to an authenticated user principal of the client application in the Kafka cluster with authentication enabled. As shown in the solution architecture, producer and consumer applications assume the Topic-B-Write-Role and Topic-B-Read-Role IAM roles, respectively, to perform write and read operations on Topic-B. Therefore, their authenticated user principal will look like the following IAM identifier:

arn:aws:sts::<AWS Account Id>:assumed-role/<assumed Role Name>/<role session name>

For more information, refer to IAM identifiers.

The role session name is a string identifier that uniquely identifies a session when IAM principals, federated identities, or applications assume an IAM role. In our case, ProducerApp-1, ConsumerApp-1, and ConsumerApp-2 applications assume an IAM role using the AWS Security Token Service (AWS STS) SDK, and provide a role session name in the AWS STS SDK call. For example, if ProducerApp-1 assumes the Topic-B-Write-Role IAM role and uses this-is-producerapp-1-role-session as its role session name, its authenticated user principal will be as follows:

arn:aws:sts::<AWS Account Id>:assumed-role/Topic-B-Write-Role/this-is-producerapp-1-role-session

The following is an example code snippet from the ProducerApp-1 application using this-is-producerapp-1-role-session as the role session name while assuming the Topic-B-Write-Role IAM role using the AWS STS SDK:

StsClient stsClient = StsClient.builder().region(region).build();
AssumeRoleRequest roleRequest = AssumeRoleRequest.builder()
          .roleArn("<Topic-B-Write-Role ARN>")
          .roleSessionName("this-is-producerapp-1-role-session") //role-session-name string literal
          .build();

Configure network bandwidth (produce and consume) quotas

The following commands configure the produce and consume quotas dynamically for client applications based on their client ID and authenticated user principal in the MSK cluster configured with IAM access control.

The following code configures the produce quota:

kafka-configs.sh --bootstrap-server <MSK cluster bootstrap servers IAM endpoint> \
--command-config config_iam.properties \
--alter --add-config "producer_byte_rate=<number of bytes per second>" \
--entity-type clients --entity-name <ProducerApp client Id> \
--entity-type users --entity-name <ProducerApp user principal>

The producer_byes_rate refers to the number of messages, in bytes, that a producer client identified by client ID and user is allowed to produce to a single broker per second. The option --command-config points to config_iam.properties, which contains the properties required for IAM access control.

The following code configures the consume quota:

kafka-configs.sh --bootstrap-server <MSK cluster bootstrap servers IAM endpoint> \
--command-config config_iam.properties \
--alter --add-config "consumer_byte_rate=<number of bytes per second>" \
--entity-type clients --entity-name <ConsumerApp client Id> \
--entity-type users --entity-name <ConsumerApp user principal>

The consumer_bytes_rate refers to the number of messages, in bytes, that a consumer client identified by client ID and user allowed to consume from a single broker per second.

Let’s look at some example quota configuration commands for ProducerApp-1, ConsumerApp-1, and ConsumerApp-2 client applications:

  • ProducerApp-1 produce quota configuration – Let’s assume ProducerApp-1 has this-is-me-producerapp-1 configured as the client ID in the application code and uses this-is-producerapp-1-role-session as the role session name when assuming the Topic-B-Write-Role IAM role. The following command sets the produce quota for ProducerApp-1 to 1024 bytes per second:
kafka-configs.sh --bootstrap-server <MSK Cluster Bootstrap servers IAM endpoint> \
--command-config config_iam.properties \
--alter --add-config "producer_byte_rate=1024" \
--entity-type clients --entity-name this-is-me-producerapp-1 \
--entity-type users --entity-name arn:aws:sts::<AWS Account Id>:assumed-role/Topic-B-Write-Role/this-is-producerapp-1-role-session
  • ConsumerApp-1 consume quota configuration – Let’s assume ConsumerApp-1 has this-is-me-consumerapp-1 configured as the client ID in the application code and uses this-is-consumerapp-1-role-session as the role session name when assuming the Topic-B-Read-Role IAM role. The following command sets the consume quota for ConsumerApp-1 to 5120 bytes per second:
kafka-configs.sh --bootstrap-server <MSK Cluster Bootstrap servers IAM endpoint> \
--command-config config_iam.properties \
--alter --add-config "consumer_byte_rate=5120" \
--entity-type clients --entity-name this-is-me-consumerapp-1 \
--entity-type users --entity-name arn:aws:sts::<AWS Account Id>:assumed-role/Topic-B-Read-Role/this-is-consumerapp-1-role-session


ConsumerApp-2 consume quota configuration
– Let’s assume ConsumerApp-2 has this-is-me-consumerapp-2 configured as the client ID in the application code and uses this-is-consumerapp-2-role-session as the role session name when assuming the Topic-B-Read-Role IAM role. The following command sets the consume quota for ConsumerApp-2 to 1024 bytes per second per broker:

kafka-configs.sh --bootstrap-server <MSK Cluster Bootstrap servers IAM endpoint> \
--command-config config_iam.properties \
--alter --add-config "consumer_byte_rate=1024" \
--entity-type clients --entity-name this-is-me-consumerapp-2 \
--entity-type users --entity-name arn:aws:sts::<AWS Account Id>:assumed-role/Topic-B-Read-Role/this-is-consumerapp-2-role-session

As a result of the preceding commands, the ProducerApp-1, ConsumerApp-1, and ConsumerApp-2 client applications will be throttled by the MSK cluster using IAM access control if they exceed their assigned produce and consume quotas, respectively.

Implement the solution

Part 2 of this series showcases the step-by-step detailed implementation of Kafka quotas configuration with IAM access control along with the sample producer and consumer client applications.

Conclusion

Kafka quotas offer teams the ability to set limits for producer and consumer applications. With Amazon MSK, Kafka quotas serve two important purposes: eliminating guesswork and preventing issues caused by poorly designed producer or consumer applications by limiting their quota, and allocating operational costs of a central streaming data platform across different cost centers and tenants (application and product teams).

In this post, we learned how to configure network bandwidth quotas within Amazon MSK while using IAM access control. We also covered some sample commands and code snippets to clarify how the client ID and authenticated principal are used when configuring quotas. Although we only demonstrated Kafka quotas using IAM access control, you can also configure them using other Amazon MSK-supported authentication mechanisms.

In Part 2 of this series, we demonstrate how to configure network bandwidth quotas with IAM access control in Amazon MSK and provide you with example producer and consumer applications so that you can see them in action.

Check out the following resources to learn more:


About the Author

Vikas Bajaj is a Senior Manager, Solutions Architects, Financial Services at Amazon Web Services. Having worked with financial services organizations and digital native customers, he advises financial services customers in Australia on technology decisions, architectures, and product roadmaps.

Multi-tenancy Apache Kafka clusters in Amazon MSK with IAM access control and Kafka quotas – Part 2

Post Syndicated from Vikas Bajaj original https://aws.amazon.com/blogs/big-data/multi-tenancy-apache-kafka-clusters-in-amazon-msk-with-iam-access-control-and-kafka-quotas-part-2/

Kafka quotas are integral to multi-tenant Kafka clusters. They prevent Kafka cluster performance from being negatively affected by poorly behaved applications overconsuming cluster resources. Furthermore, they enable the central streaming data platform to be operated as a multi-tenant platform and used by downstream and upstream applications across multiple business lines. Kafka supports two types of quotas: network bandwidth quotas and request rate quotas. Network bandwidth quotas define byte-rate thresholds such as how much data client applications can produce to and consume from each individual broker in a Kafka cluster measured in bytes per second. Request rate quotas limit the percentage of time each individual broker spends processing client applications requests. Depending on your configuration, Kafka quotas can be set for specific users, specific client IDs, or both.

In Part 1 of this two-part series, we discussed the concepts of how to enforce Kafka quotas in Amazon Managed Streaming for Apache Kafka (Amazon MSK) clusters while using AWS Identity and Access Management (IAM) access control.

In this post, we walk you through the step-by-step implementation of setting up Kafka quotas in an MSK cluster while using IAM access control and testing them through sample client applications.

Solution overview

The following figure, which we first introduced in Part 1, illustrates how Kafka client applications (ProducerApp-1, ConsumerApp-1, and ConsumerApp-2) access Topic-B in the MSK cluster by assuming write and read IAM roles. Each producer and consumer client application has a quota that determines how much data they can produce or consume in bytes/second. The ProducerApp-1 quota allows it to produce up to 1024 bytes/second per broker. Similarly, the ConsumerApp-1 and ConsumerApp-2 quotas allow them to consume 5120 and 1024 bytes/second per broker, respectively. The following is a brief explanation of the flow shown in the architecture diagram:

  • P1ProducerApp-1 (via its ProducerApp-1-Role IAM role) assumes the Topic-B-Write-Role IAM role to send messages to Topic-B
  • P2 – With the Topic-B-Write-Role IAM role assumed, ProducerApp-1 begins sending messages to Topic-B
  • C1ConsumerApp-1 (via its ConsumerApp-1-Role IAM role) and ConsumerApp-2 (via its ConsumerApp-2-Role IAM role) assume the Topic-B-Read-Role IAM role to read messages from Topic-B
  • C2 – With the Topic-B-Read-Role IAM role assumed, ConsumerApp-1 and ConsumerApp-2 start consuming messages from Topic-B

Note that this post uses the AWS Command Line Interface (AWS CLI), AWS CloudFormation templates, and the AWS Management Console for provisioning and modifying AWS resources, and resources provisioned will be billed to your AWS account.

The high-level steps are as follows:

  1. Provision an MSK cluster with IAM access control and Amazon Elastic Compute Cloud (Amazon EC2) instances for client applications.
  2. Create Topic-B on the MSK cluster.
  3. Create IAM roles for the client applications to access Topic-B.
  4. Run the producer and consumer applications without setting quotas.
  5. Configure the produce and consume quotas for the client applications.
  6. Rerun the applications after setting the quotas.

Prerequisites

It is recommended that you read Part 1 of this series before continuing. In order to get started, you need the following:

  • An AWS account that will be referred to as the demo account in this post, assuming that its account ID is 1111 1111 1111
  • Permissions to create, delete, and modify AWS resources in the demo account

Provision an MSK cluster with IAM access control and EC2 instances

This step involves provisioning an MSK cluster with IAM access control in a VPC in the demo account. Additionally, we create four EC2 instances to make configuration changes to the MSK cluster and host producer and consumer client applications.

Deploy CloudFormation stack

  1. Clone the GitHub repository to download the CloudFormation template files and sample client applications:
git clone https://github.com/aws-samples/amazon-msk-kafka-quotas.git
  1. On the AWS CloudFormation console, choose Stacks in the navigation pane.
  2. Choose Create stack.
  3. For Prepare template, select Template is ready.
  4. For Template source, select Upload a template file.
  5. Upload the cfn-msk-stack-1.yaml file from amazon-msk-kafka-quotas/cfn-templates directory, then choose Next.
  6. For Stack name, enter MSKStack.
  7. Leave the parameters as default and choose Next.
  8. Scroll to the bottom of the Configure stack options page and choose Next to continue.
  9. Scroll to the bottom of the Review page, select the check box I acknowledge that CloudFormation may create IAM resources, and choose Submit.

It will take approximately 30 minutes for the stack to complete. After the stack has been successfully created, the following resources will be created:

  • A VPC with three private subnets and one public subnet
  • An MSK cluster with three brokers with IAM access control enabled
  • An EC2 instance called MSKAdminInstance for modifying MSK cluster settings as well as creating and modifying AWS resources
  • EC2 instances for ProducerApp-1, ConsumerApp-1, and ConsumerApp-2, one for each client application
  • A separate IAM role for each EC2 instance that hosts the client application, as shown in the architecture diagram
  1. From the stack’s Outputs tab, note the MSKClusterArn value.

Create a topic on the MSK cluster

To create Topic-B on the MSK cluster, complete the following steps:

  1. On the Amazon EC2 console, navigate to your list of running EC2 instances.
  2. Select the MSKAdminInstance EC2 instance and choose Connect.
  3. On the Session Manager tab, choose Connect.
  4. Run the following commands on the new tab that opens in your browser:
sudo su - ec2-user

# Add Kafka binaries to the path
sed -i 's|HOME/bin|HOME/bin:~/kafka/bin|' .bash_profile

# Set your AWS region
aws configure set region <AWS Region>
  1. Set the environment variable to point to the MSK Cluster brokers IAM endpoint:
MSK_CLUSTER_ARN=<Use the value of MSKClusterArn that you noted earlier>
echo "export BOOTSTRAP_BROKERS_IAM=$(aws kafka get-bootstrap-brokers --cluster-arn $MSK_CLUSTER_ARN | jq -r .BootstrapBrokerStringSaslIam)" >> .bash_profile
source .bash_profile
echo $BOOTSTRAP_BROKERS_IAM
  1. Take note of the value of BOOTSTRAP_BROKERS_IAM.
  2. Run the following Kafka CLI command to create Topic-B on the MSK cluster:
kafka-topics.sh --bootstrap-server $BOOTSTRAP_BROKERS_IAM \
--create --topic Topic-B \
--partitions 3 --replication-factor 3 \
--command-config config_iam.properties

Because the MSK cluster is provisioned with IAM access control, the option --command-config points to config_iam.properties, which contains the properties required for IAM access control, created by the MSKStack CloudFormation stack.

The following warnings may appear when you run the Kafka CLI commands, but you may ignore them:

The configuration 'sasl.jaas.config' was supplied but isn't a known config. 
The configuration 'sasl.client.callback.handler.class' was supplied but isn't a known config.
  1. To verify that Topic-B has been created, list all the topics:
kafka-topics.sh --bootstrap-server $BOOTSTRAP_BROKERS_IAM \
--command-config config_iam.properties --list

Create IAM roles for client applications to access Topic-B

This step involves creating Topic-B-Write-Role and Topic-B-Read-Role as shown in the architecture diagram. Topic-B-Write-Role enables write operations on Topic-B, and can be assumed by the ProducerApp-1 . In a similar way, the ConsumerApp-1 and ConsumerApp-2 can assume Topic-B-Read-Role to perform read operations on Topic-B. To perform read operations on Topic-B, ConsumerApp-1 and ConsumerApp-2 must also belong to the consumer groups specified during the MSKStack stack update in the subsequent step.

Create the roles with the following steps:

  1. On the AWS CloudFormation console, choose Stacks in the navigation pane.
  2. Select MSKStack and choose Update.
  3. For Prepare template, select Replace current template.
  4. For Template source, select Upload a template file.
  5. Upload the cfn-msk-stack-2.yaml file from amazon-msk-kafka-quotas/cfn-templates directory, then choose Next.
  6. Provide the following additional stack parameters:
    • For Topic B ARN, enter the Topic-B ARN.

The ARN must be formatted as arn:aws:kafka:region:account-id:topic/msk-cluster-name/msk-cluster-uuid/Topic-B. Use the cluster name and cluster UUID from the MSK cluster ARN you noted earlier and provide your AWS Region. For more information, refer to the IAM access control for Amazon MSK.

    • For ConsumerApp-1 Consumer Group name, enter ConsumerApp-1 consumer group ARN.

It must be formatted as arn:aws:kafka:region:account-id:group/msk-cluster-name/msk-cluster-uuid/consumer-group-name

    • For ConsumerApp-2 Consumer Group name, enter ConsumerApp-2 consumer group ARN.

Use a similar format as the previous ARN.

  1. Choose Next to continue.
  2. Scroll to the bottom of the Configure stack options page and choose Next to continue.
  3. Scroll to the bottom of the Review page, select the check box I acknowledge that CloudFormation may create IAM resources, and choose Update stack.

It will take approximately 3 minutes for the stack to update. After the stack has been successfully updated, the following resources will be created:

  • Topic-B-Write-Role – An IAM role with permission to perform write operations on Topic-B. Its trust policy allows the ProducerApp-1-Role IAM role to assume it.
  • Topic-B-Read-Role – An IAM role with permission to perform read operations on Topic-B. Its trust policy allows the ConsumerApp-1-Role and ConsumerApp-2-Role IAM roles to assume it. Furthermore, ConsumerApp-1 and ConsumerApp-2 must also belong to the consumer groups you specified when updating the stack to perform read operations on Topic-B.
  1. From the stack’s Outputs tab, note the TopicBReadRoleARN and TopicBWriteRoleARN values.

Run the producer and consumer applications without setting quotas

Here, we run ProducerApp-1, ConsumerApp-1, and ConsumerApp-2 without setting their quotas. From the previous steps, you will need BOOTSTRAP_BROKERS_IAM value, Topic-B-Write-Role ARN, and Topic-B-Read-Role ARN. The source code of client applications and their packaged versions are available in the GitHub repository.

Run the ConsumerApp-1 application

To run the ConsumerApp-1 application, complete the following steps:

  1. On the Amazon EC2 console, select the ConsumerApp-1 EC2 instance and choose Connect.
  2. On the Session Manager tab, choose Connect.
  3. Run the following commands on the new tab that opens in your browser:
sudo su - ec2-user

# Set your AWS region
aws configure set region <aws region>

# Set BOOTSTRAP_BROKERS_IAM variable to MSK cluster's IAM endpoint
BOOTSTRAP_BROKERS_IAM=<Use the value of BOOTSTRAP_BROKERS_IAM that you noted earlier> 

echo "export BOOTSTRAP_BROKERS_IAM=$(echo $BOOTSTRAP_BROKERS_IAM)" >> .bash_profile

# Clone GitHub repository containing source code for client applications
git clone https://github.com/aws-samples/amazon-msk-kafka-quotas.git

cd amazon-msk-kafka-quotas/uber-jars/
  1. Run the ConsumerApp-1 application to start consuming messages from Topic-B:
java -jar kafka-consumer.jar --bootstrap-servers $BOOTSTRAP_BROKERS_IAM \
--assume-role-arn <Topic-B-Read-Role-ARN> \
--topic-name <Topic-Name> \
--region <AWS Region> \
--consumer-group <ConsumerApp-1 consumer group name> \
--role-session-name <role session name for ConsumerApp-1 to use during STS assume role call> \
--client-id <ConsumerApp-1 client.id> \
--print-consumer-quota-metrics Y \
--cw-dimension-name <CloudWatch Metrics Dimension Name> \
--cw-dimension-value <CloudWatch Metrics Dimension Value> \
--cw-namespace <CloudWatch Metrics Namespace>

You can find the source code on GitHub for your reference. The command line parameter details are as follows:

  • –bootstrap-servers – MSK cluster bootstrap brokers IAM endpoint.
  • –assume-role-arnTopic-B-Read-Role IAM role ARN. Assuming this role, ConsumerApp-1 will read messages from the topic.
  • –region – Region you’re using.
  • –topic-name – Topic name from which ConsumerApp-1 will read messages. The default is Topic-B.
  • –consumer-group – Consumer group name for ConsumerApp-1, as specified during the stack update.
  • –role-session-name ConsumerApp-1 assumes the Topic-B-Read-Role using the AWS Security Token Service (AWS STS) SDK. ConsumerApp-1 will use this role session name when calling the assumeRole function.
  • –client-id – Client ID for ConsumerApp-1 .
  • –print-consumer-quota-metrics – Flag indicating whether client metrics should be printed on the terminal by ConsumerApp-1.
  • –cw-dimension-nameAmazon CloudWatch dimension name that will be used to publish client throttling metrics from ConsumerApp-1.
  • –cw-dimension-value – CloudWatch dimension value that will be used to publish client throttling metrics from ConsumerApp-1.
  • –cw-namespace – Namespace where ConsumerApp-1 will publish CloudWatch metrics in order to monitor throttling.
  1. If you’re satisfied with the rest of parameters, use the following command and change --assume-role-arn and --region as per your environment:
java -jar kafka-consumer.jar --bootstrap-servers $BOOTSTRAP_BROKERS_IAM \
--assume-role-arn arn:aws:iam::111111111111:role/MSKStack-TopicBReadRole-xxxxxxxxxxx \
--topic-name Topic-B \
--region <AWS Region> \
--consumer-group consumerapp-1-cg \
--role-session-name consumerapp-1-role-session \
--client-id consumerapp-1-client-id \
--print-consumer-quota-metrics Y \
--cw-dimension-name ConsumerApp \
--cw-dimension-value ConsumerApp-1 \
--cw-namespace ConsumerApps

The fetch-throttle-time-avg and fetch-throttle-time-max client metrics should display 0.0, indicating no throttling is occurring for ConsumerApp-1. Remember that we haven’t set the consume quota for ConsumerApp-1 yet. Let it run for a while.

Run the ConsumerApp-2 application

To run the ConsumerApp-2 application, complete the following steps:

  1. On the Amazon EC2 console, select the ConsumerApp-2 EC2 instance and choose Connect.
  2. On the Session Manager tab, choose Connect.
  3. Run the following commands on the new tab that opens in your browser:
sudo su - ec2-user

# Set your AWS region
aws configure set region <aws region>

# Set BOOTSTRAP_BROKERS_IAM variable to MSK cluster's IAM endpoint
BOOTSTRAP_BROKERS_IAM=<Use the value of BOOTSTRAP_BROKERS_IAM that you noted earlier> 

echo "export BOOTSTRAP_BROKERS_IAM=$(echo $BOOTSTRAP_BROKERS_IAM)" >> .bash_profile

# Clone GitHub repository containing source code for client applications
git clone https://github.com/aws-samples/amazon-msk-kafka-quotas.git

cd amazon-msk-kafka-quotas/uber-jars/
  1. Run the ConsumerApp-2 application to start consuming messages from Topic-B:
java -jar kafka-consumer.jar --bootstrap-servers $BOOTSTRAP_BROKERS_IAM \
--assume-role-arn <Topic-B-Read-Role-ARN> \
--topic-name <Topic-Name> \
--region <AWS Region> \
--consumer-group <ConsumerApp-2 consumer group name> \
--role-session-name <role session name for ConsumerApp-2 to use during STS assume role call> \
--client-id <ConsumerApp-2 client.id> \
--print-consumer-quota-metrics Y \
--cw-dimension-name <CloudWatch Metrics Dimension Name> \
--cw-dimension-value <CloudWatch Metrics Dimension Value> \
--cw-namespace <CloudWatch Metrics Namespace>

The code has similar command line parameters details as ConsumerApp-1 discussed previously, except for the following:

  • –consumer-group – Consumer group name for ConsumerApp-2, as specified during the stack update.
  • –role-session-name ConsumerApp-2 assumes the Topic-B-Read-Role using the AWS STS SDK. ConsumerApp-2 will use this role session name when calling the assumeRole function.
  • –client-id – Client ID for ConsumerApp-2 .
  1. If you’re satisfied with the rest of parameters, use the following command and change --assume-role-arn and --region as per your environment:
java -jar kafka-consumer.jar --bootstrap-servers $BOOTSTRAP_BROKERS_IAM \
--assume-role-arn arn:aws:iam::111111111111:role/MSKStack-TopicBReadRole-xxxxxxxxxxx \
--topic-name Topic-B \
--region <AWS Region> \
--consumer-group consumerapp-2-cg \
--role-session-name consumerapp-2-role-session \
--client-id consumerapp-2-client-id \
--print-consumer-quota-metrics Y \
--cw-dimension-name ConsumerApp \
--cw-dimension-value ConsumerApp-2 \
--cw-namespace ConsumerApps

The fetch-throttle-time-avg and fetch-throttle-time-max client metrics should display 0.0, indicating no throttling is occurring for ConsumerApp-2. Remember that we haven’t set the consume quota for ConsumerApp-2 yet. Let it run for a while.

Run the ProducerApp-1 application

To run the ProducerApp-1 application, complete the following steps:

  1. On the Amazon EC2 console, select the ProducerApp-1 EC2 instance and choose Connect.
  2. On the Session Manager tab, choose Connect.
  3. Run the following commands on the new tab that opens in your browser:
sudo su - ec2-user

# Set your AWS region
aws configure set region <aws region>

# Set BOOTSTRAP_BROKERS_IAM variable to MSK cluster's IAM endpoint
BOOTSTRAP_BROKERS_IAM=<Use the value of BOOTSTRAP_BROKERS_IAM that you noted earlier> 

echo "export BOOTSTRAP_BROKERS_IAM=$(echo $BOOTSTRAP_BROKERS_IAM)" >> .bash_profile

# Clone GitHub repository containing source code for client applications
git clone https://github.com/aws-samples/amazon-msk-kafka-quotas.git

cd amazon-msk-kafka-quotas/uber-jars/
  1. Run the ProducerApp-1 application to start sending messages to Topic-B:
java -jar kafka-producer.jar --bootstrap-servers $BOOTSTRAP_BROKERS_IAM \
--assume-role-arn <Topic-B-Write-Role-ARN> \
--topic-name <Topic-Name> \
--region <AWS Region> \
--num-messages <Number of events> \
--role-session-name <role session name for ProducerApp-1 to use during STS assume role call> \
--client-id <ProducerApp-1 client.id> \
--producer-type <Producer Type, options are sync or async> \
--print-producer-quota-metrics Y \
--cw-dimension-name <CloudWatch Metrics Dimension Name> \
--cw-dimension-value <CloudWatch Metrics Dimension Value> \
--cw-namespace <CloudWatch Metrics Namespace>

You can find the source code on GitHub for your reference. The command line parameter details are as follows:

  • –bootstrap-servers – MSK cluster bootstrap brokers IAM endpoint.
  • –assume-role-arnTopic-B-Write-Role IAM role ARN. Assuming this role, ProducerApp-1 will write messages to the topic.
  • –topic-nameProducerApp-1 will send messages to this topic. The default is Topic-B.
  • –region – AWS Region you’re using.
  • –num-messages – Number of messages the ProducerApp-1 application will send to the topic.
  • –role-session-name ProducerApp-1 assumes the Topic-B-Write-Role using the AWS STS SDK. ProducerApp-1 will use this role session name when calling the assumeRole function.
  • –client-id – Client ID of ProducerApp-1 .
  • –producer-typeProducerApp-1can be run either synchronously or asynchronously. Options are sync or async.
  • –print-producer-quota-metrics – Flag indicating whether the client metrics should be printed on the terminal by ProducerApp-1.
  • –cw-dimension-name – CloudWatch dimension name that will be used to publish client throttling metrics from ProducerApp-1.
  • –cw-dimension-value – CloudWatch dimension value that will be used to publish client throttling metrics from ProducerApp-1.
  • –cw-namespace – The namespace where ProducerApp-1 will publish CloudWatch metrics in order to monitor throttling.
  1. If you’re satisfied with the rest of parameters, use the following command and change --assume-role-arn and --region as per your environment. To run a synchronous Kafka producer, it uses the option --producer-type sync:
java -jar kafka-producer.jar --bootstrap-servers $BOOTSTRAP_BROKERS_IAM \
--assume-role-arn arn:aws:iam::111111111111:role/MSKStack-TopicBWriteRole-xxxxxxxxxxxx \
--topic-name Topic-B \
--region <AWS Region> \
--num-messages 10000000 \
--role-session-name producerapp-1-role-session \
--client-id producerapp-1-client-id \
--producer-type sync \
--print-producer-quota-metrics Y \
--cw-dimension-name ProducerApp \
--cw-dimension-value ProducerApp-1 \
--cw-namespace ProducerApps

Alternatively, use --producer-type async to run an asynchronous producer. For more details, refer to Asynchronous send.

The produce-throttle-time-avg and produce-throttle-time-max client metrics should display 0.0, indicating no throttling is occurring for ProducerApp-1. Remember that we haven’t set the produce quota for ProducerApp-1 yet. Check that ConsumerApp-1 and ConsumerApp-2 can consume messages and notice they are not throttled. Stop the consumer and producer client applications by pressing Ctrl+C in their respective browser tabs.

Set produce and consume quotas for client applications

Now that we have run the producer and consumer applications without quotas, we set their quotas and rerun them.

Open the Sessions Manager terminal for the MSKAdminInstance EC2 instance as described earlier and run the following commands to find the default configuration of one of the brokers in the MSK cluster. MSK clusters are provisioned with the default Kafka quotas configuration.

# Describe Broker-1 default configurations
kafka-configs.sh --bootstrap-server $BOOTSTRAP_BROKERS_IAM \
--command-config config_iam.properties \
--entity-type brokers \
--entity-name 1 \
--all --describe > broker1_default_configurations.txt
cat broker1_default_configurations.txt | grep quota.consumer.default
cat broker1_default_configurations.txt | grep quota.producer.default

The following screenshot shows the Broker-1 default values for quota.consumer.default and quota.producer.default.

ProducerApp-1 quota configuration

Replace placeholders in all the commands in this section with values that correspond to your account.

According to the architecture diagram discussed earlier, set the ProducerApp-1 produce quota to 1024 bytes/second. For <ProducerApp-1 Client Id> and <ProducerApp-1 Role Session>, make sure you use the same values that you used while running ProducerApp-1 earlier (producerapp-1-client-id and producerapp-1-role-session, respectively):

kafka-configs.sh --bootstrap-server $BOOTSTRAP_BROKERS_IAM \
--command-config config_iam.properties \
--alter --add-config 'producer_byte_rate=1024' \
--entity-type clients --entity-name <ProducerApp-1 Client Id> \
--entity-type users --entity-name arn:aws:sts::<AWS Account Id>:assumed-role/MSKStack-TopicBWriteRole-xxxxxxxxxxx/<ProducerApp-1 Role Session>

Verify the ProducerApp-1 produce quota using the following command:

kafka-configs.sh --bootstrap-server $BOOTSTRAP_BROKERS_IAM \
--command-config config_iam.properties \
--describe \
--entity-type clients --entity-name <ProducerApp-1 Client Id> \
--entity-type users --entity-name arn:aws:sts::<AWS Account Id>:assumed-role/MSKStack-TopicBWriteRole-xxxxxxxxxxx/<ProducerApp-1 Role Session>

You can remove the ProducerApp-1 produce quota by using the following command, but don’t run the command as we’ll test the quotas next.

kafka-configs.sh --bootstrap-server $BOOTSTRAP_BROKERS_IAM \
--command-config config_iam.properties \
--alter --delete-config producer_byte_rate \
--entity-type clients --entity-name <ProducerApp-1 Client Id> \
--entity-type users --entity-name arn:aws:sts::<AWS Account Id>:assumed-role/MSKStack-TopicBWriteRole-xxxxxxxxxxx/<ProducerApp-1 Role Session>

ConsumerApp-1 quota configuration

Replace placeholders in all the commands in this section with values that correspond to your account.

Let’s set a consume quota of 5120 bytes/second for ConsumerApp-1. For <ConsumerApp-1 Client Id> and <ConsumerApp-1 Role Session>, make sure you use the same values that you used while running ConsumerApp-1 earlier (consumerapp-1-client-id and consumerapp-1-role-session, respectively):

kafka-configs.sh --bootstrap-server $BOOTSTRAP_BROKERS_IAM \
--command-config config_iam.properties \
--alter --add-config 'consumer_byte_rate=5120' \
--entity-type clients --entity-name <ConsumerApp-1 Client Id> \
--entity-type users --entity-name arn:aws:sts::<AWS Account Id>:assumed-role/MSKStack-TopicBReadRole-xxxxxxxxxxx/<ConsumerApp-1 Role Session>

Verify the ConsumerApp-1 consume quota using the following command:

kafka-configs.sh --bootstrap-server $BOOTSTRAP_BROKERS_IAM \
--command-config config_iam.properties \
--describe \
--entity-type clients --entity-name <ConsumerApp-1 Client Id> \
--entity-type users --entity-name arn:aws:sts::<AWS Account Id>:assumed-role/MSKStack-TopicBReadRole-xxxxxxxxxxx/<ConsumerApp-1 Role Session>

You can remove the ConsumerApp-1 consume quota, by using the following command, but don’t run the command as we’ll test the quotas next.

kafka-configs.sh --bootstrap-server $BOOTSTRAP_BROKERS_IAM \
--command-config config_iam.properties \
--alter --delete-config consumer_byte_rate \
--entity-type clients --entity-name <ConsumerApp-1 Client Id> \
--entity-type users --entity-name arn:aws:sts::<AWS Account Id>:assumed-role/MSKStack-TopicBReadRole-xxxxxxxxxxx/<ConsumerApp-1 Role Session>

ConsumerApp-2 quota configuration

Replace placeholders in all the commands in this section with values that correspond to your account.

Let’s set a consume quota of 1024 bytes/second for ConsumerApp-2. For <ConsumerApp-2 Client Id> and <ConsumerApp-2 Role Session>, make sure you use the same values that you used while running ConsumerApp-2 earlier (consumerapp-2-client-id and consumerapp-2-role-session, respectively):

kafka-configs.sh --bootstrap-server $BOOTSTRAP_BROKERS_IAM \
--command-config config_iam.properties \
--alter --add-config 'consumer_byte_rate=1024' \
--entity-type clients --entity-name <ConsumerApp-2 Client Id> \
--entity-type users --entity-name arn:aws:sts::<AWS Account Id>:assumed-role/MSKStack-TopicBReadRole-xxxxxxxxxxx/<ConsumerApp-2 Role Session>

Verify the ConsumerApp-2 consume quota using the following command:

kafka-configs.sh --bootstrap-server $BOOTSTRAP_BROKERS_IAM \
--command-config config_iam.properties \
--describe \
--entity-type clients --entity-name <ConsumerApp-2 Client Id> \
--entity-type users --entity-name arn:aws:sts::<AWS Account Id>:assumed-role/MSKStack-TopicBReadRole-xxxxxxxxxxx/<ConsumerApp-2 Role Session>

As with ConsumerApp-1, you can remove the ConsumerApp-2 consume quota using the same command with ConsumerApp-2 client and user details.

Rerun the producer and consumer applications after setting quotas

Let’s rerun the applications to verify the effect of the quotas.

Rerun ProducerApp-1

Rerun ProducerApp-1 in synchronous mode with the same command that you used earlier. The following screenshot illustrates that when ProducerApp-1 reaches its quota on any of the brokers, the produce-throttle-time-avg and produce-throttle-time-max client metrics value will be above 0.0. A value above 0.0 indicates that ProducerApp-1 is throttled. Allow ProducerApp-1 to run for a few seconds and then stop it by using Ctrl+C.

You can also test the effect of the produce quota by rerunning ProducerApp-1 again in asynchronous mode (--producer-type async). Similar to a synchronous run, the following screenshot illustrates that when ProducerApp-1 reaches its quota on any of the brokers, the produce-throttle-time-avg and produce-throttle-time-max client metrics value will be above 0.0. A value above 0.0 indicates that ProducerApp-1 is throttled. Allow asynchronous ProducerApp-1 to run for a while.

You will eventually see a TimeoutException stating org.apache.kafka.common.errors.TimeoutException: Expiring xxxxx record(s) for Topic-B-2:xxxxxxx ms has passed since batch creation

When using an asynchronous producer and sending messages at a rate greater than the broker can accept due to the quota, the messages will be queued in the client application memory first. The client will eventually run out of buffer space if the rate of sending messages continues to exceed the rate of accepting messages, causing the next Producer.send() call to be blocked. Producer.send() will eventually throw a TimeoutException if the timeout delay is not sufficient to allow the broker to catch up to the producer application. Stop ProducerApp-1 by using Ctrl+C.

Rerun ConsumerApp-1

Rerun ConsumerApp-1 with the same command that you used earlier. The following screenshot illustrates that when ConsumerApp-1 reaches its quota, the fetch-throttle-time-avg and fetch-throttle-time-max client metrics value will be above 0.0. A value above 0.0 indicates that ConsumerApp-1 is throttled.

Allow ConsumerApp-1 to run for a few seconds and then stop it by using Ctrl+C.

Rerun ConsumerApp-2

Rerun ConsumerApp-2 with the same command that you used earlier. Similarly, when ConsumerApp-2 reaches its quota, the fetch-throttle-time-avg and fetch-throttle-time-max client metrics value will be above 0.0. A value above 0.0 indicates that ConsumerApp-2 is throttled. Allow ConsumerApp-2 to run for a few seconds and then stop it by pressing Ctrl+C.

Client quota metrics in Amazon CloudWatch

In Part 1, we explained that client metrics are metrics exposed by clients connecting to Kafka clusters. Let’s examine the client metrics in CloudWatch.

  1. On the CloudWatch console, choose All metrics.
  2. Under Custom Namespaces, choose the namespace you provided while running the client applications.
  3. Choose the dimension name and select produce-throttle-time-max, produce-throttle-time-avg, fetch-throttle-time-max, and fetch-throttle-time-avg metrics for all the applications.

These metrics indicate throttling behavior for ProducerApp-1, ConsumerApp-1, and ConsumerApp-2 applications tested with the quota configurations in the previous section. The following screenshots indicate the throttling of ProducerApp-1, ConsumerApp-1, and ConsumerApp-2 based on network bandwidth quotas. ProducerApp-1, ConsumerApp-1, and ConsumerApp-2 applications feed their respective client metrics to CloudWatch. You can find the source code on GitHub for your reference.

Secure client ID and role session name

We discussed how to configure Kafka quotas using an application’s client ID and authenticated user principal. When a client application assumes an IAM role to access Kafka topics on a MSK cluster with IAM authentication enabled, its authenticated user principal is represented in the following format (for more information, refer to IAM identifiers):

arn:aws:sts::111111111111:assumed-role/Topic-B-Write-Role/producerapp-1-role-session

It contains the role session name (in this case, producerapp-1-role-session) used in the client application while assuming an IAM role through the AWS STS SDK. The client application source code is available for your reference. The client ID is a logical name string (for example, producerapp-1-client-id) that is configured in the application code by the application team. Therefore, an application can impersonate another application if it obtains the client ID and role session name of the other application, and if it has permission to assume the same IAM role.

As shown in the architecture diagram, ConsumerApp-1 and ConsumerApp-2 are two separate client applications with their respective quota allocations. Because both have permission to assume the same IAM role (Topic-B-Read-Role) in the demo account, they are allowed to consume messages from Topic-B. Thus, MSK cluster brokers distinguish them based on their client IDs and users (which contain their respective role session name values). If ConsumerApp-2 somehow obtains the ConsumerApp-1 role session name and client ID, it can impersonate ConsumerApp-1 by specifying the ConsumerApp-1 role session name and client ID in the application code.

Let’s assume ConsumerApp-1 uses consumerapp-1-client-id and consumerapp-1-role-session as its client ID and role session name, respectively. Therefore, ConsumerApp-1's authenticated user principal will appear as follows when it assumes the Topic-B-Read-Role IAM role:

arn:aws:sts::<AWS Account Id>:assumed-role/Topic-B-Read-Role/consumerapp-1-role-session

Similarly, ConsumerApp-2 uses consumerapp-2-client-id and consumerapp-2-role-session as its client ID and role session name, respectively. Therefore, ConsumerApp-2's authenticated user principal will appear as follows when it assumes the Topic-B-Read-Role IAM role:

arn:aws:sts::<AWS Account Id>:assumed-role/Topic-B-Read-Role/consumerapp-2-role-session

If ConsumerApp-2 obtains ConsumerApp-1's client ID and role session name and specifies them in its application code, MSK cluster brokers will treat it as ConsumerApp-1 and view its client ID as consumerapp-1-client-id, and the authenticated user principal as follows:

arn:aws:sts::<AWS Account Id>:assumed-role/Topic-B-Read-Role/consumerapp-1-role-session

This allows ConsumerApp-2 to consume data from the MSK cluster at a maximum rate of 5120 bytes per second rather than 1024 bytes per second as per its original quota allocation. Consequently, ConsumerApp-1's throughput will be negatively impacted if ConsumerApp-2 runs concurrently.

Enhanced architecture

You can introduce AWS Secrets Manager and AWS Key Management Service (AWS KMS) in the architecture to secure applications’ client IDs and role session names. To provide stronger governance, the applications’ client ID and role session name must be stored as encrypted secrets in the Secrets Manager. The IAM resource policies associated with encrypted secrets and a KMS customer managed key (CMK) will allow applications to access and decrypt only their respective client ID and role session name. In this way, applications will not be able to access each other’s client ID and role session name and impersonate one another. The following image shows the enhanced architecture.

The updated flow has the following stages:

  • P1ProducerApp-1 retrieves its client-id and role-session-name secrets from Secrets Manager
  • P2ProducerApp-1 configures the secret client-id as CLIENT_ID_CONFIG in the application code, and assumes Topic-B-Write-Role (via its ProducerApp-1-Role IAM role) by passing the secret role-session-name to the AWS STS SDK assumeRole function call
  • P3 – With the Topic-B-Write-Role IAM role assumed, ProducerApp-1 begins sending messages to Topic-B
  • C1 ConsumerApp-1 and ConsumerApp-2 retrieve their respective client-id and role-session-name secrets from Secrets Manager
  • C2ConsumerApp-1 and ConsumerApp-2 configure their respective secret client-id as CLIENT_ID_CONFIG in their application code, and assume Topic-B-Write-Role (via ConsumerApp-1-Role and ConsumerApp-2-Role IAM roles, respectively) by passing their secret role-session-name in the AWS STS SDK assumeRole function call
  • C3 – With the Topic-B-Read-Role IAM role assumed, ConsumerApp-1 and ConsumerApp-2 start consuming messages from Topic-B

Refer to the documentation for AWS Secrets Manager and AWS KMS to get a better understanding of how they fit into the architecture.

Clean up resources

Navigate to the CloudFormation console and delete the MSKStack stack. All resources created during this post will be deleted.

Conclusion

In this post, we covered detailed steps to configure Amazon MSK quotas and demonstrated their effect through sample client applications. In addition, we discussed how you can use client metrics to determine if a client application is throttled. We also highlighted a potential issue with plaintext client IDs and role session names. We recommend implementing Kafka quotas with Amazon MSK using Secrets Manager and AWS KMS as per the revised architecture diagram to ensure a zero-trust architecture.

If you have feedback or questions about this post, including the revised architecture, we’d be happy to hear from you. We hope you enjoyed reading this post.


About the Author

Vikas Bajaj is a Senior Manager, Solutions Architects, Financial Services at Amazon Web Services. With over two decades of experience in financial services and working with digital-native businesses, he advises customers on product design, technology roadmaps, and application architectures.

Ingest, transform, and deliver events published by Amazon Security Lake to Amazon OpenSearch Service

Post Syndicated from Kevin Fallis original https://aws.amazon.com/blogs/big-data/ingest-transform-and-deliver-events-published-by-amazon-security-lake-to-amazon-opensearch-service/

With the recent introduction of Amazon Security Lake, it has never been simpler to access all your security-related data in one place. Whether it’s findings from AWS Security Hub, DNS query data from Amazon Route 53, network events such as VPC Flow Logs, or third-party integrations provided by partners such as Barracuda Email Protection, Cisco Firepower Management Center, or Okta identity logs, you now have a centralized environment in which you can correlate events and findings using a broad range of tools in the AWS and partner ecosystem.

Security Lake automatically centralizes security data from cloud, on-premises, and custom sources into a purpose-built data lake stored in your account. With Security Lake, you can get a more complete understanding of your security data across your entire organization. You can also improve the protection of your workloads, applications, and data. Security Lake has adopted the Open Cybersecurity Schema Framework (OCSF), an open standard. With OCSF support, the service can normalize and combine security data from AWS and a broad range of enterprise security data sources.

When it comes to near-real-time analysis of data as it arrives in Security Lake and responding to security events your company cares about, Amazon OpenSearch Service provides the necessary tooling to help you make sense of the data found in Security Lake.

OpenSearch Service is a fully managed and scalable log analytics framework that is used by customers to ingest, store, and visualize data. Customers use OpenSearch Service for a diverse set of data workloads, including healthcare data, financial transactions information, application performance data, observability data, and much more. Additionally, customers use the managed service for its ingest performance, scalability, low query latency, and ability to analyze large datasets.

This post shows you how to ingest, transform, and deliver Security Lake data to OpenSearch Service for use by your SecOps teams. We also walk you through how to use a series of prebuilt visualizations to view events across multiple AWS data sources provided by Security Lake.

Understanding the event data found in Security Lake

Security Lake stores the normalized OCSF security events in Apache Parquet format—an optimized columnar data storage format with efficient data compression and enhanced performance to handle complex data in bulk. Parquet format is a foundational format in the Apache Hadoop ecosystem and is integrated into AWS services such as Amazon Redshift Spectrum, AWS Glue, Amazon Athena, and Amazon EMR. It’s a portable columnar format, future proofed to support additional encodings as technology develops, and it has library support across a broad set of languages like Python, Java, and Go. And the best part is that Apache Parquet is open source!

The intent of OCSF is to provide a common language for data scientists and analysts that work with threat detection and investigation. With a diverse set of sources, you can build a complete view of your security posture on AWS using Security Lake and OpenSearch Service.

Understanding the event architecture for Security Lake

Security Lake provides a subscriber framework to provide access to the data stored in Amazon S3. Services such as Amazon Athena and Amazon SageMaker use query access. The solution, in this post, uses data access to respond to events generated by Security Lake.

When you subscribe for data access, events arrive via Amazon Simple Queue Service (Amazon SQS). Each SQS event contains a notification object that has a “pointer” via data used to create a URL to the Parquet object on Amazon S3. Your subscriber processes the event, parses the data found in the object, and transforms it to whatever format makes sense for your implementation.

The solution we provide in this post uses a subscriber for data access. Let’s drill down into what the implementation looks like so that you understand how it works.

Solution overview

The high-level architecture for integrating Security Lake with OpenSearch Service is as follows.

The workflow contains the following steps:

  1. Security Lake persists Parquet formatted data into an S3 bucket as determined by the administrator of Security Lake.
  2. A notification is placed in Amazon SQS that describes the key to get access to the object.
  3. Java code in an AWS Lambda function reads the SQS notification and prepares to read the object described in the notification.
  4. Java code uses Hadoop, Parquet, and Avro libraries to retrieve the object from Amazon S3 and transform the records in the Parquet object into JSON documents for indexing in your OpenSearch Service domain.
  5. The documents are gathered and then sent to your OpenSearch Service domain, where index templates map the structure into a schema optimized for Security Lake logs in OCSF format.

Steps 1–2 are managed by Security Lake; steps 3–5 are managed by the customer. The shaded components are your responsibility. The subscriber implementation for this solution uses Lambda and OpenSearch Service, and these resources are managed by you.

If you are evaluating this as solution for your business, remember that Lambda has a 15-minute maximum execution time at the time of this writing. Security Lake can produce up to 256MB object sizes and this solution may not be effective for your company’s needs at large scale. Various levers in Lambda have impacts on the cost of the solution for log delivery. Make cost conscious decisions when evaluating sample solutions. This implementation using Lambda is suitable for smaller companies where to volume of logs for CloudTrail and VPC flow logs are more suitable for a Lambda based approach where the cost to transform and deliver logs to Amazon OpenSearch Service are more budget friendly.

Now that you have some context, let’s start building the implementation for OpenSearch Service!

Prerequisites

Creation of Security Lake for your AWS accounts is a prerequisite for building this solution. Security Lake integrates with an AWS Organizations account to enable the offering for selected accounts in the organization. For a single AWS account that doesn’t use Organizations, you can enable Security Lake without the need for Organizations. You must have administrative access to perform these operations. For multiple accounts, it’s suggested that you delegate the Security Lake activities to another account in your organization. For more information about enabling Security Lake in your accounts, review Getting started.

Additionally, you may need to take the provided template and adjust it to your specific environment. The sample solution relies on access to a public S3 bucket hosted for this blog so egress rules and permissions modifications may be required if you use S3 endpoints.

This solution assumes that you’re using a domain deployed in a VPC. Additionally, it assumes that you have fine-grained access controls enabled on the domain to prevent unauthorized access to data you store as part of the integration with Security Lake. VPC-deployed domains are privately routable and have no access to the public internet by design. If you want to access your domain in a more public setting, you need to create a NGINX proxy to broker a request between public and private settings.

The remaining sections in this post are focused on how to create the integration with OpenSearch Service.

Create the subscriber

To create your subscriber, complete the following steps:

  1. On the Security Lake console, choose Subscribers in the navigation pane.
  2. Choose Create subscriber.
  3. Under Subscriber details, enter a meaningful name and description.
  4. Under Log and event sources, specify what the subscriber is authorized to ingest. For this post, we select All log and event sources.
  5. For Data access method, select S3.
  6. Under Subscriber credentials, provide the account ID and an external ID for which AWS account you want to provide access.
  7. For Notification details, select SQS queue.
  8. Choose Create when you are finished filling in the form.

It will take a minute or so to initialize the subscriber framework, such as the SQS integration and the permission generated so that you can access the data from another AWS account. When the status changes from Creating to Created, you have access to the subscriber endpoint on Amazon SQS.

  1. Save the following values found in the subscriber Details section:
    1. AWS role ID
    2. External ID
    3. Subscription endpoint

Use AWS CloudFormation to provision Lambda integration between the two services

An AWS CloudFormation template takes care of a large portion of the setup for the integration. It creates the necessary components to read the data from Security Lake, transform it into JSON, and then index it into your OpenSearch Service domain. The template also provides the necessary AWS Identity and Access Management (IAM) roles for integration, the tooling to create an S3 bucket for the Java JAR file used in the solution by Lambda, and a small Amazon Elastic Compute Cloud (Amazon EC2) instance to facilitate the provisioning of templates in your OpenSearch Service domain.

To deploy your resources, complete the following steps:

  1. On the AWS CloudFormation console, create a new stack.
  2. For Prepare template, select Template is ready.
  3. Specify your template source as Amazon S3 URL.

You can either save the template to your local drive or copy the link for use on the AWS CloudFormation console. In this example, we use the template URL that points to a template stored on Amazon S3. You can either use the URL on Amazon S3 or install it from your device.

  1. Choose Next.
  2. Enter a name for your stack. For this post, we name the stack blog-lambda. Start populating your parameters based on the values you copied from Security Lake and OpenSearch Service. Ensure that the endpoint for the OpenSearch domain has a forward slash / at the end of the URL that you copy from OpenSearch Service.
  3. Populate the parameters with values you have saved or copied from OpenSearch Service and Security Lake, then choose Next.
  4. Select Preserve successfully provisioned resources to preserve the resources in case the stack roles back so you can debug the issues.
  5. Scroll to bottom of page and choose Next.
  6. On the summary page, select the check box that acknowledges IAM resources will be created and used in this template.
  7. Choose Submit.

The stack will take a few minutes to deploy.

  1. After the stack has deployed, navigate to the Outputs tab for the stack you created.
  2. Save the CommandProxyInstanceID for executing scripts and save the two role ARNs to use in the role mappings step.

You need to associate the IAM roles for the tooling instance and the Lambda function with OpenSearch Service security roles so that the processes can work with the cluster and the resources within.

Provision role mappings for integrations with OpenSearch Service

With the template-generated IAM roles, you need to map the roles using role mapping to the predefined all_access role in your OpenSearch Service cluster. You should evaluate your specific use of any roles and ensure they are aligned with your company’s requirements.

  1. In OpenSearch Dashboards, choose Security in the navigation pane.
  2. Choose Roles in the navigation pane and look up the all_access role.
  3. On the role details page, on the Mapped users tab, choose Manage mapping.
  4. Add the two IAM roles found in the outputs of the CloudFormation template, then choose Map.

Provision the index templates used for OCSF format in OpenSearch Service

Index templates have been provided as part of the initial setup. These templates are crucial to the format of the data so that ingestion is efficient and tuned for aggregations and visualizations. Data that comes from Security Lake is transformed into a JSON format, and this format is based directly on the OCSF standard.

For example, each OCSF category has a common Base Event class that contains multiple objects that represent details like the cloud provider in a Cloud object, enrichment data using an Enrichment object that has a common structure across events but can have different values based on the event, and even more complex structures that have inner objects, which themselves have more inner objects such as the Metadata object, still part of the Base Event class. The Base Event class is the foundation for all categories in OCSF and helps you with the effort of correlating events written into Security Lake and analyzed in OpenSearch.

OpenSearch is technically schema-less. You don’t have to define a schema up front. The OpenSearch engine will try to guess the data types and the mappings found in the data coming from Security Lake. This is known as dynamic mapping. The OpenSearch engine also provides you with the option to predefine the data you are indexing. This is known as explicit mapping. Using explicit mappings to identifying your data source types and how they are stored at time of ingestion is key to getting high volume ingest performance for time-centric data indexed at heavy load.

In summary, the mapping templates use composable templates. In this construct, the solution establishes an efficient schema for the OCSF standard and gives you the capability to correlate events and specialize on specific categories in the OCSF standard.

You load the templates using the tools proxy created by your CloudFormation template.

  1. On the stack’s Outputs tab, find the parameter CommandProxyInstanceID.

We use that value to find the instance in AWS Systems Manager.

  1. On the Systems Manager console, choose Fleet manager in the navigation pane.
  2. Locate and select your managed node.
  3. On the Node actions menu, choose Start terminal session.
  4. When you’re connected to the instance, run the following commands:
    cd;pwd
    . /usr/share/es-scripts/es-commands.sh | grep -o '{\"acknowledged\":true}' | wc -l

You should see a final result of 42 occurrences of {“acknowledged”:true}, which demonstrates the commands being sent were successful. Ignore the warnings you see for migration. The warnings don’t affect the scripts and as of this writing can’t be muted.

  1. Navigate to Dev Tools in OpenSearch Dashboards and run the following command:
    GET _cat/templates

This confirms that the scripts were successful.

Install index patterns, visualizations, and dashboards for the solution

For this solution, we prepackaged a few visualizations so that you can make sense of your data. Download the visualizations to your local desktop, then complete the following steps:

  1. In OpenSearch Dashboards, navigate to Stack Management and Saved Objects.
  2. Choose Import.
  3. Choose the file from your local device, select your import options, and choose Import.

You will see numerous objects that you imported. You can use the visualizations after you start importing data.

Enable the Lambda function to start processing events into OpenSearch Service

The final step is to go into the configuration of the Lambda function and enable the triggers so that the data can be read from the subscriber framework in Security Lake. The trigger is currently disabled; you need to enable it and save the config. You will notice the function is throttled, which is by design. You need to have templates in the OpenSearch cluster so that the data indexes in the desired format.

  1. On the Lambda console, navigate to your function.
  2. On the Configurations tab, in the Triggers section, select your SQS trigger and choose Edit.
  3. Select Activate trigger and save the setting.
  4. Choose Edit concurrency.
  5. Configure your concurrency and choose Save.

Enable the function by setting the concurrency setting to 1. You can adjust the setting as needed for your environment.

You can review the Amazon CloudWatch logs on the CloudWatch console to confirm the function is working.

You should see startup messages and other event information that indicates logs are being processed. The provided JAR file is set for information level logging and if needed, to debug any concerns, there is a verbose debug version of the JAR file you can use. Your JAR file options are:

If you choose to deploy the debug version, the verbosity of the code will show some error-level details in the Hadoop libraries. To be clear, Hadoop code will display lots of exceptions in debug mode because it tests environment settings and looks for things that aren’t provisioned in your Lambda environment, like a Hadoop metrics collector. Most of these startup errors are not fatal and can be ignored.

Visualize the data

Now that you have data flowing into OpenSearch Service from Security Lake via Lambda, it’s time to put those imported visualizations to work. In OpenSearch Dashboards, navigate to the Dashboards page.

You will see four primary dashboards aligned around the OCSF category for which they support. The four supported visualization categories are for DNS activity, security findings, network activity, and AWS CloudTrail using the Cloud API.

Security findings

The findings dashboard is a series of high-level summary information that you use for visual inspection of AWS Security Hub findings in a time window specified by you in the dashboard filters. Many of the encapsulated visualizations give “filter on click” capabilities so you can narrow your discoveries. The following screenshot shows an example.

The Finding Velocity visualization shows findings over time based on severity. The Finding Severity visualization shows which “findings” have passed or failed, and the Findings table visualization is a tabular view with actual counts. Your goal is to be near zero in all the categories except informational findings.

Network activity

The network traffic dashboard provides an overview for all your accounts in the organization that are enabled for Security Lake. The following example is monitoring 260 AWS accounts, and this dashboard summarizes the top accounts with network activities. Aggregate traffic, top accounts generating traffic and top accounts with the most activity are found in the first section of the visualizations.

Additionally, the top accounts are summarized by allow and deny actions for connections. In the visualization below, there are fields that you can drill down into other visualizations. Some of these visualizations have links to third party website that may or may not be allowed in your company. You can edit the links in the Saved objects in the Stack Management plugin.

For drill downs, you can drill down by choosing the account ID to get a summary by account. The list of egress and ingress traffic within a single AWS account is sorted by the volume of bytes transferred between any given two IP addresses.

Finally, if you choose the IP addresses, you’ll be redirected to Project Honey Pot, where you can see if the IP address is a threat or not.

DNS activity

The DNS activity dashboard shows you the requestors for DNS queries in your AWS accounts. Again, this is a summary view of all the events in a time window.

The first visualization in the dashboard shows DNS activity in aggregate across the top five active accounts. Of the 260 accounts in this example, four are active. The next visualization breaks the resolves down by the requesting service or host, and the final visualization breaks out the requestors by account, VPC ID, and instance ID for those queries run by your solutions.

API Activity

The final dashboard gives an overview of API activity via CloudTrail across all your accounts. It summarizes things like API call velocity, operations by service, top operations, and other summary information.

If we look at the first visualization in the dashboard, you get an idea of which services are receiving the most requests. You sometimes need to understand where to focus the majority of your threat discovery efforts based on which services may be consumed differently over time. Next, there are heat maps that break down API activity by region and service and you get an idea of what type of API calls are most prevalent in your accounts you are monitoring.

As you scroll down on the form, more details present themselves such as top five services with API activity and the top API operations for the organization you are monitoring.

Conclusion

Security Lake integration with OpenSearch Service is easy to achieve by following the steps outlined in this post. Security Lake data is transformed from Parquet to JSON, making it readable and simple to query. Enable your SecOps teams to identify and investigate potential security threats by analyzing Security Lake data in OpenSearch Service. The provided visualizations and dashboards can help to navigate the data, identify trends and rapidly detect any potential security issues in your organization.

As next steps, we recommend to use the above framework and associated templates that provide you with easy steps to visualize your Security Lake data using OpenSearch Service.

In a series of follow-up posts, we will review the source code and walkthrough published examples of the Lambda ingestion framework in the AWS Samples GitHub repo. The framework can be modified for use in containers to help address companies that have longer processing times for large files published in Security Lake. Additionally, we will discuss how to detect and respond to security events using example implementations that use OpenSearch plugins such as Security Analytics, Alerting, and the Anomaly Detection available in Amazon OpenSearch Service.


About the authors

Kevin Fallis (@AWSCodeWarrior) is an Principal AWS Specialist Search Solutions Architect. His passion at AWS is to help customers leverage the correct mix of AWS services to achieve success for their business goals. His after-work activities include family, DIY projects, carpentry, playing drums, and all things music.

Jimish Shah is a Senior Product Manager at AWS with 15+ years of experience bringing products to market in log analytics, cybersecurity, and IP video streaming. He’s passionate about launching products that offer delightful customer experiences, and solve complex customer problems. In his free time, he enjoys exploring cafes, hiking, and taking long walks

Ross Warren is a Senior Product SA at AWS for Amazon Security Lake based in Northern Virginia. Prior to his work at AWS, Ross’ areas of focus included cyber threat hunting and security operations. When he is not talking about AWS he likes to spend time with his family, bake bread, make sawdust and enjoy time outside.

AWS Week in Review – Amazon EC2 Instance Connect Endpoint, Detective, Amazon S3 Dual Layer Encryption, Amazon Verified Permission – June 19, 2023

Post Syndicated from Sébastien Stormacq original https://aws.amazon.com/blogs/aws/aws-week-in-review-amazon-ec2-instance-connect-endpoint-detective-amazon-s3-dual-layer-encryption-amazon-verified-permission-june-19-2023/

This week, I’ll meet you at AWS partner’s Jamf Nation Live in Amsterdam where we’re showing how to use Amazon EC2 Mac to deploy your remote developer workstations or configure your iOS CI/CD pipelines in the cloud.Mac in an instant

Last Week’s Launches
While I was traveling last week, I kept an eye on the AWS News. Here are some launches that got my attention.

Amazon EC2 Instance Connect Endpoint. Endpoint for EC2 Instance Connect allows you to securely access Amazon EC2 instances using their private IP addresses, making the use of bastion hosts obsolete. Endpoint for EC2 Instance Connect is by far my favorite launch from last week. With EC2 Instance Connect, you use AWS Identity and Access Management (IAM) policies and principals to control SSH access to your instances. This removes the need to share and manage SSH keys. We also updated the AWS Command Line Interface (AWS CLI) to allow you to easily connect or open a secured tunnel to an instance using only its instance ID. I read and contributed to a couple of threads on social media where you pointed out that AWS Systems Manager Session Manager already offered similar capabilities. You’re right. But the extra advantage of EC2 Instance Connect Endpoint is that it allows you to use your existing SSH-based tools and libraries, such as the scp command.

Amazon Inspector now supports code scanning of AWS Lambda functions. This expands the existing capability to scan Lambda functions and associated layers for software vulnerabilities in application package dependencies. Amazon Detective also extends finding groups to Amazon Inspector. Detective automatically collects findings from Amazon Inspector, GuardDuty, and other AWS security services, such as AWS Security Hub, to help increase situational awareness of related security events.

Amazon Verified Permissions is generally available. If you’re designing or developing business applications that need to enforce user-based permissions, you have a new option to centrally manage application permissions. Verified Permissions is a fine-grained permissions management and authorization service for your applications that can be used at any scale. Verified Permissions centralizes permissions in a policy store and helps developers use those permissions to authorize user actions within their applications. Similarly to the way an identity provider simplifies authentication, a policy store lets you manage authorization in a consistent and scalable way. Read Danilo’s post to discover the details.

Amazon S3 Dual-Layer Server-Side Encryption with keys stored in AWS Key Management Service (DSSE-KMS). Some heavily regulated industries require double encryption to store some type of data at rest. Amazon Simple Storage Service (Amazon S3) offers DSSE-KMS, a new free encryption option that provides two layers of data encryption, using different keys and different implementation of the 256-bit Advanced Encryption Standard with Galois Counter Mode (AES-GCM) algorithm. My colleague Irshad’s post has all the details.

AWS CloudTrail Lake Dashboards provide out-of-the-box visibility and top insights from your audit and security data directly within the CloudTrail Lake console. CloudTrail Lake features a number of AWS curated dashboards so you can get started right away – with no required detailed dashboard setup or SQL experience.

AWS IAM Identity Center now supports automated user provisioning from Google Workspace. You can now connect your Google Workspace to AWS IAM Identity Center (successor to AWS Single Sign-On) once and manage access to AWS accounts and applications centrally in IAM Identity Center.

AWS CloudShell is now available in 12 additional regions. AWS CloudShell is a browser-based shell that makes it easier to securely manage, explore, and interact with your AWS resources. The list of the 12 new Regions is detailed in the launch announcement.

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

Other AWS News
Here are some other updates and news that you might have missed:

  • AWS Extension for Stable Diffusion WebUI. WebUI is a popular open-source web interface that allows you to easily interact with Stable Diffusion generative AI. We built this extension to help you to migrate existing workloads (such as inference, train, and ckpt merge) from your local or standalone servers to the AWS Cloud.
  • GoDaddy developed a multi-Region, event-driven system. Their system handles 400 millions events per day. They plan to scale it to process 2 billion messages per day in a near future. My colleague Marcia explains the detail of their architecture in her post.
  • The Official AWS Podcast – Listen each week for updates on the latest AWS news and deep dives into exciting use cases. There are also official AWS podcasts in several languages. Check out the podcasts in FrenchGermanItalian, and Spanish.
  • AWS Open Source News and Updates – This is a newsletter curated by my colleague Ricardo to bring you the latest open source projects, posts, events, and more.

Upcoming AWS Events
Check your calendars and sign up for these AWS events:

  • AWS Silicon Innovation Day (June 21) – A one-day virtual event that will allow you to better understand AWS Silicon and how you can use the Amazon EC2 chip offerings to your benefit. My colleague Irshad shared the details in this post. Register today.
  • AWS Global Summits – There are many AWS Summits going on right now around the world: Milano (June 22), Hong Kong (July 20), New York (July 26), Taiwan (Aug 2 & 3), and Sao Paulo (Aug 3).
  • AWS Community Day – Join a community-led conference run by AWS user group leaders in your region: Manila (June 29–30), Chile (July 1), and Munich (September 14).
  • AWS User Group Perú Conf 2023 (September 2023). Some of the AWS News blog writer team will be present: Marcia, Jeff, myself, and our colleague Startup Developer Advocate Mark. Save the date and register today.
  • CDK Day CDK Day is happening again this year on September 29. The call for papers for this event is open, and this year we’re also accepting talks in Spanish. Submit your talk here.

That’s all for this week. Check back next Monday for another Week in Review!

This post is part of our Week in Review series. Check back each week for a quick roundup of interesting news and announcements from AWS!
— seb

СИНДИКЪТ – новият душманин в закона за фалита на физическите лица

Post Syndicated from VassilKendov original http://kendov.com/sindikat/

Понеделник е и по примера на добрия доктор Тупавичаров, аз започвам да давам своите 50 ст’инки обществен данък за финансово образование.

Естествено докато не приключа с темата за новия Закон за Фалита на Физическите лица, няма да мирясам.

Днес ще Ви говоря за СИНДИКА.

По някаква мистериозна причина, законотворците у нас са приели за база англосаксонското законодателство и са решили, че за да обяви едно физическо лице фалит, е нужен синдик.

Забележете!
Синдикът се избира на събрание на кредиторите.

Хубаво ама едно стандартно физическо лице има един дом (несеквистируем ако не е ипотекиран), една къща на село и някоя 15 годишна кола. Може и известен брой идеални части от наследствени ниви. И какво налага тогава назначаването на СИНДИК, вместо да използваме процедурата на ЧСИ? Колко му е да се организира търг за една слеска къща или за някоя идеална част от нива?

Също така да знаете, че СИНДИКЪТ получава заплащане.
Чл. 24. (1) Синдикът има право на еднократно възнаграждение в размер на една минимална работна заплата за извършване на всяко от следните действия:”

И има изброени 4 действия за които синдикът получава по една минимлана и които действия обикновено се извършват стандартно от ЧСИ, ама за по-малко пари от минималната работна.

С две думи при минимална работна заплата в момента 780 лева, синдикът ще вземе най малко 3120 лева.
Освен това е предвидена още една минимална заплата годишно, + 5% от осребреното имущество.

Тоест, ако отговаряте на изискванията на закона, за да имате право на личен фалит, трябва да сте задлъжняли най малко с 10 минимални работни заплати за 6 месеца. Тоест да дължите 7800 лева.

За  срещa с мен моля използвайте посочената форма.

[contact-form-7]

И сега резултата

Ако се придържаме към минималните изисквания на закона за стартиране на процедурата (задлъжнялост с мин 10 минимални заплати – 7800лв) и минималното възнаграждение на синдика – 4 минимални работни заплати (3120 лева), то излиза, че за да съберем дълга, трябва да го увеличим с 40%. Добре ли е? Чувствате ли как справедливостта се настанява трайно в нашата държава?

Сега освен ЧСИ ще създадем още една фигура, която да ви дебне – СИНДИКЪТ! АКо не храните единия, ще храните другия – това е положението.

Решението на проблема

Моето скромно мнение е, че ИЗОБЩО НЯМА НУЖДА ОТ СИНДИК! Само ако знаете как се назначава… Предвидили са да се назначава на общо събрание на кредиторите.

Направо си го представих. Банката, А1, Електрохолд, няколко частни кредитора, ВиК и данъчните се събират на една маса, за всеки длъжник с 2 имота и три идеални части от ниви в Сусурлево поискал процедура за фалит, взимат мъдро решение и избират най-подходящия синдик.

Ами ако толкова много искате да има синдици и да създавате нов душманин на хората, поне да бяхте сложили някаква бариера? Примерно ако данъчните оцеки на цялото му имущество + банковите сметки надхвърля 1 млн., тогава да се търси синдик.

Друг е въпросът, че никое физическо лице няма да иска фалит по тази процедура, защото просто не е практично и изгодно. Но за това следващия път.

А до тогава ще си давам консултации на длъжници, как да си крият доходите и какво да направят преди да напуснат България, че ЧСИ ги съсипват и не виждат светлина в тунела.

The post СИНДИКЪТ – новият душманин в закона за фалита на физическите лица appeared first on Kendov.com.

Optimize queries using dataset parameters in Amazon QuickSight

Post Syndicated from Anwar Ali original https://aws.amazon.com/blogs/big-data/optimize-queries-using-dataset-parameters-in-amazon-quicksight/

Amazon QuickSight powers data-driven organizations with unified business intelligence (BI) at hyperscale. With QuickSight, all users can meet varying analytic needs from the same source of truth through modern interactive dashboards, paginated reports, embedded analytics and natural language queries.

We have introduced dataset parameters, a new kind of parameter in QuickSight that can help you create interactive experiences in your dashboards. In this post, we dive deeper into what dataset parameters are, explain the key differences between dataset and analysis parameters, and discuss different use cases for dataset parameters along their benefits.

Introduction to dataset parameters

Before going deep into dataset parameters, let’s first discuss QuickSight analysis parameters. QuickSight analysis parameters are named variables that can transfer a value for use by an action or an object. Parameters help users create interactive experiences in their dashboards. You can tie parameters with other features in the QuickSight analysis. For example, a dashboard user can reference a parameter value in multiple places, using controls, filters, and actions, and also within calculated fields, narratives, and dynamic titles. Then the visuals in the dashboard react to the user’s selection of parameter value. Parameters can also help connect one dashboard to another, allowing a dashboard user to drill down into data that’s in a different analysis.

Dataset parameters, on the other hand, are defined at the dataset level. With dataset parameters, authors can optimize the experience and load time of dashboards that are connected live to external SQL-based sources. When readers interact with their data, the selection and actions they make in controls, filters, and visuals can be propagated to the data sources via live, custom, parameterized SQL queries. By mapping multiple dataset parameters to analysis parameters, users can create a wide variety of experiences using controls, user actions, parameterized URLs, and calculated fields, as well as dynamic visuals’ titles and insights.

In the following example, dataset owners connected via direct query to a table containing data about taxi rides in New York. They can add a WHERE clause in their custom SQL to filter the dataset based on the end-user’s input of a specific pickup date that will be later provided by the dashboard readers. In the SQL query, the rows are filtered by the date in the dataset parameter <<$pPickupDate>> if it matches the date in the pickupdate column. This way, the dataset size can be significantly smaller for users that are only interested in data for a specific taxi ride date. See the following code:

SELECT *
FROM nytaxidata
WHERE pickupdate = <<$pPickupDate>>

To allow users to provide multiple values in the parameter, you can create a multi-value parameter (for example, pPickupDates), and insert the parameter into an IN phrase as follows:

SELECT *
FROM nytaxidata
WHERE pickupdate in (<<$pPickupDates>>)

Use cases for dataset parameters

In this section, we discuss common use cases using dataset parameters and their benefits.

Optimized custom SQL in direct queries

With dataset parameters, you no longer have to trade-off between the flexibility of using custom SQL logic and the performance of an optimized SQL query. Parameterized datasets can be filtered to a relatively smaller result set when loaded. Authors and readers can benefit from the faster load of analyses and dashboards for the first time using default values, as well as for later queries when data is sliced and diced using filter controls on the dashboard. Also, data owners benefit from their datasets putting less load on backend database resources, making it more scalable and performant to serve higher user concurrency.

The performance gains will be evident when you work with direct query datasets that have complex custom SQL, such as nested queries that have to filter the data in the inner sections of the query.

Generic datasets reusable across analyses

Dataset parameters can enable datasets to be largely reused across various analyses, thereby reducing the effort for the data owners to prepare and maintain the datasets. Whether you have a SPICE dataset or direct query dataset, with dataset parameters, you can port calculated field referencing parameters from the analysis to the dataset. Authors can now reuse calculated fields referencing parameters created by dataset owners in a dataset, rather than recreate these fields across multiple analysis.

With the option to port parameter-dependent calculated fields from the analysis to the underlying datasets, dataset parameters can help you create the same calculated fields in the dataset and reuse them across multiple analyses. This is important for governance use cases as well: dataset owners can move the parameter-dependent calculated fields from the analysis to protect the business logic, ensuring that their calculated fields can’t be modified by analyses’ authors.

Simpler dataset maintenance with repeatable variables

When you have a dataset that refers to a static value (placeholder) in multiple places in custom SQL and calculated fields, you can now create a dataset parameter and reuse it in multiple places. This will help in better code maintainability. (Note that inserting parameters in custom SQL is only available in direct query.)

Solution overview

In this scenario, we create a custom SQL direct query dataset to observe unoptimized SQL queries that are generated without dataset parameters, and demonstrate how your current custom SQL queries run if you don’t use dataset parameters. Then we modify the custom SQL, add the dataset parameter, and show the optimized query generated for the same dataset if we use dataset parameters.

In this example, we use an Amazon RDS for PostgreSQL database. However, this feature will work with any SQL-based data source in QuickSight.

Query your data with analysis parameters

To set up your data source, dataset, and analysis, complete the following steps. If you’re using real data, you can skip to the next section.

  1. Create a QuickSight data source.

The following screenshot shows sample connection details.

create a datasource

  1. Create a new direct query custom SQL dataset.

We are using sample data from NYC OpenData for New York taxi rides with a subset of approximately 1 million records. The data is loaded in an RDS for PostgreSQL database table called nytaxidata.

create a sample dataset nytaxidata

  1. Create a sample analysis using the dataset you just created. Choose the table visual and add a few columns from the Fields list.

create a sample analysis using nytaxidata dataset

  1. Reload the analysis and observe the query generated on the PostgreSQL database.

You will notice it loads the full dataset (select * from nytaxidata) as referenced in the screenshot below from RDS Performance Insight.

SQL from performance insight, unoptimized SQL inner query without where clause

  1. Add an analysis parameter-based filter control to the QuickSight analysis. Change the value of this filter control (analysis parameter in this case).

creating analysis parameter with a control

The inner query over the dataset still uses custom SQL without using the filter in the WHERE clause. This filter control parameter is still part of the WHERE clause of the outer query, so the custom SQL fetches the complete result set as part of the inner query. This may not be the case if you use database tables as a dataset rather than a custom SQL query as a dataset. With a dataset based directly on tables, parameter values are passed to the database in the WHERE clause.

SQL from performance insight, unoptimized SQL inner query without where clause with analysis parameter

So how do we overcome the challenge of being able to include the parameter in the WHERE clause in custom SQL datasets? With dataset parameters!

Optimize your query with dataset parameters

Let’s look at a few scenarios where we can use dataset parameters to send more optimized queries to the database.

  1. Create a dataset parameter (for example, pDSfareamount) and add it to the WHERE clause with an equality predicate in the custom SQL.Observe if there is any change in the SQL query that was passed to the database.

creating dataset parameter

This time, you will see optimized SQL generated using the default parameter value in the WHERE clause of the inner query (select * from nytaxidata where fare_amount=0). This results in better query performance for direct query datasets.

optimized sql generated with dataset parameter

Map dataset parameters with analysis parameters

Dataset parameters can be mapped to analysis parameters and user-selected values can pass to the dataset parameters from the interactions on the dashboard at run time.

You can use a single analysis parameter and map it to multiple dataset parameters. The parent analysis parameter can now be linked with a filter control or an action, and can help you filter multiple datasets based on custom SQL.

In this section, we map a dataset parameter with an analysis parameter and bind it with a filter control at runtime.

  1. First, we create an analysis parameter and map it to a dataset parameter (we use the dataset parameter we created earlier).

mapping analysis parameter with a dataset parameter

  1. Now the analysis parameter (for this example, pAfareamount) is created. You can create the control object Fare Amount to dynamically change the dataset parameter value from the analysis or dashboard using a parameter control. You can bind pAfareamount with a QuickSight filter to pass values to the dataset parameter dynamically. When you’re changing values in a parameter control, you will find optimized SQL on the backend database with the WHERE predicate in inner query generated.

chaing value of analysis parameter mapped to dataste parameter via filter control

Additional examples using dataset parameters

So far, we have used dataset parameters with an equality predicate.Let’s look at a few more scenarios using dataset parameters.

  1. The following screenshot demonstrates using a dataset parameter with a range predicate of custom SQL.

dataset parameter with non equality predicate

  1. The following example illustrates using two dataset parameters with a between operator.

two dataset parameters with between operator

  1. The following example shows using a dataset parameter within a calculation.

dataset parameter used in calculated field based on ifelse condition

  1. We can also use a dataset parameter with a scalar user-defined function (UDF). In the following example, we have a scalar function is_holiday(pickupdate), which takes a pickupdate as a parameter and returns a flag of 0 or 1 based on whether pickupdate is a public holiday.

dataset parameter used with scalar user defined function

  1. Additionally, we can use a dataset parameter to derive a calculated field. In the following example, we need to calculate the surcharge_amount dynamically based on a value specified at runtime and the number of passengers. We use a dataset parameter along with a case statement to calculate the desired surcharge_amount.

dataset paramter with calculated field case statement

  1. The final example illustrates how to move calculations using parameters in the analysis to the dataset for reusability.

porting dataset parameter from analysis to dataset

Dataset parameter limitations

The following are the known limitations (as of this writing) that you may encounter when working with dataset parameters in QuickSight:

  • Dataset parameters can’t be inserted into custom SQL of datasets stored in SPICE.
  • Dynamic defaults can only be configured on the analysis page of the analysis that is using the dataset. You can’t configure a dynamic default at the dataset level.
  • The Select all option is not supported on multi-value controls of analysis parameters that are mapped to dataset parameters (but there is a workaround that you can follow).
  • Cascading controls are not supported for dataset parameters.
  • Dataset parameters can only be used by dataset filters when the dataset is using a direct query.
  • When dashboard readers schedule emailed reports, selected controls don’t propagate to the dataset parameters that are included in the report that is attached to the email. Instead, the default values of the parameters are used.

Refer to Using dataset parameters in Amazon QuickSight for more information.

Conclusion

In this post, we showed you how to create QuickSight dataset parameters and map them to analysis parameters. Dataset parameters help improve your QuickSight dashboard performance for direct query custom SQL datasets by generating optimized SQL queries. We also showed a few examples of how to use dataset parameters in SQL range predicates, calculated fields, scalar UDFs, and case statements.

Dataset parameters enable dataset owners to centrally create and govern parameter-dependent calculated fields at the dataset level. Such calculated fields can be reused across multiple analyses, and cannot be tampered with by analysis authors.

We hope you will find dataset parameters in QuickSight useful. We have already seen how the feature is creatively used in a wide range of use cases. We recommend that you review your existing direct query custom SQL datasets in your QuickSight deployment to look for candidates for optimization, or take advantage of the other benefits of dataset parameters. For example, BI teams can benefit from dataset parameters by reusing the same dataset with different values in the parameter to analyze different slices of the same data, such as different regions, products, or customers by industry segments.

Are you considering migrating legacy reports to QuickSight? Dataset parameters can help enterprise BI developers reduce the migration effort of legacy reports that already have parameterized SQL queries in the legacy queries. These SQL queries can be passed along their parameters to QuickSight datasets via automations with the help of QuickSight APIs (and a few adjustments to the queries if the parameters are marked differently).

For more information on dataset parameters, refer to Using dataset parameters in Amazon QuickSight.


About the authors

Anwar Ali is a Specialist Solutions Architect for Amazon QuickSight. Anwar has over 18 years of experience implementing enterprise business intelligence (BI), data analytics and database solutions . He specializes in integration of BI solutions with business applications, helping customers in BI architecture design patterns and best practices.

Salim Khan is a Specialist Solutions Architect for Amazon QuickSight. Salim has over 16 years of experience implementing enterprise business intelligence (BI) solutions. Prior to AWS, Salim worked as a BI consultant catering to industry verticals like Automotive, Healthcare, Entertainment, Consumer, Publishing and Financial Services. He has delivered business intelligence, data warehousing, data integration and master data management solutions across enterprises.

Gil Raviv is a Principal Product Manager for Amazon QuickSight, AWS’ cloud-native, fully managed SaaS BI service. As a thought-leader in BI, Gil accelerated the growth of global BI practices at AWS and Avanade, and has guided Fortune 1000 enterprises in their Data & AI journey. As a passionate evangelist, author and blogger of low-code/no-code data prep and analytic tools, Gil was awarded 5 times as a Microsoft MVP (Most Valuable Professional).

Policy-based access control in application development with Amazon Verified Permissions

Post Syndicated from Marc von Mandel original https://aws.amazon.com/blogs/devops/policy-based-access-control-in-application-development-with-amazon-verified-permissions/

Today, accelerating application development while shifting security and assurance left in the development lifecycle is essential. One of the most critical components of application security is access control. While traditional access control mechanisms such as role-based access control (RBAC) and access control lists (ACLs) are still prevalent, policy-based access control (PBAC) is gaining momentum. PBAC is a more powerful and flexible access control model, allowing developers to apply any combination of coarse-, medium-, and fine-grained access control over resources and data within an application. In this article, we will explore PBAC and how it can be used in application development using Amazon Verified Permissions and how you can define permissions as policies using Cedar, an expressive and analyzable open-source policy language. We will briefly describe here how developers and admins can define policy-based access controls using roles and attributes for fine-grained access.

What is Policy-Based Access Control?

PBAC is an access control model that uses permissions expressed as policies to determine who can access what within an application. Administrators and developers can define application access statically as admin-time authorization where the access is based on users and groups defined by roles and responsibilities. On the other hand, developers set up run-time or dynamic authorization at any time to apply access controls at the time when a user attempts to access a particular application resource. Run-time authorization takes in attributes of application resources, such as contextual elements like time or location, to determine what access should be granted or denied. This combination of policy types makes policy-based access control a more powerful authorization engine.

A central policy store and policy engine evaluates these policies continuously, in real-time to determine access to resources. PBAC is a more dynamic access control model as it allows developers and administrators to create and modify policies according to their needs, such as defining custom roles within an application or enabling secure, delegated authorization. Developers can use PBAC to apply role- and attributed-based access controls across many different types of applications, such as customer-facing web applications, internal workforce applications, multi-tenant software-as-a-service (SaaS) applications, edge device access, and more. PBAC brings together RBAC and attribute-based access control (ABAC), which have been the two most widely used access control models for the past couple decades (See the figure below).

Policy-based access control with admin-time and run-time authorization

Figure 1: Overview of policy-based access control (PBAC)

Before we try and understand how to modernize permissions, let’s understand how developers implement it in a traditional development process. We typically see developers hardcode access control into each and every application. This creates four primary challenges.

  1. First, you need to update code every time to update access control policies. This is time-consuming for a developer and done at the expense of working on the business logic of the application.
  2. Second, you need to implement these permissions in each and every application you build.
  3. Third, application audits are challenging, you need to run a battery of tests or dig through thousands of lines of code spread across multiple files to demonstrate who has access to application resources. For example, providing evidence to audits that only authorized users can access a patient’s health record.
  4.  Finally, developing hardcoded application access control is often time consuming and error prone.

Amazon Verified Permissions simplifies this process by externalizing access control rules from the application code to a central policy store within the service. Now, when a user tries to take an action in your application, you call Verified Permissions to check if it is authorized. Policy admins can respond faster to changing business requirements, as they no longer need to depend on the development team when updating access controls. They can use a central policy store to make updates to authorization policies. This means that developers can focus on the core application logic, and access control policies can be created, customized, and managed separately or collectively across applications. Developers can use PBAC to define authorization rules for users, user groups, or attributes based on the entity type accessing the application. Restricting access to data and resources using PBAC protects against unintended access to application resources and data.

For example, a developer can define a role-based and attribute-based access control policy that allows only certain users or roles to access a particular API. Imagine a group of users within a Marketing department that can only view specific photos within a photo sharing application. The policy might look something like the following using Cedar.

permit(

  principal in Role::"expo-speakers",

  action == Action::"view",

  resource == Photo::"expoPhoto94.jpg"

)

when { 

    principal.department == “Marketing”

}

;

How do I get started using PBAC in my applications?

PBAC can be integrated into the application development process in several ways when using Amazon Verified Permissions. Developers begin by defining an authorization model for their application and use this to describe the scope of authorization requests made by the application and the basis for evaluating the requests. Think of this as a narrative or structure to authorization requests. Developers then write a schema which documents the form of the authorization model in a machine-readable syntax. This schema document describes each entity type, including principal types, actions, resource types, and conditions. Developers can then craft policies, as statements, that permit or forbid a principal to one or more actions on a resource.

Next, you define a set of application policies which define the overall framework and guardrails for access controls in your application. For example, a guardrail policy might be that only the owner can access photos that are marked ‘private’. These policies are applicable to a large set of users or resources, and are not user or resource specific. You create these policies in the code of your applications, and instantiate them in your CI/CD pipeline, using CloudFormation, and tested in beta stages before being deployed to production.

Lastly, you define the shape of your end-user policies using policy templates. These end-user policies are specific to a user (or user group). For example, a policy that states “Alice” can view “expoPhoto94.jpg”. Policy templates simplify managing end-user policies as a group. Now, every time a user in your application tries to take an action, you call Verified Permissions to confirm that the action is authorized.

Benefits of using Amazon Verified Permissions policies in application development

Amazon Verified Permissions offers several benefits when it comes to application development.

  1. One of the most significant benefits is the flexibility in using the PBAC model. Amazon Verified Permissions allows application administrators or developers to create and modify policies at any time without going into application code, making it easier to respond to changing security needs.
  2. Secondly, it simplifies the application development process by externalizing access control rules from the application code. Developers can reuse PBAC controls for newly built or acquired applications. This allows developers to focus on the core application logic and mitigates security risks within applications by applying fine-grained access controls.
  3. Lastly, developers can add secure delegated authorization using PBAC and Amazon Verified Permissions. This enables developers to enable a group, role, or resource owner the ability to manage data sharing within application resources or between services. This has exciting implications for developers wanting to add privacy and consent capabilities for end users while still enforcing guardrails defined within a centralized policy store.

In Summary

PBAC is a more flexible access control model that enables fine-grained control over access to resources in an application. By externalizing access control rules from the application code, PBAC simplifies the application development process and reduces the risks of security vulnerabilities in the application. PBAC also offers flexibility, aligns with compliance mandates for access control, and developers and administrators benefit from centralized permissions across various stages of the DevOps process. By adopting PBAC in application development, organizations can improve their application security and better align with industry regulations.

Amazon Verified Permissions is a scalable permissions management and fine-grained authorization service for applications developers build. The service helps developers to build secure applications faster by externalizing authorization and centralizing policy management and administration. Developers can align their application access with Zero Trust principles by implementing least privilege and continuous verification within applications. Security and audit teams can better analyze and audit who has access to what within applications.

DeVault: Reforming the free software message

Post Syndicated from original https://lwn.net/Articles/935193/

Drew DeVault has announced
the launch of a new web site
that is intended to be a better introduction to the free-software
community.

Some of my criticisms focused on the message: fsf.org and gnu.org
together suffer from no small degree of incomprehensibility and
inaccessibility which makes it difficult for new participants to
learn about the movement and apply it in practice to their own
projects.

This is something which is relatively easily fixed!

[$] PostgreSQL reconsiders its process-based model

Post Syndicated from original https://lwn.net/Articles/934940/

In the fast-moving open-source world, programs can come and go quickly; a
tool that has many users today can easily be eclipsed by something better
next week. Even in this environment, though, some programs endure for a
long time. As an example, consider the
PostgreSQL database system
, which traces its
history
back to 1986. Making fundamental changes to a large code base
with that much history is never an easy task. As fundamental changes go,
moving PostgreSQL away from its process-oriented model is not a small one,
but it is one that the project is considering seriously.

CISPE Code of Conduct Public Register now has 107 compliant AWS services

Post Syndicated from Gokhan Akyuz original https://aws.amazon.com/blogs/security/cispe-code-of-conduct-public-register-now-has-107-compliant-aws-services/

We continue to expand the scope of our assurance programs at Amazon Web Services (AWS) and are pleased to announce that 107 services are now certified as compliant with the Cloud Infrastructure Services Providers in Europe (CISPE) Data Protection Code of Conduct. This alignment with the CISPE requirements demonstrates our ongoing commitment to adhere to the heightened expectations for data protection by cloud service providers. AWS customers who use AWS certified services can be confident that their data is processed in adherence with the European Union’s General Data Protection Regulation (GDPR).

The CISPE Code of Conduct is the first pan-European, sector-specific code for cloud infrastructure service providers, which received a favorable opinion that it complies with the GDPR. It helps organizations across Europe accelerate the development of GDPR compliant, cloud-based services for consumers, businesses, and institutions.

The accredited monitoring body EY CertifyPoint evaluated AWS on January 26, 2023, and successfully audited 100 certified services. AWS added seven additional services to the current scope in June 2023. As of the date of this blog post, 107 services are in scope of this certification. The Certificate of Compliance that illustrates AWS compliance status is available on the CISPE Public Register. For up-to-date information, including when additional services are added, search the CISPE Public Register by entering AWS as the Seller of Record; or see the AWS CISPE page.

AWS strives to bring additional services into the scope of its compliance programs to help you meet your architectural and regulatory needs. If you have questions or feedback about AWS compliance with CISPE, reach out to your AWS account team.

To learn more about our compliance and security programs, see AWS Compliance Programs, AWS General Data Protection Regulation (GDPR) Center, and the EU data protection section of the AWS Cloud Security site. As always, we value your feedback and questions; reach out to the AWS Compliance team through the Contact Us page.

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Gokhan Akyuz

Gokhan Akyuz

Gokhan is a Security Audit Program Manager at AWS based in Amsterdam, Netherlands. He leads security audits, attestations, and certification programs across Europe and the Middle East. Gokhan has more than 15 years of experience in IT and cybersecurity audits, and controls implementation in a wide range of industries.

Security updates for Monday

Post Syndicated from original https://lwn.net/Articles/935184/

Security updates have been issued by Debian (golang-go.crypto, maradns, requests, sofia-sip, and xmltooling), Fedora (chromium, iaito, iniparser, libX11, matrix-synapse, radare2, and thunderbird), Red Hat (c-ares, jenkins and jenkins-2-plugins, and texlive), SUSE (bluez, chromium, go1.19, go1.20, jetty-minimal, kernel, kubernetes1.18, kubernetes1.23, kubernetes1.24, libX11, open-vm-tools, openvswitch3, opera, syncthing, and xen), and Ubuntu (libcap2, libpod, linux, linux-aws, linux-aws-5.15, linux-azure, linux-azure-5.15,
linux-azure-fde-5.15, linux-gcp, linux-gcp-5.15, linux-gke,
linux-gke-5.15, linux-gkeop, linux-hwe-5.15, linux-ibm, linux-kvm,
linux-lowlatency, linux-lowlatency-hwe-5.15, linux-oracle,
linux-oracle-5.15, linux-raspi, linux, linux-aws, linux-azure, linux-gcp, linux-hwe-5.19, linux-kvm,
linux-lowlatency, linux-oracle, linux-raspi, linux, linux-aws, linux-lowlatency, linux-raspi, linux-oem-5.17, linux-oem-6.1, pypdf2, and qemu).

Introducing Low-Latency HLS Support for Cloudflare Stream

Post Syndicated from Taylor Smith original http://blog.cloudflare.com/low-latency-hls-support-for-cloudflare-stream/

Introducing Low-Latency HLS Support for Cloudflare Stream

Introducing Low-Latency HLS Support for Cloudflare Stream

Stream Live lets users easily scale their live streaming apps and websites to millions of creators and concurrent viewers without having to worry about bandwidth costs or purchasing hardware for real-time encoding at scale. Stream Live lets users focus on the content rather than the infrastructure — taking care of the codecs, protocols, and bitrate automatically. When we launched Stream Live last year, we focused on bringing high quality, feature-rich streaming to websites and applications with HTTP Live Streaming (HLS).

Today, we're excited to introduce support for Low-Latency HTTP Live Streaming (LL-HLS) in a closed beta, offering you an even faster streaming experience. LL-HLS will reduce the latency a viewer may experience on their player from highs of around 30 seconds to less than 10 in many cases. Lower latency brings creators even closer to their viewers, empowering customers to build more interactive features like Q&A or chat and enabling the use of live streaming in more time-sensitive applications like sports, gaming, and live events.

Broadcast with less than 10-second latency

LL-HLS is an extension of HLS and allows us to reduce glass-to-glass latency — the time between something happening on the broadcast end and a user seeing it on their screen. This includes everything from broadcaster encoding to client-side buffering because we know the experience is driven by what a user sees, not when a byte is delivered into a buffer. Depending on encoder and player settings, broadcasters' content can be playing on viewers' screens in less than ten seconds.

Our addition of LL-HLS support builds on all the best parts of Stream including simple, predictable pricing. You never have to pay for ingest (broadcasting to us), compute (encoding), or egress. It costs \$5 per 1,000 minutes of video stored per month and \$1 per 1,000 minutes of video viewed per month. This allows you to stream with peace of mind, knowing there are no surprise fees.

Other platforms tack on live recordings as a separate add-on feature, and those recordings only become available minutes or even hours after a live stream ends. With Cloudflare Stream, Live segments are automatically recorded and immediately available for on-demand playback.

Stream also provides both a built-in web player and HLS manifests to use in a compatible player of your choosing. This enables you or your users to go live using the same protocols and tools that broadcasters big and small use to go live to YouTube or Twitch, but gives you full control over access and presentation of live streams.

We also provide access control with signed URLs allowing you to protect your content, sharing with only certain users. This allows you to restrict access so only logged in members can watch a particular video, or only let users watch your video for a limited time period. And of course, Stream is powered by Cloudflare's global network for fast delivery worldwide, with points of presents within 50ms of 95% of the Internet connected population.

Introducing Low-Latency HLS Support for Cloudflare Stream
Left: Broadcasting to Stream Live using OBS. Right: Watching that same Stream. Note the five second difference in the NIST clock between the source and the playback.

Powering the LL-HLS experience involved making several improvements to our underlying infrastructure. One of the largest challenges we encountered was that our existing architecture involved a pipeline with multiple buffers as long as the keyframe interval. This meant Stream Live would introduce a delay of up to five times the keyframe interval. To resolve this, we simplified a portion of our pipeline — now, we work with individual frames rather than whole keyframe-intervals, but without giving up the economies of scale our approach to encoding provides. This decoupling of keyframe interval and internal buffer duration lets us dramatically reduce latency in HLS, with a maximum of twice the keyframe interval.

Getting started with the LL-HLS beta

As we prepare to ship this new functionality, we're looking for beta testers to help us test non-production workloads. To participate in the beta, your application should be configured with these settings:

  • H.264 video codec
  • Constant bitrate
  • Keyframe interval (GOP size) of 2s
  • No B Frames
  • Using the Stream built-in player

Getting started with Stream Live only takes a few minutes. Create a Live Input in the Cloudflare dashboard, then Stream will automatically provide RTMPS and SRT endpoints to broadcast your feed to us as well as an HTML embed for our built-in player and the HLS manifest for a custom player.

Introducing Low-Latency HLS Support for Cloudflare Stream
Introducing Low-Latency HLS Support for Cloudflare Stream

This connection information can be added easily to a broadcast application like OBS to start streaming immediately:

Introducing Low-Latency HLS Support for Cloudflare Stream

Customers in the LL-HLS beta will need to make a minor adjustment to the built-in player embed code, but there are no changes to Live Input configuration, dashboard interface, API, or existing functionality.

Sign up today

LL-HLS is Stream Live’s latest tool to bring your creators and audiences together. After the beta period, this feature will be generally available to all new and existing Stream subscriptions with no pricing changes or contract requirements — all part of building the fastest, simplest serverless live streaming platform. Join our beta to start test-driving Low-Latency HLS!

Every request, every microsecond: scalable machine learning at Cloudflare

Post Syndicated from Alex Bocharov original http://blog.cloudflare.com/scalable-machine-learning-at-cloudflare/

Every request, every microsecond: scalable machine learning at Cloudflare

Every request, every microsecond: scalable machine learning at Cloudflare

In this post, we will take you through the advancements we've made in our machine learning capabilities. We'll describe the technical strategies that have enabled us to expand the number of machine learning features and models, all while substantially reducing the processing time for each HTTP request on our network. Let's begin.

Background

For a comprehensive understanding of our evolved approach, it's important to grasp the context within which our machine learning detections operate. Cloudflare, on average, serves over 46 million HTTP requests per second, surging to more than 63 million requests per second during peak times.

Machine learning detection plays a crucial role in ensuring the security and integrity of this vast network. In fact, it classifies the largest volume of requests among all our detection mechanisms, providing the final Bot Score decision for over 72% of all HTTP requests. Going beyond, we run several machine learning models in shadow mode for every HTTP request.

At the heart of our machine learning infrastructure lies our reliable ally, CatBoost. It enables ultra low-latency model inference and ensures high-quality predictions to detect novel threats such as stopping bots targeting our customers' mobile apps. However, it's worth noting that machine learning model inference is just one component of the overall latency equation. Other critical components include machine learning feature extraction and preparation. In our quest for optimal performance, we've continuously optimized each aspect contributing to the overall latency of our system.

Initially, our machine learning models relied on single-request features, such as presence or value of certain headers. However, given the ease of spoofing these attributes, we evolved our approach. We turned to inter-request features that leverage aggregated information across multiple dimensions of a request in a sliding time window. For example, we now consider factors like the number of unique user agents associated with certain request attributes.

The extraction and preparation of inter-request features were handled by Gagarin, a Go-based feature serving platform we developed. As a request arrived at Cloudflare, we extracted dimension keys from the request attributes. We then looked up the corresponding machine learning features in the multi-layered cache. If the desired machine learning features were not found in the cache, a memcached "get" request was made to Gagarin to fetch those. Then machine learning features were plugged into CatBoost models to produce detections, which were then surfaced to the customers via Firewall and Workers fields and internally through our logging pipeline to ClickHouse. This allowed our data scientists to run further experiments, producing more features and models.

Every request, every microsecond: scalable machine learning at Cloudflare
Previous system design for serving machine learning features over Unix socket using Gagarin.

Initially, Gagarin exhibited decent latency, with a median latency around 200 microseconds to serve all machine learning features for given keys. However, as our system evolved and we introduced more features and dimension keys, coupled with increased traffic, the cache hit ratio began to wane. The median latency had increased to 500 microseconds and during peak times, the latency worsened significantly, with the p99 latency soaring to roughly 10 milliseconds. Gagarin underwent extensive low-level tuning, optimization, profiling, and benchmarking. Despite these efforts, we encountered the limits of inter-process communication (IPC) using Unix Domain Socket (UDS), among other challenges, explored below.

Problem definition

In summary, the previous solution had its drawbacks, including:

  • High tail latency: during the peak time, a portion of requests experienced increased  latency caused by CPU contention on the Unix socket and Lua garbage collector.
  • Suboptimal resource utilization: CPU and RAM utilization was not optimized to the full potential, leaving less resources for other services running on the server.
  • Machine learning features availability: decreased due to memcached timeouts, which resulted in a higher likelihood of false positives or false negatives for a subset of the requests.
  • Scalability constraints: as we added more machine learning features, we approached the scalability limit of our infrastructure.

Equipped with a comprehensive understanding of the challenges and armed with quantifiable metrics, we ventured into the next phase: seeking a more efficient way to fetch and serve machine learning features.

Exploring solutions

In our quest for more efficient methods of fetching and serving machine learning features, we evaluated several alternatives. The key approaches included:

Further optimizing Gagarin: as we pushed our Go-based memcached server to its limits, we encountered a lower bound on latency reductions. This arose from IPC over UDS synchronization overhead and multiple data copies, the serialization/deserialization overheads, as well as the inherent latency of garbage collector and the performance of hashmap lookups in Go.

Considering Quicksilver: we contemplated using Quicksilver, but the volume and update frequency of machine learning features posed capacity concerns and potential negative impacts on other use cases. Moreover, it uses a Unix socket with the memcached protocol, reproducing the same limitations previously encountered.

Increasing multi-layered cache size: we investigated expanding cache size to accommodate tens of millions of dimension keys. However, the associated memory consumption, due to duplication of these keys and their machine learning features across worker threads, rendered this approach untenable.

Sharding the Unix socket: we considered sharding the Unix socket to alleviate contention and improve performance. Despite showing potential, this approach only partially solved the problem and introduced more system complexity.

Switching to RPC: we explored the option of using RPC for communication between our front line server and Gagarin. However, since RPC still requires some form of communication bus (such as TCP, UDP, or UDS), it would not significantly change the performance compared to the memcached protocol over UDS, which was already simple and minimalistic.

After considering these approaches, we shifted our focus towards investigating alternative Inter-Process Communication (IPC) mechanisms.

IPC mechanisms

Adopting a first principles design approach, we questioned: "What is the most efficient low-level method for data transfer between two processes provided by the operating system?" Our goal was to find a solution that would enable the direct serving of machine learning features from memory for corresponding HTTP requests. By eliminating the need to traverse the Unix socket, we aimed to reduce CPU contention, improve latency, and minimize data copying.

To identify the most efficient IPC mechanism, we evaluated various options available within the Linux ecosystem. We used ipc-bench, an open-source benchmarking tool specifically designed for this purpose, to measure the latencies of different IPC methods in our test environment. The measurements were based on sending one million 1,024-byte messages forth and back (i.e., ping pong) between two processes.

IPC method Avg duration, μs Avg throughput, msg/s
eventfd (bi-directional) 9.456 105,533
TCP sockets 8.74 114,143
Unix domain sockets 5.609 177,573
FIFOs (named pipes) 5.432 183,388
Pipe 4.733 210,369
Message Queue 4.396 226,421
Unix Signals 2.45 404,844
Shared Memory 0.598 1,616,014
Memory-Mapped Files 0.503 1,908,613

Based on our evaluation, we found that Unix sockets, while taking care of synchronization, were not the fastest IPC method available. The two fastest IPC mechanisms were shared memory and memory-mapped files. Both approaches offered similar performance, with the former using a specific tmpfs volume in /dev/shm and dedicated system calls, while the latter could be stored in any volume, including tmpfs or HDD/SDD.

Missing ingredients

In light of these findings, we decided to employ memory-mapped files as the IPC mechanism for serving machine learning features. This choice promised reduced latency, decreased CPU contention, and minimal data copying. However, it did not inherently offer data synchronization capabilities like Unix sockets. Unlike Unix sockets, memory-mapped files are simply files in a Linux volume that can be mapped into memory of the process. This sparked several critical questions:

  1. How could we efficiently fetch an array of hundreds of float features for given dimension keys when dealing with a file?
  2. How could we ensure safe, concurrent and frequent updates for tens of millions of keys?
  3. How could we avert the CPU contention previously encountered with Unix sockets?
  4. How could we effectively support the addition of more dimensions and features in the future?

To address these challenges we needed to further evolve this new approach by adding a few key ingredients to the recipe.

Augmenting the Idea

To realize our vision of memory-mapped files as a method for serving machine learning features, we needed to employ several key strategies, touching upon aspects like data synchronization, data structure, and deserialization.

Wait-free synchronization

When dealing with concurrent data, ensuring safe, concurrent, and frequent updates is paramount. Traditional locks are often not the most efficient solution, especially when dealing with high concurrency environments. Here's a rundown on three different synchronization techniques:

With-lock synchronization: a common approach using mechanisms like mutexes or spinlocks. It ensures only one thread can access the resource at a given time, but can suffer from contention, blocking, and priority inversion, just as evident with Unix sockets.

Lock-free synchronization: this non-blocking approach employs atomic operations to ensure at least one thread always progresses. It eliminates traditional locks but requires careful handling of edge cases and race conditions.

Wait-free synchronization: a more advanced technique that guarantees every thread makes progress and completes its operation without being blocked by other threads. It provides stronger progress guarantees compared to lock-free synchronization, ensuring that each thread completes its operation within a finite number of steps.

Disjoint Access Parallelism Starvation Freedom Finite Execution Time
With lock Every request, every microsecond: scalable machine learning at Cloudflare Every request, every microsecond: scalable machine learning at Cloudflare Every request, every microsecond: scalable machine learning at Cloudflare
Lock-free Every request, every microsecond: scalable machine learning at Cloudflare Every request, every microsecond: scalable machine learning at Cloudflare Every request, every microsecond: scalable machine learning at Cloudflare
Wait-free Every request, every microsecond: scalable machine learning at Cloudflare Every request, every microsecond: scalable machine learning at Cloudflare Every request, every microsecond: scalable machine learning at Cloudflare

Our wait-free data access pattern draws inspiration from Linux kernel's Read-Copy-Update (RCU) pattern and the Left-Right concurrency control technique. In our solution, we maintain two copies of the data in separate memory-mapped files. Write access to this data is managed by a single writer, with multiple readers able to access the data concurrently.

We store the synchronization state, which coordinates access to these data copies, in a third memory-mapped file, referred to as "state". This file contains an atomic 64-bit integer, which represents an InstanceVersion and a pair of additional atomic 32-bit variables, tracking the number of active readers for each data copy. The InstanceVersion consists of the currently active data file index (1 bit), the data size (39 bits, accommodating data sizes up to 549 GB), and a data checksum (24 bits).

Zero-copy deserialization

To efficiently store and fetch machine learning features, we needed to address the challenge of deserialization latency. Here, zero-copy deserialization provides an answer. This technique reduces the time and memory required to access and use data by directly referencing bytes in the serialized form.

We turned to rkyv, a zero-copy deserialization framework in Rust, to help us with this task. rkyv implements total zero-copy deserialization, meaning no data is copied during deserialization and no work is done to deserialize data. It achieves this by structuring its encoded representation to match the in-memory representation of the source type.

One of the key features of rkyv that our solution relies on is its ability to access HashMap data structures in a zero-copy fashion. This is a unique capability among Rust serialization libraries and one of the main reasons we chose rkyv for our implementation. It also has a vibrant Discord community, eager to offer best-practice advice and accommodate feature requests.

Every request, every microsecond: scalable machine learning at Cloudflare
Feature comparison: rkyv vs FlatBuffers and Cap'n Proto

Enter mmap-sync crate

Leveraging the benefits of memory-mapped files, wait-free synchronization and zero-copy deserialization, we've crafted a unique and powerful tool for managing high-performance, concurrent data access between processes. We've packaged these concepts into a Rust crate named mmap-sync, which we're thrilled to open-source for the wider community.

At the core of the mmap-sync package is a structure named Synchronizer. It offers an avenue to read and write any data expressible as a Rust struct. Users simply have to implement or derive a specific Rust trait surrounding struct definition – a task requiring just a single line of code. The Synchronizer presents an elegantly simple interface, equipped with "write" and "read" methods.

impl Synchronizer {
    /// Write a given `entity` into the next available memory mapped file.
    pub fn write<T>(&mut self, entity: &T, grace_duration: Duration) -> Result<(usize, bool), SynchronizerError> {
        …
    }

    /// Reads and returns `entity` struct from mapped memory wrapped in `ReadResult`
    pub fn read<T>(&mut self) -> Result<ReadResult<T>, SynchronizerError> {
        …
    }
}

/// FeaturesMetadata stores features along with their metadata
#[derive(Archive, Deserialize, Serialize, Debug, PartialEq)]
#[archive_attr(derive(CheckBytes))]
pub struct FeaturesMetadata {
    /// Features version
    pub version: u32,
    /// Features creation Unix timestamp
    pub created_at: u32,
    /// Features represented by vector of hash maps
    pub features: Vec<HashMap<u64, Vec<f32>>>,
}

A read operation through the Synchronizer performs zero-copy deserialization and returns a "guarded" Result encapsulating a reference to the Rust struct using RAII design pattern. This operation also increments the atomic counter of active readers using the struct. Once the Result is out of scope, the Synchronizer decrements the number of readers.

The synchronization mechanism used in mmap-sync is not only "lock-free" but also "wait-free". This ensures an upper bound on the number of steps an operation will take before it completes, thus providing a performance guarantee.

The data is stored in shared mapped memory, which allows the Synchronizer to “write” to it and “read” from it concurrently. This design makes mmap-sync a highly efficient and flexible tool for managing shared, concurrent data access.

Now, with an understanding of the underlying mechanics of mmap-sync, let's explore how this package plays a key role in the broader context of our Bot Management platform, particularly within the newly developed components: the bliss service and library.

System design overhaul

Transitioning from a Lua-based module that made memcached requests over Unix socket to Gagarin in Go to fetch machine learning features, our new design represents a significant evolution. This change pivots around the introduction of mmap-sync, our newly developed Rust package, laying the groundwork for a substantial performance upgrade. This development led to a comprehensive system redesign and introduced two new components that form the backbone of our Bots Liquidation Intelligent Security System – or BLISS, in short: the bliss service and the bliss library.

Every request, every microsecond: scalable machine learning at Cloudflare

Bliss service

The bliss service operates as a Rust-based, multi-threaded sidecar daemon. It has been designed for optimal batch processing of vast data quantities and extensive I/O operations. Among its key functions, it fetches, parses, and stores machine learning features and dimensions for effortless data access and manipulation. This has been made possible through the incorporation of the Tokio event-driven platform, which allows for efficient, non-blocking I/O operations.

Bliss library

Operating as a single-threaded dynamic library, the bliss library seamlessly integrates into each worker thread using the Foreign Function Interface (FFI) via a Lua module. Optimized for minimal resource usage and ultra-low latency, this lightweight library performs tasks without the need for heavy I/O operations. It efficiently serves machine learning features and generates corresponding detections.

In addition to leveraging the mmap-sync package for efficient machine learning feature access, our new design includes several other performance enhancements:

  • Allocations-free operation: bliss library re-uses pre-allocated data structures and performs no heap allocations, only low-cost stack allocations. To enforce our zero-allocation policy, we run integration tests using the dhat heap profiler.
  • SIMD optimizations: wherever possible, the bliss library employs vectorized CPU instructions. For instance, AVX2 and SSE4 instruction sets are used to expedite hex-decoding of certain request attributes, enhancing speed by tenfold.
  • Compiler tuning: We compile both the bliss service and library with the following flags for superior performance:

[profile.release]
codegen-units = 1
debug = true
lto = "fat"
opt-level = 3

  • Benchmarking & profiling: We use Criterion for benchmarking every major feature or component within bliss. Moreover, we are also able to use the Go pprof profiler on Criterion benchmarks to view flame graphs and more:

cargo bench -p integration -- --verbose --profile-time 100

go tool pprof -http=: ./target/criterion/process_benchmark/process/profile/profile.pb

This comprehensive overhaul of our system has not only streamlined our operations but also has been instrumental in enhancing the overall performance of our Bot Management platform. Stay tuned to witness the remarkable changes brought about by this new architecture in the next section.

Rollout results

Our system redesign has brought some truly "blissful" dividends. Above all, our commitment to a seamless user experience and the trust of our customers have guided our innovations. We ensured that the transition to the new design was seamless, maintaining full backward compatibility, with no customer-reported false positives or negatives encountered. This is a testament to the robustness of the new system.

As the old adage goes, the proof of the pudding is in the eating. This couldn't be truer when examining the dramatic latency improvements achieved by the redesign. Our overall processing latency for HTTP requests at Cloudflare improved by an average of 12.5% compared to the previous system.

This improvement is even more significant in the Bot Management module, where latency improved by an average of 55.93%.

Every request, every microsecond: scalable machine learning at Cloudflare
Bot Management module latency, in microseconds.

More specifically, our machine learning features fetch latency has improved by several orders of magnitude:

Latency metric Before (μs) After (μs) Change
p50 532 9 -98.30% or x59
p99 9510 18 -99.81% or x528
p999 16000 29 -99.82% or x551

To truly grasp this impact, consider this: with Cloudflare’s average rate of 46 million requests per second, a saving of 523 microseconds per request equates to saving over 24,000 days or 65 years of processing time every single day!

In addition to latency improvements, we also reaped other benefits from the rollout:

  • Enhanced feature availability: thanks to eliminating Unix socket timeouts, machine learning feature availability is now a robust 100%, resulting in fewer false positives and negatives in detections.
  • Improved resource utilization: our system overhaul liberated resources equivalent to thousands of CPU cores and hundreds of gigabytes of RAM – a substantial enhancement of our server fleet's efficiency.
  • Code cleanup: another positive spin-off has been in our Lua and Go code. Thousands of lines of less performant and less memory-safe code have been weeded out, reducing technical debt.
  • Upscaled machine learning capabilities: last but certainly not least, we've significantly expanded our machine learning features, dimensions, and models. This upgrade empowers our machine learning inference to handle hundreds of machine learning features and dozens of dimensions and models.

Conclusion

In the wake of our redesign, we've constructed a powerful and efficient system that truly embodies the essence of 'bliss'. Harnessing the advantages of memory-mapped files, wait-free synchronization, allocation-free operations, and zero-copy deserialization, we've established a robust infrastructure that maintains peak performance while achieving remarkable reductions in latency. As we navigate towards the future, we're committed to leveraging this platform to further improve our Security machine learning products and cultivate innovative features. Additionally, we're excited to share parts of this technology through an open-sourced Rust package mmap-sync.

As we leap into the future, we are building upon our platform's impressive capabilities, exploring new avenues to amplify the power of machine learning. We are deploying a new machine learning model built on BLISS with select customers. If you are a Bot Management subscriber and want to test the new model, please reach out to your account team.

Separately, we are on the lookout for more Cloudflare customers who want to run their own machine learning models at the edge today. If you’re a developer considering making the switch to Workers for your application, sign up for our Constellation AI closed beta. If you’re a Bot Management customer and looking to run an already trained, lightweight model at the edge, we would love to hear from you. Let's embark on this path to bliss together.

The collective thoughts of the interwebz