Top 5: Featured Architecture Content for August

Post Syndicated from Elyse Lopez original https://aws.amazon.com/blogs/architecture/top-5-featured-architecture-content-for-august/

The AWS Architecture Center provides new and notable reference architecture diagrams, vetted architecture solutions, AWS Well-Architected best practices, whitepapers, and more. This blog post features some of our best picks from the new and newly updated content we released in the past month.

1. Implementing Travel & Hospitality Data Mesh

The travel and hospitality industries are facing new challenges when generating, accessing, and analyzing data at scale. This new reference architecture diagram shows how to implement a travel and hospitality-related data mesh that you can use to work with data such as travel reservations, loyalty programs, and digital revenue.

2. AWS Outposts High Availability Design and Architecture Considerations

This new whitepaper discusses architecture considerations and recommended practices that systems architects and IT managers can apply to build highly available on-premises application environments with AWS Outposts.

3. Using Computer Vision for Product Quality Analysis in Plants

This reference architecture is designed for manufacturers that want to enhance their Internet of Things (IoT) infrastructure by using computer vision for product quality analysis. It shows how to detect and act on product defect classification using AWS IoT and artificial intelligence/machine learning (AI/ML) services.

4. Tamper Proof Quality Data Using Amazon QLDB

Data tampering is costly for manufacturing companies, and quality data can be particularly vulnerable. This new AWS Solutions Implementation protects quality data by using cryptographic hashing in Amazon Quantum Ledger Database (Amazon QLDB) to maintain an accurate history of data changes.

5. Back to Basics: Handling Communications Between Applications

Many developers are realizing the benefits of moving from centralized to distributed architectures. Because of this, information on how to establish communication between the application services in those environments is very popular. This episode of Back to Basics covers best practices to establish communication, including trade-offs, common anti-patterns, and how to handle errors in less than 5 minutes!

This episode of Back to Basics covers best practices to establish communication

Figure 1. A diagram from Back to Basics video

Automating Multi-Armed Bandit testing during feature rollout

Post Syndicated from Grab Tech original https://engineering.grab.com/multi-armed-bandit-system-recommendation

A/B testing is an experiment where a random e-commerce platform user is given two versions of a variable: a control group and a treatment group, to discover the optimal version that maximizes conversion. When running A/B testing, you can take the Multi-Armed Bandit optimisation approach to minimise the loss of conversion due to low performance.

In the traditional software development process, Multi-Armed Bandit (MAB) testing and rolling out a new feature are usually separate processes. The novel Multi-Armed Bandit System for Recommendation solution, hereafter the Multi-Armed Bandit Optimiser, proposes automating the Multi-Armed Bandit testing simultaneously while rolling out the new feature.

Advantages

  • Automates the MAB testing process during new feature rollouts.
  • Selects the optimal parameters based on predefined metrics of each use case, which results in an end-to-end solution without the need for user intervention.
  • Uses the Batched Multi-Armed Bandit and Monte Carlo Simulation, which enables it to process large-scale business scenarios.
  • Uses a feedback loop to automatically collect recommendation metrics from user event logs and to feed them to the Multi-Armed Bandit Optimiser.
  • Uses an adaptive rollout method to automatically roll out the best model to the maximum distribution capacity according to the feedback metrics.

Architecture

The following diagram illustrates the system architecture.

System architecture
System architecture

 

The novel Multi-Armed Bandit System for Recommendation solution contains three building blocks.

  • Stream processing framework

A lightweight system that performs basic operations on Kafka Streams, such as aggregation, filtering, and mapping. The proposed solution relies on this framework to pre-process raw events published by mobile apps and backend processes into the proper format that can be fed into the feedback loop.

  • Feedback loop

A system that calculates the goal metrics and optimises the model traffic distribution. It runs a metrics server which pulls the data from Stalker, which is a time series database that stores the processed events in the last one hour. The metrics server invokes a Spark Job periodically to run the SQL queries that computes the pre-defined goal metrics: the Clickthrough Rate, Conversion Rate and so on, provided by users. The output of the job is dumped into an S3 bucket, and is picked up by optimiser runtime. It runs the Multi-Armed Bandit Optimiser to optimise the model traffic distribution based on the latest goal metrics.

  • Dynamic value receiver, or the GrabX variable

Multi-Armed Bandit Optimiser modules

The Multi-Armed Bandit Optimiser consists of the following modules:

  • Reward Update
  • Batched Multi-Armed Bandit Agent
  • Monte-Carlo Simulation
  • Adaptive Rollout
Multi-Armed Bandit Optimiser modules
Multi-Armed Bandit Optimiser modules

 

The goal of the Multi-Armed Bandit Optimisation is to find the optimal Arm that results in the best predefined metrics, and then allocate the maximum traffic to that Arm.

The solution can be illustrated in the following problem. For K Arm, in which the action space A={1,2,…,K}, the Multi-Arm-Bandit Optimiser goal is to solve the one-shot optimisation problem of Formula.

 

Reward Update module

The Reward Update module collects a batch of the metrics. It calculates the Success and Failure counts, then updates the Beta distribution of each Arm with the Batched Multi-Armed Bandit algorithm.

Multi-Armed Bandit Agent module

In the Multi-Armed Bandit Agent module, each Arm’s metrics are modelled as a Beta distribution which is sampled with Thompson Sampling. The Beta distribution formula is:
Formula.

 

The Batched Multi-Armed Bandit algorithm updates the Beta distribution with the batch metrics. The optimisation algorithm can be described in the following method.

Batched Multi-Armed Bandit algorithm
Batched Multi-Armed Bandit algorithm

 

Monte-Carlo Simulation module

The Monte-Carlo Simulation module runs the simulation for N repeated times to find the best Arm over a configurable simulation window. Then, it applies the simulated results as each Arm’s distribution percentage for the next round.

To handle different scenarios, we designed two strategies.

  • Max strategy: We count each Arm’s Success count’s result in Monte-Carlo Simulation, and then compute the next round distribution according to the success rate.
  • Mean strategy: We average each Arm’s Beta distribution probabilities’s result in Monte-Carlo Simulation, and then compute the next round distribution according to the averaged probabilities of each Arm.

Adaptive Rollout module

The Adaptive Rollout module rolls out the sampled distribution of each Multi-Armed Bandit Arm, in the form of Multi-Armed Bandit Arm Model ID and distribution, to the experimentation platform’s configuration variable. The resulting variable is then read from the online service. The process repeats as it collects feedback from the Adaptive Rollout metrics’ results in the feedback loop.

Multi-Armed Bandit for Recommendation Solution

In the GrabFood Recommended for You widget, there are several food recommendation models that categorise lists of merchants. The choice of the model is controlled through experiments at rollout, and the results of the experiments are analysed offline. After the analysis, data scientists and product managers rectify the model choice based on the experiment results.

The Multi-Armed Bandit System for Recommendation solution improves the process by speeding up the feedback loop with the Multi-Armed Bandit system. Instead of depending on offline data which comes out at T+N, the solution responds to minute-level metrics, and adjusts the model faster.

This results in an optimal solution faster. The proposed Multi-Armed Bandit for Recommendation solution workflow is illustrated in the following diagram.

 Multi-Armed Bandit for Recommendation Solution Workflow
Multi-Armed Bandit for Recommendation solution workflow

 

Optimisation metrics

The GrabFood recommendation uses the Effective Conversion Rate metrics as the optimisation objective. The Effective Conversion Rate is defined as the total number of checkouts through the Recommended for You widget, divided by the total widget viewed and multiplied by the coverage rate.

The events of views, clicks, and checkouts are collected over a 30-minute aggregation window and the coverage. A request with a checkout is considered as a success event, while a non-converted request is considered as a failure event.

Multi-Armed Bandit strategy

With the Multi-Armed Bandit Optimiser, the Beta distribution is selected to model the Effective Conversion Rate. The use of the mean strategy in the Monte-Carlo Simulation results in a more stable distribution.

Rollout policy

The Multi-Armed Bandit Optimiser uses the eater ID as the unique entity, applies a policy and assigns different percentages of eaters to each model, based on computed distribution at the beginning of each loop.

Fallback logic

The Multi-Armed Bandit Optimiser first runs model validation to ensure all candidates are suitable for rolling out. If the scheduled MAB job fails, it falls back to a default distribution that is set to 50-50% for each model.

Join us

Grab is more than just the leading ride-hailing and mobile payments platform in Southeast Asia. We use data and technology to improve everything from transportation to payments and financial services across a region of more than 620 million people. We aspire to unlock the true potential of Southeast Asia and look for like-minded individuals to join us on this ride.

If you share our vision of driving South East Asia forward, apply to join our team today.

Amazon Managed Grafana Is Now Generally Available with Many New Features

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/amazon-managed-grafana-is-now-generally-available-with-many-new-features/

In December, we introduced the preview of Amazon Managed Grafana, a fully managed service developed in collaboration with Grafana Labs that makes it easy to use the open-source and the enterprise versions of Grafana to visualize and analyze your data from multiple sources. With Amazon Managed Grafana, you can analyze your metrics, logs, and traces without having to provision servers, or configure and update software.

During the preview, Amazon Managed Grafana was updated with new capabilities. Today, I am happy to announce that Amazon Managed Grafana is now generally available with additional new features:

  • Grafana has been upgraded to version 8 and offers new data sources, visualizations, and features, including library panels that you can build once and re-use on multiple dashboards, a Prometheus metrics browser to quickly find and query metrics, and new state timeline and status history visualizations.
  • To centralize the querying of additional data sources within an Amazon Managed Grafana workspace, you can now query data using the JSON data source plugin. You can now also query Redis, SAP HANA, Salesforce, ServiceNow, Atlassian Jira, and many more data sources.
  • You can use Grafana API keys to publish your own dashboards or give programmatic access to your Grafana workspace. For example, this is a Terraform recipe that you can use to add data sources and dashboards.
  • You can enable single sign-on to your Amazon Managed Grafana workspaces using Security Assertion Markup Language 2.0 (SAML 2.0). We have worked with these identity providers (IdP) to have them integrated at launch: CyberArk, Okta, OneLogin, Ping Identity, and Azure Active Directory.
  • All calls from the Amazon Managed Grafana console and code calls to Amazon Managed Grafana API operations are captured by AWS CloudTrail. In this way, you can have a record of actions taken in Amazon Managed Grafana by a user, role, or AWS service. Additionally, you can now audit mutating changes that occur in your Amazon Managed Grafana workspace, such as when a dashboard is deleted or data source permissions are changed.
  • The service is available in ten AWS Regions (full list at the end of the post).

Let’s do a quick walkthrough to see how this works in practice.

Using Amazon Managed Grafana
In the Amazon Managed Grafana console, I choose Create workspace. A workspace is a logically isolated, highly available Grafana server. I enter a name and a description for the workspace, and then choose Next.

Console screenshot.

I can use AWS Single Sign-On (AWS SSO) or an external identity provider via SAML to authenticate the users of my workspace. For simplicity, I select AWS SSO. Later in the post, I’ll show how SAML authentication works. If this is your first time using AWS SSO, you can see the prerequisites (such as having AWS Organizations set up) in the documentation.

Console screenshot.

Then, I choose the Service managed permission type. In this way, Amazon Managed Grafana will automatically provision the necessary IAM permissions to access the AWS Services that I select in the next step.

Console screenshot.

In Service managed permission settings, I choose to monitor resources in my current AWS account. If you use AWS Organizations to centrally manage your AWS environment, you can use Grafana to monitor resources in your organizational units (OUs).

Console screenshot.

I can optionally select the AWS data sources that I am planning to use. This configuration creates an AWS Identity and Access Management (IAM) role that enables Amazon Managed Grafana to access those resources in my account. Later, in the Grafana console, I can set up the selected services as data sources. For now, I select Amazon CloudWatch so that I can quickly visualize CloudWatch metrics in my Grafana dashboards.

Here I also configure permissions to use Amazon Managed Service for Prometheus (AMP) as a data source and have a fully managed monitoring solution for my applications. For example, I can collect Prometheus metrics from Amazon Elastic Kubernetes Service (EKS) and Amazon Elastic Container Service (Amazon ECS) environments, using AWS Distro for OpenTelemetry or Prometheus servers as collection agents.

Console screenshot.

In this step I also select Amazon Simple Notification Service (SNS) as a notification channel. Similar to the data sources before, this option gives Amazon Managed Grafana access to SNS but does not set up the notification channel. I can do that later in the Grafana console. Specifically, this setting adds SNS publish permissions to topics that start with grafana to the IAM role created by the Amazon Managed Grafana console. If you prefer to have tighter control on permissions for SNS or any data source, you can edit the role in the IAM console or use customer-managed permissions for your workspace.

Finally, I review all the options and create the workspace.

After a few minutes, the workspace is ready, and I find the workspace URL that I can use to access the Grafana console.

Console screenshot.

I need to assign at least one user or group to the Grafana workspace to be able to access the workspace URL. I choose Assign new user or group and then select one of my AWS SSO users.

Console screenshot.

By default, the user is assigned a Viewer user type and has view-only access to the workspace. To give this user permissions to create and manage dashboards and alerts, I select the user and then choose Make admin.

Console screenshot.

Back to the workspace summary, I follow the workspace URL and sign in using my AWS SSO user credentials. I am now using the open-source version of Grafana. If you are a Grafana user, everything is familiar. For my first configurations, I will focus on AWS data sources so I choose the AWS logo on the left vertical bar.

Console screenshot.

Here, I choose CloudWatch. Permissions are already set because I selected CloudWatch in the service-managed permission settings earlier. I select the default AWS Region and add the data source. I choose the CloudWatch data source and on the Dashboards tab, I find a few dashboards for AWS services such as Amazon Elastic Compute Cloud (Amazon EC2), Amazon Elastic Block Store (EBS), AWS Lambda, Amazon Relational Database Service (RDS), and CloudWatch Logs.

Console screenshot.

I import the AWS Lambda dashboard. I can now use Grafana to monitor invocations, errors, and throttles for Lambda functions in my account. I’ll save you the screenshot because I don’t have any interesting data in this Region.

Using SAML Authentication
If I don’t have AWS SSO enabled, I can authenticate users to the Amazon Managed Grafana workspace using an external identity provider (IdP) by selecting the SAML authentication option when I create the workspace. For existing workspaces, I can choose Setup SAML configuration in the workspace summary.

First, I have to provide the workspace ID and URL information to my IdP in order to generate IdP metadata for configuring this workspace.

Console screenshot.

After my IdP is configured, I import the IdP metadata by specifying a URL or copying and pasting to the editor.

Console screenshot.

Finally, I can map user permissions in my IdP to Grafana user permissions, such as specifying which users will have Administrator, Editor, and Viewer permissions in my Amazon Managed Grafana workspace.

Console screenshot.

Availability and Pricing
Amazon Managed Grafana is available today in ten AWS Regions: US East (N. Virginia), US East (Ohio), US West (Oregon), Europe (Ireland), Europe (Frankfurt), Europe (London), Asia Pacific (Singapore), Asia Pacific (Tokyo), Asia Pacific (Sydney), and Asia Pacific (Seoul). For more information, see the AWS Regional Services List.

With Amazon Managed Grafana, you pay for the active users per workspace each month. Grafana API keys used to publish dashboards are billed as an API user license per workspace each month. You can upgrade to Grafana Enterprise to have access to enterprise plugins, support, and on-demand training directly from Grafana Labs. For more information, see the Amazon Managed Grafana pricing page.

To learn more, you are invited to this webinar on Thursday, September 9 at 9:00 am PDT / 12:00 pm EDT / 6:00 pm CEST.

Start using Amazon Managed Grafana today to visualize and analyze your operational data at any scale.

Danilo

Превод от Balkan Free Media Initiative Българската телеграфна агенция функционира като бюро на ТАСС

Post Syndicated from Николай Марченко original https://bivol.bg/%D0%B1%D1%8A%D0%BB%D0%B3%D0%B0%D1%80%D1%81%D0%BA%D0%B0%D1%82%D0%B0-%D1%82%D0%B5%D0%BB%D0%B5%D0%B3%D1%80%D0%B0%D1%84%D0%BD%D0%B0-%D0%B0%D0%B3%D0%B5%D0%BD%D1%86%D0%B8%D1%8F-%D1%84%D1%83%D0%BD%D0%BA.html

сряда 1 септември 2021


Николай Марченко от “Биволъ” анализира за Balkan Free Media Initiative (BFMI) чрез няколко нашумели факти от световната хроника, отразени от Българската национална информационна агенция БТА. Националната новинарска агенция на България…

Кое не е така?

Post Syndicated from original http://www.gatchev.info/blog/?p=2376

Бе и аз преди мислех, дека туй със злите мафии е приказка некаква. Ама излезна, че не е, от мене да знаете! Има ги, и вършат секакви злини на човечеството. И най-големата от сички – де що има умен човек и кадърен, и полезен, затварят го в едни специални затвори. И там ги дрогират и им промиват мозъците, да забравят кои и какви са. И да престанат да помагат на хората. Видех го с очите си!

Като ме пенсионираха от ДАП-а, тръгнах да търсим да чистим некаде другаде, да добавям към пенсията. Лесно намерих – съвременен човек съм, имам туйто кампютър дома, внучката ми купи, да се чуваме и видиме по него с нея. Малко ми е криво на дъщерята, че стана безродничка, заряза хубавата ни родина и се дигна у Канада. Ама внучето вина нема. Ем и български понаучи покрай мене. Понекога кани съученици, превежда им какво си говориме, те много се радват. И за тех е хубаво, да видят българин и да научат колко е убава България… Та, по кампютъра намерих и обявата. Де да знам аз, че е за такъв затвор! Отидох, говорихме, назначиха ме.

Та, почнах аз работа у едно от отделенията на затвора. За шизо… шизо… абе се я забравям тая дума, сложна е. И там колко таланти и благодетели на човечеството видех затворени и измъчвани, думи немам!

Имаше за начало не един, а цели двама Исус Христосовци! Дошли да спасяват човечеството. На зор ще да е работата, щом Господ си праща синовете на по двойки. То единият беше дъщеря де, ама кои сме ние, че да ограничаваме Господа кого да праща? Пък и има логика, връзва се, като са двама! Те постоянно се караха кой е истинският, ама аз си мисля, че ще да са и двамата. Само един на тия пропаднали времена надали стига, прав е Господ да прати двама!

По-обикновените бяха още повече. Имаше един Никола Тесла – не знаех кой е, ама в кампютъра има един фейсбук, дето пише всичко за всичко. Та, там прочетох – ако не знаете, той е един уж сърбин, ама всъщност българин, дето избягал в Америка и там изобретил лъчите на смъртта и пренасянето на енергия по въздуха. Сигурно хамериканците са го хванали и са ни го върнали да го държим затворен, те така всички свестни хора ги хващат и затварят, и това го пише във фейсбука. Аз да си кажа, отначало не му повярвах. Във фейсбука пишеше кога е роден, връзваше се да е на към сто и педесе години, па наглед и трийсе нема. Ама той ми обясни, че бил изобретил и лъчи за вечна младост… Е такъв гений значи, в затвора го държат! И го тъпчат с дроги да му промият мозъка, да спре да помага на хората!

Имаше и един Нострадамус. Прероден – той като умирал, се прераждал, понеже било важно на света да има пророк. И за него прочетох у фейсбука кой е. Отначало не вервах, че пророкува – той да не е Ванга, я! Ама той ми обясни, че затова се бил преродил българин, понеже ние сме свещен народ и на нас е дадено да сме пророци. Аз пак не вервах, ама той само като ме изгледа, все едно мерка ми взима, и от раз позна, че съм с незавършено основно! И ми рече – затова не са могли да ти промият мозъка, чиче. На тия, дето са учили висши, са им налели у главите толкова лъжи, че никога не могат да видят истината. Ама неучилите не са промити и могат! Знаех си аз, че съм по-така от всякакви чантаджии с дипломи, ама пророк си требва да го обясни, че да можеш го разбра! И логиката му да видиш, и всичко.

Тия с Нобелови награди са сигурно къде десет. (Това са едни награди за най-велики учени, пише го във фейсбука.) Половината са наизобретили вечни двигатели, да освободят човечеството от петролната и машинната мафии. Ама били наивни, и мафиите ги хванали и ги назатворили там. (Официално му викат болница, за не помня какви заболявания беше, пак некаква сложна дума. Ама като ги гледам, болни вътре не видях. Лъжа на мафиите си е.) От другите един открил средство за безсмъртие, ама като почнал да го раздава на хората, мафиите го усетили и го затворили. И всички, на които го раздавал, още същия час ги или изтровили, или разболели лошо. Нали постоянно ръсят с кемтрейлсове от самолетите, в тях има едни чипове за управление от разстояние, и през тях… И други имаше, ама то можеш ли ги запомни всичките! Страдалци за доброто на човечеството!

Персонала го гледам – ходят дегизирани с престилки, все едно наистина са лекари. От по-младите може и да имаше един-двама истински, де. Младички, наивнички, мислят си, че реално помагат на някого. Горките… Ама по-дъртите до един си бяха мафиоти. Шефът примерно, уж професор бил, един с едни очила, постоянно само нарежда. Нема как да не е билдерберг, те нали са едни такива важни. Или поне илюминат – един от изобретателите вътре ми обясни по какво се познавало, че иска лампите все да са светнати. Главната сестра пък, каквато е усойница, нема как да не е рептил. (Това рептилите били некакви марсиански змии.) Като ме види, все – измий тука, що не си измел там… Не може ли човек да дремне малко бе? Мафиоти с мафиоти…

А, забравих – от затворниците има и един лекар. С две Нобелови награди. Казва се Хипократ. Та, той ми обясни за сегашната епидемия. Никакъв вирус нямало, микробите били измислици на мафиите. Хората умирали не от вируса, а само със него. Всъщност от кемтрейлсовете, дето ги ръсят Сорос и Бил Гейтс от самолетите. Това било тръгнало още от четиринайсети век, тогава измрели две трети от хората. Уж че имало голяма чума, ама всъщност никаква чума нямало, хората умирали само с нея, а не от нея, а от кемтрейлсовете. Тогава самолети нямало още, та ги ръсели от хеликоптери… Маските по начало били направени, за да запушват устата на хората, да не могат да говорят истината, ама после Иван Костов ги усъвършенствал, направил вируси и ги сложил в тях, и като си сложиш маската, се заразяваш и умираш. Затова толкова напират да карат децата в училище да носят маски – да изтребят българския народ, за да заселят тук на освободената земя талибани с чалми и ятагани, по заповед на Меркел. Хипократ го знаел със сигурност, бил виждал лично заповедта с подписите на Меркел и Костов. Затова и го затворили.

И ако знаете какво ги правят, горките! Те повечето си мълчат, имат достойнство хората, ама един не издържа и ми се оплака. Той е изобретил едно хапче, дето го слагаш в един резервоар чешмяна вода, и тя става на бензин, по-качествен от тоя по бензиностанциите. И цяла опаковка хапчета струва жълти стотинки. Получил за тях Нобелова награда и златен медал, по-голям от длан. Ама петролната мафия го хванала и го затворила тук, да не си изгуби печалбите. И го тъпчели с разни отрови, да забрави кой е и какъв е. А като минали две седмици и видели, че отровите не действат, една нощ му откраднали Нобеловия медал, докато спал. Подменили му го с ръждива капачка за буркан, да му се подиграят. И да го разколебаят… Мъчители проклети!

Цялата Вселена ни жали нас, и сума ти извънземни са се събрали да ни помагат. Избират си разни хора и говорят на тях и чрез тях, дават ни мъдрост и знание. А мафията ги хваща тия хора и също ги затваря. Има вътре четири или пет такива затворени. Трудно е да се каже точно, понеже единият е всъщност двама. И постоянно се кара със себе си, на кой от двамата какво са казали извънземните. Ама и двамата бяха съгласни, че единственото ни спасение е майчица Русия. Тя е най-свещената страна на цялата вселена, и двамата извънземни го били потвърдили. А пък разните еврогейове са педераси, и искат да направят и нас такива. Да приемем закон, че се разрешават само хомо… абе, педераски бракове. А ако нормални хора се срещат по любов, да им вземат децата и да ги дадат за осиновяване на черни норвежки педераси, те чакали на опашка и плащали много за бели деца. Особено ако са от свещен народ. Пиели им кръвта, за да станат и те свещени.

(Това последното ако знаете колко ме ядоса, думи немам. На децата да нямат свян да посягат! Още същата вечер го казах на внучката по кампютъра, и я предупредих да внимава. Тя беше с няколко други деца, съученици от класа. И им преведе и на тях, да са предупредени. Ако знаете колко се радваха всичките! Щерката по едно време влезе в стаята и ме чу, и нещо се понамуси, не разбрах защо. Ама внучката нещо й каза там на някакъв език, не знам кой, нали в Канада имат над сто езика, и щерката се успокои. Даже май я похвали, сигурно че предупреждава и другите деца, да знаят от какво да се пазят. И какви ужасии има по този свят. Че и у нас даже, нищо че сме свещена земя българска…)

А пък във фейсбука има някакви хора, дето казват, че не било така. Ами! Хубаво им го каза един там, професор академик, абе много умен човек. Че са платени агенти на мафията и взимат пари, за да лъжат хората, че микробите съществуват. И че станиолените шапки не пазели от пет-джито, и че гемеото нямало да ни изяде… Лъжци! А кажете сега, има ли нещо невярно от дето ви го разказах? Кое не е така?

FSF copyright handling: A basis for distribution, licensing and enforcement

Post Syndicated from original https://lwn.net/Articles/867944/rss

The Free Software Foundation (FSF) clarifies
the purpose of its copyright policies
and examines the impact of
potential alternatives.

For some GNU packages, the ones that are FSF-copyrighted, we ask contributors for two kinds of legal papers: copyright assignments, and employer copyright disclaimers. We drew up these policies working with lawyers in the 1980s, and they make possible our steady and continuing enforcement of the GNU General Public License (GPL).

These papers serve four different but related legal purposes, all of which help ensure that the GNU Project’s goals of freedom for the community are met.

Get started with the Amazon Redshift Data API

Post Syndicated from David Zhang original https://aws.amazon.com/blogs/big-data/get-started-with-the-amazon-redshift-data-api/

Amazon Redshift is a fast, scalable, secure, and fully managed cloud data warehouse that enables you to analyze your data at scale. Tens of thousands of customers use Amazon Redshift to process exabytes of data to power their analytical workloads.

The Amazon Redshift Data API is an Amazon Redshift feature that simplifies access to your Amazon Redshift data warehouse by removing the need to manage database drivers, connections, network configurations, data buffering, credentials, and more. You can run SQL statements using the AWS Software Development Kit (AWS SDK), which supports different languages such as C++, Go, Java, JavaScript, .Net, Node.js, PHP, Python, and Ruby.

Since you’re reading this post, you may also be interested in the following AWS Online Tech Talks video for more info:

With the Data API, you can programmatically access data in your Amazon Redshift cluster from different AWS services such as AWS Lambda, Amazon SageMaker notebooks, AWS Cloud9, and also your on-premises applications using the AWS SDK. This allows you to build cloud-native, containerized, serverless, web-based, and event-driven applications on the AWS Cloud.

The Data API also enables you to run analytical queries on Amazon Redshift’s native tables, external tables in your data lake via Amazon Redshift Spectrum, and also across Amazon Redshift clusters, which is known as data sharing. You can also perform federated queries with external data sources such as Amazon Aurora.

In an earlier, post, we shared in great detail on how you can use the Data API to interact with your Amazon Redshift data warehouse. In this post, we learn how to get started with the Data API in different languages and also discuss various use cases in which customers are using this to build modern applications combining modular, serverless, and event-driven architectures. The Data API was launched in September 2020, and thousands of our customers are already using it for a variety of use cases:

  • Extract, transform, and load (ETL) orchestration with AWS Step Functions
  • Access Amazon Redshift from applications
  • Access Amazon Redshift from SageMaker Jupyter notebooks
  • Access Amazon Redshift with REST endpoints
  • Event-driven extract, load, transformation
  • Event-driven web application design
  • Serverless data processing workflows

Key features of the Data API

In this section, we discuss the key features of the Data API.

Use different programming language of your choice

The Data API integrates with the AWS SDK to run queries. Therefore, you can use any language supported by the AWS SDK to build your application with it, such as C++, Go, Java, JavaScript, .NET, Node.js, PHP, Python, and Ruby.

Run individual or batch SQL statements

With the Data API, you can run individual queries from your application or submit a batch of SQL statements within a transaction, which is useful to simplify your workload.

Run SQL statements with parameters

With the Data API, you can run parameterized SQL queries, which brings the ability to write reusable code when developing ETL code by passing parameters into a SQL template instead of concatenating parameters into each query on their own. This also makes it easier to migrate code from existing applications that needs parameterization. In addition, parameterization also makes code secure by eliminating malicious SQL injection.

No drivers needed

With the Data API, you can interact with Amazon Redshift without having to configure JDBC or ODBC drivers. The Data API eliminates the need for configuring drivers and managing database connections. You can run SQL commands to your Amazon Redshift cluster by calling a Data API secured API endpoint.

No network or security group setup

The Data API doesn’t need a persistent connection with Amazon Redshift. Instead, it provides a secure HTTP endpoint, which you can use to run SQL statements. Therefore, you don’t need to set up and manage a VPC, security groups, and related infrastructure to access Amazon Redshift with the Data API.

No password management

The Data API provides two options to provide credentials:

  • IAM temporary credentials – With this approach, you only need to provide the username and AWS Identity and Access Management (IAM) GetClusterCredentials permission to access Amazon Redshift with no password. The Data API automatically handles retrieving your credentials using IAM temporary credentials.
  • AWS Secrets Manager secret – With this approach, you store your username and password credentials in an AWS Secrets Manager secret and allow the Data API to access those credentials from that secret.

You can also use the Data API when working with federated logins through IAM credentials. You don’t have to pass database credentials via API calls when using identity providers such as Okta, Azure Active Directory, or database credentials stored in Secrets Manager. If you’re using Lambda, the Data API provides a secure way to access your database without the additional overhead of launching Lambda functions in Amazon Virtual Private Cloud (Amazon VPC).

Asynchronous

The Data API is asynchronous. You can run long-running queries without having to wait for it to complete, which is key in developing a serverless, microservices-based architecture. Each query results in a query ID, and you can use this ID to check the status and response of the query. In addition, query results are stored for 24 hours.

Event notifications

You can monitor Data API events in Amazon EventBridge, which delivers a stream of real-time data from your source application to targets such as Lambda. This option is available when you’re running your SQL statements in the Data API using the WithEvent parameter set to true. When a query is complete, the Data API can automatically send event notifications to EventBridge, which you may use to take further actions. This helps you design event-driven applications with Amazon Redshift. For more information, see Monitoring events for the Amazon Redshift Data API in Amazon EventBridge.

Get started with the Data API

It’s easy to get started with the Data API using the AWS SDK. You can use the Data API to run your queries on Amazon Redshift using different languages such as C++, Go, Java, JavaScript, .Net, Node.js, PHP, Python and Ruby.

All API calls from different programming languages follow similar parameter signatures. For instance, you can run the ExecuteStatement API to run individual SQL statements in the AWS Command Line Interface (AWS CLI) or different languages such as Python and JavaScript (NodeJS).

The following code is an example using the AWS CLI:

aws redshift-data execute-statement
--cluster-identifier my-redshift-cluster
--database dev
--db-user awsuser
--sql 'select data from redshift_table’;

The following example code uses Python:

boto3.client("redshift-data").execute_statement(
	ClusterIdentifier = ‘my-redshift-cluster’,
	Database = ‘dev’,
	DbUser = ‘awsuser’,
	Sql = 'select data from redshift_table’)

The following code uses JavaScript (NodeJS):

new AWS.RedshiftData().executeStatement({
	ClusterIdentifier: 'my-redshift-cluster',
	Database: 'dev',
	DbUser: 'awsuser',
	Sql: 'select data from redshift_table'
}).promise();

We have also published a GitHub repository showcasing how to get started with the Data API in different languages such as Go, Java, JavaScript, Python, and TypeScript. You may go through the step-by-step process explained in the repository to build your custom application in all these languages using the Data API.

Use cases with the Data API

You can use the Data API to modernize and simplify your application architectures by creating modular, serverless, event-driven applications with Amazon Redshift. In this section, we discuss some common use cases.

ETL orchestration with Step Functions

When performing ETL workflows, you have to complete a number of steps. As your business scales, the steps and dependencies often become complex and difficult to manage. With the Data API and Step Functions, you can easily orchestrate complex ETL workflows. You can explore the following example use case and AWS CloudFormation template demonstrating ETL orchestration using the Data API and Step Functions.

Access Amazon Redshift from custom applications

If you’re designing your custom application in any programming language that is supported by the AWS SDK, the Data API simplifies data access from your applications, which may be an application hosted on Amazon Elastic Compute Cloud (Amazon EC2) or Amazon Elastic Container Service (Amazon ECS) and other compute services or a serverless application built with Lambda. You can explore an example use case and CloudFormation template showcasing how to easily work with the Data API from Amazon EC2 based applications.

Access Amazon Redshift from SageMaker Jupyter notebooks

SageMaker notebooks are very popular among the data science community to analyze and solve machine learning problems. The Data API makes it easy to access and visualize data from your Amazon Redshift data warehouse without troubleshooting issues on password management or VPC or network issues. You can learn more about this use case along with a CloudFormation template showcasing how to use the Data API to interact from a SageMaker Jupyter notebook.

Access Amazon Redshift with REST endpoints

With the AWS SDK, you can use the Data APIs to directly invoke them as REST API calls such as GET or POST methods. For more information, see REST for Redshift Data API.

Event-driven ELT

Event–driven applications are popular with many customers, where applications run in response to events. A primary benefit of this architecture is the decoupling of producer and consumer processes, which allows greater flexibility in application design and building decoupled processes. For more information, see Building an event-driven application with AWS Lambda and the Amazon Redshift Data API. The following CloudFormation template demonstrates the same.

Event-driven web application design

Similar to event-driven ELT applications, event-driven web applications are also becoming popular, especially if you want to avoid long-running database queries, which create bottlenecks for the application servers. For example, you may be running a web application that has a long-running database query taking a minute to complete. Instead of designing that web application with long-running API calls, you can use the Data API and Amazon API Gateway WebSockets, which creates a lightweight websocket connection with the browser and submits the query to Amazon Redshift using the Data API. When the query is finished, the Data API sends a notification to EventBridge about its completion. When the data is available in the Data API, it’s pushed back to this browser session and the end-user can view the dataset. You can explore an example use case along with a CloudFormation template showcasing how to build an event-driven web application using the Data API and API Gateway WebSockets.

Serverless data processing workflows

With the Data API, you can design a serverless data processing workflow, where you can design an end-to-end data processing pipeline orchestrated using serverless AWS components such as Lambda, EventBridge, and the Data API client. Typically, a data pipeline involves multiple steps, for example:

  1. Load raw sales and customer data to a data warehouse.
  2. Integrate and transform the raw data.
  3. Build summary tables or unload this data to a data lake so subsequent steps can consume this data.

The example use case Serverless Data Processing Workflow using Amazon Redshift Data Api demonstrates how to chain multiple Lambda functions in a decoupled fashion and build an end-to-end data pipeline. In that code sample, a Lambda function is run through a scheduled event that loads raw data from Amazon Simple Storage Service (Amazon S3) to Amazon Redshift. On its completion, the Data API generates an event that triggers an event rule in EventBridge to invoke another Lambda function that prepares and transforms raw data. When that process is complete, it generates another event triggering a third EventBridge rule to invoke another Lambda function and unloads the data to Amazon S3. The Data API enables you to chain this multi-step data pipeline in a decoupled fashion.

Conclusion

The Data API offers many additional benefits when integrating Amazon Redshift into your analytical workload. The Data API simplifies and modernizes current analytical workflows and custom applications. You can perform long-running queries without having to pause your application for the queries to complete. This enables you to build event-driven applications as well as fully serverless ETL pipelines.

The Data API functionalities are available in many different programming languages to suit your environment. As mentioned earlier, there are a wide variety of use cases and possibilities where you can use the Data API to improve your analytical workflow. To learn more, see Using the Amazon Redshift Data API.


About the Authors

David Zhang is an AWS Solutions Architect who helps customers design robust, scalable, and data-driven solutions across multiple industries. With a background in software engineering, David is an active leader and contributor to AWS open-source initiatives. He is passionate about solving real-world business problems and continuously strives to work from the customer’s perspective.

 

 

Bipin Pandey is a Data Architect at AWS. He loves to build data lake and analytics platform for his customers. He is passionate about automating and simplifying customer problems with the use of cloud solutions.

 

 

Manash Deb is a Senior Analytics Specialist Solutions Architect at AWS. He has worked on building end-to-end data-driven solutions in different database and data warehousing technologies for over 15 years. He loves to learn new technologies and solving, automating, and simplifying customer problems with easy-to-use cloud data solutions on AWS.

 

 

Bhanu Pittampally is Analytics Specialist Solutions Architect based out of Dallas. He specializes in building analytical solutions. His background is in data warehouse – architecture, development and administration. He is in data and analytical field for over 13 years. His Linkedin profile is here.

 

 

Chao Duan is a software development manager at Amazon Redshift, where he leads the development team focusing on enabling self-maintenance and self-tuning with comprehensive monitoring for Redshift. Chao is passionate about building high-availability, high-performance, and cost-effective database to empower customers with data-driven decision making.

 

 

Debu Panda, a Principal Product Manager at AWS, is an industry leader in analytics, application platform, and database technologies, and has more than 25 years of experience in the IT world.

[$] Cooperative package management for Python

Post Syndicated from original https://lwn.net/Articles/867657/rss

A longstanding tug-of-war between system package managers and Python’s own
installation mechanisms (primarily pip, but there are others) looks
on its way to being resolved—or at least regularized. PEP 668
(“Graceful cooperation between external and Python package
managers
“) has been created to provide ways for the two types of package installation
to
work together, rather than at cross-purposes at times.
Since many operating systems depend on Python tools, with package versions
that may differ from those of users’ Python applications, making them play together
nicely should result in more stable systems.

How to use ACM Private CA for enabling mTLS in AWS App Mesh

Post Syndicated from Raj Jain original https://aws.amazon.com/blogs/security/how-to-use-acm-private-ca-for-enabling-mtls-in-aws-app-mesh/

Securing east-west traffic in service meshes, such as AWS App Mesh, by using mutual Transport Layer Security (mTLS) adds an additional layer of defense beyond perimeter control. mTLS adds bidirectional peer-to-peer authentication on top of the one-way authentication in normal TLS. This is done by adding a client-side certificate during the TLS handshake, through which a client proves possession of the corresponding private key to the server, and as a result the server trusts the client. This prevents an arbitrary client from connecting to an App Mesh service, because the client wouldn’t possess a valid certificate.

In this blog post, you’ll learn how to enable mTLS in App Mesh by using certificates derived from AWS Certificate Manager Private Certificate Authority (ACM Private CA). You’ll also learn how to reuse AWS CloudFormation templates, which we make available through a companion open-source project, for configuring App Mesh and ACM Private CA.

You’ll first see how to derive server-side certificates from ACM Private CA into App Mesh internally by using the native integration between the two services. You’ll then see a method and code for installing client-side certificates issued from ACM Private CA into App Mesh; this method is needed because client-side certificates aren’t integrated natively.

You’ll learn how to use AWS Lambda to export a client-side certificate from ACM Private CA and store it in AWS Secrets Manager. You’ll then see Envoy proxies in App Mesh retrieve the certificate from Secrets Manager and use it in an mTLS handshake. The solution is designed to ensure confidentiality of the private key of a client-side certificate, in transit and at rest, as it moves from ACM to Envoy.

The solution described in this blog post simplifies and allows you to automate the configuration and operations of mTLS-enabled App Mesh deployments, because all of the certificates are derived from a single managed private public key infrastructure (PKI) service—ACM Private CA—eliminating the need to run your own private PKI. The solution uses Amazon Elastic Container Services (Amazon ECS) with AWS Fargate as the App Mesh hosting environment, although the design presented here can be applied to any compute environment that is supported by App Mesh.

Solution overview

ACM Private CA provides a highly available managed private PKI service that enables creation of private CA hierarchies—including root and subordinate CAs—without the investment and maintenance costs of operating your own private PKI service. The service allows you to choose among several CA key algorithms and key sizes and makes it easier for you to export and deploy private certificates anywhere by using API-based automation.

App Mesh is a service mesh that provides application-level networking across multiple types of compute infrastructure. It standardizes how your microservices communicate, giving you end-to-end visibility and helping to ensure transport security and high availability for your applications. In order to communicate securely between mesh endpoints, App Mesh directs the Envoy proxy instances that are running within the mesh to use one-way or mutual TLS.

TLS provides authentication, privacy, and data integrity between two communicating endpoints. The authentication in TLS communications is governed by the PKI system. The PKI system allows certificate authorities to issue certificates that are used by clients and servers to prove their identity. The authentication process in TLS happens by exchanging certificates via the TLS handshake protocol. By default, the TLS handshake protocol proves the identity of the server to the client by using X.509 certificates, while the authentication of the client to the server is left to the application layer. This is called one-way TLS. TLS also supports two-way authentication through mTLS. In mTLS, in addition to the one-way TLS server authentication with a certificate, a client presents its certificate and proves possession of the corresponding private key to a server during the TLS handshake.

Example application

The following sections describe one-way and mutual TLS integrations between App Mesh and ACM Private CA in the context of an example application. This example application exposes an API to external clients that returns a text string name of a color—for example, “yellow”. It’s an extension of the Color App that’s used to demonstrate several existing App Mesh examples.

The example application is comprised of two services running in App Mesh—ColorGateway and ColorTeller. An external client request enters the mesh through the ColorGateway service and is proxied to the ColorTeller service. The ColorTeller service responds back to the ColorGateway service with the name of a color. The ColorGateway service proxies the response to the external client. Figure 1 shows the basic design of the application.
 

Figure 1: App Mesh services in the Color App example application

Figure 1: App Mesh services in the Color App example application

The two services are mapped onto the following constructs in App Mesh:

  • ColorGateway is mapped as a Virtual gateway. A virtual gateway in App Mesh allows resources that are outside of a mesh to communicate to resources that are inside the mesh. A virtual gateway represents Envoy deployed by itself. In this example, the virtual gateway represents an Envoy proxy that is running as an Amazon ECS service. This Envoy proxy instance acts as a TLS client, since it initiates TLS connections to the Envoy proxy that is running in the ColorTeller service.
  • ColorTeller is mapped as a Virtual node. A virtual node in App Mesh acts as a logical pointer to a particular task group. In this example, the virtual node—ColorTeller—runs as another Amazon ECS service. The service runs two tasks—an Envoy proxy instance and a ColorTeller application instance. The Envoy proxy instance acts as a TLS server, receiving inbound TLS connections from ColorGateway.

Let’s review running the example application in one-way TLS mode. Although optional, starting with one-way TLS allows you to compare the two methods and establish how to look at certain Envoy proxy statistics to distinguish and verify one-way TLS versus mTLS connections.

For practice, you can deploy the example application project in your own AWS account and perform the steps described in your own test environment.

Note: In both the one-way TLS and mTLS descriptions in the following sections, we’re using a flat certificate hierarchy for demonstration purposes. The root CAs are issuing end-entity certificates. The AWS ACM Private CA best practices recommend that root CAs should only be used to issue certificates for intermediate CAs. When intermediate CAs are involved, your certificate chain has multiple certificates concatenated in it, but the mechanisms are the same as those described here.

One-way TLS in App Mesh using ACM Private CA

Because this is a one-way TLS authentication scenario, you need only one Private CA—ColorTeller—and issue one end-entity certificate from it that’s used as the server-side certificate for the ColorTeller virtual node.

Figure 2, following, shows the architecture for this setup, including notations and color codes for certificates and a step-by-step process that shows how the system is configured and functions. Because this architecture uses a server-side certificate only, you use the native integration between App Mesh and ACM Private CA and don’t need an external mechanism for certificate integration.
 

Figure 2: One-way TLS in App Mesh integrated with ACM Private CA

Figure 2: One-way TLS in App Mesh integrated with ACM Private CA

The steps in Figure 2 are:

Step 1: A Private CA instance—ColorTeller—is created in ACM Private CA. Next, an end-entity certificate is created and signed by the CA. This certificate is used as the server-side certificate in ColorTeller.

Step 2: The CloudFormation templates configure the ColorGateway to validate server certificates against the ColorTeller private CA certificate chain. As the App Mesh endpoints are starting up, the ColorTeller CA certificate trust chain is ingested into the ColorGateway Envoy instance. The TLS configuration for ColorGateway in App Mesh is shown in Figure 3.
 

Figure 3: One-way TLS configuration in the client policy of ColorGateway

Figure 3: One-way TLS configuration in the client policy of ColorGateway

Figure 3 shows that the client policy attributes for outbound transport connections for ColorGateway have been configured as follows:

  • Enforce TLS is set to Enforced. This enforces use of TLS while communicating with backends.
  • TLS validation method is set to AWS Certificate Manager Private Certificate Authority (ACM-PCA hosting). This instructs App Mesh to derive the certificate trust chain from ACM PCA.
  • Certificate is set to the Amazon Resource Name (ARN) of the ColorTeller Private CA, which is the identifier of the certificate trust chain in ACM PCA.

This configuration ensures that ColorGateway makes outbound TLS-only connections towards ColorTeller, extracts the CA trust chain from ACM-PCA, and validates the server certificate presented by the ColorTeller virtual node against the configured CA ARN.

Step 3: The CloudFormation templates configure the ColorTeller virtual node with the ColorTeller end-entity certificate ARN in ACM Private CA. While the App Mesh endpoints are started, the ColorTeller end-entity certificate is ingested into the ColorTeller Envoy instance.

The TLS configuration for the ColorTeller virtual node in App Mesh is shown in Figure 4.
 

Figure 4: One-way TLS configuration in the listener configuration of ColorTeller

Figure 4: One-way TLS configuration in the listener configuration of ColorTeller

Figure 4 shows that various TLS-related attributes are configured as follows:

  • Enable TLS termination is on.
  • Mode is set to Strict to limit connections to TLS only.
  • TLS Certificate method is set to ACM Certificate Manager (ACM) hosting as the source of the end-entity certificate.
  • Certificate is set to ARN of the ColorTeller end-entity certificate.

Note: Figure 4 shows an annotation where the certificate ARN has been superimposed by the cert icon in green color. This icon follows the color convention from Figure 2 and can help you relate how the individual resources are configured to construct the architecture shown in Figure 2. The cert shown (and the associated private key that is not shown) in the diagram is necessary for ColorTeller to run the TLS stack and serve the certificate. The exchange of this material happens over the internal communications between App Mesh and ACM Private CA.

Step 4: The ColorGateway service receives a request from an external client.

Step 5: This step includes multiple sub-steps:

  • The ColorGateway Envoy initiates a one-way TLS handshake towards the ColorTeller Envoy.
  • The ColorTeller Envoy presents its server-side certificate to the ColorGateway Envoy.
  • The ColorGateway Envoy validates the certificate against its configured CA trust chain—the ColorTeller CA trust chain—and the TLS handshake succeeds.

Verifying one-way TLS

To verify that a TLS connection was established and that it is one-way TLS authenticated, run the following command on your bastion host:

$ curl -s http://colorteller.mtls-ec2.svc.cluster.local:9901/stats |grep -E 'ssl.handshake|ssl.no_certificate'

listener.0.0.0.0_15000.ssl.handshake: 1
listener.0.0.0.0_15000.ssl.no_certificate: 1

This command queries the runtime statistics that are maintained in ColorTeller Envoy and filters the output for certain SSL-related counts. The count for ssl.handshake should be one. If the ssl.handshake count is more than one, that means there’s been more than one TLS handshake. The count for ssl.no_certificate should also be one, or equal to the count for ssl.handshake. The ssl.no_certificate count tracks the total successful TLS connections with no client certificate. Since this is a one-way TLS handshake that doesn’t involve client certificates, this count is the same as the count of ssl.handshake.

The preceding statistics verify that a TLS handshake was completed and the authentication was one-way, where the ColorGateway authenticated the ColorTeller but not vice-versa. You’ll see in the next section how the ssl.no_certificate count differs when mTLS is enabled.

Mutual TLS in App Mesh using ACM Private CA

In the one-way TLS discussion in the previous section, you saw that App Mesh and ACM Private CA integration works without needing external enhancements. You also saw that App Mesh retrieved the server-side end-entity certificate in ColorTeller and the root CA trust chain in ColorGateway from ACM Private CA internally, by using the native integration between the two services.

However, a native integration between App Mesh and ACM Private CA isn’t currently available for client-side certificates. Client-side certificates are necessary for mTLS. In this section, you’ll see how you can issue and export client-side certificates from ACM Private CA and ingest them into App Mesh.

The solution uses Lambda to export the client-side certificate from ACM Private CA and store it in Secrets Manager. The solution includes an enhanced startup script embedded in the Envoy image to retrieve the certificate from Secrets Manager and place it on the Envoy file system before the Envoy process is started. The Envoy process reads the certificate, loads it in memory, and uses it in the TLS stack for the client-side certificate exchange of the mTLS handshake.

The choice of Lambda is based on this being an ephemeral workflow that needs to run only during system configuration. You need a short-lived, runtime compute context that lets you run the logic for exporting certificates from ACM Private CA and store them in Secrets Manager. Because this compute doesn’t need to run beyond this step, Lambda is an ideal choice for hosting this logic, for cost and operational effectiveness.

The choice of Secrets Manager for storing certificates is based on the confidentiality requirements of the passphrase that is used for encrypting the private key (PKCS #8) of the certificate. You also need a higher throughput data store that can support secrets retrieval from large meshes. Secrets Manager supports a higher API rate limit than the API for exporting certificates from ACM Private CA, and thus serves as a high-throughput front end for ACM Private CA for serving certificates without compromising data confidentiality.

The resulting architecture is shown in Figure 5. The figure includes notations and color codes for certificates—such as root certificates, endpoint certificates, and private keys—and a step-by-step process showing how the system is configured, started, and functions at runtime. The example uses two CA hierarchies for ColorGateway and ColorTeller to demonstrate an mTLS setup where the client and server belong to separate CA hierarchies but trust each other’s CAs.
 

Figure 5: mTLS in App Mesh integrated with ACM Private CA

Figure 5: mTLS in App Mesh integrated with ACM Private CA

The numbered steps in Figure 5 are:

Step 1: A Private CA instance representing the ColorGateway trust hierarchy is created in ACM Private CA. Next, an end-entity certificate is created and signed by the CA, which is used as the client-side certificate in ColorGateway.

Step 2: Another Private CA instance representing the ColorTeller trust hierarchy is created in ACM Private CA. Next, an end-entity certificate is created and signed by the CA, which is used as the server-side certificate in ColorTeller.

Step 3: As part of running CloudFormation, the Lambda function is invoked. This Lambda function is responsible for exporting the client-side certificate from ACM Private CA and storing it in Secrets Manager. This function begins by requesting a random password from Secrets Manager. This random password is used as the passphrase for encrypting the private key inside ACM Private CA before it’s returned to the function. Generating a random password from Secrets Manager allows you to generate a random password with a specified complexity.

Step 4: The Lambda function issues an export certificate request to ACM, requesting the ColorGateway end-entity certificate. The request conveys the private key passphrase retrieved from Secrets Manager in the previous step so that ACM Private CA can use it to encrypt the private key that’s sent in the response.

Step 5: The ACM Private CA responds to the Lambda function. The response carries the following elements of the ColorGateway end-entity certificate.

{
  'Certificate': '..',
  'CertificateChain': '..',
  'PrivateKey': '..'
}   

Step 6: The Lambda function processes the response that is returned from ACM. It extracts individual fields in the JSON-formatted response and stores them in Secrets Manager. The Lambda function stores the following four values in Secrets Manager:

  • The ColorGateway endpoint certificate
  • The ColorGateway certificate trust chain, which contains the ColorGateway Private CA root certificate
  • The encrypted private key for the ColorGateway end-entity certificate
  • The passphrase that was used to encrypt the private key

Step 7: The App Mesh services—ColorGateway and ColorTeller—are started, which then start their Envoy proxy containers. A custom startup script embedded in the Envoy docker image fetches a certificate from Secrets Manager and places it on the Envoy file system.

Note: App Mesh publishes its own custom Envoy proxy Docker container image that ensures it is fully tested and patched with the latest vulnerability and performance patches. You’ll notice in the example source code that a custom Envoy image is built on top of the base image published by App Mesh. In this solution, we add an Envoy startup script and certain utilities such as AWS Command Line Interface (AWS CLI) and jq to help retrieve the certificate from Secrets Manager and place it on the Envoy file system during Envoy startup.

Step 8: The CloudFormation scripts configure the client policy for mTLS in ColorGateway in App Mesh, as shown in Figure 6. The following attributes are configured:

  • Provide client certificate is enabled. This ensures that the client certificate is exchanged as part of the mTLS handshake.
  • Certificate method is set to Local file hosting so that the certificate is read from the local file system.
  • Certificate chain is set to the path for the file that contains the ColorGateway certificate chain.
  • Private key is set to the path for the file that contains the private key for the ColorGateway certificate.
Figure 6: Client-side mTLS configuration in ColorGateway

Figure 6: Client-side mTLS configuration in ColorGateway

At the end of the custom Envoy startup script described in step 7, the core Envoy process in ColorGateway service is started. It retrieves the ColorTeller CA root certificate from ACM Private CA and configures it internally as a trusted CA. This retrieval happens due to native integration between App Mesh and ACM Private CA. This allows ColorGateway Envoy to validate the certificate presented by ColorTeller Envoy during the TLS handshake.

Step 9: The CloudFormation scripts configure the listener configuration for mTLS in ColorTeller in App Mesh, as shown in Figure 7. The following attributes are configured:

  • Require client certificate is enabled, which enforces mTLS.
  • Validation Method is set to Local file hosting, which causes Envoy to read the certificate from the local file system.
  • Certificate chain is set to the path for the file that contains the ColorGateway certificate chain.
Figure 7: Server-side mTLS configuration in ColorTeller

Figure 7: Server-side mTLS configuration in ColorTeller

At the end of the Envoy startup script described in step 7, the core Envoy process in ColorTeller service is started. It retrieves its own server-side end-entity certificate and corresponding private key from ACM Private CA. This retrieval happens internally, driven by the native integration between App Mesh and ACM Private CA. This allows ColorTeller Envoy to present its server-side certificate to ColorGateway Envoy during the TLS handshake.

The system startup concludes with this step, and the application is ready to process external client requests.

Step 10: The ColorGateway service receives a request from an external client.

Step 11: The ColorGateway Envoy initiates a TLS handshake with the ColorTeller Envoy. During the first half of the TLS handshake protocol, the ColorTeller Envoy presents its server-side certificate to the ColorGateway Envoy. The ColorGateway Envoy validates the certificate. Because the ColorGateway Envoy has been configured with the ColorTeller CA trust chain in step 8, the validation succeeds.

Step 12: During the second half of the TLS handshake, the ColorTeller Envoy requests the ColorGateway Envoy to provide its client-side certificate. This step is what distinguishes an mTLS exchange from a one-way TLS exchange.

The ColorGateway Envoy responds with its end-entity certificate that had been placed on its file system in step 7. The ColorTeller Envoy validates the received certificate with its CA trust chain, which contains the ColorGateway root CA that was placed on its file system (in step 7). The validation succeeds, and so an mTLS session is established.

Verifying mTLS

You can now verify that an mTLS exchange happened by running the following command on your bastion host.

$ curl -s http://colorteller.mtls-ec2.svc.cluster.local:9901/stats |grep -E 'ssl.handshake|ssl.no_certificate'

listener.0.0.0.0_15000.ssl.handshake: 1
listener.0.0.0.0_15000.ssl.no_certificate: 0

The count for ssl.handshake should be one. If the ssl.handshake count is more than one, that means that you’ve gone through more than one TLS handshake. It’s important to note that the count for ssl.no_certificate—the total successful TLS connections with no client certificate—is zero. This shows that mTLS configuration is working as expected. Recall that this count was one or higher—equal to the ssl.handshake count—in the previous section that described one-way TLS. The ssl.no_certificate count being zero indicates that this was an mTLS authenticated connection, where the ColorGateway authenticated the ColorTeller and vice-versa.

Certificate renewal

The ACM Private CA certificates that are imported into App Mesh are not eligible for managed renewal, so an external certificate renewal method is needed. This example solution uses an external renewal method as recommended in Renewing certificates in a private PKI that you can use in your own implementations.

The certificate renewal mechanism can be broken down into six steps, which are outlined in Figure 8.
 

Figure 8: Certificate renewal process in ACM Private CA and App Mesh on ECS integration

Figure 8: Certificate renewal process in ACM Private CA and App Mesh on ECS integration

Here are the steps illustrated in Figure 8:

Step 1: ACM generates an Amazon CloudWatch Events event when a certificate is close to expiring.

Step 2: CloudWatch triggers a Lambda function that is responsible for certificate renewal.

Step 3: The Lambda function renews the certificate in ACM and exports the new certificate by calling ACM APIs.

Step 4: The Lambda function writes the certificate to Secrets Manager.

Step 5: The Lambda function triggers a new service deployment in an Amazon ECS cluster. This will cause the ECS services to go through a graceful update process to acquire a renewed certificate.

Step 6: The Envoy processes in App Mesh fetch the client-side certificate from Secrets Manager using external integration, and the server-side certificate from ACM using native integration.

Conclusion

In this post, you learned a method for enabling mTLS authentication between App Mesh endpoints based on certificates issued by ACM Private CA. mTLS enhances security of App Mesh deployments due to its bidirectional authentication capability. While server-side certificates are integrated natively, you saw how to use Lambda and Secrets Manager to integrate client-side certificates externally. Because ACM Private CA certificates aren’t eligible for managed renewal, you also learned how to implement an external certificate renewal process.

This solution enhances your App Mesh security posture by simplifying configuration of mTLS-enabled App Mesh deployments. It achieves this because all mTLS certificate requirements are met by a single, managed private PKI service—ACM Private CA—which allows you to centrally manage certificates and eliminates the need to run your own private PKI.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the AWS Certificate Manager forum or contact AWS Support.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Raj Jain

Raj is an engineering leader at Amazon in the FinTech space. He is passionate about building SaaS applications for Amazon internal and external customers using AWS. He is currently working on an AI/ML application in the governance, risk and compliance domain. Raj is a published author in the Bell Labs Technical Journal, has authored 3 IETF standards, and holds 12 patents in internet telephony and applied cryptography. In his spare time, he enjoys the outdoors, cooking, reading, and travel.

Author

Nagmesh Kumar

Nagmesh is a Cloud Architect with the Worldwide Public Sector Professional Services team. He enjoys working with customers to design and implement well-architected solutions in the cloud. He was a researcher who stumbled into IT operations as a database administrator. After spending all day in the cloud, you can spot him in the wild with his family, reading, or gaming.

Cybersecurity in the Infrastructure Bill

Post Syndicated from Harley Geiger original https://blog.rapid7.com/2021/08/31/cybersecurity-in-the-infrastructure-bill/

Cybersecurity in the Infrastructure Bill

On August 10, 2021, the U.S. Senate passed the Infrastructure Investment and Jobs Act of 2021 (H.R.3684). The bill comes in at 2,700+ pages, provides for $1.2T in spending, and includes several cybersecurity items. We expect this legislation to become law around late September and do not expect significant changes to the content. This post provides highlights on cybersecurity from the legislation.

(Check out our joint letter calling for cybersecurity in infrastructure legislation here.)

Cybersecurity is a priority — that’s progress

Cybersecurity is essential to ensure modern infrastructure is safe, and Rapid7 commends Congress and the Administration for including cybersecurity in the Infrastructure Investment and Jobs Act. Rapid7 led industry calls to include cybersecurity in the bill, and we are encouraged that several priorities identified by industry are reflected in the text, such as cybersecurity-specific funding for state and local governments and the electrical grid.

On the other hand, cybersecurity will be competing with natural disasters and extreme weather for funding in many (not all) grants created under the bill. In addition, not all critical infrastructure sectors receive cybersecurity resources through the legislation, with healthcare being a notable exclusion. Congress should address these gaps in the upcoming budget reconciliation package.

What’s in the bill for infrastructure cybersecurity

Below is a brief-ish summary of cybersecurity-related items in the bill. The infrastructure sectors with the most allocations appear to be energy, water, transportation, and state and local governments. Many of these funding opportunities take the form of federal grants for infrastructure resilience, which includes cybersecurity as well as natural hazards. Other funds are dedicated solely to cybersecurity.

Please note that this list aims to include major infrastructure cybersecurity funding items, but is not comprehensive. (For example, the bill also provides funding for the National Cyber Director.) Citations to the Senate-passed legislation are included.

  1. State and local governments: $1B over 4 years for the State, Local, Tribal, and Territorial (SLTT) Grant Program. This new grant program will help SLTT governments to develop or implement cybersecurity plans. FEMA will administer the program. This is also known as The State and Local Cybersecurity Improvement Act. [Sec. 70611]

  2. Energy: $250M over five years for the Rural and Municipal Utility Advanced Cybersecurity Grant and Technological Assistance Program. The Department of Energy (DOE) must create a new program to provide grants and technical assistance to improve electric utilities’ ability to detect, respond to, and recover from cybersecurity threats. [Sec. 40124]

  3. Energy: Enhanced grid security. The DOE must create a program to develop advanced cybersecurity applications and technologies for the energy sector, among other things. Over a period of five years, this section authorizes $250M for the Cybersecurity for the Energy Sector RD&D program, $50M for the Energy Sector Operational Support for Cyberresilience Program, and $50M for Modeling and Assessing Energy Infrastructure Risk. [Sec. 40125]

  4. Energy: State energy security plans. This creates federal financial and technical assistance for states to develop or implement an energy security plan that secures state energy infrastructure against cybersecurity threats, among other things. [Sec. 40108]

  5. Water: $250M over 5 years for the Midsize and Large Drinking Water System Infrastructure Resilience and Sustainability Program. This creates a new grant program to assist midsize and large drinking water systems with increasing resilience to cybersecurity vulnerabilities, as well as natural hazards. [Sec. 50107]

  6. Water: $175M over five years for technical assistance and grants for emergencies affecting public water systems. This extends an expired fund to help mitigate threats and emergencies to drinking water. This includes, among other things, emergency situations caused by a cybersecurity incident. [Sec. 50101]

  7. Water: $25M over five years for the Clean Water Infrastructure Resiliency and Sustainability Program. This creates a new program providing grants to owners/operators of publicly owned treatment works to increase the resiliency of water systems against cybersecurity vulnerabilities, as well as natural hazards. [Sec. 50205]

  8. Transportation: Cybersecurity eligible for National Highway Performance Program (NHPP). This expands on the existing NHPP grant program to allow states to use funds for resiliency of the National Highway System. "Resiliency" includes cybersecurity, as well as natural hazards. [Sec. 11105]

  9. Transportation: Cybersecurity eligible for Surface Transportation Block Grant Program. This expands the existing grant program to allow funding measures to protect transportation facilities from cybersecurity threats, among other things. [Sec. 11109]

  10. General: $100M over five years for the Cyber Response and Recovery Fund. This creates a fund for CISA to provide direct support to public or private entities that respond and recover from cyberattacks and breaches designated as a “significant incident.” The support can include technical assistance and response activities, such as vulnerability assessment, threat detection, network protection, and more. The program ends in 2028. [Sec. 70602, Div. J]

Other sectors next?

These cybersecurity items are significant down payments to safeguard the nation’s investment in infrastructure modernization. Combined with the recent Executive Order and memorandum on industrial control systems security, the Biden Administration is demonstrating that cybersecurity is a high priority.

However, more work must be done to address cybersecurity weaknesses in critical infrastructure. While the Infrastructure Investment and Jobs Act provides cybersecurity resources for some sectors, most of the 16 critical infrastructure sectors are excluded. Healthcare is an especially notable example, as the sector faces a serious ransomware problem in the middle of a deadly pandemic.

Congress is now preparing a larger budget reconciliation bill, to be advanced at roughly the same time as the infrastructure legislation. We encourage Congress and the Administration to take this opportunity to boost cybersecurity for other sectors, especially healthcare. As with the infrastructure bill, we suggest providing grants dedicated to cybersecurity, and requiring that grant funds be used to adopt or implement standards-based security safeguards and risk management practices.

Congress’ activity during the COVID-19 crisis continues to be punctuated by large, ambitious bills. To secure the modern economy and essential services, we hope the Infrastructure Investment and Jobs Act sets a precedent that sound cybersecurity policies will be integrated into transformative legislation to come.

Field Notes: How to Boost Your Search Results Using Relevance Tuning with Amazon Kendra

Post Syndicated from Sam Palani original https://aws.amazon.com/blogs/architecture/field-notes-how-to-boost-your-search-results-using-amazon-kendra-relevance-tuning/

One challenge enterprises face when they implement an intelligent search solution for their large data sources, is the ability to quickly provide relevant search results. When working with large data sources, not all features or attributes within your data will be equally relevant to all your users. We want to prioritize identifying and boosting specific attributes for your users to provide the most relevant search results.

Relevance in Amazon Kendra tuning allows you to give a boost to a result in the response when the query includes terms that match the attribute. For example, you might have two similar documents but one is created more recently. A good practice is to boost the relevance for the newer (or earlier) document.

Relevance tuning in Amazon Kendra can be performed manually at the Index level or at the query level. In this blog post, we show how to tune an existing index that is connected to external data sources, and ultimately optimize your internal search results.

Solution overview

We will walk through how you can manually tune your index using boosting techniques to achieve the best results. This enables you to prioritize the results from a specific data source so your users get the most relevant results when they perform searches.

Figure 1. Amazon Kendra setup

Figure 1 illustrates a standard Amazon Kendra setup. An Amazon Kendra index is connected to different Amazon Simple Storage Service (Amazon S3) buckets with multiple data sources.

There are two types of user personas. The first persona is administrators who are responsible for managing the index and performing administrative tasks such as access control, index tuning, and so forth. The second persona is users who access Amazon Kendra either directly or through a custom application that can make API search requests on an Amazon Kendra index. You can use relevance tuning to boost results from one of these data sources to provide a more relevant search result.

Prerequisites

This solution requires the following:

If you do not have these prerequisites set up, you might check out Create and query an index that walks you through how to create and query an index in Amazon Kendra.

Furthermore, the AWS services you use in this tutorial are within the AWS Free Tier under a 30-day trial.

Step 1 – Check facet definition

First, review your facet definition and confirm it is facetable, displayable, searchable, and sortable.

In the Amazon Kendra console, select your Amazon Kendra index, then select Facet definition in the Data management panel. Confirm that _data_source_id has all of its attributes checked.

Figure 2. Check facet definition

Step 2 – Review data sources

Next, verify that you have at least two data sources for your Amazon Kendra index.

In the Amazon Kendra console, select your Amazon Kendra index, and then select Data sources in the Data management panel. Confirm that your data sources are correctly synced and available. In our example, data-source-2 is an earlier version and contains unprocessed documents compared to sample-datasource that has newer versions and has more relevant content.

Figure 3. Verify data sources

Step 3 – Perform a regular Amazon Kendra search

Next, we will test a regular search without any relevance tuning. Select Search console, and enter the search term Amazon Kendra VPC. Review your search results.

Figure 4. Regular Amazon Kendra search

In our example search results, the document from the second data source 39_kendra-dg_kendra-dg appears as the third result.

Step 4 – Relevance tuning through boosting

Now we will boost the first data source so documents from the first data source are displayed ahead of the other data sources.

Select data source, and boost the first data source sample-datasource to 8. Press the Save button to save your tuning. Wait several seconds for the changes to propagate.

Figure 5. Boost sample-datasource

Step 5 – Perform the search after boosting

Next, we will test the search with relevance tuning applied. In the search text box enter the search term Amazon Kendra VPC. Review your search results.

Figure 6. Searching after boosting

Notice that the search result no longer contains the document from the second data source.

Cleaning up

To avoid incurring any future charges, remove any index created specifically for this tutorial. In the Amazon Kendra console, select your index. Then select Actions, and choose Delete.

Figure 7. Delete index

Conclusion

In this blog post, we showed you how relevance tuning can be used to produce results ranked by their relevance. We also walked you through an example regarding how to manually perform relevance tuning at the index level in Amazon Kendra to boost your search results.

In addition to relevance tuning at the index level, you can also perform relevance tuning at the query level. Finally, check out the What is Amazon Kendra? and Relevance tuning with Amazon Kendra blog posts to learn more.

Realtime preemption locking core merged

Post Syndicated from original https://lwn.net/Articles/867919/rss

The 5.15 merge window is off to a fast start; stay tuned for our usual full
summary. It is worth mentioning, though, that the realtime preemption
locking code has been pulled into the
mainline
with little fanfare. This
work began in 2004 and has fundamentally
changed many parts of the core kernel. With this pull, the sleepable locks
that make deterministic realtime response possible have finally joined all
of that other work (though the kernel must be built with the
REALTIME configuration option to use them).

Congratulations are due to all of the realtime developers who pushed this
project forward for nearly two decades.

Security updates for Tuesday

Post Syndicated from original https://lwn.net/Articles/867917/rss

Security updates have been issued by CentOS (libsndfile and libX11), Debian (ledgersmb, libssh, and postgresql-9.6), Fedora (squashfs-tools), openSUSE (389-ds, nodejs12, php7, spectre-meltdown-checker, and thunderbird), Oracle (kernel, libsndfile, and libX11), Red Hat (bind, cloud-init, edk2, glibc, hivex, kernel, kernel-rt, kpatch-patch, microcode_ctl, python3, and sssd), SUSE (bind, mysql-connector-java, nodejs12, sssd, and thunderbird), and Ubuntu (apr, squashfs-tools, thunderbird, and uwsgi).

Building well-architected serverless applications: Optimizing application performance – part 3

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/building-well-architected-serverless-applications-optimizing-application-performance-part-3/

This series of blog posts uses the AWS Well-Architected Tool with the Serverless Lens to help customers build and operate applications using best practices. In each post, I address the serverless-specific questions identified by the Serverless Lens along with the recommended best practices. See the introduction post for a table of contents and explanation of the example application.

PERF 1. Optimizing your serverless application’s performance

This post continues part 2 of this security question. Previously, I look at designing your function to take advantage of concurrency via asynchronous and stream-based invocations. I cover measuring, evaluating, and selecting optimal capacity units.

Best practice: Integrate with managed services directly over functions when possible

Consider using native integrations between managed services as opposed to AWS Lambda functions when no custom logic or data transformation is required. This can enable optimal performance, requires less resources to manage, and increases security. There are also a number of AWS application integration services that enable communication between decoupled components with microservices.

Use native cloud services integration

When using Amazon API Gateway APIs, you can use the AWS integration type to connect to other AWS services natively. With this integration type, API Gateway uses Apache Velocity Template Language (VTL) and HTTPS to directly integrate with other AWS services.

Timeouts and errors must be managed by the API consumer. For more information on using VTL, see “Amazon API Gateway Apache Velocity Template Reference”. For an example application that uses API Gateway to read and write directly to/from Amazon DynamoDB, see “Building a serverless URL shortener app without AWS Lambda”.

API Gateway direct service integration

API Gateway direct service integration

There is also a tutorial available, Build an API Gateway REST API with AWS integration.

When using AWS AppSync, you can use VTL, direct integration with Amazon Aurora, Amazon Elasticsearch Service, and any publicly available HTTP endpoint. AWS AppSync can use multiple integration types and can maximize throughput at the data field level. For example, you can run full-text searches on the orderDescription field against Elasticsearch while fetching the remaining data from DynamoDB. For more information, see the AWS AppSync resolver tutorials.

In the serverless airline example used in this series, the catalog service uses AWS AppSync to provide a GraphQL API for searching flights. AWS AppSync uses DynamoDB as a database, and all compute logic is contained in the Apache Velocity Template (VTL).

Serverless airline catalog service using VTL

Serverless airline catalog service using VTL

AWS Step Functions integrates with multiple AWS services using service Integrations. For example, this allows you to fetch and put data into DynamoDB, or run an AWS Batch job. You can also publish messages to Amazon Simple Notification Service (SNS) topics, and send messages to Amazon Simple Queue Service (SQS) queues. For more details on the available integrations, see “Using AWS Step Functions with other services”.

Using Amazon EventBridge, you can connect your applications with data from a variety of sources. You can connect to various AWS services natively, and act as an event bus across multiple AWS accounts to ease integration. You can also use the API destination feature to route events to services outside of AWS. EventBridge handles the authentication, retries, and throughput. For more details on available EventBridge targets, see the documentation.

Amazon EventBridge

Amazon EventBridge

Good practice: Optimize access patterns and apply caching where applicable

Consider caching when clients may not require up to date data. Optimize access patterns to only fetch data that is necessary to end users. This improves the overall responsiveness of your workload and makes more efficient use of compute and data resources across components.

Implement caching for suitable access patterns

For REST APIs, you can use API Gateway caching to reduce the number of calls made to your endpoint and also improve the latency of requests to your API. When you enable caching for a stage or method, API Gateway caches responses for a specified time-to-live (TTL) period. API Gateway then responds to the request by looking up the endpoint response from the cache, instead of making a request to your endpoint.

API Gateway caching

API Gateway caching

For more information, see “Enabling API caching to enhance responsiveness”.

For geographically distributed clients, Amazon CloudFront or your third-party CDN can cache results at the edge and further reducing network round-trip latency.

For GraphQL APIs, AWS AppSync provides built-in server-side caching at the API level. This reduces the need to access data sources directly by making data available in a high-speed in-memory cache. This improves performance and decreases latency. For queries with common arguments or a restricted set of arguments, you can also enable caching at the resolver level to improve overall responsiveness. For more information, see “Improving GraphQL API performance and consistency with AWS AppSync Caching”.

When using databases, cache results and only connect to and fetch data when needed. This reduces the load on the downstream database and improves performance. Include a caching expiration mechanism to prevent serving stale records. For more information on caching implementation patterns and considerations, see “Caching Best Practices”.

For DynamoDB, you can enable caching with Amazon DynamoDB Accelerator (DAX). DAX enables you to benefit from fast in-memory read performance in microseconds, rather than milliseconds. DAX is suitable for use cases that may not require strongly consistent reads. Some examples include real-time bidding, social gaming, and trading applications. For more information, read “Use cases for DAX“.

For general caching purposes, Amazon ElastiCache provides a distributed in-memory data store or cache environment. ElastiCache supports a variety of caching patterns through key-value stores using the Redis and Memcache engines. Define what is safe to cache, even when using popular caching patterns like lazy caching or write-through. Set a TTL and eviction policy that fits your baseline performance and access patterns. This ensures that you don’t serve stale records or cache data that should have a strongly consistent read. For more information on ElastiCache caching and time-to-live strategies, see the documentation.

For additional serverless caching suggestions, see the AWS Serverless Hero blog post “All you need to know about caching for serverless applications”.

Reduce overfetching and underfetching

Over-fetching is when a client downloads too much data from a database or endpoint. This results in data in the response that you don’t use. Under-fetching is not having enough data in the response. The client then needs to make additional requests to receive the data. Overfetching and underfetching can both affect performance.

To fetch a collection of items from a DynamoDB table, you can perform a query or a scan. A scan operation always scans the entire table or secondary index. It then filters out values to provide the result you want, essentially adding the extra step of removing data from the result set. A query operation finds items directly based on primary key values.

For faster response times, design your tables and indexes so that your applications can use query instead of scan. Use both Global Secondary Index (GSI) in addition to composite sort keys to help you query hierarchical relationships in your data. For more information, see “Best Practices for Querying and Scanning Data”.

Consider GraphQL and AWS AppSync for interactive web applications, mobile, real-time, or for use cases where data drives the user interface. AWS AppSync provides data fetching flexibility, which allows your client to query only for the data it needs, in the format it needs it. Ensure you do not make too many nested queries where a long response may result in timeouts. GraphQL helps you adapt access patterns as your workload evolves. This makes it more flexible as it allows you to move to purpose-built databases if necessary.

Compress payload and data storage

Some AWS services allow you to compress the payload or compress data storage. This can improve performance by sending and receiving less data, and can save on data storage, which can also reduce costs.

If your content supports deflate, gzip or identity content encoding, API Gateway allows your client to call your API with compressed payloads. By default, API Gateway supports decompression of the method request payload. However, you must configure your API to enable compression of the method response payload. Compression in API Gateway and decompression in the client might increase overall latency and require more computing times. Run test cases against your API to determine an optimal value. For more information, see “Enabling payload compression for an API”.

Amazon Kinesis Data Firehose supports compressing streaming data using gzip, snappy, or zip. This minimizes the amount of storage used at the destination. The Amazon Kinesis Data Firehose FAQs has more information on compression. Kinesis Data Firehose also supports converting your streaming data from JSON to Apache Parquet or Apache ORC before storing the data in Amazon S3. Parquet and ORC are columnar data formats that save space and enable faster queries compared to row-oriented formats like JSON.

Conclusion

Evaluate and optimize your serverless application’s performance based on access patterns, scaling mechanisms, and native integrations. You can improve your overall experience and make more efficient use of the platform in terms of both value and resources.

In part 1, I cover measuring and optimizing function startup time. I explain cold and warm starts and how to reuse the Lambda execution environment to improve performance. I explain how only importing necessary libraries and dependencies increases application performance.

In part 2, I look at designing your function to take advantage of concurrency via asynchronous and stream-based invocations. I cover measuring, evaluating, and selecting optimal capacity units.

In this post, I look at integrating with managed services directly over functions when possible. I cover optimizing access patterns and applying caching where applicable.

In the next post in the series, I cover the cost optimization pillar from the Well-Architected Serverless Lens.

For more serverless learning resources, visit Serverless Land.

The EPYC journey continues to Rome in Cloudflare’s 11th generation Edge Server

Post Syndicated from Chris Howells original https://blog.cloudflare.com/the-epyc-journey-continues-to-rome-in-cloudflares-11th-generation-edge-server/

The EPYC journey continues to Rome in Cloudflare’s 11th generation Edge Server

The EPYC journey continues to Rome in Cloudflare’s 11th generation Edge Server

When I was interviewing to join Cloudflare’s in 2014 as a member of the SRE team, we had just introduced our generation 4 server, and I was excited about the prospects. Since then, Cloudflare, the industry and I have all changed dramatically. The best thing about working for a rapidly growing company like Cloudflare is that as the company grows, new roles open up to enable career development. And so, having left the SRE team last year, I joined the recently formed hardware engineering team, a team that simply didn’t exist in 2014.

We aim to introduce a new server platform to our edge network every 12 to 18 months or so, to ensure that we keep up with the latest industry technologies and developments. We announced the generation 9 server in October 2018 and we announced the generation 10 server in February 2020. We consider this length of cycle optimal: short enough to stay nimble and take advantage of the latest technologies, but long enough to offset the time taken by our hardware engineers to test and validate the entire platform. When we are shipping servers to over 200 cities around the world with a variety of regulatory standards, it’s essential to get things right the first time.

We continually work with our silicon vendors to receive product roadmaps and stay on top of the latest technologies. Since mid-2020, the hardware engineering team at Cloudflare has been working on our generation 11 server.

Requests per Watt is one of our defining characteristics when testing new hardware and we use it to identify how much more efficient a new hardware generation is than the previous generation. We continually strive to reduce our operational costs and power consumption reduction is one of the most important parts of this. It’s good for the planet and we can fit more servers into a rack, reducing our physical footprint.

The design of these Generation 11 x86 servers has been in parallel with our efforts to design next-generation edge servers using the Ampere Altra ARM architecture. You can read more about our tests in a blog post by my colleague Sung and will document our work on Arm at the edge in a subsequent blog post.

We evaluated Intel’s latest generation of “Ice Lake” Xeon processors. Although Intel’s chips were able to compete with AMD in terms of raw performance, the power consumption was several hundred watts higher per server – that’s enormous. This meant that Intel’s Performance per Watt was unattractive.

We previously described how we had deployed AMD EPYC 7642’s processors in our generation 10 server. This has 48 cores and is based on AMD’s 2nd generation EPYC architecture, code named Rome. For our generation 11 server, we evaluated 48, 56 and 64 core samples based on AMD’s 3rd generation EPYC architecture, code named Milan. We were interested to find that comparing the two 48 core processors directly, we saw a performance boost of several percent in the 3rd generation EPYC architecture. We therefore had high hopes for the 56 core and 64 core chips.

So, based on the samples we received from our vendors and our subsequent testing, hardware from AMD and Ampere made the shortlist for our generation 11 server. On this occasion, we decided that Intel did not meet our requirements. However, it’s healthy that Intel and AMD compete and innovate in the x86 space and we look forward to seeing how Intel’s next generation shapes up.

The EPYC journey continues to Rome in Cloudflare’s 11th generation Edge Server

Testing and validation process

Before we go on to talk about the hardware, I’d like to say a few words about the testing process we went through to test out generation 11 servers.

As we elected to proceed with AMD chips, we were able to use our generation 10 servers as our Engineering Validation Test platform, with the only changes being the new silicon and updated firmware. We were able to perform these upgrades ourselves in our hardware validation lab.

Cloudflare’s network is built with cheap commodity hardware and we source the hardware from multiple vendors, known as ODMs (Original Design Manufacturer) who build the servers to our specifications.

When you are working with bleeding edge silicon and experimental firmware, not everything is plain sailing. We worked with one of our ODMs to eliminate an issue which was causing the Linux kernel to panic on boot. Once resolved, we used a variety of synthetic benchmarking tools to verify the performance including cf_benchmark, as well as an internal tool which applies a synthetic load to our entire software stack.

Once we were satisfied, we ordered Design Validation Test samples, which were manufactured by our ODMs with the new silicon. We continued to test these and iron out the inevitable issues that arise when you are developing custom hardware. To ensure that performance matched our expectations, we used synthetic benchmarking to test the new silicon. We also began testing it in our production environment by gradually introducing customer traffic to them as confidence grew.

Once the issues were resolved, we ordered the Product Validation Test samples, which were again manufactured by our ODMs, taking into account the feedback obtained in the DVT phase. As these are intended to be production grade, we work with the broader Cloudflare teams to deploy these units like a mass production order.

The EPYC journey continues to Rome in Cloudflare’s 11th generation Edge Server

CPU

Previously: AMD EPYC 7642 48-Core Processor
Now: AMD EPYC 7713 64-Core Processor

The EPYC journey continues to Rome in Cloudflare’s 11th generation Edge Server

AMD EPYC 7642 AMD EPYC 7643 AMD EPYC 7663 AMD EPYC 7713
Status Incumbent Candidate Candidate Candidate
Core Count 48 48 56 64
Thread Count 96 96 112 128
Base Clock 2.3GHz 2.3GHz 2.0GHz 2.0GHz
Max Boost Clock 3.3GHz 3.6GHz 3.5GHz 3.675GHz
Total L3 Cache 256MB 256MB 256MB 256MB
Default TDP 225W 225W 240W 225W
Configurable TDP 240W 240W 240W 240W

In the above chart, TDP refers to Thermal Design Power, a measure of the heat dissipated. All of the above processors have a configurable TDP – assuming the cooling solution is capable – giving more performance at the expense of increased power consumption. We tested all processors configured at their highest supported TDP.

The 64 core processors have 33% more cores than the 48 core processors so you might hypothesize that we would see a corresponding 33% increase in performance, although our benchmarks saw slightly more modest gains. This can be explained because the 64 core processors have lower base clock frequencies to fit within the same 225W power envelope.

In production testing, we found that the 64 core EPYC 7713 gave us around a 29% performance boost over the incumbent, whilst having similar power consumption and thermal properties.

The EPYC journey continues to Rome in Cloudflare’s 11th generation Edge Server

Memory

The EPYC journey continues to Rome in Cloudflare’s 11th generation Edge Server

Previously: 256GB DDR4-2933
Now: 384GB DDR4-3200

Having made a decision about the processor, the next step was to determine the optimal amount of memory for our workload. We ran a series of experiments with our chosen EPYC 7713 processor and 256GB, 384GB and 512MB memory configurations. We started off by running synthetic benchmarks with tools such as STREAM to ensure that none of the configurations performed unexpectedly poorly and to generate a baseline understanding of the performance.

After the synthetic benchmarks, we proceeded to test the various configurations with production workloads to empirically determine the optimal quantity. We use Prometheus and Grafana to gather and display a rich set of metrics from all of our servers so that we can monitor and spot trends, and we re-used the same infrastructure for our performance analysis.

As well as measuring available memory, previous experience has shown us that one of the best ways to ensure that we have enough memory is to observe request latency and disk IO performance. If there is insufficient memory, we expect to see request latency and disk IO volume and latency to increase. The reason for this is that our core HTTP server uses memory to cache web assets and if there is insufficient memory the assets will be ejected from memory prematurely and more assets will be fetched from disk instead of memory, degrading performance.

Like most things in life, it’s a balancing act. We want enough memory to take advantage of the fact that serving web assets directly from memory is much faster than even the best NVMe disks. We also want to future proof our platform to enable the new features such as the ones that we recently announced in security week and developer week. However, we don’t want to spend unnecessarily on excess memory that will never be used. We found that the 512GB configuration did not provide a performance boost to justify the extra cost and settled on the 384GB configuration.

We also tested the performance impact of switching from DDR4-2933 to DDR4-3200 memory. We found that it provided a performance boost of several percent and the pricing has improved to the point where it is cost beneficial to make the change.

Disk

Previously: 3x Samsung PM983 x 960GB
Now: 2x Samsung PM9A3 x 1.92TB

The EPYC journey continues to Rome in Cloudflare’s 11th generation Edge Server

We validated samples by studying the manufacturer’s data sheets and testing using fio to ensure that the results being obtained in our test environment were in line with the published specifications. We also developed an automation framework to help compare different drive models using fio. The framework helps us to restore the drives close to factory settings, precondition the drives, perform the sequential and random tests in our environment, and analyze the data results to evaluate the bandwidth and latency results. Since our SSD samples were arriving in our test center at different months, having an automated framework helped in dealing with speedy evaluations by reducing our time spent testing and doing analysis.

For Gen 11 we decided to move to a 2x 2TB configuration from the original 3x 1TB configuration giving us an extra 1 TB of storage. This also meant we could use the higher performance of a 2TB drive and save around 6W of power since there is one less SSD.

After analyzing the performances of various 2TB drives, their latencies and endurances, we chose Samsung’s PM9A3 SSDs as our Gen11 drives. The results we obtained below were consistent with the manufacturer’s claims.

Sequential performance:

The EPYC journey continues to Rome in Cloudflare’s 11th generation Edge Server
The EPYC journey continues to Rome in Cloudflare’s 11th generation Edge Server

Random Performance:

The EPYC journey continues to Rome in Cloudflare’s 11th generation Edge Server
The EPYC journey continues to Rome in Cloudflare’s 11th generation Edge Server

Compared to our previous generation drives, we could see a 1.5x – 2x improvement in read and write bandwidths. The higher values for the PM9A3 can be attributed to the fact that these are PCIe 4.0 drives, have more intelligent SSD controllers and an upgraded NAND architecture.

Network

Previously: Mellanox ConnectX-4 dual-port 25G
Now: Mellanox ConnectX-4 dual-port 25G

There is no change on the network front; the Mellanox ConnectX-4 is a solid performer which continues to meet our needs. We investigated higher speed Ethernet, but we do not currently see this as beneficial. Cloudflare’s network is built on cheap commodity hardware and the highly distributed nature of Cloudflare’s network means we don’t have discrete DDoS scrubbing centres. All points of presence operate as scrubbing centres. This means that we distribute the load across our entire network and do not need to employ higher speed and more expensive Ethernet devices.

Open source firmware

Transparency, security and integrity is absolutely critical to us at Cloudflare. Last year, we described how we had deployed Platform Secure Boot to create trust that we were running the software that we thought we were.

Now, we are pleased to announce that we are deploying open source firmware to our servers using OpenBMC. With access to the source code, we have been able to configure BMC features such as the fan PID controller, having BIOS POST codes recorded and accessible, and managing networking ports and devices. Prior to OpenBMC, requesting these features from our vendors led to varying results and misunderstandings of the scope and capabilities of the BMC. After working with the BMC source code much more directly, we have the flexibility to to work on features ourselves to our liking, or understand why the BMC is incapable of running our desired software.

Whilst our current BMC is an industry standard, we feel that OpenBMC better suits our needs and gives us advantages such as allowing us to deal with upstream security issues without a dependency on our vendors. Some opportunities with security include integration of desired authentication modules, usage of specific software packages, staying up to date with the latest Linux kernel, and controlling a variety of attack vectors. Because we have a kernel lockdown implemented, flashing tooling is difficult to use in our environment. With access to source code of the flashing tools, we have an understanding of what the tools need access to, and assess whether or not this meets our standard of security.

Summary

The EPYC journey continues to Rome in Cloudflare’s 11th generation Edge Server

The jump between our generation 9 and generation 10 servers was enormous. To summarise, we changed from a dual-socket Intel platform to a single socket AMD platform. We upgraded the SATA SSDs to NVMe storage devices, and physically the multi-node chassis changed to a 1U form factor.

At the start of the generation 11 project we weren’t sure if we would be making such radical changes again. However, after a thorough testing of the latest chips and a review of how well the generation 10 server has performed in production for over a year, our generation 11 server built upon the solid foundations of generation 10 and ended up as a refinement rather than total revamp. Despite this, and bearing in mind that performance varies by time of day and geography, we are pleased that generation 11 is capable of serving approximately 29% more requests than generation 10 without an increase in power consumption.

The EPYC journey continues to Rome in Cloudflare’s 11th generation Edge Server

Thanks to Denny Mathew and Ryan Chow’s work on benchmarking and OpenBMC, respectively.

If you are interested in working with bleeding edge hardware, open source server firmware, solving interesting problems, helping to improve our performance, and are interested in helping us work on our generation 12 server platform (amongst many other things!), we’re hiring.

The collective thoughts of the interwebz

By continuing to use the site, you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.

Close