Tag Archives: Uncategorized

Managing AWS Lambda Function Concurrency

Post Syndicated from Chris Munns original https://aws.amazon.com/blogs/compute/managing-aws-lambda-function-concurrency/

One of the key benefits of serverless applications is the ease in which they can scale to meet traffic demands or requests, with little to no need for capacity planning. In AWS Lambda, which is the core of the serverless platform at AWS, the unit of scale is a concurrent execution. This refers to the number of executions of your function code that are happening at any given time.

Thinking about concurrent executions as a unit of scale is a fairly unique concept. In this post, I dive deeper into this and talk about how you can make use of per function concurrency limits in Lambda.

Understanding concurrency in Lambda

Instead of diving right into the guts of how Lambda works, here’s an appetizing analogy: a magical pizza.
Yes, a magical pizza!

This magical pizza has some unique properties:

  • It has a fixed maximum number of slices, such as 8.
  • Slices automatically re-appear after they are consumed.
  • When you take a slice from the pizza, it does not re-appear until it has been completely consumed.
  • One person can take multiple slices at a time.
  • You can easily ask to have the number of slices increased, but they remain fixed at any point in time otherwise.

Now that the magical pizza’s properties are defined, here’s a hypothetical situation of some friends sharing this pizza.

Shawn, Kate, Daniela, Chuck, Ian and Avleen get together every Friday to share a pizza and catch up on their week. As there is just six of them, they can easily all enjoy a slice of pizza at a time. As they finish each slice, it re-appears in the pizza pan and they can take another slice again. Given the magical properties of their pizza, they can continue to eat all they want, but with two very important constraints:

  • If any of them take too many slices at once, the others may not get as much as they want.
  • If they take too many slices, they might also eat too much and get sick.

One particular week, some of the friends are hungrier than the rest, taking two slices at a time instead of just one. If more than two of them try to take two pieces at a time, this can cause contention for pizza slices. Some of them would wait hungry for the slices to re-appear. They could ask for a pizza with more slices, but then run the same risk again later if more hungry friends join than planned for.

What can they do?

If the friends agreed to accept a limit for the maximum number of slices they each eat concurrently, both of these issues are avoided. Some could have a maximum of 2 of the 8 slices, or other concurrency limits that were more or less. Just so long as they kept it at or under eight total slices to be eaten at one time. This would keep any from going hungry or eating too much. The six friends can happily enjoy their magical pizza without worry!

Concurrency in Lambda

Concurrency in Lambda actually works similarly to the magical pizza model. Each AWS Account has an overall AccountLimit value that is fixed at any point in time, but can be easily increased as needed, just like the count of slices in the pizza. As of May 2017, the default limit is 1000 “slices” of concurrency per AWS Region.

Also like the magical pizza, each concurrency “slice” can only be consumed individually one at a time. After consumption, it becomes available to be consumed again. Services invoking Lambda functions can consume multiple slices of concurrency at the same time, just like the group of friends can take multiple slices of the pizza.

Let’s take our example of the six friends and bring it back to AWS services that commonly invoke Lambda:

  • Amazon S3
  • Amazon Kinesis
  • Amazon DynamoDB
  • Amazon Cognito

In a single account with the default concurrency limit of 1000 concurrent executions, any of these four services could invoke enough functions to consume the entire limit or some part of it. Just like with the pizza example, there is the possibility for two issues to pop up:

  • One or more of these services could invoke enough functions to consume a majority of the available concurrency capacity. This could cause others to be starved for it, causing failed invocations.
  • A service could consume too much concurrent capacity and cause a downstream service or database to be overwhelmed, which could cause failed executions.

For Lambda functions that are launched in a VPC, you have the potential to consume the available IP addresses in a subnet or the maximum number of elastic network interfaces to which your account has access. For more information, see Configuring a Lambda Function to Access Resources in an Amazon VPC. For information about elastic network interface limits, see Network Interfaces section in the Amazon VPC Limits topic.

One way to solve both of these problems is applying a concurrency limit to the Lambda functions in an account.

Configuring per function concurrency limits

You can now set a concurrency limit on individual Lambda functions in an account. The concurrency limit that you set reserves a portion of your account level concurrency for a given function. All of your functions’ concurrent executions count against this account-level limit by default.

If you set a concurrency limit for a specific function, then that function’s concurrency limit allocation is deducted from the shared pool and assigned to that specific function. AWS also reserves 100 units of concurrency for all functions that don’t have a specified concurrency limit set. This helps to make sure that future functions have capacity to be consumed.

Going back to the example of the consuming services, you could set throttles for the functions as follows:

Amazon S3 function = 350
Amazon Kinesis function = 200
Amazon DynamoDB function = 200
Amazon Cognito function = 150
Total = 900

With the 100 reserved for all non-concurrency reserved functions, this totals the account limit of 1000.

Here’s how this works. To start, create a basic Lambda function that is invoked via Amazon API Gateway. This Lambda function returns a single “Hello World” statement with an added sleep time between 2 and 5 seconds. The sleep time simulates an API providing some sort of capability that can take a varied amount of time. The goal here is to show how an API that is underloaded can reach its concurrency limit, and what happens when it does.
To create the example function

  1. Open the Lambda console.
  2. Choose Create Function.
  3. For Author from scratch, enter the following values:
    1. For Name, enter a value (such as concurrencyBlog01).
    2. For Runtime, choose Python 3.6.
    3. For Role, choose Create new role from template and enter a name aligned with this function, such as concurrencyBlogRole.
  4. Choose Create function.
  5. The function is created with some basic example code. Replace that code with the following:

import time
from random import randint
seconds = randint(2, 5)

def lambda_handler(event, context):
time.sleep(seconds)
return {"statusCode": 200,
"body": ("Hello world, slept " + str(seconds) + " seconds"),
"headers":
{
"Access-Control-Allow-Headers": "Content-Type,X-Amz-Date,Authorization,X-Api-Key,X-Amz-Security-Token",
"Access-Control-Allow-Methods": "GET,OPTIONS",
}}

  1. Under Basic settings, set Timeout to 10 seconds. While this function should only ever take up to 5-6 seconds (with the 5-second max sleep), this gives you a little bit of room if it takes longer.

  1. Choose Save at the top right.

At this point, your function is configured for this example. Test it and confirm this in the console:

  1. Choose Test.
  2. Enter a name (it doesn’t matter for this example).
  3. Choose Create.
  4. In the console, choose Test again.
  5. You should see output similar to the following:

Now configure API Gateway so that you have an HTTPS endpoint to test against.

  1. In the Lambda console, choose Configuration.
  2. Under Triggers, choose API Gateway.
  3. Open the API Gateway icon now shown as attached to your Lambda function:

  1. Under Configure triggers, leave the default values for API Name and Deployment stage. For Security, choose Open.
  2. Choose Add, Save.

API Gateway is now configured to invoke Lambda at the Invoke URL shown under its configuration. You can take this URL and test it in any browser or command line, using tools such as “curl”:


$ curl https://ofixul557l.execute-api.us-east-1.amazonaws.com/prod/concurrencyBlog01
Hello world, slept 2 seconds

Throwing load at the function

Now start throwing some load against your API Gateway + Lambda function combo. Right now, your function is only limited by the total amount of concurrency available in an account. For this example account, you might have 850 unreserved concurrency out of a full account limit of 1000 due to having configured a few concurrency limits already (also the 100 concurrency saved for all functions without configured limits). You can find all of this information on the main Dashboard page of the Lambda console:

For generating load in this example, use an open source tool called “hey” (https://github.com/rakyll/hey), which works similarly to ApacheBench (ab). You test from an Amazon EC2 instance running the default Amazon Linux AMI from the EC2 console. For more help with configuring an EC2 instance, follow the steps in the Launch Instance Wizard.

After the EC2 instance is running, SSH into the host and run the following:


sudo yum install go
go get -u github.com/rakyll/hey

“hey” is easy to use. For these tests, specify a total number of tests (5,000) and a concurrency of 50 against the API Gateway URL as follows(replace the URL here with your own):


$ ./go/bin/hey -n 5000 -c 50 https://ofixul557l.execute-api.us-east-1.amazonaws.com/prod/concurrencyBlog01

The output from “hey” tells you interesting bits of information:


$ ./go/bin/hey -n 5000 -c 50 https://ofixul557l.execute-api.us-east-1.amazonaws.com/prod/concurrencyBlog01

Summary:
Total: 381.9978 secs
Slowest: 9.4765 secs
Fastest: 0.0438 secs
Average: 3.2153 secs
Requests/sec: 13.0891
Total data: 140024 bytes
Size/request: 28 bytes

Response time histogram:
0.044 [1] |
0.987 [2] |
1.930 [0] |
2.874 [1803] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
3.817 [1518] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
4.760 [719] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
5.703 [917] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
6.647 [13] |
7.590 [14] |
8.533 [9] |
9.477 [4] |

Latency distribution:
10% in 2.0224 secs
25% in 2.0267 secs
50% in 3.0251 secs
75% in 4.0269 secs
90% in 5.0279 secs
95% in 5.0414 secs
99% in 5.1871 secs

Details (average, fastest, slowest):
DNS+dialup: 0.0003 secs, 0.0000 secs, 0.0332 secs
DNS-lookup: 0.0000 secs, 0.0000 secs, 0.0046 secs
req write: 0.0000 secs, 0.0000 secs, 0.0005 secs
resp wait: 3.2149 secs, 0.0438 secs, 9.4472 secs
resp read: 0.0000 secs, 0.0000 secs, 0.0004 secs

Status code distribution:
[200] 4997 responses
[502] 3 responses

You can see a helpful histogram and latency distribution. Remember that this Lambda function has a random sleep period in it and so isn’t entirely representational of a real-life workload. Those three 502s warrant digging deeper, but could be due to Lambda cold-start timing and the “second” variable being the maximum of 5, causing the Lambda functions to time out. AWS X-Ray and the Amazon CloudWatch logs generated by both API Gateway and Lambda could help you troubleshoot this.

Configuring a concurrency reservation

Now that you’ve established that you can generate this load against the function, I show you how to limit it and protect a backend resource from being overloaded by all of these requests.

  1. In the console, choose Configure.
  2. Under Concurrency, for Reserve concurrency, enter 25.

  1. Click on Save in the top right corner.

You could also set this with the AWS CLI using the Lambda put-function-concurrency command or see your current concurrency configuration via Lambda get-function. Here’s an example command:


$ aws lambda get-function --function-name concurrencyBlog01 --output json --query Concurrency
{
"ReservedConcurrentExecutions": 25
}

Either way, you’ve set the Concurrency Reservation to 25 for this function. This acts as both a limit and a reservation in terms of making sure that you can execute 25 concurrent functions at all times. Going above this results in the throttling of the Lambda function. Depending on the invoking service, throttling can result in a number of different outcomes, as shown in the documentation on Throttling Behavior. This change has also reduced your unreserved account concurrency for other functions by 25.

Rerun the same load generation as before and see what happens. Previously, you tested at 50 concurrency, which worked just fine. By limiting the Lambda functions to 25 concurrency, you should see rate limiting kick in. Run the same test again:


$ ./go/bin/hey -n 5000 -c 50 https://ofixul557l.execute-api.us-east-1.amazonaws.com/prod/concurrencyBlog01

While this test runs, refresh the Monitoring tab on your function detail page. You see the following warning message:

This is great! It means that your throttle is working as configured and you are now protecting your downstream resources from too much load from your Lambda function.

Here is the output from a new “hey” command:


$ ./go/bin/hey -n 5000 -c 50 https://ofixul557l.execute-api.us-east-1.amazonaws.com/prod/concurrencyBlog01
Summary:
Total: 379.9922 secs
Slowest: 7.1486 secs
Fastest: 0.0102 secs
Average: 1.1897 secs
Requests/sec: 13.1582
Total data: 164608 bytes
Size/request: 32 bytes

Response time histogram:
0.010 [1] |
0.724 [3075] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
1.438 [0] |
2.152 [811] |∎∎∎∎∎∎∎∎∎∎∎
2.866 [11] |
3.579 [566] |∎∎∎∎∎∎∎
4.293 [214] |∎∎∎
5.007 [1] |
5.721 [315] |∎∎∎∎
6.435 [4] |
7.149 [2] |

Latency distribution:
10% in 0.0130 secs
25% in 0.0147 secs
50% in 0.0205 secs
75% in 2.0344 secs
90% in 4.0229 secs
95% in 5.0248 secs
99% in 5.0629 secs

Details (average, fastest, slowest):
DNS+dialup: 0.0004 secs, 0.0000 secs, 0.0537 secs
DNS-lookup: 0.0002 secs, 0.0000 secs, 0.0184 secs
req write: 0.0000 secs, 0.0000 secs, 0.0016 secs
resp wait: 1.1892 secs, 0.0101 secs, 7.1038 secs
resp read: 0.0000 secs, 0.0000 secs, 0.0005 secs

Status code distribution:
[502] 3076 responses
[200] 1924 responses

This looks fairly different from the last load test run. A large percentage of these requests failed fast due to the concurrency throttle failing them (those with the 0.724 seconds line). The timing shown here in the histogram represents the entire time it took to get a response between the EC2 instance and API Gateway calling Lambda and being rejected. It’s also important to note that this example was configured with an edge-optimized endpoint in API Gateway. You see under Status code distribution that 3076 of the 5000 requests failed with a 502, showing that the backend service from API Gateway and Lambda failed the request.

Other uses

Managing function concurrency can be useful in a few other ways beyond just limiting the impact on downstream services and providing a reservation of concurrency capacity. Here are two other uses:

  • Emergency kill switch
  • Cost controls

Emergency kill switch

On occasion, due to issues with applications I’ve managed in the past, I’ve had a need to disable a certain function or capability of an application. By setting the concurrency reservation and limit of a Lambda function to zero, you can do just that.

With the reservation set to zero every invocation of a Lambda function results in being throttled. You could then work on the related parts of the infrastructure or application that aren’t working, and then reconfigure the concurrency limit to allow invocations again.

Cost controls

While I mentioned how you might want to use concurrency limits to control the downstream impact to services or databases that your Lambda function might call, another resource that you might be cautious about is money. Setting the concurrency throttle is another way to help control costs during development and testing of your application.

You might want to prevent against a function performing a recursive action too quickly or a development workload generating too high of a concurrency. You might also want to protect development resources connected to this function from generating too much cost, such as APIs that your Lambda function calls.

Conclusion

Concurrent executions as a unit of scale are a fairly unique characteristic about Lambda functions. Placing limits on how many concurrency “slices” that your function can consume can prevent a single function from consuming all of the available concurrency in an account. Limits can also prevent a function from overwhelming a backend resource that isn’t as scalable.

Unlike monolithic applications or even microservices where there are mixed capabilities in a single service, Lambda functions encourage a sort of “nano-service” of small business logic directly related to the integration model connected to the function. I hope you’ve enjoyed this post and configure your concurrency limits today!

Колективно управление на права: иск на ЕК срещу България

Post Syndicated from nellyo original https://nellyo.wordpress.com/2017/12/10/2014-26-ip/

На  7 декември 2017 r. стана известно решението на ЕК да предяви иск пред Съда на ЕС срещу България, Испания, Люксембург и Румъния поради липса на уведомление за цялостно транспониране в националното законодателство  на разпоредбите на ЕС в областта на колективното управление на авторското право и сродните му права и на многотериториалното лицензиране на правата върху музикални произведения за използване онлайн, т. е. транспониране на Директивата относно колективното управление на правата (Директива 2014/26/ЕС), което  е предвидено да се извърши до 10 април 2016 г. Процедурите за нарушение са открити през май 2016 г. , но Комисията все още не е получила уведомление, че са предприети необходимите мерки за транспониране на директивата.

Комисията реагира при липса на уведомление, при частично или неправилно транспониране.  По-нататък в съобщението се казва, че

Комисията призовава Съда да наложи финансови санкции на тези четири държави (България — 19 121,60 EUR на ден, Испания — 123 928,64 EUR на ден, Люксембург — 12 920,00 EUR на ден и Румъния — 42 377,60 EUR на ден).

Относно процедурата

Макар че по-известната реакция е Комисията да предложи финансова санкция (едва) в случай на неизпълнение на решение на Съда,  с нововъведение от Договора от Лисабон (чл.260, нов параграф 3 ДФЕС)  се предвижда следното: „Когато Комисията сезира Съда на Европейския съюз с иск по силата на член 258, тъй като счита, че тази държава  не е изпълнила задължението си да съобщи за мерките за транспониране на директива, приета съгласно определена законодателна процедура, тя може, ако счете за уместно, да определи размера на еднократно платимата сума или периодичната имуществена санкция, която тази държава трябва да заплати, и която според нея е съобразена с обстоятелствата.

Ако Съдът на Европейския съюз установи, че има неизпълнение на горепосоченото задължение, той може да наложи на тази държава  да заплати еднократно платимата сума или периодичната имуществена санкция, в рамките на размера, определен от Комисията. Задължението за плащане влиза в сила на датата, определена в решението на Съда на Европейския съюз.“

Както се посочва и в SEC(2010) 1371 окончателен, този параграф създава изцяло нов инструмент. Комисията може да предложи на Съда на Европейския съюз  още от момента на подаване на иска си за установяване на неизпълнение на задължение по силата на член 258 ДФЕС  да наложи заплащането на еднократно платимата сума или на периодичната имуществена санкция със същото решение, с което се установява неизпълнението от страна на държава-членка на задължението ѝ да съобщи за мерки за транспонирането на директива, приета съгласно законодателната процедура.

Комисията може да използва новата възможност  „ако счете  за уместно“ – какъвто очевидно е нашият случай.

Комисията вече е прилагала чл.260, параграф 3 ДФЕС по отношение на България – по дело С-203/13 искането на ЕК е Съдът освен отстраняване на допуснатите нарушения в законодателство, да наложи на Република България в съответствие c член 260, параграф 3 от ДФЕС и имуществена санкция в размер на 8448 евро на ден за всяка частично транспонирана директива  в областта на енергетиката.  Комисията  впоследствие е оттеглила иска си в резултат на поведението на Република България, която е предприела необходимите за изпълнение на задълженията си мерки едва след предявяването на иска.

По същество

Очаква въвеждане  Директива относно колективното управление на правата (Директива 2014/26/ЕС)

Директивата  има за цел да подобри начина, по който се управляват всички организации за колективно управление чрез създаване на общи стандарти за управление, прозрачност и финансова дисциплина. С нея също така се определят общи стандарти за многотериториалното лицензиране на правата върху музикални произведения за използване онлайн на вътрешния пазар. Директивата относно колективното управление на правата е важна част от законодателството за авторското право в Европа. Всички организации за колективно управление трябва да подобрят своите стандарти за управление и за прозрачност.

В Румъния според ЕК има неправилно прилагане – законодателството на ЕС предвижда, че авторите могат да разрешават или забраняват разгласяването пред публика на своите произведения, но в Румъния авторите нямат друг избор, освен да оставят управлението на своето право на публично разгласяване на музикални произведения на организация за колективно управление. Това води до лишаване от изключителното авторско право на публично разгласяване, което, по мнението на Комисията, не е оправдано по смисъла на правото на ЕС.  Другите три държави, вкл. България,  не са уведомили ЕК за транспониране.

Рискът от санкция и необходимостта от експедитивност не намалява изискванията към националния законодателен процес за обоснованост, балансираност, съобразяване на различните интереси. Става дума за стандарти за прозрачност на организациите за колективно управление – и това прави националната мярка особено важна.

Събщение за медиите

Filed under: Uncategorized Tagged: съд на ес

Три билборда извън града – по Маклуън

Post Syndicated from nellyo original https://nellyo.wordpress.com/2017/12/09/three_billboards/

 

.

Последният засега филм на Мартин Макдона Три билборда извън града  (оригинално заглавие  Three Billboards Outside Ebbing, Missouri)  беше показан и в София. Филмът вече е взел наградата на публиката в Сан Себастиан и Торонто, наградата за най-добър сценарий във Венеция и две награди BIFA.

Главната героиня е Милдред Хейс. Живее в малък град  в Мисури, разведена, майка на син и дъщеря,  преживяла преди месеци огромна лична трагедия. Дъщеря й Анжела, в трудната гимназиална възраст, една вечер е изнасилена и убита. Месеци наред убиецът остава неизвестен, мъката не отстъпва, но се засилва гневът от липсата на справедливост и правосъдие.  На входа на града, покрай магистралата, има три изоставени билборда – и филмът започва с  решението на  Милдред да ги наеме, за да постави директно и видимо въпросите си  към хората, от които очаква действие – и лично към шефа на полицията : какво правиш, за да разкриеш престъплението? Въпросите  от билбордовете  – както и се очаква – привличат вниманието  на местната полиция, на телевизията,  на хората от града.

Какво казва законът, пита Милдред Хейс в началото, какво можеш и какво не можеш да напишеш на билборд?  Билбордовете са самостоятелен герой във филма: те движат действието, агресията към майката прелива в агресия  към билбордовете, подкрепата за майката се изразява в подкрепа за нейните въпроси на билбордовете, медията е съобщението*.  Ако в Ани Хол Маклуън се появява на екрана лично,   Три билборда извън града  е филмова реализация на идеите на Маклуън:  “Всяка медия застава между нас и реалността и въздейства върху начина ни на възприемане на света.”

Ролята на майката е поверена на Франсис Масдорманд, носителка на Оскар за ролята си във Фарго  на братята Джоел и Итън Коен. Макдорманд е съпруга на Джоел Коен от 1984 г.  и това като че ли по особен начин има отношение и към играта й в този филм на Макдона.  Може би не – но така или иначе вече писаха, че съсредоточената отдаденост, съчетана с пълна липса на актьорска суета, правят  Франсис Масдорманд  претендент за втори Оскар.

Както и наградите показват, в основата на успеха е текстът на Макдона.  Историята е пълна с обрати   и в края на филма трудно можем да се посочи  отрицателен герой – напълно отрицателен герой липсва,  героите показват различни страни от характера си в неочаквани развития. Конфликтът, разгърнат чрез билбордовете, е с местния полицейски шеф – който  на практика е престанал да търси убиеца, но пък го  е търсил добросъвестно – според собствените си разбирания, просто засега няма резултат  – и, за съжаление, е на път да си отиде от тежка болест. Какво може да направи? – Ако аз бях на твое място, бих направила база с ДНК за всеки мъж  в тази страна още от раждането му, и като стане нещо – установява се със сигурност – и смърт, казва майката.  – Това със сигурност не се позволява  от законите за правата на човека, обяснява полицаят, и точно в този момент той изглежда прав. Милдред Хейс също има грешки – може би и вини   – в трудните отношения майка – дъщеря, може би Анжела е щяла да остане  жива, ако майката с  търпение беше намирала път към нея: става ясно, че отношенията им са били обтегнати и скоро преди убийството дъщерята е искала да се премести да живее при баща си. Обтегнати отношения не заради липса на любов, а заради моментна небрежност или липса на внимание, никой не е съвършен – но само някои плащат толкова висока цена.

По подобен начин се развива и образът на полицая Диксън, класически расист и насилник, власт, облечена в сила. Още ли мъчиш негри, Диксън, пита майката в началото на филма. – Не негри, а цветнокожи, отговаря Диксън (известна фраза и от Ръкомахане в Спокан) –  и след известна пауза добавя – и не ги мъча.  Мъчи ги, филмът показва сцени на насилие, но ето, Диксън случайно чува непознат мъж да споделя, че е участвал в изнасилване – и от този момент се променя,  проявява изобретателност, рискува, за да получи ДНК от заподозрения – и  започва да се държи наистина като полицай, все пак повелята  е  да служи и да защитава.

Филмът не дава отговори, не знаем дали в далечината се вижда правосъдие за убитото момиче, но на финала камерата  проследява удивителна двойка – Милдред Хейс и Диксън, заели се с установяване на истината   – ако не за дъщерята на Милдред, то може би за друго насилие. Как ще продължи историята? Ще видим, ще решим по пътя.

Звучи и като отворен финал за Америка, за американската действителност. Ще решим по пътя.

Прекрасен филм, а най-хубавото е, че със сигурност за другите зрители важни ще се окажат други послания, те ще открият друго там. Отново време за Маклуън: “Как е възможно един поет като Елиът да заяви: Никога не съм мислил така, но това съм искал да кажа, след като вие сте го открили там. Ето това е моят начин на мислене по повод реакциите на критиката.”

*

Маршал Маклуън – исках да сложа връзка към статията за Маклуън в  българската Уикипедия – с удивление виждам, че няма статия, но поне има два реда за Медията е съобщението.

Filed under: Uncategorized

Implementing Dynamic ETL Pipelines Using AWS Step Functions

Post Syndicated from Tara Van Unen original https://aws.amazon.com/blogs/compute/implementing-dynamic-etl-pipelines-using-aws-step-functions/

This post contributed by:
Wangechi Dole, AWS Solutions Architect
Milan Krasnansky, ING, Digital Solutions Developer, SGK
Rian Mookencherry, Director – Product Innovation, SGK

Data processing and transformation is a common use case you see in our customer case studies and success stories. Often, customers deal with complex data from a variety of sources that needs to be transformed and customized through a series of steps to make it useful to different systems and stakeholders. This can be difficult due to the ever-increasing volume, velocity, and variety of data. Today, data management challenges cannot be solved with traditional databases.

Workflow automation helps you build solutions that are repeatable, scalable, and reliable. You can use AWS Step Functions for this. A great example is how SGK used Step Functions to automate the ETL processes for their client. With Step Functions, SGK has been able to automate changes within the data management system, substantially reducing the time required for data processing.

In this post, SGK shares the details of how they used Step Functions to build a robust data processing system based on highly configurable business transformation rules for ETL processes.

SGK: Building dynamic ETL pipelines

SGK is a subsidiary of Matthews International Corporation, a diversified organization focusing on brand solutions and industrial technologies. SGK’s Global Content Creation Studio network creates compelling content and solutions that connect brands and products to consumers through multiple assets including photography, video, and copywriting.

We were recently contracted to build a sophisticated and scalable data management system for one of our clients. We chose to build the solution on AWS to leverage advanced, managed services that help to improve the speed and agility of development.

The data management system served two main functions:

  1. Ingesting a large amount of complex data to facilitate both reporting and product funding decisions for the client’s global marketing and supply chain organizations.
  2. Processing the data through normalization and applying complex algorithms and data transformations. The system goal was to provide information in the relevant context—such as strategic marketing, supply chain, product planning, etc. —to the end consumer through automated data feeds or updates to existing ETL systems.

We were faced with several challenges:

  • Output data that needed to be refreshed at least twice a day to provide fresh datasets to both local and global markets. That constant data refresh posed several challenges, especially around data management and replication across multiple databases.
  • The complexity of reporting business rules that needed to be updated on a constant basis.
  • Data that could not be processed as contiguous blocks of typical time-series data. The measurement of the data was done across seasons (that is, combination of dates), which often resulted with up to three overlapping seasons at any given time.
  • Input data that came from 10+ different data sources. Each data source ranged from 1–20K rows with as many as 85 columns per input source.

These challenges meant that our small Dev team heavily invested time in frequent configuration changes to the system and data integrity verification to make sure that everything was operating properly. Maintaining this system proved to be a daunting task and that’s when we turned to Step Functions—along with other AWS services—to automate our ETL processes.

Solution overview

Our solution included the following AWS services:

  • AWS Step Functions: Before Step Functions was available, we were using multiple Lambda functions for this use case and running into memory limit issues. With Step Functions, we can execute steps in parallel simultaneously, in a cost-efficient manner, without running into memory limitations.
  • AWS Lambda: The Step Functions state machine uses Lambda functions to implement the Task states. Our Lambda functions are implemented in Java 8.
  • Amazon DynamoDB provides us with an easy and flexible way to manage business rules. We specify our rules as Keys. These are key-value pairs stored in a DynamoDB table.
  • Amazon RDS: Our ETL pipelines consume source data from our RDS MySQL database.
  • Amazon Redshift: We use Amazon Redshift for reporting purposes because it integrates with our BI tools. Currently we are using Tableau for reporting which integrates well with Amazon Redshift.
  • Amazon S3: We store our raw input files and intermediate results in S3 buckets.
  • Amazon CloudWatch Events: Our users expect results at a specific time. We use CloudWatch Events to trigger Step Functions on an automated schedule.

Solution architecture

This solution uses a declarative approach to defining business transformation rules that are applied by the underlying Step Functions state machine as data moves from RDS to Amazon Redshift. An S3 bucket is used to store intermediate results. A CloudWatch Event rule triggers the Step Functions state machine on a schedule. The following diagram illustrates our architecture:

Here are more details for the above diagram:

  1. A rule in CloudWatch Events triggers the state machine execution on an automated schedule.
  2. The state machine invokes the first Lambda function.
  3. The Lambda function deletes all existing records in Amazon Redshift. Depending on the dataset, the Lambda function can create a new table in Amazon Redshift to hold the data.
  4. The same Lambda function then retrieves Keys from a DynamoDB table. Keys represent specific marketing campaigns or seasons and map to specific records in RDS.
  5. The state machine executes the second Lambda function using the Keys from DynamoDB.
  6. The second Lambda function retrieves the referenced dataset from RDS. The records retrieved represent the entire dataset needed for a specific marketing campaign.
  7. The second Lambda function executes in parallel for each Key retrieved from DynamoDB and stores the output in CSV format temporarily in S3.
  8. Finally, the Lambda function uploads the data into Amazon Redshift.

To understand the above data processing workflow, take a closer look at the Step Functions state machine for this example.

We walk you through the state machine in more detail in the following sections.

Walkthrough

To get started, you need to:

  • Create a schedule in CloudWatch Events
  • Specify conditions for RDS data extracts
  • Create Amazon Redshift input files
  • Load data into Amazon Redshift

Step 1: Create a schedule in CloudWatch Events
Create rules in CloudWatch Events to trigger the Step Functions state machine on an automated schedule. The following is an example cron expression to automate your schedule:

In this example, the cron expression invokes the Step Functions state machine at 3:00am and 2:00pm (UTC) every day.

Step 2: Specify conditions for RDS data extracts
We use DynamoDB to store Keys that determine which rows of data to extract from our RDS MySQL database. An example Key is MCS2017, which stands for, Marketing Campaign Spring 2017. Each campaign has a specific start and end date and the corresponding dataset is stored in RDS MySQL. A record in RDS contains about 600 columns, and each Key can represent up to 20K records.

A given day can have multiple campaigns with different start and end dates running simultaneously. In the following example DynamoDB item, three campaigns are specified for the given date.

The state machine example shown above uses Keys 31, 32, and 33 in the first ChoiceState and Keys 21 and 22 in the second ChoiceState. These keys represent marketing campaigns for a given day. For example, on Monday, there are only two campaigns requested. The ChoiceState with Keys 21 and 22 is executed. If three campaigns are requested on Tuesday, for example, then ChoiceState with Keys 31, 32, and 33 is executed. MCS2017 can be represented by Key 21 and Key 33 on Monday and Tuesday, respectively. This approach gives us the flexibility to add or remove campaigns dynamically.

Step 3: Create Amazon Redshift input files
When the state machine begins execution, the first Lambda function is invoked as the resource for FirstState, represented in the Step Functions state machine as follows:

"Comment": ” AWS Amazon States Language.", 
  "StartAt": "FirstState",
 
"States": { 
  "FirstState": {
   
"Type": "Task",
   
"Resource": "arn:aws:lambda:xx-xxxx-x:XXXXXXXXXXXX:function:Start",
    "Next": "ChoiceState" 
  } 

As described in the solution architecture, the purpose of this Lambda function is to delete existing data in Amazon Redshift and retrieve keys from DynamoDB. In our use case, we found that deleting existing records was more efficient and less time-consuming than finding the delta and updating existing records. On average, an Amazon Redshift table can contain about 36 million cells, which translates to roughly 65K records. The following is the code snippet for the first Lambda function in Java 8:

public class LambdaFunctionHandler implements RequestHandler<Map<String,Object>,Map<String,String>> {
    Map<String,String> keys= new HashMap<>();
    public Map<String, String> handleRequest(Map<String, Object> input, Context context){
       Properties config = getConfig(); 
       // 1. Cleaning Redshift Database
       new RedshiftDataService(config).cleaningTable(); 
       // 2. Reading data from Dynamodb
       List<String> keyList = new DynamoDBDataService(config).getCurrentKeys();
       for(int i = 0; i < keyList.size(); i++) {
           keys.put(”key" + (i+1), keyList.get(i)); 
       }
       keys.put(”key" + T,String.valueOf(keyList.size()));
       // 3. Returning the key values and the key count from the “for” loop
       return (keys);
}

The following JSON represents ChoiceState.

"ChoiceState": {
   "Type" : "Choice",
   "Choices": [ 
   {

      "Variable": "$.keyT",
     "StringEquals": "3",
     "Next": "CurrentThreeKeys" 
   }, 
   {

     "Variable": "$.keyT",
    "StringEquals": "2",
    "Next": "CurrentTwooKeys" 
   } 
 ], 
 "Default": "DefaultState"
}

The variable $.keyT represents the number of keys retrieved from DynamoDB. This variable determines which of the parallel branches should be executed. At the time of publication, Step Functions does not support dynamic parallel state. Therefore, choices under ChoiceState are manually created and assigned hardcoded StringEquals values. These values represent the number of parallel executions for the second Lambda function.

For example, if $.keyT equals 3, the second Lambda function is executed three times in parallel with keys, $key1, $key2 and $key3 retrieved from DynamoDB. Similarly, if $.keyT equals two, the second Lambda function is executed twice in parallel.  The following JSON represents this parallel execution:

"CurrentThreeKeys": { 
  "Type": "Parallel",
  "Next": "NextState",
  "Branches": [ 
  {

     "StartAt": “key31",
    "States": { 
       “key31": {

          "Type": "Task",
        "InputPath": "$.key1",
        "Resource": "arn:aws:lambda:xx-xxxx-x:XXXXXXXXXXXX:function:Execution",
        "End": true 
       } 
    } 
  }, 
  {

     "StartAt": “key32",
    "States": { 
     “key32": {

        "Type": "Task",
       "InputPath": "$.key2",
         "Resource": "arn:aws:lambda:xx-xxxx-x:XXXXXXXXXXXX:function:Execution",
       "End": true 
      } 
     } 
   }, 
   {

      "StartAt": “key33",
       "States": { 
          “key33": {

                "Type": "Task",
             "InputPath": "$.key3",
             "Resource": "arn:aws:lambda:xx-xxxx-x:XXXXXXXXXXXX:function:Execution",
           "End": true 
       } 
     } 
    } 
  ] 
} 

Step 4: Load data into Amazon Redshift
The second Lambda function in the state machine extracts records from RDS associated with keys retrieved for DynamoDB. It processes the data then loads into an Amazon Redshift table. The following is code snippet for the second Lambda function in Java 8.

public class LambdaFunctionHandler implements RequestHandler<String, String> {
 public static String key = null;

public String handleRequest(String input, Context context) { 
   key=input; 
   //1. Getting basic configurations for the next classes + s3 client Properties
   config = getConfig();

   AmazonS3 s3 = AmazonS3ClientBuilder.defaultClient(); 
   // 2. Export query results from RDS into S3 bucket 
   new RdsDataService(config).exportDataToS3(s3,key); 
   // 3. Import query results from S3 bucket into Redshift 
    new RedshiftDataService(config).importDataFromS3(s3,key); 
   System.out.println(input); 
   return "SUCCESS"; 
 } 
}

After the data is loaded into Amazon Redshift, end users can visualize it using their preferred business intelligence tools.

Lessons learned

  • At the time of publication, the 1.5–GB memory hard limit for Lambda functions was inadequate for processing our complex workload. Step Functions gave us the flexibility to chunk our large datasets and process them in parallel, saving on costs and time.
  • In our previous implementation, we assigned each key a dedicated Lambda function along with CloudWatch rules for schedule automation. This approach proved to be inefficient and quickly became an operational burden. Previously, we processed each key sequentially, with each key adding about five minutes to the overall processing time. For example, processing three keys meant that the total processing time was three times longer. With Step Functions, the entire state machine executes in about five minutes.
  • Using DynamoDB with Step Functions gave us the flexibility to manage keys efficiently. In our previous implementations, keys were hardcoded in Lambda functions, which became difficult to manage due to frequent updates. DynamoDB is a great way to store dynamic data that changes frequently, and it works perfectly with our serverless architectures.

Conclusion

With Step Functions, we were able to fully automate the frequent configuration updates to our dataset resulting in significant cost savings, reduced risk to data errors due to system downtime, and more time for us to focus on new product development rather than support related issues. We hope that you have found the information useful and that it can serve as a jump-start to building your own ETL processes on AWS with managed AWS services.

For more information about how Step Functions makes it easy to coordinate the components of distributed applications and microservices in any workflow, see the use case examples and then build your first state machine in under five minutes in the Step Functions console.

If you have questions or suggestions, please comment below.

Glenn’s Take on re:Invent 2017 – Part 3

Post Syndicated from Glenn Gore original https://aws.amazon.com/blogs/architecture/glenns-take-on-reinvent-2017-part-3/

Glenn Gore here, Chief Architect for AWS. I was in Las Vegas last week — with 43K others — for re:Invent 2017. I checked in to the Architecture blog here and here with my take on what was interesting about some of the bigger announcements from a cloud-architecture perspective.

In the excitement of so many new services being launched, we sometimes overlook feature updates that, while perhaps not as exciting as Amazon DeepLens, have significant impact on how you architect and develop solutions on AWS.

Amazon DynamoDB is used by more than 100,000 customers around the world, handling over a trillion requests every day. From the start, DynamoDB has offered high availability by natively spanning multiple Availability Zones within an AWS Region. As more customers started building and deploying truly-global applications, there was a need to replicate a DynamoDB table to multiple AWS Regions, allowing for read/write operations to occur in any region where the table was replicated. This update is important for providing a globally-consistent view of information — as users may transition from one region to another — or for providing additional levels of availability, allowing for failover between AWS Regions without loss of information.

There are some interesting concurrency-design aspects you need to be aware of and ensure you can handle correctly. For example, we support the “last writer wins” reconciliation where eventual consistency is being used and an application updates the same item in different AWS Regions at the same time. If you require strongly-consistent read/writes then you must perform all of your read/writes in the same AWS Region. The details behind this can be found in the DynamoDB documentation. Providing a globally-distributed, replicated DynamoDB table simplifies many different use cases and allows for the logic of replication, which may have been pushed up into the application layers to be simplified back down into the data layer.

The other big update for DynamoDB is that you can now back up your DynamoDB table on demand with no impact to performance. One of the features I really like is that when you trigger a backup, it is available instantly, regardless of the size of the table. Behind the scenes, we use snapshots and change logs to ensure a consistent backup. While backup is instant, restoring the table could take some time depending on its size and ranges — from minutes to hours for very large tables.

This feature is super important for those of you who work in regulated industries that often have strict requirements around data retention and backups of data, which sometimes limited the use of DynamoDB or required complex workarounds to implement some sort of backup feature in the past. This often incurred significant, additional costs due to increased read transactions on their DynamoDB tables.

Amazon Simple Storage Service (Amazon S3) was our first-released AWS service over 11 years ago, and it proved the simplicity and scalability of true API-driven architectures in the cloud. Today, Amazon S3 stores trillions of objects, with transactional requests per second reaching into the millions! Dealing with data as objects opened up an incredibly diverse array of use cases ranging from libraries of static images, game binary downloads, and application log data, to massive data lakes used for big data analytics and business intelligence. With Amazon S3, when you accessed your data in an object, you effectively had to write/read the object as a whole or use the range feature to retrieve a part of the object — if possible — in your individual use case.

Now, with Amazon S3 Select, an SQL-like query language is used that can work with delimited text and JSON files, as well as work with GZIP compressed files. We don’t support encryption during the preview of Amazon S3 Select.

Amazon S3 Select provides two major benefits:

  • Faster access
  • Lower running costs

Serverless Lambda functions, where every millisecond matters when you are being charged, will benefit greatly from Amazon S3 Select as data retrieval and processing of your Lambda function will experience significant speedups and cost reductions. For example, we have seen 2x speed improvement and 80% cost reduction with the Serverless MapReduce code.

Other AWS services such as Amazon Athena, Amazon Redshift, and Amazon EMR will support Amazon S3 Select as well as partner offerings including Cloudera and Hortonworks. If you are using Amazon Glacier for longer-term data archival, you will be able to use Amazon Glacier Select to retrieve a subset of your content from within Amazon Glacier.

As the volume of data that can be stored within Amazon S3 and Amazon Glacier continues to scale on a daily basis, we will continue to innovate and develop improved and optimized services that will allow you to work with these magnificently-large data sets while reducing your costs (retrieval and processing). I believe this will also allow you to simplify the transformation and storage of incoming data into Amazon S3 in basic, semi-structured formats as a single copy vs. some of the duplication and reformatting of data sometimes required to do upfront optimizations for downstream processing. Amazon S3 Select largely removes the need for this upfront optimization and instead allows you to store data once and process it based on your individual Amazon S3 Select query per application or transaction need.

Thanks for reading!

Glenn contemplating why CSV format is still relevant in 2017 (Italy).

ЕСПЧ: Видеокамери и видеозаписи в лекционни зали нарушават правото на личен живот на преподавателите

Post Syndicated from nellyo original https://nellyo.wordpress.com/2017/12/03/echr-art8-21/

 На 28 ноември 2017 г. ЕСПЧ оповести решението си по делото  Antović и Mirković v. Черна гора ( № 70838/13). С  четири гласа за  и три особени мнения Съдът реши, че има  нарушение на член 8 ЕКПЧ (право на зачитане на личния и семейния живот), като присъди съответно обезщетение.

Двама професори в университета на Черна гора, Невенка Антович и Йован Миркович, са жалбоподатели  по делото.  В университета са монтирали видеокамери.  Професорите са обжалвали решението, но националните съдилища приемат, че аудиториите, в които  Антович и Миркович  преподават, са публични пространства и няма неправомерна намеса в личния живот на преподавателите.

Съдът в Страсбург отхвърля доводите на правителството, че делото е недопустимо, тъй като аудиторията е публично пространство. Съдът преди всичко намира, че  – както вече е установил в други дела – личният живот може да включва професионални дейности. Следователно член 8 ЕКПЧ е приложим:

43.  Съдът  провери дали жалбоподателите имат оправдано очакване за неприкосновеност  на личния живот. […]

44. Що се отнася до настоящия случай, Съдът отбелязва, че университетските аудитории са работните места на преподавателите. Тайно видеонаблюдение на   работното място трябва да се разглежда като  значителна намеса в личния живот […] Това води до   възпроизводима документация  за поведението на човека в неговото работно място [...] Няма причина Съдът да се отклони от тази констатация, дори ако се отнася до случаите на явно видеонаблюдение на служител на  работното място […]   Правото на личен живот съществува, дори и да е ограничено, доколкото е необходимо (вж. Bărbulescu, § 80).

По същество:

59. според националния закон оборудването за видеонаблюдение  може да се инсталира в официални или търговски помещения, но само ако целите –  по-специално безопасността на хора или имущество или за защита на поверителни данни –  не могат да бъдат постигнати в по друг начин. Съдът отбелязва, че в настоящия случай е въведено видеонаблюдение, за да се гарантира безопасността на имуществото и хората, включително студентите, и за наблюдение на преподаването. Една от тези цели, а именно надзорът върху преподаването, изобщо не е предвидена от закона като основание за видеонаблюдение. Освен това  изрично е постановено, че няма доказателства, че  имущество или хората са били в опасност […]. От своя страна   правителството не предоставя никакво доказателство за противното в това отношение, нито пък показва, че  преди това е разглеждало други мерки като алтернатива.

Професорите смятат също, че  нямат ефективен контрол над събраната информация, поради което наблюдението е незаконно.

ЕСПЧ потвърди : наблюдението е неправомерна намеса в правото на личен живот, нарушение на чл.8 ЕКПЧ.

 В едното особено мнение се приема, че преподаването не е част от личния живот:

За  мнозинството е достатъчно, че  видеонаблюдението или мониторингът се провежда на  работното място – залата, където  професорите преподават и общуват   със студенти. Според нас това е много широко  разбиране на понятието “личен живот”.

В другите две особени мнения се приема, че чл. 8 е приложим и има нарушение, но с различна аргументация:

Настоящият случай не се отнася до камери за сигурност, поставени например на входовете и изходите на университетските сгради. Тя се отнася до видео наблюдението на аудиториите. Гарантиране на  безопасността на хората и имуществото  е  една от целите на мярката , другата цел е мониторингът на учебните дейности  –  фактът, че деканът е имал достъп до записите, изглежда потвърждава, че това наистина е преследвана цел.
Университетските аудитории не са нито частно, нито обществено място. Те са места, където преподавателите срещат своите студенти и взаимодействат с тях.  Струва ни се, че в такова взаимодействие преподавателят може да има очаквания за неприкосновеност на личния живот –  в смисъл, че   обикновено се очаква, че това, което се случва в класната стая,  може да се наблюдава само от тези, които са оправомощени да присъстват в клас  и действително   присъстват.  […]

Filed under: Media Law, Uncategorized Tagged: еспч, чл.8

Glenn’s Take on re:Invent Part 2

Post Syndicated from Glenn Gore original https://aws.amazon.com/blogs/architecture/glenns-take-on-reinvent-part-2/

Glenn Gore here, Chief Architect for AWS. I’m in Las Vegas this week — with 43K others — for re:Invent 2017. We’ve got a lot of exciting announcements this week. I’m going to check in to the Architecture blog with my take on what’s interesting about some of the announcements from an cloud architectural perspective. My first post can be found here.

The Media and Entertainment industry has been a rapid adopter of AWS due to the scale, reliability, and low costs of our services. This has enabled customers to create new, online, digital experiences for their viewers ranging from broadcast to streaming to Over-the-Top (OTT) services that can be a combination of live, scheduled, or ad-hoc viewing, while supporting devices ranging from high-def TVs to mobile devices. Creating an end-to-end video service requires many different components often sourced from different vendors with different licensing models, which creates a complex architecture and a complex environment to support operationally.

AWS Media Services
Based on customer feedback, we have developed AWS Media Services to help simplify distribution of video content. AWS Media Services is comprised of five individual services that can either be used together to provide an end-to-end service or individually to work within existing deployments: AWS Elemental MediaConvert, AWS Elemental MediaLive, AWS Elemental MediaPackage, AWS Elemental MediaStore and AWS Elemental MediaTailor. These services can help you with everything from storing content safely and durably to setting up a live-streaming event in minutes without having to be concerned about the underlying infrastructure and scalability of the stream itself.

In my role, I participate in many AWS and industry events and often work with the production and event teams that put these shows together. With all the logistical tasks they have to deal with, the biggest question is often: “Will the live stream work?” Compounding this fear is the reality that, as users, we are also quick to jump on social media and make noise when a live stream drops while we are following along remotely. Worse is when I see event organizers actively selecting not to live stream content because of the risk of failure and and exposure — leading them to decide to take the safe option and not stream at all.

With AWS Media Services addressing many of the issues around putting together a high-quality media service, live streaming, and providing access to a library of content through a variety of mechanisms, I can’t wait to see more event teams use live streaming without the concern and worry I’ve seen in the past. I am excited for what this also means for non-media companies, as video becomes an increasingly common way of sharing information and adding a more personalized touch to internally- and externally-facing content.

AWS Media Services will allow you to focus more on the content and not worry about the platform. Awesome!

Amazon Neptune
As a civilization, we have been developing new ways to record and store information and model the relationships between sets of information for more than a thousand years. Government census data, tax records, births, deaths, and marriages were all recorded on medium ranging from knotted cords in the Inca civilization, clay tablets in ancient Babylon, to written texts in Western Europe during the late Middle Ages.

One of the first challenges of computing was figuring out how to store and work with vast amounts of information in a programmatic way, especially as the volume of information was increasing at a faster rate than ever before. We have seen different generations of how to organize this information in some form of database, ranging from flat files to the Information Management System (IMS) used in the 1960s for the Apollo space program, to the rise of the relational database management system (RDBMS) in the 1970s. These innovations drove a lot of subsequent innovations in information management and application development as we were able to move from thousands of records to millions and billions.

Today, as architects and developers, we have a vast variety of database technologies to select from, which have different characteristics that are optimized for different use cases:

  • Relational databases are well understood after decades of use in the majority of companies who required a database to store information. Amazon Relational Database (Amazon RDS) supports many popular relational database engines such as MySQL, Microsoft SQL Server, PostgreSQL, MariaDB, and Oracle. We have even brought the traditional RDBMS into the cloud world through Amazon Aurora, which provides MySQL and PostgreSQL support with the performance and reliability of commercial-grade databases at 1/10th the cost.
  • Non-relational databases (NoSQL) provided a simpler method of storing and retrieving information that was often faster and more scalable than traditional RDBMS technology. The concept of non-relational databases has existed since the 1960s but really took off in the early 2000s with the rise of web-based applications that required performance and scalability that relational databases struggled with at the time. AWS published this Dynamo whitepaper in 2007, with DynamoDB launching as a service in 2012. DynamoDB has quickly become one of the critical design elements for many of our customers who are building highly-scalable applications on AWS. We continue to innovate with DynamoDB, and this week launched global tables and on-demand backup at re:Invent 2017. DynamoDB excels in a variety of use cases, such as tracking of session information for popular websites, shopping cart information on e-commerce sites, and keeping track of gamers’ high scores in mobile gaming applications, for example.
  • Graph databases focus on the relationship between data items in the store. With a graph database, we work with nodes, edges, and properties to represent data, relationships, and information. Graph databases are designed to make it easy and fast to traverse and retrieve complex hierarchical data models. Graph databases share some concepts from the NoSQL family of databases such as key-value pairs (properties) and the use of a non-SQL query language such as Gremlin. Graph databases are commonly used for social networking, recommendation engines, fraud detection, and knowledge graphs. We released Amazon Neptune to help simplify the provisioning and management of graph databases as we believe that graph databases are going to enable the next generation of smart applications.

A common use case I am hearing every week as I talk to customers is how to incorporate chatbots within their organizations. Amazon Lex and Amazon Polly have made it easy for customers to experiment and build chatbots for a wide range of scenarios, but one of the missing pieces of the puzzle was how to model decision trees and and knowledge graphs so the chatbot could guide the conversation in an intelligent manner.

Graph databases are ideal for this particular use case, and having Amazon Neptune simplifies the deployment of a graph database while providing high performance, scalability, availability, and durability as a managed service. Security of your graph database is critical. To help ensure this, you can store your encrypted data by running AWS in Amazon Neptune within your Amazon Virtual Private Cloud (Amazon VPC) and using encryption at rest integrated with AWS Key Management Service (AWS KMS). Neptune also supports Amazon VPC and AWS Identity and Access Management (AWS IAM) to help further protect and restrict access.

Our customers now have the choice of many different database technologies to ensure that they can optimize each application and service for their specific needs. Just as DynamoDB has unlocked and enabled many new workloads that weren’t possible in relational databases, I can’t wait to see what new innovations and capabilities are enabled from graph databases as they become easier to use through Amazon Neptune.

Look for more on DynamoDB and Amazon S3 from me on Monday.

 

Glenn at Tour de Mont Blanc

 

 

Glenn’s Take on re:Invent 2017 Part 1

Post Syndicated from Glenn Gore original https://aws.amazon.com/blogs/architecture/glenns-take-on-reinvent-2017-part-1/

GREETINGS FROM LAS VEGAS

Glenn Gore here, Chief Architect for AWS. I’m in Las Vegas this week — with 43K others — for re:Invent 2017. We have a lot of exciting announcements this week. I’m going to post to the AWS Architecture blog each day with my take on what’s interesting about some of the announcements from a cloud architectural perspective.

Why not start at the beginning? At the Midnight Madness launch on Sunday night, we announced Amazon Sumerian, our platform for VR, AR, and mixed reality. The hype around VR/AR has existed for many years, though for me, it is a perfect example of how a working end-to-end solution often requires innovation from multiple sources. For AR/VR to be successful, we need many components to come together in a coherent manner to provide a great experience.

First, we need lightweight, high-definition goggles with motion tracking that are comfortable to wear. Second, we need to track movement of our body and hands in a 3-D space so that we can interact with virtual objects in the virtual world. Third, we need to build the virtual world itself and populate it with assets and define how the interactions will work and connect with various other systems.

There has been rapid development of the physical devices for AR/VR, ranging from iOS devices to Oculus Rift and HTC Vive, which provide excellent capabilities for the first and second components defined above. With the launch of Amazon Sumerian we are solving for the third area, which will help developers easily build their own virtual worlds and start experimenting and innovating with how to apply AR/VR in new ways.

Already, within 48 hours of Amazon Sumerian being announced, I have had multiple discussions with customers and partners around some cool use cases where VR can help in training simulations, remote-operator controls, or with new ideas around interacting with complex visual data sets, which starts bringing concepts straight out of sci-fi movies into the real (virtual) world. I am really excited to see how Sumerian will unlock the creative potential of developers and where this will lead.

Amazon MQ
I am a huge fan of distributed architectures where asynchronous messaging is the backbone of connecting the discrete components together. Amazon Simple Queue Service (Amazon SQS) is one of my favorite services due to its simplicity, scalability, performance, and the incredible flexibility of how you can use Amazon SQS in so many different ways to solve complex queuing scenarios.

While Amazon SQS is easy to use when building cloud-native applications on AWS, many of our customers running existing applications on-premises required support for different messaging protocols such as: Java Message Service (JMS), .Net Messaging Service (NMS), Advanced Message Queuing Protocol (AMQP), MQ Telemetry Transport (MQTT), Simple (or Streaming) Text Orientated Messaging Protocol (STOMP), and WebSockets. One of the most popular applications for on-premise message brokers is Apache ActiveMQ. With the release of Amazon MQ, you can now run Apache ActiveMQ on AWS as a managed service similar to what we did with Amazon ElastiCache back in 2012. For me, there are two compelling, major benefits that Amazon MQ provides:

  • Integrate existing applications with cloud-native applications without having to change a line of application code if using one of the supported messaging protocols. This removes one of the biggest blockers for integration between the old and the new.
  • Remove the complexity of configuring Multi-AZ resilient message broker services as Amazon MQ provides out-of-the-box redundancy by always storing messages redundantly across Availability Zones. Protection is provided against failure of a broker through to complete failure of an Availability Zone.

I believe that Amazon MQ is a major component in the tools required to help you migrate your existing applications to AWS. Having set up cross-data center Apache ActiveMQ clusters in the past myself and then testing to ensure they work as expected during critical failure scenarios, technical staff working on migrations to AWS benefit from the ease of deploying a fully redundant, managed Apache ActiveMQ cluster within minutes.

Who would have thought I would have been so excited to revisit Apache ActiveMQ in 2017 after using SQS for many, many years? Choice is a wonderful thing.

Amazon GuardDuty
Maintaining application and information security in the modern world is increasingly complex and is constantly evolving and changing as new threats emerge. This is due to the scale, variety, and distribution of services required in a competitive online world.

At Amazon, security is our number one priority. Thus, we are always looking at how we can increase security detection and protection while simplifying the implementation of advanced security practices for our customers. As a result, we released Amazon GuardDuty, which provides intelligent threat detection by using a combination of multiple information sources, transactional telemetry, and the application of machine learning models developed by AWS. One of the biggest benefits of Amazon GuardDuty that I appreciate is that enabling this service requires zero software, agents, sensors, or network choke points. which can all impact performance or reliability of the service you are trying to protect. Amazon GuardDuty works by monitoring your VPC flow logs, AWS CloudTrail events, DNS logs, as well as combing other sources of security threats that AWS is aggregating from our own internal and external sources.

The use of machine learning in Amazon GuardDuty allows it to identify changes in behavior, which could be suspicious and require additional investigation. Amazon GuardDuty works across all of your AWS accounts allowing for an aggregated analysis and ensuring centralized management of detected threats across accounts. This is important for our larger customers who can be running many hundreds of AWS accounts across their organization, as providing a single common threat detection of their organizational use of AWS is critical to ensuring they are protecting themselves.

Detection, though, is only the beginning of what Amazon GuardDuty enables. When a threat is identified in Amazon GuardDuty, you can configure remediation scripts or trigger Lambda functions where you have custom responses that enable you to start building automated responses to a variety of different common threats. Speed of response is required when a security incident may be taking place. For example, Amazon GuardDuty detects that an Amazon Elastic Compute Cloud (Amazon EC2) instance might be compromised due to traffic from a known set of malicious IP addresses. Upon detection of a compromised EC2 instance, we could apply an access control entry restricting outbound traffic for that instance, which stops loss of data until a security engineer can assess what has occurred.

Whether you are a customer running a single service in a single account, or a global customer with hundreds of accounts with thousands of applications, or a startup with hundreds of micro-services with hourly release cycle in a devops world, I recommend enabling Amazon GuardDuty. We have a 30-day free trial available for all new customers of this service. As it is a monitor of events, there is no change required to your architecture within AWS.

Stay tuned for tomorrow’s post on AWS Media Services and Amazon Neptune.

 

Glenn during the Tour du Mont Blanc

AWS Fargate: A Product Overview

Post Syndicated from Deepak Dayama original https://aws.amazon.com/blogs/compute/aws-fargate-a-product-overview/

It was just about three years ago that AWS announced Amazon Elastic Container Service (Amazon ECS), to run and manage containers at scale on AWS. With Amazon ECS, you’ve been able to run your workloads at high scale and availability without having to worry about running your own cluster management and container orchestration software.

Today, AWS announced the availability of AWS Fargate – a technology that enables you to use containers as a fundamental compute primitive without having to manage the underlying instances. With Fargate, you don’t need to provision, configure, or scale virtual machines in your clusters to run containers. Fargate can be used with Amazon ECS today, with plans to support Amazon Elastic Container Service for Kubernetes (Amazon EKS) in the future.

Fargate has flexible configuration options so you can closely match your application needs and granular, per-second billing.

Amazon ECS with Fargate

Amazon ECS enables you to run containers at scale. This service also provides native integration into the AWS platform with VPC networking, load balancing, IAM, Amazon CloudWatch Logs, and CloudWatch metrics. These deep integrations make the Amazon ECS task a first-class object within the AWS platform.

To run tasks, you first need to stand up a cluster of instances, which involves picking the right types of instances and sizes, setting up Auto Scaling, and right-sizing the cluster for performance. With Fargate, you can leave all that behind and focus on defining your application and policies around permissions and scaling.

The same container management capabilities remain available so you can continue to scale your container deployments. With Fargate, the only entity to manage is the task. You don’t need to manage the instances or supporting software like Docker daemon or the Amazon ECS agent.

Fargate capabilities are available natively within Amazon ECS. This means that you don’t need to learn new API actions or primitives to run containers on Fargate.

Using Amazon ECS, Fargate is a launch type option. You continue to define the applications the same way by using task definitions. In contrast, the EC2 launch type gives you more control of your server clusters and provides a broader range of customization options.

For example, a RunTask command example is pasted below with the Fargate launch type:

ecs run-task --launch-type FARGATE --cluster fargate-test --task-definition nginx --network-configuration
"awsvpcConfiguration={subnets=[subnet-b563fcd3]}"

Key features of Fargate

Resource-based pricing and per second billing
You pay by the task size and only for the time for which resources are consumed by the task. The price for CPU and memory is charged on a per-second basis. There is a one-minute minimum charge.

Flexible configurations options
Fargate is available with 50 different combinations of CPU and memory to closely match your application needs. You can use 2 GB per vCPU anywhere up to 8 GB per vCPU for various configurations. Match your workload requirements closely, whether they are general purpose, compute, or memory optimized.

Networking
All Fargate tasks run within your own VPC. Fargate supports the recently launched awsvpc networking mode and the elastic network interface for a task is visible in the subnet where the task is running. This provides the separation of responsibility so you retain full control of networking policies for your applications via VPC features like security groups, routing rules, and NACLs. Fargate also supports public IP addresses.

Load Balancing
ECS Service Load Balancing  for the Application Load Balancer and Network Load Balancer is supported. For the Fargate launch type, you specify the IP addresses of the Fargate tasks to register with the load balancers.

Permission tiers
Even though there are no instances to manage with Fargate, you continue to group tasks into logical clusters. This allows you to manage who can run or view services within the cluster. The task IAM role is still applicable. Additionally, there is a new Task Execution Role that grants Amazon ECS permissions to perform operations such as pushing logs to CloudWatch Logs or pulling image from Amazon Elastic Container Registry (Amazon ECR).

Container Registry Support
Fargate provides seamless authentication to help pull images from Amazon ECR via the Task Execution Role. Similarly, if you are using a public repository like DockerHub, you can continue to do so.

Amazon ECS CLI
The Amazon ECS CLI provides high-level commands to help simplify to create and run Amazon ECS clusters, tasks, and services. The latest version of the CLI now supports running tasks and services with Fargate.

EC2 and Fargate Launch Type Compatibility
All Amazon ECS clusters are heterogeneous – you can run both Fargate and Amazon ECS tasks in the same cluster. This enables teams working on different applications to choose their own cadence of moving to Fargate, or to select a launch type that meets their requirements without breaking the existing model. You can make an existing ECS task definition compatible with the Fargate launch type and run it as a Fargate service, and vice versa. Choosing a launch type is not a one-way door!

Logging and Visibility
With Fargate, you can send the application logs to CloudWatch logs. Service metrics (CPU and Memory utilization) are available as part of CloudWatch metrics. AWS partners for visibility, monitoring and application performance management including Datadog, Aquasec, Splunk, Twistlock, and New Relic also support Fargate tasks.

Conclusion

Fargate enables you to run containers without having to manage the underlying infrastructure. Today, Fargate is availabe for Amazon ECS, and in 2018, Amazon EKS. Visit the Fargate product page to learn more, or get started in the AWS Console.

–Deepak Dayama

Well-Architected Lens: Focus on Specific Workload Types

Post Syndicated from Philip Fitzsimons original https://aws.amazon.com/blogs/architecture/well-architected-lens-focus-on-specific-workload-types/

Customers have been building their innovations on AWS for over 11 years. During that time, our solutions architects have conducted tens of thousands of architecture reviews for our customers. In 2012 we created the “Well-Architected” initiative to share with you best practices for building in the cloud, and started publishing them in 2015. We recently released an updated Framework whitepaper, and a new Operational Excellence Pillar whitepaper to reflect what we learned from working with customers every day. Today, we are pleased to announce a new concept called a “lens” that allows you to focus on specific workload types from the well-architected perspective.

A well-architected review looks at a workload from a general technology perspective, which means it can’t provide workload-specific advice. For example, there are additional best practices when you are building high-performance computing (HPC) or serverless applications. Therefore, we created the concept of a lens to focus on what is different for those types of workloads.

In each lens, we document common scenarios we see — specific to that workload — providing reference architectures and a walkthrough. The lens also provides design principles to help you understand how to architect these types of workloads for the cloud, and questions for assessing your own architecture.

Today, we are releasing two lenses:

Well-Architected: High-Performance Computing (HPC) Lens <new>
Well-Architected: Serverless Applications Lens <new>

We expect to create more lenses over time, and evolve them based on customer feedback.

Philip Fitzsimons, Leader, AWS Well-Architected Team

Serverless Automated Cost Controls, Part1

Post Syndicated from Shankar Ramachandran original https://aws.amazon.com/blogs/compute/serverless-automated-cost-controls-part1/

This post courtesy of Shankar Ramachandran, Pubali Sen, and George Mao

In line with AWS’s continual efforts to reduce costs for customers, this series focuses on how customers can build serverless automated cost controls. This post provides an architecture blueprint and a sample implementation to prevent budget overruns.

This solution uses the following AWS products:

  • AWS Budgets – An AWS Cost Management tool that helps customers define and track budgets for AWS costs, and forecast for up to three months.
  • Amazon SNS – An AWS service that makes it easy to set up, operate, and send notifications from the cloud.
  • AWS Lambda – An AWS service that lets you run code without provisioning or managing servers.

You can fine-tune a budget for various parameters, for example filtering by service or tag. The Budgets tool lets you post notifications on an SNS topic. A Lambda function that subscribes to the SNS topic can act on the notification. Any programmatically implementable action can be taken.

The diagram below describes the architecture blueprint.

In this post, we describe how to use this blueprint with AWS Step Functions and IAM to effectively revoke the ability of a user to start new Amazon EC2 instances, after a budget amount is exceeded.

Freedom with guardrails

AWS lets you quickly spin up resources as you need them, deploying hundreds or even thousands of servers in minutes. This means you can quickly develop and roll out new applications. Teams can experiment and innovate more quickly and frequently. If an experiment fails, you can always de-provision those servers without risk.

This improved agility also brings in the need for effective cost controls. Your Finance and Accounting department must budget, monitor, and control the AWS spend. For example, this could be a budget per project. Further, Finance and Accounting must take appropriate actions if the budget for the project has been exceeded, for example. Call it “freedom with guardrails” – where Finance wants to give developers freedom, but with financial constraints.

Architecture

This section describes how to use the blueprint introduced earlier to implement a “freedom with guardrails” solution.

  1. The budget for “Project Beta” is set up in Budgets. In this example, we focus on EC2 usage and identify the instances that belong to this project by filtering on the tag Project with the value Beta. For more information, see Creating a Budget.
  2. The budget configuration also includes settings to send a notification on an SNS topic when the usage exceeds 100% of the budgeted amount. For more information, see Creating an Amazon SNS Topic for Budget Notifications.
  3. The master Lambda function receives the SNS notification.
  4. It triggers execution of a Step Functions state machine with the parameters for completing the configured action.
  5. The action Lambda function is triggered as a task in the state machine. The function interacts with IAM to effectively remove the user’s permissions to create an EC2 instance.

This decoupled modular design allows for extensibility.  New actions (serially or in parallel) can be added by simply adding new steps.

Implementing the solution

All the instructions and code needed to implement the architecture have been posted on the Serverless Automated Cost Controls GitHub repo. We recommend that you try this first in a Dev/Test environment.

This implementation description can be broken down into two parts:

  1. Create a solution stack for serverless automated cost controls.
  2. Verify the solution by testing the EC2 fleet.

To tie this back to the “freedom with guardrails” scenario, the Finance department performs a one-time implementation of the solution stack. To simulate resources for Project Beta, the developers spin up the test EC2 fleet.

Prerequisites

There are two prerequisites:

  • Make sure that you have the necessary IAM permissions. For more information, see the section titled “Required IAM permissions” in the README.
  • Define and activate a cost allocation tag with the key Project. For more information, see Using Cost Allocation Tags. It can take up to 12 hours for the tags to propagate to Budgets.

Create resources

The solution stack includes creating the following resources:

  • Three Lambda functions
  • One Step Functions state machine
  • One SNS topic
  • One IAM group
  • One IAM user
  • IAM policies as needed
  • One budget

Two of the Lambda functions were described in the previous section, to a) receive the SNS notification and b) trigger the Step Functions state machine. Another Lambda function is used to create the budget, as a custom AWS CloudFormation resource. The SNS topic connects Budgets with Lambda function A. Lambda function B is configured as a task in Step Functions. A budget for $2 is created which is filtered by Service: EC2 and Tag: Project, Beta. A test IAM group and user is created to enable you to validate this Cost Control Solution.

To create the serverless automated cost control solution stack, choose the button below. It takes few minutes to spin up the stack. You can monitor the progress in the CloudFormation console.

When you see the CREATE_COMPLETE status for the stack you had created, choose Outputs. Copy the following four values that you need later:

  • TemplateURL
  • UserName
  • SignInURL
  • Password

Verify the stack

The next step is to verify the serverless automated cost controls solution stack that you just created. To do this, spin up an EC2 fleet of t2.micro instances, representative of the resources needed for Project Beta, and tag them with Project, Beta.

  1. Browse to the SignInURL, and log in using the UserName and Password values copied on from the stack output.
  2. In the CloudFormation console, choose Create Stack.
  3. For Choose a template, select Choose an Amazon S3 template URL and paste the TemplateURL value from the preceding section. Choose Next.
  4. Give this stack a name, such as “testEc2FleetForProjectBeta”. Choose Next.
  5. On the Specify Details page, enter parameters such as the UserName and Password copied in the previous section. Choose Next.
  6. Ignore any errors related to listing IAM roles. The test user has a minimal set of permissions that is just sufficient to spin up this test stack (in line with security best practices).
  7. On the Options page, choose Next.
  8. On the Review page, choose Create. It takes a few minutes to spin up the stack, and you can monitor the progress in the CloudFormation console. 
  9. When you see the status “CREATE_COMPLETE”, open the EC2 console to verify that four t2.micro instances have been spun up, with the tag of Project, Beta.

The hourly cost for these instances depends on the region in which they are running. On the average (irrespective of the region), you can expect the aggregate cost for this EC2 fleet to exceed the set $2 budget in 48 hours.

Verify the solution

The first step is to identify the test IAM group that was created in the previous section. The group should have “projectBeta” in the name, prepended with the CloudFormation stack name and appended with an alphanumeric string. Verify that the managed policy associated is: “EC2FullAccess”, which indicates that the users in this group have unrestricted access to EC2.

There are two stages of verification for this serverless automated cost controls solution: simulating a notification and waiting for a breach.

Simulated notification

Because it takes at least a few hours for the aggregate cost of the EC2 fleet to breach the set budget, you can verify the solution by simulating the notification from Budgets.

  1. Log in to the SNS console (using your regular AWS credentials).
  2. Publish a message on the SNS topic that has “budgetNotificationTopic” in the name. The complete name is appended by the CloudFormation stack identifier.  
  3. Copy the following text as the body of the notification: “This is a mock notification”.
  4. Choose Publish.
  5. Open the IAM console to verify that the policy for the test group has been switched to “EC2ReadOnly”. This prevents users in this group from creating new instances.
  6. Verify that the test user created in the previous section cannot spin up new EC2 instances.  You can log in as the test user and try creating a new EC2 instance (via the same CloudFormation stack or the EC2 console). You should get an error message indicating that you do not have the necessary permissions.
  7. If you are proceeding to stage 2 of the verification, then you must switch the permissions back to “EC2FullAccess” for the test group, which can be done in the IAM console.

Automatic notification

Within 48 hours, the aggregate cost of the EC2 fleet spun up in the earlier section breaches the budget rule and triggers an automatic notification. This results in the permissions getting switched out, just as in the simulated notification.

Clean up

Use the following steps to delete your resources and stop incurring costs.

  1. Open the CloudFormation console.
  2. Delete the EC2 fleet by deleting the appropriate stack (for example, delete the stack named “testEc2FleetForProjectBeta”).                                               
  3. Next, delete the “costControlStack” stack.                                                                                                                                                    

Conclusion

Using Lambda in tandem with Budgets, you can build Serverless automated cost controls on AWS. Find all the resources (instructions, code) for implementing the solution discussed in this post on the Serverless Automated Cost Controls GitHub repo.

Stay tuned to this series for more tips about building serverless automated cost controls. In the next post, we discuss using smart lighting to influence developer behavior and describe a solution to encourage cost-aware development practices.

If you have questions or suggestions, please comment below.

 

Using AWS CodeCommit Pull Requests to request code reviews and discuss code

Post Syndicated from Chris Barclay original https://aws.amazon.com/blogs/devops/using-aws-codecommit-pull-requests-to-request-code-reviews-and-discuss-code/

Thank you to Michael Edge, Senior Cloud Architect, for a great blog on CodeCommit pull requests.

~~~~~~~

AWS CodeCommit is a fully managed service for securely hosting private Git repositories. CodeCommit now supports pull requests, which allows repository users to review, comment upon, and interactively iterate on code changes. Used as a collaboration tool between team members, pull requests help you to review potential changes to a CodeCommit repository before merging those changes into the repository. Each pull request goes through a simple lifecycle, as follows:

  • The new features to be merged are added as one or more commits to a feature branch. The commits are not merged into the destination branch.
  • The pull request is created, usually from the difference between two branches.
  • Team members review and comment on the pull request. The pull request might be updated with additional commits that contain changes made in response to comments, or include changes made to the destination branch.
  • Once team members are happy with the pull request, it is merged into the destination branch. The commits are applied to the destination branch in the same order they were added to the pull request.

Commenting is an integral part of the pull request process, and is used to collaborate between the developers and the reviewer. Reviewers add comments and questions to a pull request during the review process, and developers respond to these with explanations. Pull request comments can be added to the overall pull request, a file within the pull request, or a line within a file.

To make the comments more useful, sign in to the AWS Management Console as an AWS Identity and Access Management (IAM) user. The username will then be associated with the comment, indicating the owner of the comment. Pull request comments are a great quality improvement tool as they allow the entire development team visibility into what reviewers are looking for in the code. They also serve as a record of the discussion between team members at a point in time, and shouldn’t be deleted.

AWS CodeCommit is also introducing the ability to add comments to a commit, another useful collaboration feature that allows team members to discuss code changed as part of a commit. This helps you discuss changes made in a repository, including why the changes were made, whether further changes are necessary, or whether changes should be merged. As is the case with pull request comments, you can comment on an overall commit, on a file within a commit, or on a specific line or change within a file, and other repository users can respond to your comments. Comments are not restricted to commits, they can also be used to comment on the differences between two branches, or between two tags. Commit comments are separate from pull request comments, i.e. you will not see commit comments when reviewing a pull request – you will only see pull request comments.

A pull request example

Let’s get started by running through an example. We’ll take a typical pull request scenario and look at how we’d use CodeCommit and the AWS Management Console for each of the steps.

To try out this scenario, you’ll need:

  • An AWS CodeCommit repository with some sample code in the master branch. We’ve provided sample code below.
  • Two AWS Identity and Access Management (IAM) users, both with the AWSCodeCommitPowerUser managed policy applied to them.
  • Git installed on your local computer, and access configured for AWS CodeCommit.
  • A clone of the AWS CodeCommit repository on your local computer.

In the course of this example, you’ll sign in to the AWS CodeCommit console as one IAM user to create the pull request, and as the other IAM user to review the pull request. To learn more about how to set up your IAM users and how to connect to AWS CodeCommit with Git, see the following topics:

  • Information on creating an IAM user with AWS Management Console access.
  • Instructions on how to access CodeCommit using Git.
  • If you’d like to use the same ‘hello world’ application as used in this article, here is the source code:
package com.amazon.helloworld;

public class Main {
	public static void main(String[] args) {

		System.out.println("Hello, world");
	}
}

The scenario below uses the us-east-2 region.

Creating the branches

Before we jump in and create a pull request, we’ll need at least two branches. In this example, we’ll follow a branching strategy similar to the one described in GitFlow. We’ll create a new branch for our feature from the main development branch (the default branch). We’ll develop the feature in the feature branch. Once we’ve written and tested the code for the new feature in that branch, we’ll create a pull request that contains the differences between the feature branch and the main development branch. Our team lead (the second IAM user) will review the changes in the pull request. Once the changes have been reviewed, the feature branch will be merged into the development branch.

Figure 1: Pull request link

Sign in to the AWS CodeCommit console with the IAM user you want to use as the developer. You can use an existing repository or you can go ahead and create a new one. We won’t be merging any changes to the master branch of your repository, so it’s safe to use an existing repository for this example. You’ll find the Pull requests link has been added just above the Commits link (see Figure 1), and below Commits you’ll find the Branches link. Click Branches and create a new branch called ‘develop’, branched from the ‘master’ branch. Then create a new branch called ‘feature1’, branched from the ‘develop’ branch. You’ll end up with three branches, as you can see in Figure 2. (Your repository might contain other branches in addition to the three shown in the figure).

Figure 2: Create a feature branch

If you haven’t cloned your repo yet, go to the Code link in the CodeCommit console and click the Connect button. Follow the instructions to clone your repo (detailed instructions are here). Open a terminal or command line and paste the git clone command supplied in the Connect instructions for your repository. The example below shows cloning a repository named codecommit-demo:

git clone https://git-codecommit.us-east-2.amazonaws.com/v1/repos/codecommit-demo

If you’ve previously cloned the repo you’ll need to update your local repo with the branches you created. Open a terminal or command line and make sure you’re in the root directory of your repo, then run the following command:

git remote update origin

You’ll see your new branches pulled down to your local repository.

$ git remote update origin
Fetching origin
From https://git-codecommit.us-east-2.amazonaws.com/v1/repos/codecommit-demo
 * [new branch]      develop    -> origin/develop
 * [new branch]      feature1   -> origin/feature1

You can also see your new branches by typing:

git branch --all

$ git branch --all
* master
  remotes/origin/develop
  remotes/origin/feature1
  remotes/origin/master

Now we’ll make a change to the ‘feature1’ branch. Open a terminal or command line and check out the feature1 branch by running the following command:

git checkout feature1

$ git checkout feature1
Branch feature1 set up to track remote branch feature1 from origin.
Switched to a new branch 'feature1'

Make code changes

Edit a file in the repo using your favorite editor and save the changes. Commit your changes to the local repository, and push your changes to CodeCommit. For example:

git commit -am 'added new feature'
git push origin feature1

$ git commit -am 'added new feature'
[feature1 8f6cb28] added new feature
1 file changed, 1 insertion(+), 1 deletion(-)

$ git push origin feature1
Counting objects: 9, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (4/4), done.
Writing objects: 100% (9/9), 617 bytes | 617.00 KiB/s, done.
Total 9 (delta 2), reused 0 (delta 0)
To https://git-codecommit.us-east-2.amazonaws.com/v1/repos/codecommit-demo
   2774a53..8f6cb28  feature1 -> feature1

Creating the pull request

Now we have a ‘feature1’ branch that differs from the ‘develop’ branch. At this point we want to merge our changes into the ‘develop’ branch. We’ll create a pull request to notify our team members to review our changes and check whether they are ready for a merge.

In the AWS CodeCommit console, click Pull requests. Click Create pull request. On the next page select ‘develop’ as the destination branch and ‘feature1’ as the source branch. Click Compare. CodeCommit will check for merge conflicts and highlight whether the branches can be automatically merged using the fast-forward option, or whether a manual merge is necessary. A pull request can be created in both situations.

Figure 3: Create a pull request

After comparing the two branches, the CodeCommit console displays the information you’ll need in order to create the pull request. In the ‘Details’ section, the ‘Title’ for the pull request is mandatory, and you may optionally provide comments to your reviewers to explain the code change you have made and what you’d like them to review. In the ‘Notifications’ section, there is an option to set up notifications to notify subscribers of changes to your pull request. Notifications will be sent on creation of the pull request as well as for any pull request updates or comments. And finally, you can review the changes that make up this pull request. This includes both the individual commits (a pull request can contain one or more commits, available in the Commits tab) as well as the changes made to each file, i.e. the diff between the two branches referenced by the pull request, available in the Changes tab. After you have reviewed this information and added a title for your pull request, click the Create button. You will see a confirmation screen, as shown in Figure 4, indicating that your pull request has been successfully created, and can be merged without conflicts into the ‘develop’ branch.

Figure 4: Pull request confirmation page

Reviewing the pull request

Now let’s view the pull request from the perspective of the team lead. If you set up notifications for this CodeCommit repository, creating the pull request would have sent an email notification to the team lead, and he/she can use the links in the email to navigate directly to the pull request. In this example, sign in to the AWS CodeCommit console as the IAM user you’re using as the team lead, and click Pull requests. You will see the same information you did during creation of the pull request, plus a record of activity related to the pull request, as you can see in Figure 5.

Figure 5: Team lead reviewing the pull request

Commenting on the pull request

You now perform a thorough review of the changes and make a number of comments using the new pull request comment feature. To gain an overall perspective on the pull request, you might first go to the Commits tab and review how many commits are included in this pull request. Next, you might visit the Changes tab to review the changes, which displays the differences between the feature branch code and the develop branch code. At this point, you can add comments to the pull request as you work through each of the changes. Let’s go ahead and review the pull request. During the review, you can add review comments at three levels:

  • The overall pull request
  • A file within the pull request
  • An individual line within a file

The overall pull request
In the Changes tab near the bottom of the page you’ll see a ‘Comments on changes’ box. We’ll add comments here related to the overall pull request. Add your comments as shown in Figure 6 and click the Save button.

Figure 6: Pull request comment

A specific file in the pull request
Hovering your mouse over a filename in the Changes tab will cause a blue ‘comments’ icon to appear to the left of the filename. Clicking the icon will allow you to enter comments specific to this file, as in the example in Figure 7. Go ahead and add comments for one of the files changed by the developer. Click the Save button to save your comment.

Figure 7: File comment

A specific line in a file in the pull request
A blue ‘comments’ icon will appear as you hover over individual lines within each file in the pull request, allowing you to create comments against lines that have been added, removed or are unchanged. In Figure 8, you add comments against a line that has been added to the source code, encouraging the developer to review the naming standards. Go ahead and add line comments for one of the files changed by the developer. Click the Save button to save your comment.

Figure 8: Line comment

A pull request that has been commented at all three levels will look similar to Figure 9. The pull request comment is shown expanded in the ‘Comments on changes’ section, while the comments at file and line level are shown collapsed. A ‘comment’ icon indicates that comments exist at file and line level. Clicking the icon will expand and show the comment. Since you are expecting the developer to make further changes based on your comments, you won’t merge the pull request at this stage, but will leave it open awaiting feedback. Each comment you made results in a notification being sent to the developer, who can respond to the comments. This is great for remote working, where developers and team lead may be in different time zones.

Figure 9: Fully commented pull request

Adding a little complexity

A typical development team is going to be creating pull requests on a regular basis. It’s highly likely that the team lead will merge other pull requests into the ‘develop’ branch while pull requests on feature branches are in the review stage. This may result in a change to the ‘Mergable’ status of a pull request. Let’s add this scenario into the mix and check out how a developer will handle this.

To test this scenario, we could create a new pull request and ask the team lead to merge this to the ‘develop’ branch. But for the sake of simplicity we’ll take a shortcut. Clone your CodeCommit repo to a new folder, switch to the ‘develop’ branch, and make a change to one of the same files that were changed in your pull request. Make sure you change a line of code that was also changed in the pull request. Commit and push this back to CodeCommit. Since you’ve just changed a line of code in the ‘develop’ branch that has also been changed in the ‘feature1’ branch, the ‘feature1’ branch cannot be cleanly merged into the ‘develop’ branch. Your developer will need to resolve this merge conflict.

A developer reviewing the pull request would see the pull request now looks similar to Figure 10, with a ‘Resolve conflicts’ status rather than the ‘Mergable’ status it had previously (see Figure 5).

Figure 10: Pull request with merge conflicts

Reviewing the review comments

Once the team lead has completed his review, the developer will review the comments and make the suggested changes. As a developer, you’ll see the list of review comments made by the team lead in the pull request Activity tab, as shown in Figure 11. The Activity tab shows the history of the pull request, including commits and comments. You can reply to the review comments directly from the Activity tab, by clicking the Reply button, or you can do this from the Changes tab. The Changes tab shows the comments for the latest commit, as comments on previous commits may be associated with lines that have changed or been removed in the current commit. Comments for previous commits are available to view and reply to in the Activity tab.

In the Activity tab, use the shortcut link (which looks like this </>) to move quickly to the source code associated with the comment. In this example, you will make further changes to the source code to address the pull request review comments, so let’s go ahead and do this now. But first, you will need to resolve the ‘Resolve conflicts’ status.

Figure 11: Pull request activity

Resolving the ‘Resolve conflicts’ status

The ‘Resolve conflicts’ status indicates there is a merge conflict between the ‘develop’ branch and the ‘feature1’ branch. This will require manual intervention to restore the pull request back to the ‘Mergable’ state. We will resolve this conflict next.

Open a terminal or command line and check out the develop branch by running the following command:

git checkout develop

$ git checkout develop
Switched to branch 'develop'
Your branch is up-to-date with 'origin/develop'.

To incorporate the changes the team lead made to the ‘develop’ branch, merge the remote ‘develop’ branch with your local copy:

git pull

$ git pull
remote: Counting objects: 9, done.
Unpacking objects: 100% (9/9), done.
From https://git-codecommit.us-east-2.amazonaws.com/v1/repos/codecommit-demo
   af13c82..7b36f52  develop    -> origin/develop
Updating af13c82..7b36f52
Fast-forward
 src/main/java/com/amazon/helloworld/Main.java | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Then checkout the ‘feature1’ branch:

git checkout feature1

$ git checkout feature1
Switched to branch 'feature1'
Your branch is up-to-date with 'origin/feature1'.

Now merge the changes from the ‘develop’ branch into your ‘feature1’ branch:

git merge develop

$ git merge develop
Auto-merging src/main/java/com/amazon/helloworld/Main.java
CONFLICT (content): Merge conflict in src/main/java/com/amazon/helloworld/Main.java
Automatic merge failed; fix conflicts and then commit the result.

Yes, this fails. The file Main.java has been changed in both branches, resulting in a merge conflict that can’t be resolved automatically. However, Main.java will now contain markers that indicate where the conflicting code is, and you can use these to resolve the issues manually. Edit Main.java using your favorite IDE, and you’ll see it looks something like this:

package com.amazon.helloworld;

import java.util.*;

/**
 * This class prints a hello world message
 */

public class Main {
   public static void main(String[] args) {

<<<<<<< HEAD
        Date todaysdate = Calendar.getInstance().getTime();

        System.out.println("Hello, earthling. Today's date is: " + todaysdate);
=======
      System.out.println("Hello, earth");
>>>>>>> develop
   }
}

The code between HEAD and ‘===’ is the code the developer added in the ‘feature1’ branch (HEAD represents ‘feature1’ because this is the current checked out branch). The code between ‘===’ and ‘>>> develop’ is the code added to the ‘develop’ branch by the team lead. We’ll resolve the conflict by manually merging both changes, resulting in an updated Main.java:

package com.amazon.helloworld;

import java.util.*;

/**
 * This class prints a hello world message
 */

public class Main {
   public static void main(String[] args) {

        Date todaysdate = Calendar.getInstance().getTime();

        System.out.println("Hello, earth. Today's date is: " + todaysdate);
   }
}

After saving the change you can add and commit it to your local repo:

git add src/
git commit -m 'fixed merge conflict by merging changes'

Fixing issues raised by the reviewer

Now you are ready to address the comments made by the team lead. If you are no longer pointing to the ‘feature1’ branch, check out the ‘feature1’ branch by running the following command:

git checkout feature1

$ git checkout feature1
Branch feature1 set up to track remote branch feature1 from origin.
Switched to a new branch 'feature1'

Edit the source code in your favorite IDE and make the changes to address the comments. In this example, the developer has updated the source code as follows:

package com.amazon.helloworld;

import java.util.*;

/**
 *  This class prints a hello world message
 *
 * @author Michael Edge
 * @see HelloEarth
 * @version 1.0
 */

public class Main {
   public static void main(String[] args) {

        Date todaysDate = Calendar.getInstance().getTime();

        System.out.println("Hello, earth. Today's date is: " + todaysDate);
   }
}

After saving the changes, commit and push to the CodeCommit ‘feature1’ branch as you did previously:

git commit -am 'updated based on review comments'
git push origin feature1

Responding to the reviewer

Now that you’ve fixed the code issues you will want to respond to the review comments. In the AWS CodeCommit console, check that your latest commit appears in the pull request Commits tab. You now have a pull request consisting of more than one commit. The pull request in Figure 12 has four commits, which originated from the following activities:

  • 8th Nov: the original commit used to initiate this pull request
  • 10th Nov, 3 hours ago: the commit by the team lead to the ‘develop’ branch, merged into our ‘feature1’ branch
  • 10th Nov, 24 minutes ago: the commit by the developer that resolved the merge conflict
  • 10th Nov, 4 minutes ago: the final commit by the developer addressing the review comments

Figure 12: Pull request with multiple commits

Let’s reply to the review comments provided by the team lead. In the Activity tab, reply to the pull request comment and save it, as shown in Figure 13.

Figure 13: Replying to a pull request comment

At this stage, your code has been committed and you’ve updated your pull request comments, so you are ready for a final review by the team lead.

Final review

The team lead reviews the code changes and comments made by the developer. As team lead, you own the ‘develop’ branch and it’s your decision on whether to merge the changes in the pull request into the ‘develop’ branch. You can close the pull request with or without merging using the Merge and Close buttons at the bottom of the pull request page (see Figure 13). Clicking Close will allow you to add comments on why you are closing the pull request without merging. Merging will perform a fast-forward merge, incorporating the commits referenced by the pull request. Let’s go ahead and click the Merge button to merge the pull request into the ‘develop’ branch.

Figure 14: Merging the pull request

After merging a pull request, development of that feature is complete and the feature branch is no longer needed. It’s common practice to delete the feature branch after merging. CodeCommit provides a check box during merge to automatically delete the associated feature branch, as seen in Figure 14. Clicking the Merge button will merge the pull request into the ‘develop’ branch, as shown in Figure 15. This will update the status of the pull request to ‘Merged’, and will close the pull request.

Conclusion

This blog has demonstrated how pull requests can be used to request a code review, and enable reviewers to get a comprehensive summary of what is changing, provide feedback to the author, and merge the code into production. For more information on pull requests, see the documentation.

How AWS Managed Microsoft AD Helps to Simplify the Deployment and Improve the Security of Active Directory–Integrated .NET Applications

Post Syndicated from Peter Pereira original https://aws.amazon.com/blogs/security/how-aws-managed-microsoft-ad-helps-to-simplify-the-deployment-and-improve-the-security-of-active-directory-integrated-net-applications/

Companies using .NET applications to access sensitive user information, such as employee salary, Social Security Number, and credit card information, need an easy and secure way to manage access for users and applications.

For example, let’s say that your company has a .NET payroll application. You want your Human Resources (HR) team to manage and update the payroll data for all the employees in your company. You also want your employees to be able to see their own payroll information in the application. To meet these requirements in a user-friendly and secure way, you want to manage access to the .NET application by using your existing Microsoft Active Directory identities. This enables you to provide users with single sign-on (SSO) access to the .NET application and to manage permissions using Active Directory groups. You also want the .NET application to authenticate itself to access the database, and to limit access to the data in the database based on the identity of the application user.

Microsoft Active Directory supports these requirements through group Managed Service Accounts (gMSAs) and Kerberos constrained delegation (KCD). AWS Directory Service for Microsoft Active Directory, also known as AWS Managed Microsoft AD, enables you to manage gMSAs and KCD through your administrative account, helping you to migrate and develop .NET applications that need these native Active Directory features.

In this blog post, I give an overview of how to use AWS Managed Microsoft AD to manage gMSAs and KCD and demonstrate how you can configure a gMSA and KCD in six steps for a .NET application:

  1. Create your AWS Managed Microsoft AD.
  2. Create your Amazon RDS for SQL Server database.
  3. Create a gMSA for your .NET application.
  4. Deploy your .NET application.
  5. Configure your .NET application to use the gMSA.
  6. Configure KCD for your .NET application.

Solution overview

The following diagram shows the components of a .NET application that uses Amazon RDS for SQL Server with a gMSA and KCD. The diagram also illustrates authentication and access and is numbered to show the six key steps required to use a gMSA and KCD. To deploy this solution, the AWS Managed Microsoft AD directory must be in the same Amazon Virtual Private Cloud (VPC) as RDS for SQL Server. For this example, my company name is Example Corp., and my directory uses the domain name, example.com.

Diagram showing the components of a .NET application that uses Amazon RDS for SQL Server with a gMSA and KCD

Deploy the solution

The following six steps (numbered to correlate with the preceding diagram) walk you through configuring and using a gMSA and KCD.

1. Create your AWS Managed Microsoft AD directory

Using the Directory Service console, create your AWS Managed Microsoft AD directory in your Amazon VPC. In my example, my domain name is example.com.

Image of creating an AWS Managed Microsoft AD directory in an Amazon VPC

2. Create your Amazon RDS for SQL Server database

Using the RDS console, create your Amazon RDS for SQL Server database instance in the same Amazon VPC where your directory is running, and enable Windows Authentication. To enable Windows Authentication, select your directory in the Microsoft SQL Server Windows Authentication section in the Configure Advanced Settings step of the database creation workflow (see the following screenshot).

In my example, I create my Amazon RDS for SQL Server db-example database, and enable Windows Authentication to allow my db-example database to authenticate against my example.com directory.

Screenshot of configuring advanced settings

3. Create a gMSA for your .NET application

Now that you have deployed your directory, database, and application, you can create a gMSA for your .NET application.

To perform the next steps, you must install the Active Directory administration tools on a Windows server that is joined to your AWS Managed Microsoft AD directory domain. If you do not have a Windows server joined to your directory domain, you can deploy a new Amazon EC2 for Microsoft Windows Server instance and join it to your directory domain.

To create a gMSA for your .NET application:

  1. Log on to the instance on which you installed the Active Directory administration tools by using a user that is a member of the Admins security group or the Managed Service Accounts Admins security group in your organizational unit (OU). For my example, I use the Admin user in the example OU.

Screenshot of logging on to the instance on which you installed the Active Directory administration tools

  1. Identify which .NET application servers (hosts) will run your .NET application. Create a new security group in your OU and add your .NET application servers as members of this new group. This allows a group of application servers to use a single gMSA, instead of creating one gMSA for each server. In my example, I create a group, App_server_grp, in my example OU. I also add Appserver1, which is my .NET application server computer name, as a member of this new group.

Screenshot of creating a new security group

  1. Create a gMSA in your directory by running Windows PowerShell from the Start menu. The basic syntax to create the gMSA at the Windows PowerShell command prompt follows.
    PS C:\Users\admin> New-ADServiceAccount -name [gMSAname] -DNSHostName [domainname] -PrincipalsAllowedToRetrieveManagedPassword [AppServersSecurityGroup] -TrustedForDelegation $truedn <Enter>

    In my example, the gMSAname is gMSAexample, the DNSHostName is example.com, and the PrincipalsAllowedToRetrieveManagedPassword is the recently created security group, App_server_grp.

    PS C:\Users\admin> New-ADServiceAccount -name gMSAexample -DNSHostName example.com -PrincipalsAllowedToRetrieveManagedPassword App_server_grp -TrustedForDelegation $truedn <Enter>

    To confirm you created the gMSA, you can run the Get-ADServiceAccount command from the PowerShell command prompt.

    PS C:\Users\admin> Get-ADServiceAccount gMSAexample <Enter>
    
    DistinguishedName : CN=gMSAexample,CN=Managed Service Accounts,DC=example,DC=com
    Enabled           : True
    Name              : gMSAexample
    ObjectClass       : msDS-GroupManagedServiceAccount
    ObjectGUID        : 24d8b68d-36d5-4dc3-b0a9-edbbb5dc8a5b
    SamAccountName    : gMSAexample$
    SID               : S-1-5-21-2100421304-991410377-951759617-1603
    UserPrincipalName :

    You also can confirm you created the gMSA by opening the Active Directory Users and Computers utility located in your Administrative Tools folder, expand the domain (example.com in my case), and expand the Managed Service Accounts folder.
    Screenshot of confirming the creation of the gMSA

4. Deploy your .NET application

Deploy your .NET application on IIS on Amazon EC2 for Windows Server instances. For this step, I assume you are the application’s expert and already know how to deploy it. Make sure that all of your instances are joined to your directory.

5. Configure your .NET application to use the gMSA

You can configure your .NET application to use the gMSA to enforce strong password security policy and ensure password rotation of your service account. This helps to improve the security and simplify the management of your .NET application. Configure your .NET application in two steps:

  1. Grant to gMSA the required permissions to run your .NET application in the respective application folders. This is a critical step because when you change the application pool identity account to use gMSA, downtime can occur if the gMSA does not have the application’s required permissions. Therefore, make sure you first test the configurations in your development and test environments.
  2. Configure your application pool identity on IIS to use the gMSA as the service account. When you configure a gMSA as the service account, you include the $ at the end of the gMSA name. You do not need to provide a password because AWS Managed Microsoft AD automatically creates and rotates the password. In my example, my service account is gMSAexample$, as shown in the following screenshot.

Screenshot of configuring application pool identity

You have completed all the steps to use gMSA to create and rotate your .NET application service account password! Now, you will configure KCD for your .NET application.

6. Configure KCD for your .NET application

You now are ready to allow your .NET application to have access to other services by using the user identity’s permissions instead of the application service account’s permissions. Note that KCD and gMSA are independent features, which means you do not have to create a gMSA to use KCD. For this example, I am using both features to show how you can use them together. To configure a regular service account such as a user or local built-in account, see the Kerberos constrained delegation with ASP.NET blog post on MSDN.

In my example, my goal is to delegate to the gMSAexample account the ability to enforce the user’s permissions to my db-example SQL Server database, instead of the gMSAexample account’s permissions. For this, I have to update the msDS-AllowedToDelegateTo gMSA attribute. The value for this attribute is the service principal name (SPN) of the service instance that you are targeting, which in this case is the db-example Amazon RDS for SQL Server database.

The SPN format for the msDS-AllowedToDelegateTo attribute is a combination of the service class, the Kerberos authentication endpoint, and the port number. The Amazon RDS for SQL Server Kerberos authentication endpoint format is [database_name].[domain_name]. The value for my msDS-AllowedToDelegateTo attribute is MSSQLSvc/db-example.example.com:1433, where MSSQLSvc and 1433 are the SQL Server Database service class and port number standards, respectively.

Follow these steps to perform the msDS-AllowedToDelegateTo gMSA attribute configuration:

  1. Log on to your Active Directory management instance with a user identity that is a member of the Kerberos Delegation Admins security group. In this case, I will use admin.
  2. Open the Active Directory Users and Groups utility located in your Administrative Tools folder, choose View, and then choose Advanced Features.
  3. Expand your domain name (example.com in this example), and then choose the Managed Service Accounts security group. Right-click the gMSA account for the application pool you want to enable for Kerberos delegation, choose Properties, and choose the Attribute Editor tab.
  4. Search for the msDS-AllowedToDelegateTo attribute on the Attribute Editor tab and choose Edit.
  5. Enter the MSSQLSvc/db-example.example.com:1433 value and choose Add.
    Screenshot of entering the value of the multi-valued string
  6. Choose OK and Apply, and your KCD configuration is complete.

Congratulations! At this point, your application is using a gMSA rather than an embedded static user identity and password, and the application is able to access SQL Server using the identity of the application user. The gMSA eliminates the need for you to rotate the application’s password manually, and it allows you to better scope permissions for the application. When you use KCD, you can enforce access to your database consistently based on user identities at the database level, which prevents improper access that might otherwise occur because of an application error.

Summary

In this blog post, I demonstrated how to simplify the deployment and improve the security of your .NET application by using a group Managed Service Account and Kerberos constrained delegation with your AWS Managed Microsoft AD directory. I also outlined the main steps to get your .NET environment up and running on a managed Active Directory and SQL Server infrastructure. This approach will make it easier for you to build new .NET applications in the AWS Cloud or migrate existing ones in a more secure way.

For additional information about using group Managed Service Accounts and Kerberos constrained delegation with your AWS Managed Microsoft AD directory, see the AWS Directory Service documentation.

To learn more about AWS Directory Service, see the AWS Directory Service home page. If you have questions about this post or its solution, start a new thread on the Directory Service forum.

– Peter

Одобрението за членството на България в ЕС се запазва

Post Syndicated from nellyo original https://nellyo.wordpress.com/2017/11/17/eu_poll/

Одобрението за членството на България в ЕС се запазва в рамките на 50 на сто, срещу 16% неодобрение и 34% неутрални оценки.

Най-силни еврооптимисти остават по-младите, с по-високо образование, активност и мобилност, които са се възползвали най-пълноценно  от предимствата на членството.

Национално представително проучване на Алфа Рисърч, проведено по поръчка на Представителството на ЕК в България, в периода 16 – 25 септември 2017 г.

 

Filed under: Uncategorized

Introducing Cloud Native Networking for Amazon ECS Containers

Post Syndicated from Nathan Taber original https://aws.amazon.com/blogs/compute/introducing-cloud-native-networking-for-ecs-containers/

This post courtesy of ECS Sr. Software Dev Engineer Anirudh Aithal.

Today, AWS announced Task Networking for Amazon ECS. This feature brings Amazon EC2 networking capabilities to tasks using elastic network interfaces.

An elastic network interface is a virtual network interface that you can attach to an instance in a VPC. When you launch an EC2 virtual machine, an elastic network interface is automatically provisioned to provide networking capabilities for the instance.

A task is a logical group of running containers. Previously, tasks running on Amazon ECS shared the elastic network interface of their EC2 host. Now, the new awsvpc networking mode lets you attach an elastic network interface directly to a task.

This simplifies network configuration, allowing you to treat each container just like an EC2 instance with full networking features, segmentation, and security controls in the VPC.

In this post, I cover how awsvpc mode works and show you how you can start using elastic network interfaces with your tasks running on ECS.

Background:  Elastic network interfaces in EC2

When you launch EC2 instances within a VPC, you don’t have to configure an additional overlay network for those instances to communicate with each other. By default, routing tables in the VPC enable seamless communication between instances and other endpoints. This is made possible by virtual network interfaces in VPCs called elastic network interfaces. Every EC2 instance that launches is automatically assigned an elastic network interface (the primary network interface). All networking parameters—such as subnets, security groups, and so on—are handled as properties of this primary network interface.

Furthermore, an IPv4 address is allocated to every elastic network interface by the VPC at creation (the primary IPv4 address). This primary address is unique and routable within the VPC. This effectively makes your VPC a flat network, resulting in a simple networking topology.

Elastic network interfaces can be treated as fundamental building blocks for connecting various endpoints in a VPC, upon which you can build higher-level abstractions. This allows elastic network interfaces to be leveraged for:

  • VPC-native IPv4 addressing and routing (between instances and other endpoints in the VPC)
  • Network traffic isolation
  • Network policy enforcement using ACLs and firewall rules (security groups)
  • IPv4 address range enforcement (via subnet CIDRs)

Why use awsvpc?

Previously, ECS relied on the networking capability provided by Docker’s default networking behavior to set up the network stack for containers. With the default bridge network mode, containers on an instance are connected to each other using the docker0 bridge. Containers use this bridge to communicate with endpoints outside of the instance, using the primary elastic network interface of the instance on which they are running. Containers share and rely on the networking properties of the primary elastic network interface, including the firewall rules (security group subscription) and IP addressing.

This means you cannot address these containers with the IP address allocated by Docker (it’s allocated from a pool of locally scoped addresses), nor can you enforce finely grained network ACLs and firewall rules. Instead, containers are addressable in your VPC by the combination of the IP address of the primary elastic network interface of the instance, and the host port to which they are mapped (either via static or dynamic port mapping). Also, because a single elastic network interface is shared by multiple containers, it can be difficult to create easily understandable network policies for each container.

The awsvpc networking mode addresses these issues by provisioning elastic network interfaces on a per-task basis. Hence, containers no longer share or contend use these resources. This enables you to:

  • Run multiple copies of the container on the same instance using the same container port without needing to do any port mapping or translation, simplifying the application architecture.
  • Extract higher network performance from your applications as they no longer contend for bandwidth on a shared bridge.
  • Enforce finer-grained access controls for your containerized applications by associating security group rules for each Amazon ECS task, thus improving the security for your applications.

Associating security group rules with a container or containers in a task allows you to restrict the ports and IP addresses from which your application accepts network traffic. For example, you can enforce a policy allowing SSH access to your instance, but blocking the same for containers. Alternatively, you could also enforce a policy where you allow HTTP traffic on port 80 for your containers, but block the same for your instances. Enforcing such security group rules greatly reduces the surface area of attack for your instances and containers.

ECS manages the lifecycle and provisioning of elastic network interfaces for your tasks, creating them on-demand and cleaning them up after your tasks stop. You can specify the same properties for the task as you would when launching an EC2 instance. This means that containers in such tasks are:

  • Addressable by IP addresses and the DNS name of the elastic network interface
  • Attachable as ‘IP’ targets to Application Load Balancers and Network Load Balancers
  • Observable from VPC flow logs
  • Access controlled by security groups

­This also enables you to run multiple copies of the same task definition on the same instance, without needing to worry about port conflicts. You benefit from higher performance because you don’t need to perform any port translations or contend for bandwidth on the shared docker0 bridge, as you do with the bridge networking mode.

Getting started

If you don’t already have an ECS cluster, you can create one using the create cluster wizard. In this post, I use “awsvpc-demo” as the cluster name. Also, if you are following along with the command line instructions, make sure that you have the latest version of the AWS CLI or SDK.

Registering the task definition

The only change to make in your task definition for task networking is to set the networkMode parameter to awsvpc. In the ECS console, enter this value for Network Mode.

 

If you plan on registering a container in this task definition with an ECS service, also specify a container port in the task definition. This example specifies an NGINX container exposing port 80:

This creates a task definition named “nginx-awsvpc" with networking mode set to awsvpc. The following commands illustrate registering the task definition from the command line:

$ cat nginx-awsvpc.json
{
        "family": "nginx-awsvpc",
        "networkMode": "awsvpc",
        "containerDefinitions": [
            {
                "name": "nginx",
                "image": "nginx:latest",
                "cpu": 100,
                "memory": 512,
                "essential": true,
                "portMappings": [
                  {
                    "containerPort": 80,
                    "protocol": "tcp"
                  }
                ]
            }
        ]
}

$ aws ecs register-task-definition --cli-input-json file://./nginx-awsvpc.json

Running the task

To run a task with this task definition, navigate to the cluster in the Amazon ECS console and choose Run new task. Specify the task definition as “nginx-awsvpc“. Next, specify the set of subnets in which to run this task. You must have instances registered with ECS in at least one of these subnets. Otherwise, ECS can’t find a candidate instance to attach the elastic network interface.

You can use the console to narrow down the subnets by selecting a value for Cluster VPC:

 

Next, select a security group for the task. For the purposes of this example, create a new security group that allows ingress only on port 80. Alternatively, you can also select security groups that you’ve already created.

Next, run the task by choosing Run Task.

You should have a running task now. If you look at the details of the task, you see that it has an elastic network interface allocated to it, along with the IP address of the elastic network interface:

You can also use the command line to do this:

$ aws ecs run-task --cluster awsvpc-ecs-demo --network-configuration "awsvpcConfiguration={subnets=["subnet-c070009b"],securityGroups=["sg-9effe8e4"]}" nginx-awsvpc $ aws ecs describe-tasks --cluster awsvpc-ecs-demo --task $ECS_TASK_ARN --query tasks[0]
{
    "taskArn": "arn:aws:ecs:us-west-2:xx..x:task/f5xx-...",
    "group": "family:nginx-awsvpc",
    "attachments": [
        {
            "status": "ATTACHED",
            "type": "ElasticNetworkInterface",
            "id": "xx..",
            "details": [
                {
                    "name": "subnetId",
                    "value": "subnet-c070009b"
                },
                {
                    "name": "networkInterfaceId",
                    "value": "eni-b0aaa4b2"
                },
                {
                    "name": "macAddress",
                    "value": "0a:47:e4:7a:2b:02"
                },
                {
                    "name": "privateIPv4Address",
                    "value": "10.0.0.35"
                }
            ]
        }
    ],
    ...
    "desiredStatus": "RUNNING",
    "taskDefinitionArn": "arn:aws:ecs:us-west-2:xx..x:task-definition/nginx-awsvpc:2",
    "containers": [
        {
            "containerArn": "arn:aws:ecs:us-west-2:xx..x:container/62xx-...",
            "taskArn": "arn:aws:ecs:us-west-2:xx..x:task/f5x-...",
            "name": "nginx",
            "networkBindings": [],
            "lastStatus": "RUNNING",
            "networkInterfaces": [
                {
                    "privateIpv4Address": "10.0.0.35",
                    "attachmentId": "xx.."
                }
            ]
        }
    ]
}

When you describe an “awsvpc” task, details of the elastic network interface are returned via the “attachments” object. You can also get this information from the “containers” object. For example:

$ aws ecs describe-tasks --cluster awsvpc-ecs-demo --task $ECS_TASK_ARN --query tasks[0].containers[0].networkInterfaces[0].privateIpv4Address
"10.0.0.35"

Conclusion

The nginx container is now addressable in your VPC via the 10.0.0.35 IPv4 address. You did not have to modify the security group on the instance to allow requests on port 80, thus improving instance security. Also, you ensured that all ports apart from port 80 were blocked for this application without modifying the application itself, which makes it easier to manage your task on the network. You did not have to interact with any of the elastic network interface API operations, as ECS handled all of that for you.

You can read more about the task networking feature in the ECS documentation. For a detailed look at how this new networking mode is implemented on an instance, see Under the Hood: Task Networking for Amazon ECS.

Please use the comments section below to send your feedback.

Now You Can Monitor DDoS Attack Trends with AWS Shield Advanced

Post Syndicated from Ritwik Manan original https://aws.amazon.com/blogs/security/now-you-can-monitor-ddos-attack-trends-with-aws-shield-advanced/

AWS Shield Advanced has always notified you about DDoS attacks on your applications via the AWS Management Console and API as well as Amazon CloudWatch metrics. Today, we added the global threat environment dashboard to AWS Shield Advanced to allow you to view trends and metrics about DDoS attacks across Amazon CloudFront, Elastic Load Balancing, and Amazon Route 53. This information can help you understand the DDoS target profile of the AWS services you use and, in turn, can help you create a more resilient and distributed architecture for your application.

The global threat environment dashboard shows comprehensive and easy-to-understand data about DDoS attacks. The dashboard displays a summary of the global threat environment, including the largest attacks, top vectors, and the relative number of significant attacks. You also can view the dashboard for different time durations to give you a history of DDoS attacks.

To get started with the global threat environment dashboard:

  1. Sign in to the AWS Management Console and navigate to the AWS WAF and AWS Shield console.
  2. To activate AWS Shield Advanced, choose Protected resources in the navigation pane, choose Activate AWS Shield Advanced, and then accept the terms by typing I accept.
  3. Navigate to the global threat environment dashboard through the navigation pane.
  4. Choose your desired time period from the time period drop-down menu in the top right part of the page.

You can use the information on the global threat environment dashboard to understand the threat landscape as well as to inform decisions you make that will help to better protect your AWS resources.

To learn more information, see Global Threat Environment Dashboard: View DDoS Attack Trends Across AWS.

– Ritwik

Amazon SES Now Supports DMARC Validation and Reporting for Incoming Email

Post Syndicated from Nic Webb original https://aws.amazon.com/blogs/ses/amazon-ses-now-supports-dmarc-validation-and-reporting-for-incoming-email/

Amazon SES now adds DMARC verdicts to incoming emails, and publishes aggregate DMARC reports to domain owners. These two new features will help combat email spoofing and phishing, making the email ecosystem a safer and more secure place.

What is DMARC?

DMARC stands for Domain-based Message Authentication, Reporting, and Conformance. The DMARC standard was designed to prevent malicious actors from sending messages that appear to be from legitimate senders. Domain owners can tell email receivers how to handle unauthenticated messages that appear to be from their domains. The DMARC standard also specifies certain reports that email senders and receivers send to each other. The cooperative nature of this reporting process helps improve the email authentication infrastructure.

How does Amazon SES Implement DMARC?

When you receive an email message through Amazon SES, the headers of that message will include a DMARC policy verdict alongside the DKIM and SPF verdicts (both of which are already present). This additional information helps you verify the authenticity of all email messages you receive.

Messages you receive through Amazon SES will contain one of the following DMARC verdicts:

  • PASS – The message passed DMARC authentication.
  • FAIL – The message failed DMARC authentication.
  • GRAY – The sending domain does not have a DMARC policy.
  • PROCESSING_FAILED – An issue occurred that prevented Amazon SES from providing a DMARC verdict.

If the DMARC verdict is FAIL, Amazon SES will also provide information about the sending domain’s DMARC settings. In this situation, you will see one of the following verdicts:

  • NONE – The owner of the sending domain requests that no specific action be taken on messages that fail DMARC authentication.
  • QUARANTINE – The owner of the sending domain requests that messages that fail DMARC authentication be treated by receivers as suspicious.
  • REJECT – The owner of the sending domain requests that messages that fail DMARC authentication be rejected.

In addition to publishing the DMARC verdict on each incoming message, Amazon SES now sends DMARC aggregate reports to domain owners. These reports help domain owners identify systemic authentication failures, and avoid potential domain spoofing attacks.

Note: Domain owners only receive aggregate information about emails that do not pass DMARC authentication. These reports, known as RUA reports, only include information about the IP addresses that send unauthenticated emails to you. These reports do not include information about legitimate email senders.

How do I configure DMARC?

As is the case with SPF and DKIM, domain owners must publish their DMARC policies as DNS records for their domains. For more information about setting up DMARC, see Complying with DMARC Using Amazon SES in the Amazon SES Developer Guide.

DMARC reporting is now available in the following AWS Regions: US West (Oregon), US East (N. Virginia), and EU (Ireland). You can find more information about the dmarcVerdict and dmarcPolicy objects in the Amazon SES Developer Guide. The Developer Guide also includes a sample Lambda function that you can use to bounce incoming emails that fail DMARC authentication.

Using AWS Step Functions State Machines to Handle Workflow-Driven AWS CodePipeline Actions

Post Syndicated from Marcilio Mendonca original https://aws.amazon.com/blogs/devops/using-aws-step-functions-state-machines-to-handle-workflow-driven-aws-codepipeline-actions/

AWS CodePipeline is a continuous integration and continuous delivery service for fast and reliable application and infrastructure updates. It offers powerful integration with other AWS services, such as AWS CodeBuildAWS CodeDeployAWS CodeCommit, AWS CloudFormation and with third-party tools such as Jenkins and GitHub. These services make it possible for AWS customers to successfully automate various tasks, including infrastructure provisioning, blue/green deployments, serverless deployments, AMI baking, database provisioning, and release management.

Developers have been able to use CodePipeline to build sophisticated automation pipelines that often require a single CodePipeline action to perform multiple tasks, fork into different execution paths, and deal with asynchronous behavior. For example, to deploy a Lambda function, a CodePipeline action might first inspect the changes pushed to the code repository. If only the Lambda code has changed, the action can simply update the Lambda code package, create a new version, and point the Lambda alias to the new version. If the changes also affect infrastructure resources managed by AWS CloudFormation, the pipeline action might have to create a stack or update an existing one through the use of a change set. In addition, if an update is required, the pipeline action might enforce a safety policy to infrastructure resources that prevents the deletion and replacement of resources. You can do this by creating a change set and having the pipeline action inspect its changes before updating the stack. Change sets that do not conform to the policy are deleted.

This use case is a good illustration of workflow-driven pipeline actions. These are actions that run multiple tasks, deal with async behavior and loops, need to maintain and propagate state, and fork into different execution paths. Implementing workflow-driven actions directly in CodePipeline can lead to complex pipelines that are hard for developers to understand and maintain. Ideally, a pipeline action should perform a single task and delegate the complexity of dealing with workflow-driven behavior associated with that task to a state machine engine. This would make it possible for developers to build simpler, more intuitive pipelines and allow them to use state machine execution logs to visualize and troubleshoot their pipeline actions.

In this blog post, we discuss how AWS Step Functions state machines can be used to handle workflow-driven actions. We show how a CodePipeline action can trigger a Step Functions state machine and how the pipeline and the state machine are kept decoupled through a Lambda function. The advantages of using state machines include:

  • Simplified logic (complex tasks are broken into multiple smaller tasks).
  • Ease of handling asynchronous behavior (through state machine wait states).
  • Built-in support for choices and processing different execution paths (through state machine choices).
  • Built-in visualization and logging of the state machine execution.

The source code for the sample pipeline, pipeline actions, and state machine used in this post is available at https://github.com/awslabs/aws-codepipeline-stepfunctions.

Overview

This figure shows the components in the CodePipeline-Step Functions integration that will be described in this post. The pipeline contains two stages: a Source stage represented by a CodeCommit Git repository and a Prod stage with a single Deploy action that represents the workflow-driven action.

This action invokes a Lambda function (1) called the State Machine Trigger Lambda, which, in turn, triggers a Step Function state machine to process the request (2). The Lambda function sends a continuation token back to the pipeline (3) to continue its execution later and terminates. Seconds later, the pipeline invokes the Lambda function again (4), passing the continuation token received. The Lambda function checks the execution state of the state machine (5,6) and communicates the status to the pipeline. The process is repeated until the state machine execution is complete. Then the Lambda function notifies the pipeline that the corresponding pipeline action is complete (7). If the state machine has failed, the Lambda function will then fail the pipeline action and stop its execution (7). While running, the state machine triggers various Lambda functions to perform different tasks. The state machine and the pipeline are fully decoupled. Their interaction is handled by the Lambda function.

The Deploy State Machine

The sample state machine used in this post is a simplified version of the use case, with emphasis on infrastructure deployment. The state machine will follow distinct execution paths and thus have different outcomes, depending on:

  • The current state of the AWS CloudFormation stack.
  • The nature of the code changes made to the AWS CloudFormation template and pushed into the pipeline.

If the stack does not exist, it will be created. If the stack exists, a change set will be created and its resources inspected by the state machine. The inspection consists of parsing the change set results and detecting whether any resources will be deleted or replaced. If no resources are being deleted or replaced, the change set is allowed to be executed and the state machine completes successfully. Otherwise, the change set is deleted and the state machine completes execution with a failure as the terminal state.

Let’s dive into each of these execution paths.

Path 1: Create a Stack and Succeed Deployment

The Deploy state machine is shown here. It is triggered by the Lambda function using the following input parameters stored in an S3 bucket.

Create New Stack Execution Path

{
    "environmentName": "prod",
    "stackName": "sample-lambda-app",
    "templatePath": "infra/Lambda-template.yaml",
    "revisionS3Bucket": "codepipeline-us-east-1-418586629775",
    "revisionS3Key": "StepFunctionsDrivenD/CodeCommit/sjcmExZ"
}

Note that some values used here are for the use case example only. Account-specific parameters like revisionS3Bucket and revisionS3Key will be different when you deploy this use case in your account.

These input parameters are used by various states in the state machine and passed to the corresponding Lambda functions to perform different tasks. For example, stackName is used to create a stack, check the status of stack creation, and create a change set. The environmentName represents the environment (for example, dev, test, prod) to which the code is being deployed. It is used to prefix the name of stacks and change sets.

With the exception of built-in states such as wait and choice, each state in the state machine invokes a specific Lambda function.  The results received from the Lambda invocations are appended to the state machine’s original input. When the state machine finishes its execution, several parameters will have been added to its original input.

The first stage in the state machine is “Check Stack Existence”. It checks whether a stack with the input name specified in the stackName input parameter already exists. The output of the state adds a Boolean value called doesStackExist to the original state machine input as follows:

{
  "doesStackExist": true,
  "environmentName": "prod",
  "stackName": "sample-lambda-app",
  "templatePath": "infra/lambda-template.yaml",
  "revisionS3Bucket": "codepipeline-us-east-1-418586629775",
  "revisionS3Key": "StepFunctionsDrivenD/CodeCommit/sjcmExZ",
}

The following stage, “Does Stack Exist?”, is represented by Step Functions built-in choice state. It checks the value of doesStackExist to determine whether a new stack needs to be created (doesStackExist=true) or a change set needs to be created and inspected (doesStackExist=false).

If the stack does not exist, the states illustrated in green in the preceding figure are executed. This execution path creates the stack, waits until the stack is created, checks the status of the stack’s creation, and marks the deployment successful after the stack has been created. Except for “Stack Created?” and “Wait Stack Creation,” each of these stages invokes a Lambda function. “Stack Created?” and “Wait Stack Creation” are implemented by using the built-in choice state (to decide which path to follow) and the wait state (to wait a few seconds before proceeding), respectively. Each stage adds the results of their Lambda function executions to the initial input of the state machine, allowing future stages to process them.

Path 2: Safely Update a Stack and Mark Deployment as Successful

Safely Update a Stack and Mark Deployment as Successful Execution Path

If the stack indicated by the stackName parameter already exists, a different path is executed. (See the green states in the figure.) This path will create a change set and use wait and choice states to wait until the change set is created. Afterwards, a stage in the execution path will inspect  the resources affected before the change set is executed.

The inspection procedure represented by the “Inspect Change Set Changes” stage consists of parsing the resources affected by the change set and checking whether any of the existing resources are being deleted or replaced. The following is an excerpt of the algorithm, where changeSetChanges.Changes is the object representing the change set changes:

...
var RESOURCES_BEING_DELETED_OR_REPLACED = "RESOURCES-BEING-DELETED-OR-REPLACED";
var CAN_SAFELY_UPDATE_EXISTING_STACK = "CAN-SAFELY-UPDATE-EXISTING-STACK";
for (var i = 0; i < changeSetChanges.Changes.length; i++) {
    var change = changeSetChanges.Changes[i];
    if (change.Type == "Resource") {
        if (change.ResourceChange.Action == "Delete") {
            return RESOURCES_BEING_DELETED_OR_REPLACED;
        }
        if (change.ResourceChange.Action == "Modify") {
            if (change.ResourceChange.Replacement == "True") {
                return RESOURCES_BEING_DELETED_OR_REPLACED;
            }
        }
    }
}
return CAN_SAFELY_UPDATE_EXISTING_STACK;

The algorithm returns different values to indicate whether the change set can be safely executed (CAN_SAFELY_UPDATE_EXISTING_STACK or RESOURCES_BEING_DELETED_OR_REPLACED). This value is used later by the state machine to decide whether to execute the change set and update the stack or interrupt the deployment.

The output of the “Inspect Change Set” stage is shown here.

{
  "environmentName": "prod",
  "stackName": "sample-lambda-app",
  "templatePath": "infra/lambda-template.yaml",
  "revisionS3Bucket": "codepipeline-us-east-1-418586629775",
  "revisionS3Key": "StepFunctionsDrivenD/CodeCommit/sjcmExZ",
  "doesStackExist": true,
  "changeSetName": "prod-sample-lambda-app-change-set-545",
  "changeSetCreationStatus": "complete",
  "changeSetAction": "CAN-SAFELY-UPDATE-EXISTING-STACK"
}

At this point, these parameters have been added to the state machine’s original input:

  • changeSetName, which is added by the “Create Change Set” state.
  • changeSetCreationStatus, which is added by the “Get Change Set Creation Status” state.
  • changeSetAction, which is added by the “Inspect Change Set Changes” state.

The “Safe to Update Infra?” step is a choice state (its JSON spec follows) that simply checks the value of the changeSetAction parameter. If the value is equal to “CAN-SAFELY-UPDATE-EXISTING-STACK“, meaning that no resources will be deleted or replaced, the step will execute the change set by proceeding to the “Execute Change Set” state. The deployment is successful (the state machine completes its execution successfully).

"Safe to Update Infra?": {
      "Type": "Choice",
      "Choices": [
        {
          "Variable": "$.taskParams.changeSetAction",
          "StringEquals": "CAN-SAFELY-UPDATE-EXISTING-STACK",
          "Next": "Execute Change Set"
        }
      ],
      "Default": "Deployment Failed"
 }

Path 3: Reject Stack Update and Fail Deployment

Reject Stack Update and Fail Deployment Execution Path

If the changeSetAction parameter is different from “CAN-SAFELY-UPDATE-EXISTING-STACK“, the state machine will interrupt the deployment by deleting the change set and proceeding to the “Deployment Fail” step, which is a built-in Fail state. (Its JSON spec follows.) This state causes the state machine to stop in a failed state and serves to indicate to the Lambda function that the pipeline deployment should be interrupted in a fail state as well.

 "Deployment Failed": {
      "Type": "Fail",
      "Cause": "Deployment Failed",
      "Error": "Deployment Failed"
    }

In all three scenarios, there’s a state machine’s visual representation available in the AWS Step Functions console that makes it very easy for developers to identify what tasks have been executed or why a deployment has failed. Developers can also inspect the inputs and outputs of each state and look at the state machine Lambda function’s logs for details. Meanwhile, the corresponding CodePipeline action remains very simple and intuitive for developers who only need to know whether the deployment was successful or failed.

The State Machine Trigger Lambda Function

The Trigger Lambda function is invoked directly by the Deploy action in CodePipeline. The CodePipeline action must pass a JSON structure to the trigger function through the UserParameters attribute, as follows:

{
  "s3Bucket": "codepipeline-StepFunctions-sample",
  "stateMachineFile": "state_machine_input.json"
}

The s3Bucket parameter specifies the S3 bucket location for the state machine input parameters file. The stateMachineFile parameter specifies the file holding the input parameters. By being able to specify different input parameters to the state machine, we make the Trigger Lambda function and the state machine reusable across environments. For example, the same state machine could be called from a test and prod pipeline action by specifying a different S3 bucket or state machine input file for each environment.

The Trigger Lambda function performs two main tasks: triggering the state machine and checking the execution state of the state machine. Its core logic is shown here:

exports.index = function (event, context, callback) {
    try {
        console.log("Event: " + JSON.stringify(event));
        console.log("Context: " + JSON.stringify(context));
        console.log("Environment Variables: " + JSON.stringify(process.env));
        if (Util.isContinuingPipelineTask(event)) {
            monitorStateMachineExecution(event, context, callback);
        }
        else {
            triggerStateMachine(event, context, callback);
        }
    }
    catch (err) {
        failure(Util.jobId(event), callback, context.invokeid, err.message);
    }
}

Util.isContinuingPipelineTask(event) is a utility function that checks if the Trigger Lambda function is being called for the first time (that is, no continuation token is passed by CodePipeline) or as a continuation of a previous call. In its first execution, the Lambda function will trigger the state machine and send a continuation token to CodePipeline that contains the state machine execution ARN. The state machine ARN is exposed to the Lambda function through a Lambda environment variable called stateMachineArn. Here is the code that triggers the state machine:

function triggerStateMachine(event, context, callback) {
    var stateMachineArn = process.env.stateMachineArn;
    var s3Bucket = Util.actionUserParameter(event, "s3Bucket");
    var stateMachineFile = Util.actionUserParameter(event, "stateMachineFile");
    getStateMachineInputData(s3Bucket, stateMachineFile)
        .then(function (data) {
            var initialParameters = data.Body.toString();
            var stateMachineInputJSON = createStateMachineInitialInput(initialParameters, event);
            console.log("State machine input JSON: " + JSON.stringify(stateMachineInputJSON));
            return stateMachineInputJSON;
        })
        .then(function (stateMachineInputJSON) {
            return triggerStateMachineExecution(stateMachineArn, stateMachineInputJSON);
        })
        .then(function (triggerStateMachineOutput) {
            var continuationToken = { "stateMachineExecutionArn": triggerStateMachineOutput.executionArn };
            var message = "State machine has been triggered: " + JSON.stringify(triggerStateMachineOutput) + ", continuationToken: " + JSON.stringify(continuationToken);
            return continueExecution(Util.jobId(event), continuationToken, callback, message);
        })
        .catch(function (err) {
            console.log("Error triggering state machine: " + stateMachineArn + ", Error: " + err.message);
            failure(Util.jobId(event), callback, context.invokeid, err.message);
        })
}

The Trigger Lambda function fetches the state machine input parameters from an S3 file, triggers the execution of the state machine using the input parameters and the stateMachineArn environment variable, and signals to CodePipeline that the execution should continue later by passing a continuation token that contains the state machine execution ARN. In case any of these operations fail and an exception is thrown, the Trigger Lambda function will fail the pipeline immediately by signaling a pipeline failure through the putJobFailureResult CodePipeline API.

If the Lambda function is continuing a previous execution, it will extract the state machine execution ARN from the continuation token and check the status of the state machine, as shown here.

function monitorStateMachineExecution(event, context, callback) {
    var stateMachineArn = process.env.stateMachineArn;
    var continuationToken = JSON.parse(Util.continuationToken(event));
    var stateMachineExecutionArn = continuationToken.stateMachineExecutionArn;
    getStateMachineExecutionStatus(stateMachineExecutionArn)
        .then(function (response) {
            if (response.status === "RUNNING") {
                var message = "Execution: " + stateMachineExecutionArn + " of state machine: " + stateMachineArn + " is still " + response.status;
                return continueExecution(Util.jobId(event), continuationToken, callback, message);
            }
            if (response.status === "SUCCEEDED") {
                var message = "Execution: " + stateMachineExecutionArn + " of state machine: " + stateMachineArn + " has: " + response.status;
                return success(Util.jobId(event), callback, message);
            }
            // FAILED, TIMED_OUT, ABORTED
            var message = "Execution: " + stateMachineExecutionArn + " of state machine: " + stateMachineArn + " has: " + response.status;
            return failure(Util.jobId(event), callback, context.invokeid, message);
        })
        .catch(function (err) {
            var message = "Error monitoring execution: " + stateMachineExecutionArn + " of state machine: " + stateMachineArn + ", Error: " + err.message;
            failure(Util.jobId(event), callback, context.invokeid, message);
        });
}

If the state machine is in the RUNNING state, the Lambda function will send the continuation token back to the CodePipeline action. This will cause CodePipeline to call the Lambda function again a few seconds later. If the state machine has SUCCEEDED, then the Lambda function will notify the CodePipeline action that the action has succeeded. In any other case (FAILURE, TIMED-OUT, or ABORT), the Lambda function will fail the pipeline action.

This behavior is especially useful for developers who are building and debugging a new state machine because a bug in the state machine can potentially leave the pipeline action hanging for long periods of time until it times out. The Trigger Lambda function prevents this.

Also, by having the Trigger Lambda function as a means to decouple the pipeline and state machine, we make the state machine more reusable. It can be triggered from anywhere, not just from a CodePipeline action.

The Pipeline in CodePipeline

Our sample pipeline contains two simple stages: the Source stage represented by a CodeCommit Git repository and the Prod stage, which contains the Deploy action that invokes the Trigger Lambda function. When the state machine decides that the change set created must be rejected (because it replaces or deletes some the existing production resources), it fails the pipeline without performing any updates to the existing infrastructure. (See the failed Deploy action in red.) Otherwise, the pipeline action succeeds, indicating that the existing provisioned infrastructure was either created (first run) or updated without impacting any resources. (See the green Deploy stage in the pipeline on the left.)

The Pipeline in CodePipeline

The JSON spec for the pipeline’s Prod stage is shown here. We use the UserParameters attribute to pass the S3 bucket and state machine input file to the Lambda function. These parameters are action-specific, which means that we can reuse the state machine in another pipeline action.

{
  "name": "Prod",
  "actions": [
      {
          "inputArtifacts": [
              {
                  "name": "CodeCommitOutput"
              }
          ],
          "name": "Deploy",
          "actionTypeId": {
              "category": "Invoke",
              "owner": "AWS",
              "version": "1",
              "provider": "Lambda"
          },
          "outputArtifacts": [],
          "configuration": {
              "FunctionName": "StateMachineTriggerLambda",
              "UserParameters": "{\"s3Bucket\": \"codepipeline-StepFunctions-sample\", \"stateMachineFile\": \"state_machine_input.json\"}"
          },
          "runOrder": 1
      }
  ]
}

Conclusion

In this blog post, we discussed how state machines in AWS Step Functions can be used to handle workflow-driven actions. We showed how a Lambda function can be used to fully decouple the pipeline and the state machine and manage their interaction. The use of a state machine greatly simplified the associated CodePipeline action, allowing us to build a much simpler and cleaner pipeline while drilling down into the state machine’s execution for troubleshooting or debugging.

Here are two exercises you can complete by using the source code.

Exercise #1: Do not fail the state machine and pipeline action after inspecting a change set that deletes or replaces resources. Instead, create a stack with a different name (think of blue/green deployments). You can do this by creating a state machine transition between the “Safe to Update Infra?” and “Create Stack” stages and passing a new stack name as input to the “Create Stack” stage.

Exercise #2: Add wait logic to the state machine to wait until the change set completes its execution before allowing the state machine to proceed to the “Deployment Succeeded” stage. Use the stack creation case as an example. You’ll have to create a Lambda function (similar to the Lambda function that checks the creation status of a stack) to get the creation status of the change set.

Have fun and share your thoughts!

About the Author

Marcilio Mendonca is a Sr. Consultant in the Canadian Professional Services Team at Amazon Web Services. He has helped AWS customers design, build, and deploy best-in-class, cloud-native AWS applications using VMs, containers, and serverless architectures. Before he joined AWS, Marcilio was a Software Development Engineer at Amazon. Marcilio also holds a Ph.D. in Computer Science. In his spare time, he enjoys playing drums, riding his motorcycle in the Toronto GTA area, and spending quality time with his family.