Tag Archives: Uncategorized

AWS Security Profiles: Phil Rodrigues, Principal Security Solutions Architect

Post Syndicated from Becca Crockett original https://aws.amazon.com/blogs/security/aws-security-profiles-phil-rodrigues-principal-security-solutions-architect/

Amazon Spheres and author info

In the weeks leading up to re:Invent, we’ll share conversations we’ve had with people at AWS who will be presenting at the event so you can learn more about them and some of the interesting work that they’re doing.


How long have you been at AWS, and what do you do in your current role?

I’m a Principal Security Solutions Architect based in Sydney, Australia. I look after both Australia and New Zealand. I just had my two year anniversary with AWS. As I tell new hires, the first few months at AWS are a blur, after 6-12 months you start to get some ideas, and then after a year or two you start to really own and implement your decisions. That’s the phase I’m in now. I’m working to figure out new ways to help customers with cloud security.

What are you currently working on that you’re excited about?

In Australia, AWS has a mature set of financial services customers who are leading the way in terms of how large, regulated institutions can consume cloud services at scale. Many Aussie banks started this process as soon as we opened the region six years ago. They’re over the first hump, in terms of understanding what’s appropriate to put into the cloud, how they should be controlling it, and how to get regulatory support for it. Now they’re looking to pick up steam and do this at scale. I’m excited to be a part of that process.

What’s the most challenging part of your job?

Among our customers’ senior leadership there’s still a difference of opinion on whether or not the public cloud is the right place to be running, say, critical banking workloads. Based on anecdotal evidence, I think we’re at a tipping point leading to broad adoption of public cloud for the industry’s most critical workloads. It’s challenging to figure out the right messaging that will resonate with the boards of large, multi-national banks to help them understand that the technology control benefits of the cloud are far superior when it comes to security.

What’s your favorite part of your job?

We had a private customer security event in Australia recently, and I realized that: We now have the chance to do things that security professionals have always wanted to do. That is, we can automatically apply the most secure configurations at scale, ubiquitously across all workloads, and we can build environments that are quick to respond to security problems and that can automatically fix those problems. For people in the security industry, that’s always been the dream, and it’s a dream that some of our customers are now able to realize. I love getting to hear from customers how AWS helped make that happen.

How did you choose your particular topic for re:Invent this year?

Myles Hosford and I are presenting a session called Top Cloud Security Myths – Dispelled! It’s a very practical session. We’ve talked with hundreds of customers about security over the past two years, and we’ve noticed the types of questions that they ask tend to follow a pattern that’s largely dependent on where they are in their cloud journey. Our talk covers these questions — from the simple to the complex. We want the talk to be accessible for people who are new to cloud security, but still interesting for people who have more experience. We hope we’ll be able to guide everyone through the journey, starting with basics like, “Why is AWS more secure than my data center?”, up through more advanced questions, like “How does AWS protect and prevent administrative access to the customer environment?”

What are you hoping that your audience will take away from it?

There are only a few 200-level talks on the Security track. Our session is for people who don’t have a high level of expertise in cloud security — people who aren’t planning to go to the 300- and 400-level builder talks — but who still have some important, foundational questions about how secure the cloud is and what AWS does to keep it secure. We’re hoping that someone who has questions about cloud security can come to the session and, in less than an hour, get a number of the answers that they need in order to make them more comfortable about migrating their most important workloads to the cloud.

Any tips for first-time conference attendees?

You’ll never see it all, so don’t exhaust yourself by trying to crisscross the entire length of the Strip. Focus on the sessions that will be the most beneficial to you, stay close to the people that you’d like to share the experience with, and enjoy it. This isn’t a scientific measure, but I estimate that last year I saw maybe 1% of re:Invent — so I tried to make it the best 1% that I could. You can catch up on new service announcements and talks later, via video.

What’s the most common misperception you encounter about cloud security?

One common misperception stems from the fact that cloud is a broad term. On one side of the spectrum, you have global hyperscale providers, but on the opposite end, you have small operations with what I’d call “a SaaS platform and a dream” who might sell business ideas to individual parts of a larger organization. The organization might want to process important information on the SaaS platform, but the provider doesn’t always have the experience to put the correct controls into place. Now, AWS does an awesome job of keeping the cloud itself secure, and we give customers a lot of options to create secure workloads, but many times, if an organization asks the SaaS provider if they’re secure, the SaaS provider says, “Of course we’re secure. We use AWS.” They’ll give out AWS audit reports that shows what AWS does to keep the cloud secure, but that’s not the full story. The software providers operating on top of AWS also play a role in keeping their customers’ data secure, and not all of these providers are following the same mature, rigorous processes that we follow — for example, undergoing external third-party audits. It’s important for AWS to be secure, but it’s also important for the ecosystem of partners building on top of us to be secure.

In your opinion, what’s the biggest challenge facing cloud security right now?

The number of complex choices that customers must make when deciding which of our services to use and how to configure them. We offer great guidance through best practices, Well-Architected reviews, and a number of other mechanisms that guide the industry, but our overall model is still that of providing building blocks that customers must assemble themselves. We hope customers are making great decisions regarding security configurations while they’re building, and we provide a number of tools to help them do this — as do a number of third-parties. But staying secure in the cloud still requires a lot of choices.

Five years from now, what changes do you think we’ll see across the security/compliance landscape?

I’m not losing much sleep over quantum computing and its impact on cryptography. I think that’s a while away. For me, the near future is more likely to feature developments like broad adoption of automated assurance. We’ll move away from a paper-based, once-a-year audits to determine organizations’ technology risk, and toward taking advantage of persistent automation, near-instant visibility, and being able to react to things that happen in real-time. I also think we’ll see a requirement for large organizations who want to move important workloads to the cloud to use security automation. Regulators and the external audit community have started to realize that automated security is possible, and then they’ll push to require it. We’re already seeing a handful of examples in Australia, where regulators who understand the cloud are asking to see evidence of AWS best practices being applied. Some customers are also asking third-party auditors not to bring in a spreadsheet but rather to query the state of their security controls via an API in real-time or through a dashboard. I think these trends will continue. The future will be very automated, and much more secure.

What does cloud security mean to you, personally?

My customer base in Australia includes banks, governments, healthcare, energy, telco, and utility. For me, this drives home the realization that the cloud is the critical digital infrastructure of the future. I have a young family who will be using these services for a long time. They rely on the cloud either as the infrastructure underneath another service they’re consuming — including services as important as transportation and education — or else they access the cloud directly themselves. How we keep this infrastructure safe and secure, and how we keep peoples’ information private but available affects my family.

Professionally, I’ve been interested in security since before it was a big business, and it’s rewarding to see stuff that we toiled on in the corner of a university lab two decades ago gaining attention and becoming best practice. At the same time, I think everyone who works in security thrives on the challenge that it’s not simple, it’s certainly not “done” yet, and there’s always someone on the other side trying to make it harder. What drives me is both that professional sense of competition, and the personal realization that getting it right impacts me and my family.

What’s the one thing a visitor should do on a trip to Sydney?

Australia is a fascinating place, and visitors tend to be struck by how physically beautiful it is. I agree; I think Sydney is one of the most beautiful cities in the world. My advice is to take a walk, whether along the Opera House, at Sydney Harbor, up through the botanical gardens, or along the beaches. Or take a ferry across to the Manly beachfront community to walk down the promenade. It’s easy to see the physical beauty of Sydney when you visit — just take a walk.

The AWS Security team is hiring! Want to find out more? Check out our career page.

Want more AWS Security news? Follow us on Twitter.

Author

Phil Rodrigues

Phil Rodrigues is a Principal Security Solutions Architect for AWS based in Sydney, Australia. He works with AWS’s largest customers to improve their security, risk, and compliance in the cloud. Phil is a frequent speaker at AWS and cloud events across Australia. Prior to AWS, he worked for over 17 years in Information Security in the US, Europe, and Asia-Pacific.

AWS Security Profiles: Ken Beer, General Manager, AWS Key Management Service

Post Syndicated from Becca Crockett original https://aws.amazon.com/blogs/security/aws-security-profiles-ken-beer-general-manager-aws-key-management-service/

Amazon Spheres and author info

In the weeks leading up to re:Invent, we’ll share conversations we’ve had with people at AWS who will be presenting at the event so you can learn more about them and some of the interesting work that they’re doing.


How long have you been at AWS, and what do you do in your current role?

I’ve been here a little over six years. I’m the General Manager of AWS Key Management Service (AWS KMS).

How do you explain your job to non-tech friends?

For any kind of product development, you have the builders and the sellers. I manage all of the builders of the Key Management Service.

What are you currently working on that you’re excited about?

The work that gets me excited isn’t always about new features. There’s also essential work going on to keep AWS KMS up and running, which enables more and more people to use it whenever they want. We have to maintain a high level of availability, low latency, and a good customer experience 24 hours a day. Ensuring a good customer experience and providing operational excellence is a big source of adrenaline for service teams at AWS — you get to determine in real time when things aren’t going well, and then try to address the issue before customers notice.

In addition, we’re always looking for new features to add to enable customers to run new workloads in AWS. At a high level, my teams are responsible for ensuring that customers can easily encrypt all their data, whether it resides in an AWS service or not. To date, that’s primarily been an opt-in exercise for customers. Depending on the service, they might choose to have it encrypted — and, more specifically, they might choose to have it encrypted under keys that they have more control over. In a classic model of encryption, your data is encrypted at storage — think of BitLocker on your laptop — and that’s it. But if you don’t understand how encryption works, then you don’t appreciate the fact that encrypted by default doesn’t necessarily provide the security you think it does if the person or application that has access to your encrypted data can also cause your keys to be used whenever it wants to decrypt your data. AWS KMS was invented to help provide that separation: Customers can control who has access to their keys and enable how AWS services or their own applications make use of those keys in an easy, reliable, and low cost way.

What’s the most challenging part of your job?

It depends on the project that’s in front of me. Sometimes it’s finding the right people. That’s always a challenge for any manager at AWS, considering that we’re still in a growth phase. In my case, finding people who meet the engineering bar for being great computer scientists is often not enough — I’ve got to find people who appreciate security and have a strong ethos for maintaining the confidentiality of customer data. That makes it tougher to find people who will be a good fit on my teams.

Outside of hiring good people, the biggest challenge is to minimize risk as we constantly improve the service. We’re trying to improve the feature set and the customer experience of our service APIs, which means that we’re always pushing new software — and every deployment introduces risk.

What’s your favorite part of your job?

Working with very smart, committed, passionate people.

How did you choose your particular topic for re:Invent this year?

For the past four years, I’ve given a re:Invent talk that offered an overview of how to encrypt things at AWS. When I started giving this talk, not every service supported encryption, AWS KMS was relatively new, and we were adding a lot of new features. This year, I was worried the presentation wouldn’t include enough new material, so I decided to broaden the scope. The new session is Data Protection: Encryption, Availability, Resiliency, and Durability, which I’ll be co-presenting with Peter O’Donnell, one of our solution architects. This session focuses on how to approach data security and how to think about access control of data in the cloud holistically, where encryption is just a part of the solution. When we talk to our customers directly, we often hear that they’re struggling to figure out which of their well-established on-premises security controls they should take with them to the cloud — and what new things should they be doing once they get there. We’re using the session to give people a sense of what it means to own logical access control, and of all the ways they can control access to AWS resources and their data within those resources. Encryption is another access control mechanism that can provide strong confidentiality if used correctly. If a customer delegates to an AWS managed service to encrypt data at rest, the actual encipherment of data is going to happen on a server that they can’t touch. It’s going to use code that they don’t own. All of the encryption occurs in a black box, so to speak. AWS KMS gives customers the confidence to say, “In order to get access to this piece of data, not only does someone have to have permission to the encrypted data itself in storage, that person also has to have permission to use the right decryption key.” Customers now have two independent access control mechanisms, as a belt-and-suspenders approach for stronger data security.

What are you hoping that your audience will take away from it?

I want people to think about the classification of their data and which access control mechanisms they should apply to it. In many cases, a given AWS service won’t let you apply a specific classification to an individual piece of data. It’ll be applied to a collection or a container of data — for example, a database. I want people to think about how they’re going to define the containers and resources that hold their data, how they want to organize them, and how they’re going to manage access to create, modify, and delete them. People should focus on the data itself as opposed to the physical connections and physical topology of the network since with most AWS services they can’t control that topology or network security — AWS does it all for them.

Any tips for first-time conference attendees?

Wear comfortable shoes. Because so many different hotels are involved, getting from point A to point B often requires walking at a brisk pace. We hope a renewed investment in shuttle buses will help make transitions easier.

What’s the most common misperception you encounter about encryption in the cloud?

I encounter a lot of people who think, “If I use my cloud provider’s encryption services, then they must have access to my data.” That’s the most common misperception. AWS services that integrate with AWS KMS are designed so that AWS does not have access to your data unless you explicitly give us that access. This can be a hard concept to grasp for some, but we put a lot of effort into the secure design of our encryption features and we hold ourselves accountable to that design with all the compliance schemes we follow. After that, I see a lot of people under the impression that there’s a huge performance penalty for using encryption. This is often based on experiences from years ago: At the time, a lot of CPU cycles were spent on encryption, which meant they weren’t available for interesting things like database searches or vending web pages. Using the latest hardware in the cloud, that’s mostly changed. While there’s a non-zero cost to doing encryption (it’s math and physics, after all), AWS can hide a lot of that and absorb the overhead on behalf of customers. Especially when customers are trying to do full-disk encryptions for workloads running on Amazon Elastic Compute Cloud (Amazon EC2) with Amazon Elastic Block Store (Amazon EBS), we actually perform the encryption on dedicated hardware that’s not exposed to the customer’s memory space or compute. We’re minimizing the perceived latency of encryption, and we all but erase the performance cost in terms of CPU cycles for customers.

What are some of the blockers that customers face when it comes to using cryptography?

There are some customers who’ve heard encryption is a good idea — but every time they’ve looked at it, they’ve decided that it’s too hard or too expensive. A lot of times, that’s because they’ve brought in a consultant or vendor who’s influenced them to think that it would be expensive, not just from a licensing standpoint but also in terms of having people on staff who understand how to do it right. We’d like to convince those customers that they can take advantage of encryption, and that it’s incredibly easy in AWS. We make sure it’s done right, and in a way that doesn’t introduce new risks for their data.

There are other customers, like banks and governments, who have been doing encryption for years. They don’t realize that we’ve made encryption better, faster, and cheaper. AWS has hundreds of people tasked with making sure encryption works properly for all of the millions of AWS customers. Most companies don’t have hundreds of people on staff who care about encryption and key management the way we do. These companies should absolutely perform due diligence and force us to prove that our security controls are in place and they do what we claim they do. We’ve found the customers that have done this diligence understand that we’re providing a consistent way to enforce the use of encryption across all of their workloads. We’re also on the cutting edge of trying to protect them against tomorrow’s encryption-related problems, such as quantum-safe cryptography.

Five years from now, what changes do you think we’ll see across the security and compliance landscape?

I think we’ll see a couple of changes. The first is that we’ll see more customers use encryption by default, making encryption a critical part of their operational security. It won’t just be used in regulated industries or by very large companies.

The second change is more fundamental, and has to do with a perceived threat to some of today’s cryptography: There’s some evidence that quantum computing will become affordable and usable at some point in time — although it’s unclear if that time is 5 or 50 years away. But when it comes, it will make certain types of cryptography very weak, including the kind we use for data in transit security protocols like HTTPS and TLS. The industry is currently working on what’s called quantum-safe or post-quantum cryptography, in which you use different algorithms and different key sizes to provide the same level of security that we have today, even in the face of an adversary that has a quantum computer and can capture your communications. As encryption algorithms and protocols evolve to address this potential future risk, we’ll see a shift in the way our devices connect to each other. Our phones, our laptops, and our servers will adopt this new technology to ensure privacy in our communications.

The AWS Security team is hiring! Want to find out more? Check out our career page.

Want more AWS Security news? Follow us on Twitter.

Author

Ken Beer

Ken is the General Manager of the AWS Key Management Service. Ken has worked in identity and access management, encryption, and key management for over 6 years at AWS. Before joining AWS, Ken was in charge of the network security business at Trend Micro. Before Trend Micro, he was at Tumbleweed Communications. Ken has spoken on a variety of security topics at events such as the RSA Conference, the DoD PKI User’s Forum, and AWS re:Invent.

New – Predictive Scaling for EC2, Powered by Machine Learning

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-predictive-scaling-for-ec2-powered-by-machine-learning/

When I look back on the history of AWS and think about the launches that truly signify the fundamentally dynamic, on-demand nature of the cloud, two stand out in my memory: the launch of Amazon EC2 in 2006 and the concurrent launch of CloudWatch Metrics, Auto Scaling, and Elastic Load Balancing in 2009. The first launch provided access to compute power; the second made it possible to use that access to rapidly respond to changes in demand. We have added a multitude of features to all of these services since then, but as far as I am concerned they are still central and fundamental!

New Predictive Scaling
Today we are making Auto Scaling even more powerful with the addition of predictive scaling. Using data collected from your actual EC2 usage and further informed by billions of data points drawn from our own observations, we use well-trained Machine Learning models to predict your expected traffic (and EC2 usage) including daily and weekly patterns. The model needs at least one day’s of historical data to start making predictions; it is re-evaluated every 24 hours to create a forecast for the next 48 hours.

We’ve done our best to make this really easy to use. You enable it with a single click, and then use a 3-step wizard to choose the resources that you want to observe and scale. You can configure some warm-up time for your EC2 instances, and you also get to see actual and predicted usage in a cool visualization! The prediction process produces a scaling plan that can drive one or more groups of Auto Scaled EC2 instances.

Once your new scaling plan is in action, you will be able to scale proactively, ahead of daily and weekly peaks. This will improve the overall user experience for your site or business, and it can also help you to avoid over-provisioning, which will reduce your EC2 costs.

Let’s take a look…

Predictive Scaling in Action
The first step is to open the Auto Scaling Console and click Get started:

I can select the resources to be observed and predictively scaled in three different ways:

I select an EC2 Auto Scaling group (not shown), then I assign my group a name, pick a scaling strategy, and leave both Enable predictive scaling and Enable dynamic scaling checked:

As you can see from the screen above, I can use predictive scaling, dynamic scaling, or both. Predictive scaling works by forecasting load and scheduling minimum capacity; dynamic scaling uses target tracking to adjust a designated CloudWatch metric to a specific target. The two models work well together because of the scheduled minimum capacity already set by predictive scaling.

I can also fine-tune the predictive scaling, but the default values will work well to get started:

I can forecast on one of three pre-chosen metrics (this is in the General settings):

Or on a custom metric:

I have the option to do predictive forecasting without actually scaling:

And I can set up a buffer time so that newly launched instances can warm up and be ready to handle traffic at the predicted time:

After a couple more clicks, the scaling plan is created and the learning/prediction process begins! I return to the console and I can see the forecasts for CPU Utilization (my chosen metric) and for the number of instances:

I can see the scaling actions that will implement the predictions:

I can also see the CloudWatch metrics for the Auto Scaling group:

And that’s all you need to do!

Here are a couple of things to keep in mind about predictive scaling:

Timing – Once the initial set of predictions have been made and the scaling plans are in place, the plans are updated daily and forecasts are made for the following 2 days.

Cost – You can use predictive scaling at no charge, and may even reduce your AWS charges.

Resources – We are launching with support for EC2 instances, and plan to support other AWS resource types over time.

Applicability – Predictive scaling is a great match for web sites and applications that undergo periodic traffic spikes. It is not designed to help in situations where spikes in load are not cyclic or predictable.

Long-Term Baseline – Predictive scaling maintains the minimum capacity based on historical demand; this ensures that any gaps in the metrics won’t cause an inadvertent scale-in.

Available Now
Predictive scaling is available now and you can starting using it today in the US East (N. Virginia), US East (Ohio), US West (Oregon), Europe (Ireland), and Asia Pacific (Singapore) Regions.

Jeff;

 

Event-based campaigns let you automatically send messages to your customer when they perform certain actions

Post Syndicated from Zach Barbitta original https://aws.amazon.com/blogs/messaging-and-targeting/event-based-campaigns-let-you-automatically-send-messages-to-your-customer-when-they-perform-certain-actions/

Today, we added a new campaign management feature to Amazon Pinpoint: event-based campaigns. You can now use Amazon Pinpoint to set up campaigns that send messages (such as text messages, push notifications, and emails) to your customers when they take specific actions. For example, you can set up a campaign to send a message when a customer creates a new account, or when they spend a certain dollar amount in your app, or when they add an item to their cart but don’t purchase it. Event-based campaigns help you send messages that are timely, personalized, and relevant to your customers, which ultimately increases their trust in your brand and gives them a reason to return.

You can create event-based campaigns by using the Amazon Pinpoint console, or by using the Amazon Pinpoint API. Event-based campaigns are an effective way to implement both transactional and targeted campaign use cases. For transactional workloads, imagine that you want to send an email to customers immediately after they choose the Password reset button in your app. You can create an event-based campaign that addresses this use case in as few as four clicks.

For targeted campaign workloads, event-based campaigns are a great way to introduce cross-sale opportunities for complementary items. For example, if a customer adds a phone to their shopping cart, you can send an in-app push notification that offers them a special price on a case that fits the device that they’re purchasing.

When you use the campaign wizard in the Amazon Pinpoint console, you now have the option to create event-based campaigns. Rather than define a time to send your message to customers, you select specific events, attributes, and metric values. As you go through the event scheduling component of the wizard, you define the event that you want Amazon Pinpoint to look for, as shown in the following image.

In order to use this feature, you have to set up your mobile and web apps to send event data to Amazon Pinpoint. To learn more about sending event data to Amazon Pinpoint, see Reporting Events in Your Application in the Amazon Pinpoint Developer Guide.

A real-world scenario

Let’s look at a scenario that explores how you can use event-based campaigns. Say you have a music streaming service such as Amazon Music. You’ve secured a deal for a popular artist to record a set of live tracks that will only be available on your service, and you want to let people who’ve previously listened to that artist know about this exciting exclusive.

To create an event-based campaign, you complete the usual steps involved in creating a campaign in Amazon Pinpoint (you can learn more about these steps in the Campaigns section of the Amazon Pinpoint User Guide). When you arrive at step 4 of the campaign creation wizard, it asks you when the campaign should be sent. At this point, you have two options: you can send the campaign at a specific time, or you can send it when an event occurs. In this example, we choose When an event occurs.

Next, you choose the event that causes the campaign to be sent. In our example, we’ll choose the event named song.played. If we stopped here and launched the campaign, Amazon Pinpoint would send our message to every customer who generated that event—in this case, every customer who played any song in our app, ever. That’s not what we want, but fortunately we can refine our criteria by choosing attributes and metrics.

In the Attributes box, we’ll choose the artistName attribute, and then choose the attribute value that corresponds with the name of the band we want to promote, The Alexandrians.

We’ve already narrowed down our list of recipients quite a bit, but we can go even further. In the Metrics box, we can specify certain quantitative values. For example, we could refine our event so that only customers who spent more than 10 minutes listening to songs by The Alexandrians are sent notifications. To complete this step, we choose the timePlayed metric and set it to look for endpoints for which this metric is greater than 600 seconds. When we finish setting up the triggering event, we see something similar to the following example:

Now that we’ve finished setting up the event that will result in our campaign to be sent, we can finish creating the campaign as we normally would.

Transparent and affordable pricing

There are no additional charges associated with creating event-based campaigns. You pay only for the number of endpoints that you target, the number of messages that you send, and the number of analytics events that you send to Amazon Pinpoint. To learn more about the costs associated with using Amazon Pinpoint, see our Pricing page.

As an example, assume that your app has 50,000 monthly active users, and that you target each user once a day with a push notification. In this scenario, you’d pay around $55.50 per month to send messages to all 50,000 users. Note that this pricing is subject to change, and is not necessarily reflective of what your actual costs may be.

Limitations and best practices

There are a few limitations and best practices that you should consider when you create event-based campaigns:

  • You can only create an event-based campaign if the campaign uses a dynamic segment (as opposed to an imported segment).
  • Event-based campaigns only consider events that are reported by recent versions of the AWS Mobile SDK. Your apps should use the following versions of the SDK in order to work with event-based campaigns:
    • AWS Mobile SDK for Android: version 2.7.2 or later
    • AWS Mobile SDK for iOS: version 2.6.30 or later
  • Because of this restriction, we recommend that you set up your segments so that they only include customers who use a version of your app that runs a compatible version of the SDK.
  • Choose your events carefully. For example, if you send an event-based campaign every time a session.start event occurs, you might quickly overwhelm your users with messages. You can limit the number of messages that Amazon Pinpoint sends to a single endpoint in a 24-hour period. For more information, see General Settings in the Amazon Pinpoint User Guide.

Ready to get started?

You can use these features today in every AWS Region where Amazon Pinpoint is available. We hope that these additions help you to better understand your customers and find exciting new ways to use Amazon Pinpoint to connect and engage with them.

 

How to automate replication of secrets in AWS Secrets Manager across AWS Regions

Post Syndicated from Tracy Pierce original https://aws.amazon.com/blogs/security/how-to-automate-replication-of-secrets-in-aws-secrets-manager-across-aws-regions/

Assume that you make snapshot copies or read-replicas of your RDS databases in a secondary or backup AWS Region as a best practice. By using AWS Secrets Manager, you can store your RDS database credentials securely using AWS KMS Customer Master Keys, otherwise known as CMKs. AWS Key Management Service (AWS KMS) ensures secrets are encrypted at rest. With the integration of AWS Lambda, you can now more easily rotate these credentials regularly and replicate them for disaster recovery situations. This automation keeps credentials stored in AWS Secrets Manager for Amazon Relational Database Service (Amazon RDS) in sync between the origin Region, where your AWS RDS database lives, and the replica Region where your read-replicas live. While using the same credentials for all databases is not ideal, in the instance of disaster recovery, it can be useful for a quicker recovery.

In this post, I show you how to set up secret replication using an AWS CloudFormation template to create an AWS Lambda Function. By replicating secrets across AWS Regions, you can reduce the time required to get back up and running in production in a disaster recovery situation by ensuring your credentials are securely stored in the replica Region as well.

Solution overview

The solution described in this post uses a combination of AWS Secrets Manager, AWS CloudTrail, Amazon CloudWatch Events, and AWS Lambda. You create a secret in Secrets Manager that houses your RDS database credentials. This secret is encrypted using AWS KMS. Lambda automates the replication of the secret’s value in your origin AWS Region by performing a PUT operation on a secret of the same name in the same AWS Region as your read-replica. CloudWatch Events ensures that each time the secret housing your AWS RDS database credentials is rotated, it triggers the Lambda function to copy the secret’s value to your read-replica Region. By doing this, your RDS database credentials are always in sync for recovery.

Note: You might incur charges for using the services used in this solution, including Lambda. For information about potential costs, see the AWS pricing page.

The following diagram illustrates the process covered in this post.
 

Figure 1: Process diagram

Figure 1: Process diagram

This process assumes you have already created a secret housing your RDS database credentials in your main AWS Region and configured your CloudTrail Logs to send to CloudWatch Logs. Once this is complete, the steps to replicate are here:

  1. Secrets Manager rotates a secret in your original AWS Region.
  2. CloudTrail receives a log with “eventName”: “RotationSuceeded”.
  3. CloudTrail passes this log to CloudWatch Events.
  4. A filter in CloudWatch Events for this EventName triggers a Lambda function.
  5. The Lambda function retrieves the secret value from the origin AWS Region.
  6. The Lambda function then performs PutSecretValue on a secret with the same name in the replica AWS Region.

The Lambda function is triggered by a CloudWatch Event passed by CloudTrail. The triggering event is raised whenever a secret successfully rotates, which creates a CloudTrail log with the EventName property set to RotationSucceeded. You will know the secret rotation was successful when it has the label AWSCURRENT. You can read more about the secret labels and how they change during the rotation process here. The Lambda function retrieves the new secret, then calls PutSecretValue on a secret with the same name in the replica AWS Region. This AWS Region is specified by an environment variable inside the Lambda function.

Note: If the origin secret uses a customer-managed Customer Master Key (CMK), then the cloned secret must, as well. If the origin secret uses an AWS-managed CMK, then the cloned secret must, as well. You can’t mix them or the Lambda function will fail. AWS recommends you use customer-managed CMKs because you have full control of the permissions regarding which entities can use the CMK.

The CloudFormation template also creates an AWS Identity and Access Management (IAM) role with the required permissions to create and update secret replicas. Next, you’ll launch the CloudFormation template by using the AWS CloudFormation CLI.

Deploy the solution

Now that you understand how the Lambda function copies your secrets to the replica AWS Region, I’ll explain the commands to launch your CloudFormation stack. This stack creates the necessary resources to perform the automation. It includes the Lambda function, an IAM role, and the CloudWatch Event trigger.

First, make sure you have credentials for an IAM user or role that can launch all resources included in this template configured on your CLI. To launch the template, run the command below. You’ll choose a unique Stack Name to easily identify its purpose and the URL of the provided template you uploaded to your own S3 Bucket. For the following examples, I will use US-EAST-1 as my origin Region, and EU-WEST-1 as my replica Region. Make sure to replace these values and the other variables (identified by red font) with actual values from your own account.


$ aws cloudformation create-stack --stack-name Replication_Stack --template-url S3_URL --parameters ParameterKey=ReplicaKmsKeyArn,ParameterValue=arn:aws:kms:eu-west-1:111122223333:key/Example_Key_ID_12345 ParameterKey=TargetRegion,ParameterValue=eu-west-1 --capabilities CAPABILITY_NAMED_IAM –-region us-east-1

After the previous command is successful, you will see an output similar to the following with your CloudFormation Stack ARN:


$ {
    "StackId": "arn:aws:cloudformation:us-east-1:111122223333:stack/Replication_Stack/Example_additional_id_0123456789"
}

You can verify that the stack has completed successfully by running the following command:


$ aws cloudformation describe-stacks --stack-name Replication_Stack --region us-east-1

Verify that the StackStatus shows CREATE_COMPLETE. After you verify this, you’ll see three resources created in your account. The first is the IAM role, which allows the Lambda function to perform its API calls. The name of this role is SecretsManagerRegionReplicatorRole, and can be found in the IAM console under Roles. There are two policies attached to this role. The first policy is the managed permissions policy AWSLambdaBasicExecutionRole, which grants permissions for the Lambda function to write to AWS CloudWatch Logs. These logs will be used further on in the Event Rule creation, which will trigger the cloning of the origin secret to the replication region.

The second policy attached to the SecretsManagerRegionReplicatorRole role is an inline policy that grants permissions to decrypt and encrypt the secret in both your original AWS Region and in the replica AWS Region. This policy also grants permissions to retrieve the secret from the original AWS Region, and to store the secret in the replica AWS Region. You can see an example of this policy granting access to specific secrets below. Should you choose to use this policy, please remember to place your parameters into the placeholder values.


{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "KMSPermissions",
            "Effect": "Allow",
            "Action": [
                "kms:Decrypt",
                "kms:Encrypt",
                "kms:GenerateDataKey"
            ],
            "Resource": [
          "arn:aws:kms:us-east-1:111122223333:key/Example_Key_ID_12345",
     "arn:aws:kms:eu-west-1:111122223333:key/Example_Key_ID_12345"
      ]
        },
        {
            "Sid": "SecretsManagerOriginRegion",
            "Effect": "Allow",
            "Action": [
                "secretsmanager:DescribeSecret",
                "secretsmanager:GetSecretValue"
            ],
            "Resource": "arn:aws:secretsmanager:us-east-1:111122223333:secret:replica/myexamplereplica*"
        },
        {
            "Sid": "SecretsManagerReplicaRegion",
            "Effect": "Allow",
            "Action": [
                "secretsmanager:CreateSecret",
                "secretsmanager:UpdateSecretVersionStage",
                "secretsmanager:PutSecretValue",
                "secretsmanager:DescribeSecret"
            ],
            "Resource": "arn:aws:secretsmanager:eu-west-1:111122223333:secret:replica/myexamplereplica*"
        }
    ]
}

The next resource created is the CloudWatch Events rule SecretsManagerCrossRegionReplicator. You can find this rule by going to the AWS CloudWatch console, and, under Events, selecting Rules. Here’s an example Rule for the EventName I described earlier that’s used to trigger the Lambda function:


{
    "detail-type": [
        "AWS Service Event via CloudTrail"
    ],
    "source": [
        "aws.secretsmanager"
    ],
    "detail": {
        "eventSource": [
            "secretsmanager.amazonaws.com"
        ],
        "eventName": [
            "RotationSucceeded"
        ]
    }
}

The last resource created by the CloudFormation template is the Lambda function, which will do most of the actual work for the replication. After it’s created, you’ll be able to find this resource in your Lambda console with the name SecretsManagerRegionReplicator. You can download a copy of the Python script here, and you can see the full script below. In the function’s environment variables, you’ll also notice the parameter names and values you entered when launching your stack.


import boto3
from os import environ

targetRegion = environ.get('TargetRegion')
if targetRegion == None:
    raise Exception('Environment variable "TargetRegion" must be set')

smSource = boto3.client('secretsmanager')
smTarget = boto3.client('secretsmanager', region_name=targetRegion)

def lambda_handler(event, context):
    detail = event['detail']

    print('Retrieving SecretArn from event data')
    secretArn = detail['additionalEventData']['SecretId']

    print('Retrieving new version of Secret "{0}"'.format(secretArn))
    newSecret = smSource.get_secret_value(SecretId = secretArn)

    secretName = newSecret['Name']
    currentVersion = newSecret['VersionId']

    replicaSecretExists = True
    print('Replicating secret "{0}" (Version {1}) to region "{2}"'.format(secretName, currentVersion, targetRegion))
    try:
        smTarget.put_secret_value(
            SecretId = secretName,
            ClientRequestToken = currentVersion,
            SecretString = newSecret['SecretString']
        )
        pass
    except smTarget.exceptions.ResourceNotFoundException:
        print('Secret "{0}" does not exist in target region "{1}". Creating it now with default values'.format(secretName, targetRegion))
        replicaSecretExists = False
    except smTarget.exceptions.ResourceExistsException:
        print('Secret version "{0}" has already been created, this must be a duplicate invocation'.format(currentVersion))
        pass

    if replicaSecretExists == False:
        secretMeta = smSource.describe_secret(SecretId = secretArn)
        if secretMeta['KmsKeyId'] != None:
            replicaKmsKeyArn = environ.get('ReplicaKmsKeyArn')
            if replicaKmsKeyArn == None:
                raise Exception('Cannot create replica of a secret that uses a custom KMS key unless the "ReplicaKmsKeyArn" environment variable is set. Alternatively, you can also create the key manually in the replica region with the same name')

            smTarget.create_secret(
                Name = secretName,
                ClientRequestToken = currentVersion,
                KmsKeyId = replicaKmsKeyArn,
                SecretString = newSecret['SecretString'],
                Description = secretMeta['Description']
            )
        else:
            smTarget.create_secret(
                Name = secretName,
                ClientRequestToken = currentVersion,
                SecretString = newSecret['SecretString'],
                Description = secretMeta['Description']
            )
    else:
        secretMeta = smTarget.describe_secret(SecretId = secretName)
        for previousVersion, labelList in secretMeta['VersionIdsToStages'].items():
            if 'AWSCURRENT' in labelList and previousVersion != currentVersion:
                print('Moving "AWSCURRENT" label from version "{0}" to new version "{1}"'.format(previousVersion, currentVersion))
                smTarget.update_secret_version_stage(
                    SecretId = secretName,
                    VersionStage = 'AWSCURRENT',
                    MoveToVersionId = currentVersion,
                    RemoveFromVersionId = previousVersion
                )
                break

    print('Secret {0} replicated successfully to region "{1}"'.format(secretName, targetRegion))

Now that your CloudFormation Stack is completed, and all necessary resources set up, you are ready to begin secret replication. To verify the setup works, you can modify the secret in the origin region that houses your RDS database credentials. It will take a few moments for the secret label to return to AWSCURRENT, and then roughly another 5-15 minutes for the CloudTrail Logs to populate and then send the Events to CloudWatch Logs. Once received, the Event will trigger the Lambda function to complete the replication process. You will be able to verify the secret has correctly replicated by going to the Secrets Manager Console in your replication Region, selecting the replicated secret, and viewing the values. If the values are the same in the replicated secret as they are the origin secret, and both labels are AWSCURRENT, you know replication completed successfully and your credentials will be ready if needed.

Summary

In this post, you learned how you can use AWS Lambda and Amazon CloudWatch Events to automate replication of your secrets in AWS Secrets Manager across AWS Regions. You used a CloudFormation template to create the necessary resources for the replication setup. The CloudWatch Event will watch for any CloudTrail Logs which would trigger the Lambda function that then pulls the secret name and value and replicates it to the AWS Region of your choice. Should a disaster occur, you’ll have increased your chances for a smooth recovery of your databases, and you’ll be back in production quicker.

If you have feedback about this blog post, submit comments in the Comments section below. If you have questions about this blog post, start a new thread on the AWS Secrets Manager forum.

Want more AWS Security news? Follow us on Twitter.

Author

Tracy Pierce

Tracy Pierce is a Senior Cloud Support Engineer at AWS. She enjoys the peculiar culture of Amazon and uses that to ensure every day is exciting for her fellow engineers and customers alike. Customer Obsession is her highest priority and she shows this by improving processes, documentation, and building tutorials. She has her AS in Computer Security & Forensics from SCTD, SSCP certification, AWS Developer Associate certification, and AWS Security Specialist certification. Outside of work, she enjoys time with friends, her Great Dane, and three cats. She keeps work interesting by drawing cartoon characters on the walls at request.

AWS Security Profiles: Sam Elmalak, Enterprise Solutions Architect

Post Syndicated from Becca Crockett original https://aws.amazon.com/blogs/security/aws-security-profiles-sam-elmalak-enterprise-solutions-architect/

Amazon Spheres and author info

In the weeks leading up to re:Invent, we’ll share conversations we’ve had with people at AWS who will be presenting at the event so you can learn more about them and some of the interesting work that they’re doing.


How long have you been at AWS, and what do you do in your current role?

I’ve been with AWS for three and a half years. I’m an Enterprise Solutions Architect, which means that I help enterprise customers think through their cloud strategy. I work with customers on everything from business goals and how to align those goals with their technology strategy to helping individual developers create well-architected cloud solutions. I also have an area of focus around security by helping a broader set of customers with their cloud journey and security practices.

How do you explain your job to non-tech friends?

I help my customers figure out how to use AWS and the cloud in a way that delivers business value.

What are you currently working on that you’re excited about?

From a project perspective, the AWS Landing Zone initiative (which also happens to be my 2018 re:Invent topic) is the most exciting. For the last two to three years, we’ve been providing guidance to help customers decide how to build environments in a way that incorporates best practices. But the AWS Landing Zone has a team that’s building out a solution that makes it easier for customers to implement those best practices. We’re no longer just telling customers, “Here’s how you should do it.” Instead, we’re providing a real implementation. It’s a prescriptive approach that customers can set up in just a few hours. This can help customers accelerate their cloud journey and reduce the work that goes into setting up governance. And the solution can be used by any company — including enterprises, educational institutions, small businesses, and startups.

What’s the most challenging part of your job?

I need to strike a balance between different initiatives, which means being able to focus on the right priorities for the moment. I don’t always get it right, but my hope is that I can always help customers achieve their goals. Another challenge is the sheer number of launches and releases—it can be difficult to stay on top of everything that’s being released while maintaining expert-level knowledge about it all. But that’s just a side effect of how quickly AWS innovates.

What’s your favorite part of your job?

The people I work with. I get to interact with so many smart, talented achievers and builders, and they’re always so humble and willing to help. Being around people like that is an amazing experience. Also, I get to learn nonstop. There are a lot of challenging problems to figure out, but there are also so many opportunities for growth. The job ends up being whatever you make of it.

In your opinion, what’s the biggest challenge facing cloud security right now?

Often, security organizations take the approach of saying “No.” They block things instead of making things happen by partnering with their business and development teams. I think the biggest challenge is trying to change that mindset. Skillset is also a challenge: Sometimes, people need to learn how to “do” security in the cloud in a way that keeps pace with their development team, and that can require additional skills. I believe training your entire organization to develop automation and approach problems and processes in an automated manner will help remove these barriers.

Five years from now, what changes do you think we’ll see across the security/compliance landscape?

I think we’ll see more automation, more tooling, more partners, and more products — all of which will make it simpler for customers to adopt the cloud and operate there in an efficient, secure manner. As customers adopting the cloud mature, I also think the job of the security practitioner will change slightly — the work will become a matter of how to use all the available tooling and other resources in the most efficient manner. I suspect that artificial intelligence and machine learning, predictive analytics, and anomaly detection will start to play a more prominent role, allowing customers to do more and be more secure. I also think customers will be starting to think more of security in terms of users and devices rather than perimeter security.

How did you choose your session topics for re:Invent 2018?

This is my third year holding sessions on establishing a Landing Zone. Back in 2016, I had a few customers who asked me about how to set up their AWS environment. I spent quite a bit of time researching but couldn’t find a solid, well-rounded answer. So I took it upon myself to figure out what that guidance should include. I spoke with a number of more experienced people in AWS, and then proposed a re:Invent session around it. At the time, I thought it would sound boring and no one would want to attend. But after the session, feedback from customers was overwhelmingly positive and I realized that people were hungry for this kind of foundational AWS info. We put a team together to develop more guidance for our customers. The AWS Landing Zone initiative leverages that guidance by implementing best practices built by a talented team whose vision is to make our customers’ lives easier and more secure. Since then, Re:Invent sessions on Landing Zone have expanded. We’re up to at least 18 sessions, workshops, and chalk talks this year, and we’ve even added a tag (awslandingzone) so they’re all searchable in the session catalog and customers can find them. In my presentations at re:Invent, we have a customer who will talk through what their journey looked like and how the AWS Landing Zone Solution has helped them.

What are you hoping that your audience will take away from these sessions?

I want customers to start thinking differently about a few areas. One is how to enable their organizations to innovate, build and release services/products more quickly. To do that, central teams need to think of the rest of their organization as their customers, then think of ways to onboard those customers faster by means of automated, self-service processes. Their idea of an application or a team also needs to be smaller than the traditional definition of an entire business unit. I actually want customers to think smaller — and more agile. I want them to think, “What if I have to accommodate thousands of different projects, and I want them all in different accounts and isolated workspaces, sitting under this Landing Zone umbrella?”

Thinking about that type of design and approach from the beginning will help customers start, innovate, and move forward while avoiding the pitfalls of trying to fit everything into a single AWS account. It’s a cultural mindshift. I want them to start thinking in terms of the people and the groups within their organizations. I want them to think about how to enable those groups and get them to move forward and to spend less time focused on how to control everything that those groups do. I want people to think of the balance between governance/security and control.

Any tips for first-time conference attendees?

Plan to do a lot of walking and have comfortable shoes. If you’ve signed up for sessions, get there early and remember that there are at least five venues this year — it’s important to factor in travel time. Other than that, I’d say visit the partner expo, meet other customers, and learn from each other. And ask us questions; we’ll do everything we can to help. Most importantly, enjoy it and learn!

If you had to pick any other job, what would you want to do with your life?

My current role comes down to helping empower people, which I love, so I’d look for a way to replicate that feeling elsewhere by helping people realize their talents and potential.

As a backup plan, I’d downsize, go live somewhere cheap and enjoy life, nature, music and tango…

The AWS Security team is hiring! Want to find out more? Check out our career page.

Want more AWS Security news? Follow us on Twitter.

Author

Sam Elmalak

Sam is an Enterprise Solutions Architect at AWS and a member of the AWS security community. In addition to helping customers solve their technical issues, he helps customers navigate organizational complexity and address cultural challenges. Sam is passionate about enabling teams to apply technology to address business challenges and unmet needs. He’s largely an optimist and a believer in people’s abilities to thrive and achieve amazing things.

Introducing the Amazon Pinpoint Voice Channel

Post Syndicated from Kadir Rathnavelu original https://aws.amazon.com/blogs/messaging-and-targeting/introducing-the-amazon-pinpoint-voice-channel/

Today, Amazon Pinpoint added a new voice channel. This new channel enables you to deliver voice messages to your customers over the phone. When you use the voice channel, you provide a text script, and, optionally, instructions for converting that text to speech. Amazon Pinpoint converts your script into lifelike speech and then calls your customers over the phone to deliver your message. Pinpoint Voice is a great way to deliver transactional messages such as one-time passwords, appointment reminders, order confirmations, and more.

Phone calls offer several advantages over other channels; they’re highly reliable and very extensible, and just about everybody has a phone number. The addition of voice messages to Pinpoint’s existing email, SMS, and push notification channels provides you with even more flexibility in delivering the right message on the right channel for your customers.

For example, to send a One-Time Password (OTP) to your customer over the phone, you first provide the Amazon Pinpoint SMS and Voice API with a set of instructions, including the content of the message and the destination phone number. The content can be a plain-text or Speech Synthesis Markup Language (SSML) messages.

You can use SSML to personalize your messages by adding pauses, changing the pronunciation, emphasizing key words, and much more. Amazon Pinpoint takes your message and uses Amazon Polly to convert it into lifelike speech in one of the dozens of languages and voices available. Amazon Pinpoint then and calls the destination number and delivers the message.

{
  "DestinationPhoneNumber":"+12065550199",
  "OriginationPhoneNumber":"+14155550142",
  "CallerId":"+12065550199",
  "Content":{
    "SSMLMessage":{
      "Text":"<speak> Your one-time password is 1 <break time='0.5s'/> 4 <break time='0.5s'/> 5 <break time='0.5s'/> 7 </speak>",
      "Voice":"Russell",
      "Language":"en-AU"
    }
  }
}

As the call progresses, Amazon Pinpoint provides call metrics in real time to Amazon Kinesis Data Firehose and Amazon CloudWatch Logs. These metrics tell you when a call has been initiated, when it’s ringing, when it’s been answered, when it’s completed, and whether or not the recipient answered. You can use this information to measure the effectiveness of your messages and to optimize your future voice engagements.

In addition to delivering one-to-one transactional messages, you can also use voice messages as a backup channel when you aren’t able to deliver messages through email, SMS, or push notifications. For example, it’s not unusual for customers to provide you with phone numbers that aren’t able to receive text messages, such as land-line or VoIP numbers. You can use the Phone Number Validate feature of Amazon Pinpoint to determine the type of phone number provided. When you detect that a phone number can’t receive text messages, you can send voice messages instead. By using Amazon Pinpoint Voice, you can reach a segment of customers who you wouldn’t be able to reach by sending text messages.

To send voice messages using Amazon Pinpoint, you need to have at least one dedicated phone number. You can lease local phone numbers in a variety of countries directly from the Amazon Pinpoint console. In certain countries and regions, you can also use these dedicated phone numbers to send SMS messages to your customers. See Supported Countries and Regions for more information about leasing phone numbers.

To learn more, see the Amazon Pinpoint User Guide and the Amazon Pinpoint SMS and Voice API Reference.

Literature dispenser

Post Syndicated from Liz Upton original https://www.raspberrypi.org/blog/literature-dispenser/

There are many things I do not know about Argentina. Until today, one of them was this: if you’re in an Argentinian bank, you may not use electronic devices. That includes phones and e-readers like the Kindle (and I can’t be the only person here who is pretty much surgically attached to their Kindle).

Enter the literature dispenser.

Roni Bandini, an Argentinian author, found himself twiddling his thumbs in a Buenos Aires bank queue, and thought that perhaps the 50 other people he could see in the same situation might benefit from a little distraction. How about a machine, owned by the bank, that could furnish you with one of a curated selection of short stories at the touch of a button? The short stories bit was easy: he writes them for a living.

Expendedor de literatura en tickets (invento argentino)

Expendedor de literatura en tickets desarrollado por @ronibandini Versión 2 Elementos utilizados para su fabricación: Raspberry Pi Thermal Printer LCD Display 16×2 Custom 3d Printed Case Más información en https://medium.com/@Bandini

He chose a Raspberry Pi because there are so many libraries for thermal printers and LCD displays available (and because it’s tiny, and you can fit a heck of a lot of short stories on an SD card these days).

Roni says:

This project was “trial and error” in many aspects. I had troubles with power source amperage due to thermal printer requirements, conflicts with previous software running in the Raspberry – since the same one was used for other projects – and I had to write some routines to avoid words being split due to ticket width. Since the machine could be working for 12 hours in a row, I have added a small 5v cooling fan in the back.

He built a wooden prototype, and was helped out by Z-lab, a small, local 3d print design studio, with permanent casing (which is rather lovely).

literature dispenser

The UI’s very simple: press the green button, be rewarded with a short story, printed to order on a till strip. We’d love to see businesses use these in real life (and we’re thinking one of these would be a lovely addition to the Pi Towers lobby, to help soothe anxious interview candidates). Thanks Roni – I’m off to try to find some of your work in translation, and we’re all agreed that we’re very grateful for internet banking.

The post Literature dispenser appeared first on Raspberry Pi.

Migrating your Amazon ECS deployment to the new ARN and resource ID format

Post Syndicated from Nathan Peck original https://aws.amazon.com/blogs/compute/migrating-your-amazon-ecs-deployment-to-the-new-arn-and-resource-id-format-2/

Starting today you can opt in to a new Amazon Resource Name (ARN) and resource ID format for Amazon ECS tasks, container instances, and services. The new format enables the enhanced ability to tag resources in your cluster, as well as tracking the cost of services and tasks running in your cluster.

In most cases, you don’t need to change your system beyond opting in to the new format. However, if your deployment mechanism uses regular expressions to parse the old format ARNs or task IDs, this may be a breaking change. It may also be a breaking change if you were storing the old format ARNs and IDs in a fixed-width database field or data structure.

After you have opted in, any new ECS services or tasks have the new ARN and ID format. Existing resources do not receive the new format. If you decide to opt out, any new resources that you later create then use the old format.

You can read more about the exact details of the change, including a format comparison, in the FAQ. I recommend that you test the new format in isolation before opting in globally.

Approaches to opting in

Depending on whether your software interacts with instance, service, or task ARNs, you may want to be careful about how you opt in to avoid breaking your existing implementation. There are several ways to do this.

Separate testing and production accounts

The easiest way to opt in is to use separate AWS accounts for your testing and QA environment and your production environment. In this case, you could choose to opt in your entire test account by opting in for the AWS account root user. All IAM users and roles within the account are then opted-in. You can easily reverse the opt-in decision by opting out the root user. After you have verified that all your deployments work as expected, you can opt in for your main production account.

Single region

If you use a single account to host all your resources and want to test the new ARNs in a gradual rollout, you need to be more careful about how you opt in. You may consider testing by opting in one AWS Region that you don’t normally use. Spin up your resources there to verify that they work before opting in your primary Region.

Single IAM user or role

Alternatively, you can decide to opt in for a specific test IAM user or role and conduct testing before opting in for the entire account. In this post, I explore how to opt in for a specific IAM user and role and to migrate your resources.

Prerequisites to opting in

There are a few prerequisites to opting into the new ARN and resource ID format.

Root user access

First, make sure that you can access the AWS account root user, so that you can opt in for all IAM users and roles on the account by default. Additionally, the root user can opt in for any single IAM user or role on the account, per Region. Individual IAM users and roles can still opt in or out separately.

Instance role access

In particular, getting the new format on your cluster instances involves opting in for the instance role that the instance assumes when it boots up. This role allows the instance to communicate with the ECS control plane and register itself to receive an ARN. For the instance to get a new-format ARN, the role that the instance uses must be opted in.

Access for other IAM roles

Depending on how your organization deploys clusters, you may need to opt in to the new format on other IAM users or roles that touch your ECS cluster. For example, you may have an IAM user that you use to access the console and create an ECS cluster.

You also may use AWS CodeDeploy, which has an IAM role that allows it to make deployments and update that service with the latest version of your application’s Docker container. Both roles should be opted in. Otherwise, CodeDeploy causes your service to deploy tasks with the old format.

Opt in for your IAM user

For the best results, create a new IAM user that you can use for testing the new ARN format. That way, you can continue to interact with your existing ECS deployments on your original IAM user without accidentally switching your live services to the new ARN format.

After your testing IAM user is created, access the opt-in page and turn on the new ARN format for this user.

At this point, any services or tasks that you now create in any cluster with this IAM user will have the new ARN format. However, you also need to update the ARN format on the instances in the cluster.

Create and opt-in for an instance role

To get the new instance ARN format, create an instance role. This role is used for each instance in the ECS cluster. After you opt in for the role, any instance that registers itself with the ECS control plane using that role gets the new ARN format.

Open the IAM console and choose Roles, Create role. Specify the type of role you are creating. For this example, create an AWS service-linked role and choose EC2 as the service that will use the role.

The next step is to add permissions for this role. In the search box, type AmazonEC2ContainerServiceforEC2Role. Select the check box to attach this policy to the role. This policy document is managed by AWS and has all the permissions required for an EC2 instance to act as a host in an ECS cluster.

Give the role a name and create it. Your finished role should look something like the following:

Finally, opt in for this role so that when an EC2 instance uses the role to register itself in the EC2 cluster, it receives the new ARN format. To do this, log in as the root user, and navigate to the opt-in page. This time, opt in for the instance role.

Launch a new cluster using an opted-in IAM user and instance role

Existing resources do not receive the new ARN format until they are re-created. To get the new ARNs, use the opted-in IAM user and opted-in instance role to create an opted-in cluster.

Open the cluster creation wizard, and create a new EC2 Linux + Networking cluster.

On the next screen, name your cluster and select the instance role that you previously opted in:

After the cluster launches, navigate to the ECS Instances list and verify that the instances have the new ID format.

Launch a new opted-in service

The last part is to launch a service in the cluster. Be sure to use the IAM user that you opted in earlier. Task definitions are not changed by this opt in, so you can create a clone of an existing service using the same task definitions that you are already using.

As a best practice, do a blue/green infrastructure rollout by creating a full parallel stack: a new cluster of EC2 instances, services, tasks, and any load balancers. Then, direct traffic from your previous cluster to the new cluster by switching the DNS CNAME of your website to point at the new stack’s load balancer.

This approach allows you to test the new stack without interrupting your existing stack. After you have verified that it works, you can switch traffic over to the new stack without any downtime.

Conclusion

You have now tested opting in to the new ARN and resource ID format for ECS container instances, services, and tasks. If everything worked properly, you can opt in across your full account today by using the root user to opt in.

Starting January 1, 2020, all newly created ECS resources receive the new ARN and resource ID format. I recommend testing the new ARNs now, even if you don’t intend to manually opt in your existing resources at this time.

Amazon Pinpoint’s Analytics Suite Now Includes Event Metric Visualizations

Post Syndicated from Zach Barbitta original https://aws.amazon.com/blogs/messaging-and-targeting/amazon-pinpoints-analytics-suite-now-includes-event-metric-visualizations/

Today, we’re enhancing Amazon Pinpoint’s analytics capabilities by giving you the ability to filter event data based on certain attributes and metrics. We’re also adding new analytics features help you visualize event data over time. These new features help you quickly understand how your customers are using your app, and make it easy to keep track of behavioral trends over time.

For example, if you want to understand how much time a customer spends performing a certain action within your app, such as watching a video, you can now visualize this in as few as three clicks. Start by signing in to the console and choosing the project that contains the data that you want to view. Next, in the navigation pane, under Analytics, choose Events. On the Events chart, you’ll see two new menus: Event Attributes and Metrics, and Event Attribute Values. Use these menus to refine the data in the chart to include only the events and attributes that you want to see.

When you select a metric value, Amazon Pinpoint shows a new time-series chart. This chart includes summary filters, so you can quickly see the total time an action occurs, the average, the minimum and maximum values over the time period, and the number of events. You can use this data when considering how to improve the user experience of your app, or when creating new engagement campaigns for certain groups of customers.

For example, you could see if the average time spent on your app is increasing or decreasing, or if the total price of goods sold is trending in the right direction. You can then create engagement campaigns that specifically target segments of users who spend less time in your app, or those who are making fewer in-app purchases.

As always, the data in Pinpoint is yours. You can easily export it to CSV format for analysis in external applications. For deeper analysis, we recommend that you set up Amazon Pinpoint Event Streams, which let you stream your data to nearly any destination. To learn more about setting up Event Streams, see Streaming Events from Amazon Pinpoint to Redshift on the AWS Messaging and Targeting Blog.

You can use these features now in all AWS Regions where Amazon Pinpoint is available. We hope that these additions help you to better understand your customers and find exciting new ways to use Amazon Pinpoint to connect and engage with them.

Encrypting messages published to Amazon SNS with AWS KMS

Post Syndicated from Michelle Mercier original https://aws.amazon.com/blogs/compute/encrypting-messages-published-to-amazon-sns-with-aws-kms/

Amazon Simple Notification Service (Amazon SNS) is a fully managed pub/sub messaging service for decoupling event-driven microservices, distributed systems, and serverless applications. Enterprises around the world use Amazon SNS to support applications that handle private and sensitive data. Many of these enterprises operate in regulated markets, and their systems are subject to stringent security and compliance standards, such as HIPAA for healthcare, PCI DSS for finance, and FedRAMP for government. To address the requirements of highly critical workloads, Amazon SNS provides message encryption in transit, based on Amazon Trust Services (ATS) certificates, as well as message encryption at rest, using AWS Key Management Service (AWS KMS) keys. 

Message encryption in transit

The Amazon SNS API is served through Secure HTTP (HTTPS) and encrypts all messages in transit with Transport Layer Security (TLS) certificates issued by ATS. These certificates verify the identity of the Amazon SNS API server whenever an encrypted connection is established. A certificate authority (CA) issues the certificate to a specific domain. Thus, when a domain presents a certificate issued by a trusted CA, the API client can determine that it’s safe to establish a connection.

Message encryption at rest

Amazon SNS supports encrypted topics. When you publish messages to encrypted topics, Amazon SNS uses customer master keys (CMK), powered by AWS KMS, to encrypt your messages. Amazon SNS supports customer-managed as well as AWS-managed CMKs. As soon as Amazon SNS receives your messages, the encryption takes place on the server, using a 256-bit AES-GCM algorithm. The messages are stored in encrypted form across multiple Availability Zones (AZs) for durability and are decrypted just before being delivered to subscribed endpoints, such as Amazon Simple Queue Service (Amazon SQS) queues, AWS Lambda functions, and HTTP and HTTPS webhooks.

 

Notes:

  • Amazon SNS encrypts only the body of the messages that you publish. It doesn’t encrypt message metadata (identifier, subject, timestamp, and attributes), topic metadata (name and attributes), or topic metrics. Thus, encryption doesn’t affect the operations of Amazon SNS, such as message fanout and message filtering.
  • Amazon SNS doesn’t retroactively encrypt messages that were published before server-side encryption (SSE) was enabled for the topic. In addition, any encrypted message retains its encryption even if you disable the encryption of its topic. It can take up to a minute for encryption to be effective after enabled.
  • Amazon SNS uses envelope encryption internally. It uses your configured CMK to generate a short-lived data encryption key (DEK) and then reuses this DEK to encrypt your published messages for 5 minutes. When the DEK expires, Amazon SNS automatically rotates to generate a new DEK from AWS KMS.

Applying encrypted topics in a use case

You can use encrypted topics for a variety of scenarios, especially for processing sensitive data, such as personally identifiable information (PII) and protected health information (PHI).

The following example illustrates an electronic medical record (EMR) system deployed in several clinics and hospitals. The system manages patients’ medical histories, diagnoses, medications, immunizations, visits, lab results, and doctors’ notes.

The clinic’s EMR system is integrated with three auxiliary systems. Each system, hosted on an Amazon EC2 instance, polls incoming patient records from an Amazon SQS queue and takes action:

  • The Billing system manages the clinic’s revenue cycles and processes accounts receivable and reimbursements.
  • The Scheduling system keeps patients informed of their upcoming clinic and lab appointments and reminds them to take their medications.
  • The Prescription system transmits electronic prescriptions to authorized pharmacies and tracks medication-filling history.

When a physician inputs a new record into a patient’s file, the EMR system publishes a message to an Amazon SNS topic. The topic in turn fans out a copy of the message to each one of the three subscribing Amazon SQS queues for parallel processing. When the message is retrieved from the queue, the Billing system invoices the patient, the Scheduling system books the patient’s next clinic or lab appointment, and the Prescription system orders the required medication from an authorized pharmacy.

The Amazon SNS topic and all Amazon SQS queues described in this use case are encrypted using AWS KMS keys that the clinic creates. The communication among services is based on HTTPS API calls. This end-to-end encryption protects patients’ medical records by making their content unavailable to unauthorized or anonymous users while messages are either in transit or at rest.

Creating, subscribing, and publishing to encrypted topics

You can create an Amazon SNS encrypted topic or an Amazon SQS encrypted queue by setting its attribute KmsMasterKeyId, which expects an AWS KMS key identifier. The key identifier can be a key ID, key ARN, or key alias. You can use the identifier of either a customer-managed CMK, such as alias/MyKey, or the AWS-managed CMK in your account, whose alias is alias/aws/sns.

The following code snippets work with the AWS SDK for Java. You can use these code samples for the healthcare system scenario in the previous section.

First, the principal publishing messages to the Amazon SNS encrypted topic must have access permission to execute the AWS KMS operations GenerateDataKey and Decrypt, in addition to the Amazon SNS operation Publish. The principal can be either an IAM user or an IAM role. The following IAM policy grants the required access permission to the principal.

{
  "Version": "2012-10-17",
  "Statement": {
    "Effect": "Allow",
    "Action": [
      "kms:GenerateDataKey",
      "kms:Decrypt",
      "sns:Publish"
    ],
    "Resource": "*"
  }
}

If you want to use a customer-managed CMK, a CMK needs to be created and secured by granting the publisher access to the same AWS KMS operations GenerateDataKey and Decrypt. The access permission is granted using KMS key policies. The following JSON document shows an example policy statement for the customer-managed CMK used by the healthcare system. For more information on creating and securing keys, see Creating Keys and Using Key Policies in the AWS Key Management Service Developer Guide.

{
  "Version": "2012-10-17",
  "Id": "EMR-System-KeyPolicy",
  "Statement": [
    {
      "Sid": "Allow access for Root User",
      "Effect": "Allow",
      "Principal": {"AWS": "arn:aws:iam::123456789012:root"},
      "Action": "kms:*",
      "Resource": "*"
    },
    {
      "Sid": "Allow access for Key Administrator",
      "Effect": "Allow",
      "Principal": {"AWS": "arn:aws:iam::123456789012:user/EMRAdmin"},
      "Action": [
        "kms:Create*",
        "kms:Describe*",
        "kms:Enable*",
        "kms:List*",
        "kms:Put*",
        "kms:Update*",
        "kms:Revoke*",
        "kms:Disable*",
        "kms:Get*",
        "kms:Delete*",
        "kms:TagResource",
        "kms:UntagResource",
        "kms:ScheduleKeyDeletion",
        "kms:CancelKeyDeletion"
      ],
      "Resource": "*"
    },
    {
      "Sid": "Allow access for Key User (SNS Publisher)",
      "Effect": "Allow",
      "Principal": {"AWS": "arn:aws:iam::123456789012:user/EMRUser"},
      "Action": [
        "kms:GenerateDataKey*",
        "kms:Decrypt"
      ],
      "Resource": "*"
    }
  ]
}

The following snippet uses the CMK to create an encrypted topic, three encrypted queues, and their corresponding subscriptions.

// Create API clients

String userArn = "arn:aws:iam::123456789012:user/EMRUser";

AWSCredentialsProvider credentials = getCredentials(userArn);

AmazonSNS sns = new AmazonSNSClient(credentials);
AmazonSQS sqs = new AmazonSQSClient(credentials);

// Create an attributes collection for the topic and queues

String keyId = "arn:aws:kms:us-east-1:123456789012:alias/EMRKey"; 

Map<String, String> attributes = new HashMap<>();
attributes.put("KmsMasterKeyId", keyId);

// Create an encrypted topic

String topicArn = sns.createTopic(
    new CreateTopicRequest("Patient-Records")
        .withAttributes(attributes)).getTopicArn();

// Create encrypted queues

String billingQueueUrl = sqs.createQueue(
    new CreateQueueRequest("Billing-Integration")
        .withAttributes(attributes)).getQueueUrl();

String schedulingQueueUrl = sqs.createQueue(
    new CreateQueueRequest("Scheduling-Integration")
        .withAttributes(attributes)).getQueueUrl();

String prescriptionQueueUrl = sqs.createQueue(
    new CreateQueueRequest("Prescription-Integration")
        .withAttributes(attributes)).getQueueUrl();

// Create subscriptions

Topics.subscribeQueue(sns, sqs, topicArn, billingQueueUrl);
Topics.subscribeQueue(sns, sqs, topicArn, schedulingQueueUrl);
Topics.subscribeQueue(sns, sqs, topicArn, prescriptionQueueUrl);

Next, the following code composes a JSON message and publishes it to the encrypted topic.

// Publish message to encrypted topic

String messageBody = "{\"patient\": 2911, \"medication\": 151}";
String messageSubject = "Electronic Medical Record - 3472";

sns.publish(
    new PublishRequest()
        .withSubject(messageSubject)
        .withMessage(messageBody)
        .withTopicArn(topicArn));

Note:

Publishing messages to encrypted topics isn’t different from publishing messages to standard, unencrypted topics. Your publisher needs access to perform AWS KMS operations GenerateDataKey and Decrypt on the configured CMK. All of the encryption logic is offloaded to Amazon SNS, and the message is delivered to all subscribed endpoints.

A copy of the message is now available in each subscribing queue. The final code snippet retrieves the messages from the encrypted queues.

// Retrieve messages from encrypted queues

List<Message> messagesForBilling = sqs.receiveMessage(
    new ReceiveMessageRequest(billingQueueUrl)).getMessages();

List<Message> messagesForScheduling = sqs.receiveMessage(
    new ReceiveMessageRequest(schedulingQueueUrl)).getMessages();

List<Message> messagesForPrescription = sqs.receiveMessage(
    new ReceiveMessageRequest(prescriptionQueueUrl)).getMessages();

Note:

Retrieving messages from encrypted queues isn’t different from retrieving messages from standard, unencrypted queues. All of the decryption logic is offloaded to Amazon SQS.

Enabling compatibility between encrypted topics and event sources

Several AWS services publish events to Amazon SNS topics. To allow these event sources to work with encrypted topics, you must first create a customer-managed CMK and then add the following statement to the policy of the CMK.

{
    "Version": "2012-10-17",
    "Statement": [{
        "Effect": "Allow",
        "Principal": {"Service": "service.amazonaws.com"},
        "Action": ["kms:GenerateDataKey*", "kms:Decrypt"],
        "Resource": "*"
    }]
}

You can use the following example service principals in the statement:

Other Amazon SNS event sources require you to provide an IAM role, as opposed to their service principal, in the KMS key policy. This set of event sources includes the following:

Once the CMK key policy has been configured, you can enable encryption on the topic using the CMK, and then provide the encrypted topic’s ARN to the event source.

Note:

As of November 2018, Amazon CloudWatch alarms don’t yet work with Amazon SNS encrypted topics. For information on publishing alarms to standard, unencrypted topics, see Using Amazon CloudWatch Alarms in the Amazon CloudWatch User Guide.

Publishing messages privately to encrypted topics through VPC endpoints

In addition to encrypting messages in transit and at rest, you can further harden the security and privacy of your applications by publishing messages to encrypted topics privately, without traversing the public Internet. Amazon SNS supports VPC endpoints via AWS PrivateLink. You can use VPC endpoints to privately publish messages to both standard and encrypted topics from a virtual private cloud (VPC) subnet. When you use AWS PrivateLink, you don’t have to set up an internet gateway, network address translation (NAT) device, or virtual private network (VPN) connection. For more information, see Publishing to Amazon SNS topics from Amazon Virtual Private Cloud in the Amazon Simple Notification Service Developer Guide.

Auditing the usage of encrypted topics

You can use AWS CloudTrail to audit the usage of the AWS KMS keys applied to your Amazon SNS topics. AWS CloudTrail creates log files that contain a history of AWS API calls and related events for your account. These log files include all AWS KMS API requests made with the AWS Management Console , SDKs, and Command Line Tools, as well as those made through integrated AWS services. You can use these log files to get information about when your CMK was used, the operation that was requested, the identity of the requester, and the IP address that the request came from. For more information, see Logging AWS KMS API calls with AWS CloudTrail in the AWS Key Management Service Developer Guide.

Summary

Amazon SNS provides a full set of security features to protect your data from unauthorized and anonymous access, including message encryption in transit with Amazon ATS certificates, message encryption at rest with AWS KMS keys, message privacy with AWS PrivateLink, and auditing with AWS CloudTrail. Moreover, you can subscribe Amazon SQS encrypted queues to Amazon SNS encrypted topics to establish end-to-end encryption in your messaging use cases.

Amazon SNS encrypted topics are available in all AWS Regions where AWS KMS is available. For pricing details, see AWS KMS pricing and Amazon SNS pricing. There is no increase in Amazon SNS charges for using encrypted topics, beyond the AWS KMS request charges incurred. For more information, see Protecting Amazon SNS Data Using Server-Side Encryption (SSE) in the Amazon Simple Notification Service Developer Guide.

Get started today by creating your Amazon SNS encrypted topics via the AWS Management Console and AWS SDKs.

AWS Security Profiles: Brigid Johnson, Senior Manager of Product Management

Post Syndicated from Becca Crockett original https://aws.amazon.com/blogs/security/aws-security-profiles-brigid-johnson-senior-manager-of-product-management/

Amazon Spheres and author info
In the weeks leading up to re:Invent, we’ll share conversations we’ve had with people at AWS who will be presenting at the event so you can learn more about them and some of the interesting work that they’re doing.


How long have you been at AWS, and what do you do in your current role?

I’ve been with AWS a little over four years. I’m currently the Senior Manager of Product Management in AWS Identity. I head a team of product managers that manage AWS Identity and Access Management (IAM) and AWS Secrets Manager.

How do you explain your job to non-tech friends?

I’m the voice of the customer. I work to simplify security for AWS customers.

What are you currently working on that you’re excited about?

A lot of my work is focused on how we can simplify permissions management at scale. Customers are moving an enormous number of workloads to AWS, and a lot of teams are on-boarding to manage those workloads. And often, a lot of applications need access to resources within AWS. Companies rely on granular permissions to manage their resources and data securely. My team is working on the tools and experiences to make this experience easy.

What’s the most challenging part of your job?

AWS IAM and AWS Secrets Manager are services that every AWS customer will use, from small start-ups to really large enterprises. Balancing their needs, use cases, and requirements, and then creating a product vision based on this information is challenging—and also very rewarding.

What’s your favorite part of your job?

Talking with customers. I find it incredibly fascinating to hear what people are building on top of AWS, how they’re using our system, and what they need more of. And while I really love presenting AWS products on stage, it’s usually brainstorming meetings with a central security team or someone who manages permissions at a small startup that I love the most. I love real conversations about day-to-day routines and experiences. It helps me hone in on where we need to innovate.

How did you choose your particular topic for re:Invent this year?

The session, Become an IAM Policy Master in 60 Minutes or Less, has been going on for years. Before I took over, my previous manager ran it. Policies are very, very powerful. You can write granular permission rules, and there are some really cool things I’ve seen customers do with them. The session is a way for me to show people that power, and then show them how to think about permissions and use IAM policies. You can use policies to control access to both applications and users in a very granular way. It’s really rewarding to watch as people start to get it—and then get excited about new possibilities of their security posture.

So, how are you going to make someone an IAM Policy Master in 60 minutes?

There are three parts: First, I’ll go through some policy basics. This section’s fairly short, since this is a more advanced session. Next, I’ll explore how to reason and think about policy evaluation. When I explain this in a certain way with customers, I can often see them get it and the lightbulbs go on. This provides them with the foundation to go back and work with their teams to create better permissions rules. Finally, I go into some specific policy types and where you can use these, so people have a framework for understanding different policy types. This section includes use cases for how and why you might use different types. My favorite part of the sessions is at the end: I go through some of my favorite complex use cases and show the real power and granularity of the AWS permissions model.

What are you hoping that your audience will take away from your session?

I want them to have a deeper understanding of the power of our policies. And, honestly, I want them to go play and explore. So many times, when we’re setting up permissions, we’re trying to get something done, and so we go through the set-up really quickly—and I should stress that my team is working to make that set-up quicker and easier. But if you’re part of a central security team, and you spend a little time exploring the ninja moves that I’m going to explain in my session, you can more easily scale your permissions management. Spending just a little bit of up-front cost to explore and figure out how things work will make your permissions management much easier.

Any tips for first-time conference attendees?

Two tips: One, be sure to go to sessions that interest you, even if you know nothing about the topic. At the end of the day, the conference is a chance for you to explore. Two, try spending some time outside. Being indoors all day can be more draining than you think. A little outdoor time helps keep your energy level up.

What does cloud security mean to you, personally?

Permissions are critical to running workloads in the cloud. Here’s why I’m passionate about this topic. Permissions enable people to build without getting themselves into trouble accidentally. The easier that AWS makes permissions (and managing permission in the cloud), the easier and faster it is for customers to onboard workloads to AWS—and the easier and faster it is for builders to build on AWS. And that’s what really inspires me. If builders are blocked because they don’t have access to something that they should, it’s a frustrating experience. It also means they’re not building the next cool app I didn’t know I needed, solving healthcare challenges, or innovating as fast as they can. My goal, and what I love about my work, is getting to innovate in ways that make security easier by default. People can operate safely in the cloud, and they can still move fast to build exciting things.

In your opinion, what’s a challenge facing cloud security teams right now?

AWS innovates really quickly. We send out a lot of new features that continually change the game in terms of how a central security team can approach security, monitor security, or author their permissions. Keeping up with all of this game-changing information is really, really hard. I follow Twitter and the What’s New announcements for up to date information, and of course the AWS Security Blog.

Five years from now, what changes do you think we’ll see across the security/compliance landscape?

I think we’ll see more preventative security controls turned on by default, rather than controls that rely on you to go turn them on. If you have a use case where you need to turn things off, you’ll still be able to do so. But I think turned on will become the new default, and that more management tools and services will be able to “do” security for you. This might mean that the job of the central security team will move away from building out systems and toward consuming information that comes from these systems and using it to make judgment calls based on environment and workload.

If you had to pick any other job, what would you want to do with your life?

I’ve always thought it would be fun to build and run my own sleep-away camp for girls. I’m an avid horseback rider, so there would be a lot of horseback riding. If I had to do something else, I’d go buy a plot of land somewhere and make it happen.

The AWS Security team is hiring! Want to find out more? Check out our career page.

Want more AWS Security news? Follow us on Twitter.

Author

Brigid Johnson

Brigid manages the Product Management team for AWS IAM. Prior to her career at Amazon, she received her MBA from the Carnegie Mellon’s Tepper School of Business. Brigid spent her early career as a consultant for Accenture and an engineer for JPMorgan Chase, after receiving an undergraduate degree in Computer Science from University of Illinois. In her spare time, Brigid enjoys riding and training her new horse Pickles.

AWS Security Profiles: Eric Brandwine, VP and Distinguished Engineer

Post Syndicated from Becca Crockett original https://aws.amazon.com/blogs/security/aws-security-profiles-eric-brandwine-vp-and-distinguished-engineer/

Amazon Spheres and author info
In the weeks leading up to re:Invent, we’ll share conversations we’ve had with people at AWS who will be presenting at the event so you can learn more about them and some of the interesting work that they’re doing.


How long you have been at AWS, and what you do in your current role?

I’ve been at AWS for 11 years, and I report to Steve Schmidt, who’s our Chief Information Security Officer. I’m his Chief Engineer. I engage across the entire AWS Security Org to help keep the bar high. What we do all day, every day, is help AWS employees and customers figure out how to securely build what they need to build.

What part of your job do you enjoy the most?

It’s a trite answer, but it’s true: delivery. I enjoy impacting the customer experience. Often, the way security impacts the customer experience is by not impacting it, but we’re starting to launch a suite of really exciting security services for customers.

What’s the most challenging part of your job?

It’s a combination of two things. One is Amazon’s culture. I love the culture and would not change it, but it poses particular challenges for the Security team. The second challenge is people: I thought I had a computer job, but it turns out that like pretty much all of the Senior Engineers I know, I have a people job as much as or more than I have a computer job. Amazon has a culture of distributed ownership and empowerment. It’s what allows the company to move fast and deliver as much as it does, and it’s magnificent. I wouldn’t change it. But as the Security team, we’re often in a position where we have to say that X is no longer a best practice. Whatever the X is—there’s been a new research paper, there’s a patch that’s been published—everyone needs to stop doing X and start doing Y. But there’s no central lever we can pull to get every team and every individual to stop doing X and start doing Y. I can’t go to senior leaders and say, “Please inform your generals that Y needs to be done,” and have that message move down through the ranks in an orderly fashion. Instead, we spend a lot of our time trying to establish conditions, whether it’s by building tools, reporting on the right metrics, or offering the right incentives that drive the organization towards our desired state. It’s hacking people and groups of people at enormous scale and trying to influence the organization. It’s a tremendous amount of fun, and it can also be maddening.

Do you have any advice for people early in their careers about how to meet the challenge of influencing people?

I’ve got two lessons. The first is to take a structured approach to professional interactions such as meetings and email strings. Before you start, think through what your objective is. On what can you compromise? Where are you unwilling to compromise? What does success look like? Think through the engagement from the perspective of the other parties involved. This isn’t to say that you shouldn’t treat people like people. Much of my success is due to the amount of coffee and beer that I’ve bought. However, once you’re in the meeting or whatever, keep it on topic, drive towards that outcome, and hopefully end early so there’s time for coffee.

The other is to shift the discussion towards customers. As a security professional, it’s my job to tell you that you’ve done your job poorly. That thing that you sweated over for months? The one that you poured yourself into? Yeah, that one. It’s not good enough, it’s not ready to launch. This is always a hard discussion to have. By shifting the focus from my opinion versus yours to trying to delight customers, it becomes a much easier discussion. What should we do for customers? What is right for customers? How do we protect customers? If you take this approach, then you can have difficult conversations with peers and make tough decisions about product features and releases because the focus is always on the customer and not on social cohesion, peer pressure, or other concerns that aren’t customer-focused.

Tell us about your 2018 re:Invent topic. How did you choose it?

My talk is called 0x32 Shades of #7f7f7f: The Tension Between Absolutes and Ambiguity in Security. I chose the topic because a lot of security issues come down to judgment calls that result in very finely graduated shades of gray. I’ve seen a lot of people struggle with that ambiguity—but that ambiguity is actually central to security. And the ability to deal with the ambiguity is one of the things that enables a security team to be effective. At AWS, we don’t have a checklist that we go down when we engage with a product team. That lack of a checklist makes the teams more likely to talk to us and to bring us their problems, because they know we’re going to dig into the problem with them and come up with a reasoned recommendation based on our experience, not based on the rule book. This approach is excellent and absolutely necessary. But on the flipside, there are times when absolutes apply. There are times when you draw a bright line that none shall pass. One of the biggest things that can enable a security team to scale is knowing where to draw those bright lines and how to keep those lines immaculately clean. So my talk is about that tension: the dichotomy between absolute black and white and the pervading shades of gray.

What are some of the common misperceptions you encounter about cloud security?

I’ve got a couple. The first is that choosing a cloud provider is a long-term relationship. Make sure that your provider has a track record of security improvements and flexibility. The Internet, your applications, and your customers are not static. They’re going to change over time, sometimes quite quickly. Making it perfect now doesn’t mean that it will be perfect forever, or even for very long. At Amazon, we talk about one-way doors versus two-way doors. When going through a one-way door you have to be very sure that you’re making a good decision. We’ve not found a way to reliably and quickly make these decisions at scale, but we have found that you can often avoid the one-way door entirely. Make sure that as you’re moving your applications to the cloud, your provider gives you the flexibility to change your mind about configurations, policies, and other security mechanisms in the future.

The second is that you cannot allow the perfect to be the enemy of the good. Both within Amazon and with our customers, I’ve seen people use the migration to the cloud as an opportunity to fix all of the issues that their application has accreted over years or perhaps even decades. These projects very rarely succeed. The bar is so high, the project is so complex, that it’s basically impossible to successfully deliver. You have to be realistic about the security and availability that you have now, and you have to make sure that you get both better security when you launch in the cloud, and that you have the runway to incrementally improve over time. In 2016, Rob Joyce of the NSA gave a great talk about how the NSA thinks about zero-day vulnerabilities and gaining access to systems. It’s a good, clear articulation of a well-known security lesson, that adversaries are going to take the shortest easiest path to their objective. The news has been full of low-level side channel attacks like Spectre and Meltdown. While you absolutely have to address these, you also have to adopt MFA, minimize IAM policies, and the like. Customers should absolutely make sure that their cloud provider is someone they can trust, someone who takes security very seriously and with whom they can have a long-term relationship. But they also have to make sure that their own fundamentals are in order.

If you had to pick a different job, what would it be?

I would do something dealing with outer space. I’ve always read a lot of science fiction, so I’ve always had an interest, and it’s getting to the point where space is no longer the domain of large government agencies. There are these private companies that are doing amazing things in space, and the idea of one of my systems in orbit or further out is appealing.

The AWS Security team is hiring! Want to find out more? Check out our career page.

Want more AWS Security news? Follow us on Twitter.

Author

Eric Brandwine

By day, Eric helps teams figure out how to cloud. By night, Eric stalks the streets of Gotham, keeping it safe for customers. He is marginally competent at: AWS, Networking, Distributed Systems, Security, Photography, and Sarcasm. He is also an amateur parent and husband.

AWS Security Profiles: Ben Potter, Security Lead, Well-Architected

Post Syndicated from Becca Crockett original https://aws.amazon.com/blogs/security/aws-security-profiles-ben-potter-security-lead-well-architected/

Amazon Spheres with author info

In the weeks leading up to re:Invent, we’ll share conversations we’ve had with people at AWS who will be presenting at the event so you can learn more about them and some of the interesting work that they’re doing.


How long have you been at AWS, and what do you do in your current role?

I’ve been with AWS for four and a half years. I started as a one of the first mid-market territory Solution Architects in Sydney, then I moved to professional services doing security, risk, and compliance. For the last year, I’ve been the security lead for Well-Architected, which is a global role.

What is Well-Architected?

It’s a framework that contains best practices, allowing you to measure your architecture and implement continuous improvements against those measurements. It’s designed to help your architecture evolve in alignment with five pillars: Operational Excellence, Security, Reliability, Performance Efficiency, and Cost Optimization. The framework is based on customer data that we’ve gathered, and learnings that our customers have shared. We want to share these learnings with everyone else.

How do you explain your job to non-tech friends?

Basically, I listen to customers a lot. I work with specialists and service teams around the world to help create security best practices for AWS that drive the Well-Architected framework. My work helps customers make better cloud security decisions.

What are you currently working on that you’re excited about?

I’ve been developing some in-depth, hands-on training material for Well-Architected, which you can find on GitHub. It’s all open-source, and the community is welcome to contribute. We’re just getting started with this sort of hands-on content, but we’ve run AWS-led sessions around the globe using this particular content, including at our AWS Security Lofts throughout the USA — plus Sydney, London, and Singapore — and we’ve gotten very positive feedback.

What’s the most challenging part of your job?

Everyone has different priorities and opinions on security. What a Singapore financial startup thinks is a priority is completely different from what an established bank in London thinks — which is completely different from the entertainment industry. The priorities of startups often center around short time-to-market and low cost, with less focus on security.

I’m trying to make it easy for everyone to be what we call Well-Architected in security from the start, so that the only way to do something is via automated, repeatable, secure mechanisms. AWS is great at providing building blocks, but if we can combine those building blocks into different solution sets and guidance, then we can help every customer be Well-Architected from the beginning. Most of the time, it doesn’t cost anything additional. People like me just need to spend the time developing examples, solutions, and labs, and getting them out there.

What does cloud security mean to you, personally?

Cloud security is an opportunity to rethink cybersecurity — to rethink the boundaries of what’s possible. It’s not just a security guard in front of a data center, with a big, old-fashioned firewall protecting the network. It’s a lot deeper than that. The cloud lets you influence security at every layer, from developers all the way to end users. Everyone needs to be thinking about it. I had a big presentation earlier this year, and I asked the audience, “Put your hand up if you’re responsible for your organization’s security.” Only about a quarter of the audience put their hands up. But that’s not true — it’s everyone’s responsibility. The cloud provides opportunities for businesses to innovate, improve their agility and ability to drive business value, but security needs to go hand-in-hand with all of that.

What’s the biggest issue that you see customers struggling with when it comes to cloud security?

A lot of customers don’t think about the need for incident response. They think: I don’t want to think about it. It’s never gonna happen to me. No, my access keys will never be lost. It’s fine. We’ve got processes in place, and our developers know what they’re doing. We’re never gonna lose any access keys or credentials. But it happens, people make mistakes. And it’s very important for anyone, regardless of whether or not they’re in the cloud, to be prepared for an incident, by investing in the tools that they need, by actually practicing responding to an incident, and by having run books. If X does happen, then where do I start? What do I need to do? Who do I need to communicate with? AWS can help with that, but it’s all very reactive. Incident response needs to be proactive because your organization’s reputation and business could be on the line.

In your opinion, what’s the biggest challenge facing the cloud security industry right now?

I think the biggest challenge is just staying up to date with what’s happening in the industry. Any company that develops software or tools or services is going to have a predefined plan of work. But often, security is forgotten about in that development process. Say you’re developing a mobile game: you’d probably have daily agile-style stand-ups, and you’d develop the game until you’ve got a minimum viable product. Then you’d put it out there for testing. But what if the underlying software libraries that you used to develop the game had vulnerabilities in them, and you didn’t realize this because you didn’t build in a process for hourly or daily checking of vulnerabilities in the external libraries you pulled in?

Keeping up-to-date is always a challenge, and this is where the cloud actually has a lot of power, because the cloud can drive the automated infrastructure combined with the actual code. It’s part of the whole dev ops thing — combining infrastructure code with the actual application code. You can take it all and run automated tools across it to verify your security posture and provide more granular control. In the old days, nearly everyone had keys to the data center to go in and reboot stuff. Now, you can isolate different application teams to different portions of their cloud environment. If something bad does happen, it’s much easier to contain the issue through the segmentation and micro-segmentation of services.

Five years from now, what changes do you think we’ll see across the security landscape?

I think we’re going to see a lot of change for the better. If you look at ransomware statistics that McAfee has published, new infection rates have actually gone down. More people are becoming aware of security, including end users and the general public. Cyber criminals go where the money is. This means organizations are under increasing pressure to do the right thing in terms of public safety and security.

For ransomware specifically, there’s also nomoreransom.org, a global project for which I was the “Chief Architect” — I worked with Europol, McAfee, and Kaspersky to create this website. It’s been around for a couple years now, and I think it’s already helping drive awareness of security and best practices for the public, like, don’t click on this phishing email. I co-presented a re:Invent presentation on this project few years ago, if you want more info about it.

Tell us about the chalk talk you’re giving at re:Invent this year.

The Well-Architected for Security chalk talk is meant to help customers get started by helping them identify which best practices they should follow. It’s an open QA. I’ll start by giving an overview of the Well-Architected framework, some best practices, and some design principles, and then I’ll do a live Q&A with whiteboarding. It’ll be really interactive. I like to question the audience about what they think their challenges are. Last year, I ran a session on advanced web application security that was really awesome because I actually got a lot of feedback, and I had some service team members in the room who were also able to use a lot of feedback from that session. So it’s not just about sharing, it’s also listening to customers’ challenges, which helps drive our content road map on what we need to do for customer enablement in the coming months.

Your second re:Invent session, the Security Framework Shakedown, says it will walk you through a complete security journey. What does that mean?

This session that Steve Laino and I are delivering is about where you should start in terms of design: How to know you’re designing a secure architecture, and how the Cloud Adoption and Well-Architected frameworks can help. As your company evolves, you’re going to have priorities, and you can’t do everything right the first time. So you’ll need to think about what your priorities are and create your own roadmap for an evolving architecture that becomes continually more secure. We’ve got National Australia Bank co-presenting with us. They’ll share their journey, including how they used the Cloud Adoption Framework to get started, and how they use Well-Architected daily to drive improvement across their platform.

Broadly, what are you hoping that your audience will take away from your sessions? What do you want them to do differently?

I want people to start prioritizing security in their day-to-day job roles. That prioritization means asking questions like, “What are some principles that I should include in my day to day work life? Are we using tools and automation to make security effective?” And if you’re not using automation and tools, then what’s out there that you can start using?

Any tips for first-time conference attendees?

Get out there and socialize. Talk to your peers, and try to find some mentors in the community. You’ll find that many people in the industry, both in AWS and among our customers and partners, are very willing to help you on a personal basis to develop your career.

Any tips for returning attendees?

Think about your goals, and go after that. You should be willing to give your honest feedback, too, and seek out service team members and individuals that have influenced you in the past.

You’re from Adelaide. If somebody is visiting your hometown, what would you advise them to do?

The “Mad March” festivities should not be missed. If you like red wine, you should visit the wine regions of Barossa Valley or McLaren Vale — or both. My favorite is definitely Barossa Valley.

The AWS Security team is hiring! Want to find out more? Check out our career page.

Want more AWS Security news? Follow us on Twitter.

Author

Ben Potter

Ben is the global security leader for the AWS Well-Architected Framework and is responsible for sharing best practices in security with customers and partners. Ben is also an ambassador for the No More Ransom initiative helping fight cyber crime with Europol, McAfee and law enforcement across the globe.

Trick or (the ultimate) treat!

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/trick-or-treat/

I’ll keep today’s blog post short and sweet, because Liz, Helen, and I are all still under the weather.

Raspberry Pi 4!

Don’t tell Eben, Liz, or the rest of the team I showed you this, but here’s your Halloween ‘trick or treat’ gift: an exclusive sneak peek at the Raspberry Pi 4.

We’ll be back to our regularly scheduled programming from tomorrow.

The post Trick or (the ultimate) treat! appeared first on Raspberry Pi.

We have the plague

Post Syndicated from Liz Upton original https://www.raspberrypi.org/blog/we-have-the-plague/

Apologies to our daily visitors (we love you guys); we don’t have a proper blog post for you today because we’re all really ill. (I have food poisoning, Helen is coughing up goo and can barely speak or breathe, and Alex is being sick.)

You’ve got a day until Halloween; if you’re looking for inspiration, we’ve got several years of archived spooky project posts for you to check out. And now, if you’ll excuse me, I’m going to go and have a little lie down.

The post We have the plague appeared first on Raspberry Pi.

Първи годишен доклад на британския регулатор за БиБиСи

Post Syndicated from nellyo original https://nellyo.wordpress.com/2018/10/27/ofcom_bbc-2/

Публикуван  е първият годишен доклад на  британския регулатор за обществената медия БиБиСи за периода от април 2017 г. до март 2018 г. Всяка година Ofcom ще публикува доклад, който  оценява  съответствието на Би Би Си с изискванията на закона, лицензията и свързаните с нея документи. В този доклад регулаторът предоставя оценка за това как БиБиСи се представя на фона на бързо променящия се медиен пейзаж,  за изпълнението на  мисията и обществените цели на БиБиСи.  Според първия доклад Би Би Си като цяло изпълнява своята мисия, заедно с това регулаторът посочва  четири области, в които трябват още усилия. 

Този доклад показва една обществена медия от друг свят.

Тук с обществената медия сме  така: ето така – за което СЕМ наложил санкция от 3000 лева – взети откъде? – от бюджета, много ясно.

Среден пръст, за който не ни обезщетяват  – а плащаме, когато го видим.

kushi

 

*

Оценката на БиБиСи:

 Letter to the BBC 

Ofcom’s Annual Report on the BBC (PDF, 2.3 MB)

Annex 1: Compliance with regulatory requirements (PDF, 231.7 KB)

Annex 2: BBC Performance Report (PDF, 1.1 MB)

Annex 3: Methodology overview (PDF, 267.4 KB)

View an interactive report providing visualisations of key data

Title Summary
BBC Performance Report: data download (CSV, 39.7 KB) Download of the data used in the charts from the BBC Performance Report.
BBC Performance Tracker data tables (PDF, 13.6 MB) and questionnaire (PDF, 405.9 KB) The BBC performance tracker provides Ofcom with an evidence base to assess audience opinions on the BBC’s performance against its delivery of four public purposes. The data tables contain the results from fieldwork period October 2017 to April 2018.
PSB Tracker (PDF, 10.1 MB) The PSB Tracker monitors the satisfaction and attitudes towards public service broadcasting channels.
News Consumption in the UK This report provides the findings of Ofcom’s 2018 research into news consumption across television, radio, print and online. The aim is to inform an understanding of news consumption across the UK and within each UK nation.
Children’s Media Literacy The Children’s Media Literacy Tracker is a face-to-face survey run once a year between April and June. The objective of the survey is to provide detailed evidence on media use and understanding among children and young people aged 5-15, as well as in-depth information about media access and use among children aged 3-4.

 

Deploying a Burstable and Event-driven HPC Cluster on AWS Using SLURM, Part 2

Post Syndicated from Geoff Murase original https://aws.amazon.com/blogs/compute/deploy-a-burstable-and-event-driven-hpc-cluster-on-aws-using-slurm-part-2/

Contributed by Amr Ragab, HPC Application Consultant, AWS Professional Services

In part 1 of this series, you deployed the base components to create the HPC cluster. This unique deployment stands up the SLURM headnode. For every job submitted to the queue, the headnode provisions the needed compute resources to run the job, based on job submission parameters.

By provisioning the compute nodes dynamically, you can immediately see the benefit of elasticity, scale, and optimized operational compute costs. As new technologies are released, you can take advantage of heterogeneous deployments, such as scaling high, tightly coupled, CPU-bound workloads independently from high memory or distributed GPU-based workloads.

To further extend a cloud-native approach to designing HPC architectures, you can integrate with existing AWS services and provide additional benefits by abstracting the underlying compute resources. It is possible for the HPC cluster to be event-driven in response to requests from a web application or from direct API calls.

Additional frontend components can be added to take advantage of an API-instantiated execution of an HPC workload. The following reference architecture describes the pattern.

 

The difference from the previous reference architecture in Part 1 is that the user submits the job described as JSON through an HTTP call to Amazon API Gateway, which is then processed by an AWS Lambda function to submit the job.

Deployment

I recommend that you start this section after completing the deployment in Part I . Write down the private IP address of the SLURM controller.

In the Amazon EC2 console, select the SLURM headnode and retrieve the private IPv4 address. In the Lambda console, create a new function based on Python 2.7 authored from scratch.

Under the environment variables, add a new entry for “HEADNODE”, “SLURM_BUCKET_S3”, “SLURM_KEY_S3” and set the value to the private IPv4 address of the SLURM controller noted earlier, plus the bucket and key pair. This allows the Lambda function to connect to the instance using SSH.

In the AWS GitHub repo that you cloned in part 1, find the lambda/hpc_worker.zip file and upload the contents to the Function Code section of the Lambda function. A derivative of this function was referenced by Puneet Agarwal, in the Scheduling SSH jobs using AWS Lambda post.

The Lambda function needs to launch in the VPC as the SLURM node and have the same security groups as the SLURM headnode. This is because the Lambda function connects to the SLURM controller using SSH. Ignore the error about creating the Lambda function across two Availability Zones for high availability (HA).

The default memory settings, with a timeout of 20 seconds, are sufficient. The Lambda execution role needs access to Amazon EC2, Amazon CloudWatch, and Amazon S3.

In the API Gateway console, create a new API from scratch and name it “hpc.” Under Resources, create a new resource as “hpc.” Then, create a new method under the “hpc” resource for POST.

Under the POST method, set the integration method to the Lambda function created earlier.

Under the resource “hpc”, choose to deploy the API for staging, calling the endpoint “dev.” You get an endpoint to execute:

curl -H "Content-Type: application/json" -X POST https://<endpoint>.execute-api.us-west-2.amazonaws.com/dev/hpc -d @test.json

Then, create a JSON file with the following code.

{
    "username": "awsuser", 
    "jobname": "hpc_test", 
    "nodes": 2, 
    "tasks-per-node": 1, 
    "cpus-per-task": 4, 
    "feature": "us-west-2a|us-west-2b|us-west-2c", 
    "io": 
        [{"workdir": "/home/centos/job123"},
         {"input": "s3://ar-job-input/test.input"},
         {"output": "s3://ar-job-output"}],
    "launch": "env && sleep 60"
}

Next, in the API Gateway console, watch the following four events happen:

  1. The API gateway passes the input JSON to the Lambda function.
  2. The Lambda function writes out a SLURM sbatch job submission file.
  3. The job is executed and held until the instance is provisioned
  4. After the instance is running, the job script executes, copies data from S3, and completes the job.

In the response body of the API call, you return the job ID.

{
"body": "{\"error\": \"\", \"name\": \"awsuser\", \"jobid\": \"Submitted batch job 5\\n\"}",
"statusCode": 200
}

When the job completes, the instance is held for 60 seconds in case another job is submitted. If no jobs are submitted, the instance is terminated by the SLURM cluster.

Conclusion

End-to-end scalable job submission and instance provisioning is one way to execute your HPC workloads in a scalable and elastic fashion. Now, go power your HPC workloads on AWS!

Automating rollback of failed Amazon ECS deployments

Post Syndicated from Anuneet Kumar original https://aws.amazon.com/blogs/compute/automating-rollback-of-failed-amazon-ecs-deployments/

Contributed by Vinay Nadig, Associate Solutions Architect, AWS.

With more and more organizations moving toward Agile development, it’s not uncommon to deploy code to production multiple times a day. With the increased speed of deployments, it’s imperative to have a mechanism in place where you can detect errors and roll back problematic deployments early. In this blog post, we look at a solution that automates the process of monitoring Amazon Elastic Container Service (Amazon ECS) deployments and rolling back the deployment if the container health checks fail repeatedly.

The normal flow for a service deployment on Amazon ECS is to create a new task definition revision and update an Amazon ECS service with the new task definition. Based on the values of minimumHealthyPercent and maximumHealthyPercent, Amazon ECS replaces existing containers in batches to complete the deployment. After the deployment is complete, you typically monitor the service health for errors and make a call on rolling back the deployment.

In March 2018, AWS announced support for native Docker health checks on Amazon ECS. Amazon ECS also supports Application Load Balancer health checks for services that are integrated with a load balancer. Leveraging these two features, we can build a solution that automatically rolls back Amazon ECS deployments if health checks fail.

Solution overview

The solution consists of the following components:

·      An Amazon CloudWatch event to listen for the UpdateService event of an Amazon ECS cluster

·      An AWS Lambda function that listens for the Amazon ECS events generated from the cluster after the service update

·      A Lambda function that calculates the failure percentage based on the events in the Amazon ECS event stream

·      A Lambda function that triggers rollback of the deployment if there are high error rates in the event

·      An AWS Step Functions state machine to orchestrate the entire flow

 

The following diagram shows the solution’s components and workflow.

Assumptions

The following assumptions are important to understand before you implement the solution:

·      The solution assumes that with every revision of the task definition, you use a new Docker tag instead of using the default “latest” tag. As a best practice, we advise that you do every release with a different Docker image tag and a revision of the task definition.

·      If there are continuous healthcheck failures even after the deployment is automatically rolled back using this setup, another rollback is triggered due to the health check failures. This might introduce a runaway deployment rollback loop. Make sure that you use the solution where you know that a one-step rollback will bring the Amazon ECS service into a stable state.

·      This blog post assumes deployment to the US West (Oregon) us-west-2 Region. If you want to deploy the solution to other Regions, you need to make minor modifications to the Lambda code.

·      The Amazon ECS cluster launches in a new VPC. Make sure that your VPC service limit allows for a new VPC.

Prerequisites

You need the following permissions in AWS Identity and Access Management (IAM) to implement the solution:

·      Create IAM Roles

·      Create ECS Cluster

·      Create CloudWatch Rule

·      Create Lambda Functions

·      Create Step Functions

Creating the Amazon ECS cluster

First, we create an Amazon ECS cluster using the AWS Management Console.

1. Sign in to the AWS Management Console and open the Amazon ECS console.

2. For Step 1: Select cluster template, choose EC2 Linux + Networking and then choose Next step.

3. For Step 2: Configure cluster, under Configure cluster, enter the Amazon ECS cluster name as AutoRollbackTestCluster.

 

4. Under Instance configuration, for EC2 instance type, choose t2.micro.

5. Keep the default values for the rest of the settings and choose Create.

 

This provisions an Amazon ECS cluster with a single Amazon ECS container instance.

Creating the task definition

Next, we create a new task definition using the Nginx Alpine image.

1. On the Amazon ECS console, choose Task Definitions in the navigation pane and then choose Create new Task Definition.

2. For Step 1: Select launch type compatibility, choose EC2 and then choose Next step.

3. For Task Definition Name, enter Web-Service-Definition.

4. Under Task size, under Container Definitions, choose Add Container.

5.  On the Add container pane, under Standard, enter Web-Service-Container for Container name.

6.  For Image, enter nginx:alpine. This pulls the nginx:alpine Docker image from Docker Hub.

7.  For Memory Limits (MiB), choose Hard limit and enter 128.

8.  Under Advanced container configuration, enter the following information for Healthcheck:

 

·      Command:

CMD-SHELL, wget http://localhost/ && rm index.html || exit 1

·      Interval: 10

·      Timeout: 30

·      Start period: 10

·      Retries: 2

9.  Keep the default values for the rest of the settings on this pane and choose Add.

10. Choose Create.

Creating the Amazon ECS service

Next, we create an Amazon ECS service that uses this task definition.

1.  On the Amazon ECS console, choose Clusters in the navigation pane and then choose AutoRollbackTestCluster.

2.  On the Services view, choose Create.

3.  For Step 1: Configure service, use the following settings:

·      Launch type: EC2.

·      Task Definition Family: Web-Service-Definition. This automatically selects the latest revision of the task definition.

·      Cluster: AutoRollbackTestCluster.

·      Service name: Nginx-Web-Service.

·      Number of tasks: 3.

4.  Keep the default values for the rest of the settings and choose Next Step.

5.  For Step 2: Configure network, keep the default value for Load balancer type and choose Next Step.

6. For Step 3: Set Auto Scaling (optional), keep the default value for Service Auto Scaling and choose Next Step.

7. For Step 4: Review, review the settings and choose Create Service.

After creating the service, you should have three tasks running in the cluster. You can verify this on the Tasks view in the service, as shown in the following image.

Implementing the solution

With the Amazon ECS cluster set up, we can move on to implementing the solution.

Creating the IAM role

First, we create an IAM role for reading the event stream of the Amazon ECS service and rolling back any faulty deployments.

 

1.  Open the IAM console and choose Policies in the navigation pane.

2.  Choose Create policy.

3.  On the Visual editor view, for Service, choose EC2 Container Service.

4.  For Actions, under Access Level, select DescribeServices for Read and UpdateServices for Write.

5.  Choose Review policy.

6.  For Name, enter ECSRollbackPolicy.

7.  For Description, enter an appropriate description.

8.  Choose Create policy.

Creating the Lambda service role

Next, we create a Lambda service role that uses the previously created IAM policy. The Lambda function to roll back faulty deployments uses this role.

 

1.  On the IAM console, choose Roles in the navigation pane and then choose Create role.

2.  For the type of trusted entity, choose AWS service.

3.  For the service that will use this role, choose Lambda.

4.  Choose Next: Permissions.

5.  Under Attach permissions policies, select the ECSRollbackPolicy policy that you created.

6. Choose Next: Review.

7.  For Role name, enter ECSRollbackLambdaRole and choose Create role.

Creating the Lambda function for the Step Functions workflow and Amazon ECS event stream

The next step is to create the Lambda function that will collect Amazon ECS events from the Amazon ECS event stream. This Lambda function will be part of the Step Functions state machine.

 

1.  Open the Lambda console and choose Create function.

2.  For Name, enter ECSEventCollector.

3.  For Runtime, choose Python 3.6.

4.  For Existing role, choose the ECSRollbackLambdaRole IAM role that you created.

5. Choose Create function.

6.  On the Configuration view, under Function code, enter the following code.

import time
import boto3
from datetime import datetime

ecs = boto3.client('ecs', region_name='us-west-2')


def lambda_handler(event, context):
    service_name = event['detail']['requestParameters']['service']
    cluster_name = event['detail']['requestParameters']['cluster']
    _update_time = event['detail']['eventTime']
    _update_time = datetime.strptime(_update_time, "%Y-%m-%dT%H:%M:%SZ")
    start_time = _update_time.strftime("%s")
    seconds_from_start = time.time() - int(start_time)
    event.update({'seconds_from_start': seconds_from_start})

    _services = ecs.describe_services(
        cluster=cluster_name, services=[service_name])
    service = _services['services'][0]
    service_events = service['events']
    events_since_update = [event for event in service_events if int(
        (event['createdAt']).strftime("%s")) > int(start_time)]
    [event.pop('createdAt') for event in events_since_update]
    event.update({"events": events_since_update})
    return event

 

7. Under Basic Settings, set Timeout to 30 seconds.

 

8.  Choose Save.

Creating the Lambda function to calculate failure percentage

Next, we create a Lambda function that calculates the failure percentage based on the number of failed container health checks derived from the event stream.

 

1.     On the Lambda console, choose Create function.

2.     For Name, enter ECSFailureCalculator.

3.     For Runtime, choose Python 3.6.

4.     For Existing role, choose the ECSRollbackLambdaRole IAM role that you created.

5.     Choose Create function.

6.     On the Configuration view, under Function code, enter the following code.

 

import re

lb_hc_regex = re.compile("\(service (.*)?\) \(instance (i-[a-z0-9]{7,17})\) \(port ([0-9]{4,5})\) is unhealthy in \(target-group (.*)?\) due to \((.*)?: \[(.*)\]\)")
docker_hc_regex = re.compile("\(service (.*)?\) \(task ([a-z0-9]{8}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{12})\) failed container health checks\.")
task_registration_formats = ["\(service (.*)?\) has started ([0-9]{1,9}) tasks: (.*)\."]


def lambda_handler(event, context):
    cluster_name = event['detail']['requestParameters']['cluster']
    service_name = event['detail']['requestParameters']['service']

    messages = [m['message'] for m in event['events']]
    failures = get_failure_messages(messages)
    registrations = get_registration_messages(messages)
    failure_percentage = get_failure_percentage(failures, registrations)
    print("Failure Percentage = {}".format(failure_percentage))
    return {"failure_percentage": failure_percentage, "service_name": service_name, "cluster_name": cluster_name}


def get_failure_percentage(failures, registrations):
    no_of_failures = len(failures)
    no_of_registrations = sum([float(x[0][1]) for x in registrations])
    return no_of_failures / no_of_registrations * 100 if no_of_registrations > 0 else 0


def get_failure_messages(messages):
    failures = []
    for message in messages:
        failures.append(lb_hc_regex.findall(message)) if lb_hc_regex.findall(message) else None
        failures.append(docker_hc_regex.findall(message)) if docker_hc_regex.findall(message) else None
    return failures


def get_registration_messages(messages):
    registrations = []
    for message in messages:
        for registration_format in task_registration_formats:
            if re.findall(registration_format, message):
                registrations.append(re.findall(registration_format, message))
    return registrations

7.     Under Basic Settings, set Timeout to 30 seconds.

8.     Choose Save.

Creating the Lambda function to roll back a deployment

Next, we create a Lambda function to roll back an Amazon ECS deployment.

 

1.     On the Lambda console, choose Create function.

2.     For Name, enter ECSRollbackfunction.

3.     For Runtime, choose Python 3.6.

4.     For Existing role, choose the ECSRollbackLambdaRole IAM role that you created.

5.     Choose Create function.

6.     On the Configuration view, under Function code, enter the following code.

 

import boto3

ecs = boto3.client('ecs', region_name='us-west-2')

def lambda_handler(event, context):
    service_name = event['service_name']
    cluster_name = event['cluster_name']

    _services = ecs.describe_services(cluster=cluster_name, services=[service_name])
    task_definition = _services['services'][0][u'taskDefinition']
    previous_task_definition = get_previous_task_definition(task_definition)

    ecs.update_service(cluster=cluster_name, service=service_name, taskDefinition=previous_task_definition)
    print("Rollback Complete")
    return {"Rollback": True}

def get_previous_task_definition(task_definition):
    previous_version_number = str(int(task_definition.split(':')[-1])-1)
    previous_task_definition = ':'.join(task_definition.split(':')[:-1]) + ':' + previous_version_number
    return previous_task_definition


7.     Under Basic Settings, set Timeout to 30 seconds.

8.     Choose Save.

Creating the Step Functions state machine

Next, we create a Step Functions state machine that performs the following steps:

 

1.     Collect events of a specified service for a specified duration from the event stream of the Amazon ECS cluster.

2.     Calculate the percentage of failures after the deployment.

3.     If the failure percentage is greater than a specified threshold, roll back the service to the previous task definition.

 

To create the state machine:

1.     Open the Step Functions console and choose Create state machine.

2.     For Name, enter ECSAutoRollback.

For IAM role, keep the default selection of Create a role for me and select the check box. This will create a new IAM role with necessary permissions for the execution of the state machine.

Note
If you have already created a Step Functions state machine, IAM Role is populated.

3.     For State machine definition, enter the following code, replacing the Amazon Resource Name (ARN) placeholders with the ARNs of the three Lambda functions that you created.

{
    "StartAt": "VerifyClusterAndService",
    "States":
    {
        "VerifyClusterAndService":
        {
            "Type": "Choice",
            "Choices": [
            {
                "And": [
                {
                    "Variable": "$.detail.requestParameters.cluster",
                    "StringEquals": "AutoRollbackTestCluster"
                },
                {
                    "Variable": "$.detail.requestParameters.service",
                    "StringEquals": "Nginx-Web-Service"
                }],
                "Next": "GetTasksStatus"
            },
            {
                "Not":
                {
                    "And": [
                    {
                        "Variable": "$.detail.requestParameters.cluster",
                        "StringEquals": "AutoRollbackTestCluster"
                    },
                    {
                        "Variable": "$.detail.requestParameters.service",
                        "StringEquals": "Nginx-Web-Service"
                    }]
                },
                "Next": "EndState"
            }]
        },
        "GetTasksStatus":
        {
            "Type": "Task",
            "Resource": "<ARN-of-ECSEventCollector-Lambda-Function>",
            "Next": "WaitForInterval"
        },
        "WaitForInterval":
        {
            "Type": "Wait",
            "Seconds": 5,
            "Next": "IntervalCheck"
        },
        "IntervalCheck":
        {
            "Type": "Choice",
            "Choices": [
            {
                "Variable": "$.seconds_from_start",
                "NumericGreaterThan": 300,
                "Next": "FailureCalculator"
            },
            {
                "Variable": "$.seconds_from_start",
                "NumericLessThan": 300,
                "Next": "GetTasksStatus"
            }]
        },
        "FailureCalculator":
        {
            "Type": "Task",
            "Resource": "<ARN-of-ECSFailureCalculator-Lambda-Function-here>",
            "Next": "RollbackDecider"
        },
        "RollbackDecider":
        {
            "Type": "Choice",
            "Choices": [
            {
                "Variable": "$.failure_percentage",
                "NumericGreaterThan": 10,
                "Next": "RollBackDeployment"
            },
            {
                "Variable": "$.failure_percentage",
                "NumericLessThan": 10,
                "Next": "EndState"
            }]
        },
        "RollBackDeployment":
        {
            "Type": "Task",
            "Resource": "<ARN-of-ECSRollbackFunction-Lambda-Function-here>",
            "Next": "EndState"
        },
        "EndState":
        {
            "Type": "Succeed"
        }
    }
}

4.     Choose Create state machine.

 

Now we have a mechanism to roll back a deployment if there are more than a configurable percentage of errors after a deployment to a specific Amazon ECS service.

(Optional) Monitoring and rolling back all services in the Amazon ECS cluster

Step Functions hard-codes the Amazon ECS service name in the state machine so that you monitor only a specific service in the cluster. The following image shows these lines in the state machine’s definition.

If you want to monitor all services and automatically roll back any Amazon ECS deployment in the cluster  based on failures, modify the state machine definition to verify only the cluster name and to not verify the service name. To do this, remove the service name check in the definition, as shown in the following image.

The following code verifies only the cluster name. It monitors any Amazon ECS service and performs a rollback if there are errors.

{
    "StartAt": "VerifyClusterAndService",
    "States":
    {
        "VerifyClusterAndService":
        {
            "Type": "Choice",
            "Choices": [
            {
                "Variable": "$.detail.requestParameters.cluster",
                "StringEquals": "AutoRollbackTestCluster",
                "Next": "GetTasksStatus"
            },
            {
                "Not":
                {
                    "Variable": "$.detail.requestParameters.cluster",
                    "StringEquals": "AutoRollbackTestCluster"
                },
                "Next": "EndState"
            }]
        },
        "GetTasksStatus":
        {
            "Type": "Task",
            "Resource": "<ARN-of-ECSEventCollector-Lambda-Function>",
            "Next": "WaitForInterval"
        },
        "WaitForInterval":
        {
            "Type": "Wait",
            "Seconds": 5,
            "Next": "IntervalCheck"
        },
        "IntervalCheck":
        {
            "Type": "Choice",
            "Choices": [
            {
                "Variable": "$.seconds_from_start",
                "NumericGreaterThan": 300,
                "Next": "FailureCalculator"
            },
            {
                "Variable": "$.seconds_from_start",
                "NumericLessThan": 300,
                "Next": "GetTasksStatus"
            }]
        },
        "FailureCalculator":
        {
            "Type": "Task",
            "Resource": "<ARN-of-ECSFailureCalculator-Lambda-Function-here>",
            "Next": "RollbackDecider"
        },
        "RollbackDecider":
        {
            "Type": "Choice",
            "Choices": [
            {
                "Variable": "$.failure_percentage",
                "NumericGreaterThan": 10,
                "Next": "RollBackDeployment"
            },
            {
                "Variable": "$.failure_percentage",
                "NumericLessThan": 10,
                "Next": "EndState"
            }]
        },
        "RollBackDeployment":
        {
            "Type": "Task",
            "Resource": "<ARN-of-ECSRollbackFunction-Lambda-Function-here>",
            "Next": "EndState"
        },
        "EndState":
        {
            "Type": "Succeed"
        }
    }
}

 

Configuring the state machine to execute automatically upon Amazon ECS deployment

Next, we configure a trigger for the state machine so that its execution automatically starts when there is an Amazon ECS deployment. We use Amazon CloudWatch to configure the trigger.

 

1.     Open the CloudWatch console and choose Rules in the navigation pane.

2.     Choose Create rule and use the following settings:

·      Event Source

o   Service Name: EC2 Container Service (ECS)

o   Event Type: AWS API Call via CloudTrail

o   Operations: choose ‘Specific Operations and enter UpdateService

·      Targets

o   Step Functions state machine

o   State machine: ECSAutoRollback

3.     Choose Configure details.

4.     For Name, enter ECSServiceUpdateRule.

5.     For Description, enter an appropriate description.

6.     For State, make sure that Enabled is selected.

7.     Click Create rule.

 

Setting up the CloudWatch trigger is the last step in linking the Amazon ECS UpdateService events to the Step Functions state machine that we set up. With this step complete, we can move on to testing the solution.

Testing the solution

Let’s update the task definition and force a failure of the container health checks so that we can confirm that the deployment rollback occurs as expected.

 

To test the solution:

 

1.     Open the Amazon ECS console and choose Task Definitions in the navigation pane.

2.     Select the check box next to Web-Service-Definition and choose Create new revision.

3.     Under Container Definitions, choose Web-Service-Container.

4.     On the Edit container pane, under Healthcheck, update Command to

CMD-SHELL, wget http://localhost/does-not-exist.html && rm index.html || exit 1 

and choose Update.

5.     Choose Create. This creates the task definition revision.

6.     Open the Nginx-Web-Service page of the Amazon ECS console and choose Update.

7.     For Task Definition, select the latest revision.

8.    Keep the default values for the rest of the settings by choosing Next Step until you reach Review.

9.     Choose Update Service. This creates a new Amazon ECS deployment.

This service update triggers the CloudWatch rule, which in turn triggers the state machine. The state machine collects the Amazon ECS events for 300 seconds. If the percentage of errors due to health check failures is more than 10%, the deployment is automatically rolled back. You can verify this on the Step Functions console. On the Executions view, you should see a new execution that the deployment is triggering, as shown in the following image.

Choose the execution to see the workflow in progress. After the workflow is complete, you can check the outcome of the workflow by choosing EndState in Visual Workflow. The output should show {“Rollback”: true}.

You can also verify in the service details that the service has been updated with the previous version of the task definition.

Conclusion

With this solution, you can detect issues with Amazon ECS deployments early on and automate failure responses. You can also integrate the solution into your existing systems by triggering an Amazon SNS notification to send email or SMS instead of rolling back the deployment automatically. Though this blog uses Amazon ECS, you can follow similar steps to have automatic rollback for AWS Fargate.

If you want to customize the duration for monitoring your deployments before deciding to rollback and the error percentage threshold beyond which a rollback should be triggered, modify the values highlighted in the following image of the state machine definition.

New – How to better monitor your custom application metrics using Amazon CloudWatch Agent

Post Syndicated from Helen Lin original https://aws.amazon.com/blogs/devops/new-how-to-better-monitor-your-custom-application-metrics-using-amazon-cloudwatch-agent/

This blog was contributed by Zhou Fang, Sr. Software Development Engineer for Amazon CloudWatch and Helen Lin, Sr. Product Manager for Amazon CloudWatch

Amazon CloudWatch collects monitoring and operational data from both your AWS resources and on-premises servers, providing you with a unified view of your infrastructure and application health. By default, CloudWatch automatically collects and stores many of your AWS services’ metrics and enables you to monitor and alert on metrics such as high CPU utilization of your Amazon EC2 instances. With the CloudWatch Agent that launched last year, you can also deploy the agent to collect system metrics and application logs from both your Windows and Linux environments. Using this data collected by CloudWatch, you can build operational dashboards to monitor your service and application health, set high-resolution alarms to alert and take automated actions, and troubleshoot issues using Amazon CloudWatch Logs.

We recently introduced CloudWatch Agent support for collecting custom metrics using StatsD and collectd. It’s important to collect system metrics like available memory, and you might also want to monitor custom application metrics. You can use these custom application metrics, such as request count to understand the traffic going through your application or understand latency so you can be alerted when requests take too long to process. StatsD and collectd are popular, open-source solutions that gather system statistics for a wide variety of applications. By combining the system metrics the agent already collects, with the StatsD protocol for instrumenting your own metrics and collectd’s numerous plugins, you can better monitor, analyze, alert, and troubleshoot the performance of your systems and applications.

Let’s dive into an example that demonstrates how to monitor your applications using the CloudWatch Agent.  I am operating a RESTful service that performs simple text encoding. I want to use CloudWatch to help monitor a few key metrics:

  • How many requests are coming into my service?
  • How many of these requests are unique?
  • What is the typical size of a request?
  • How long does it take to process a job?

These metrics help me understand my application performance and throughput, in addition to setting alarms on critical metrics that could indicate service degradation, such as request latency.

Step 1. Collecting StatsD metrics

My service is running on an EC2 instance, using Amazon Linux AMI 2018.03.0. Make sure to attach the CloudWatchAgentServerPolicy AWS managed policy so that the CloudWatch agent can collect and publish metrics from this instance:

Here is the service structure:

 

The “/encode” handler simply returns the base64 encoded string of an input text.  To monitor key metrics, such as total and unique request count as well as request size and method response time, I used StatsD to define these custom metrics.

@RestController

public class EncodeController {

    @RequestMapping("/encode")
    public String encode(@RequestParam(value = "text") String text) {
        long startTime = System.currentTimeMillis();
        statsd.incrementCounter("totalRequest.count", new String[]{"path:/encode"});
        statsd.recordSetValue("uniqueRequest.count", text, new String[]{"path:/encode"});
        statsd.recordHistogramValue("request.size", text.length(), new String[]{"path:/encode"});
        String encodedString = Base64.getEncoder().encodeToString(text.getBytes());
        statsd.recordExecutionTime("latency", System.currentTimeMillis() - startTime, new String[]{"path:/encode"});
        return encodedString;
    }
}

Note that I need to first choose a StatsD client from here.

The “/status” handler responds with a health check ping.  Here I am monitoring my available JVM memory:

@RestController
public class StatusController {

    @RequestMapping("/status")
    public int status() {
        statsd.recordGaugeValue("memory.free", Runtime.getRuntime().freeMemory(), new String[]{"path:/status"});
        return 0;
    }
}

 

Step 2. Emit custom metrics using collectd (optional)

collectd is another popular, open-source daemon for collecting application metrics. If I want to use the hundreds of available collectd plugins to gather application metrics, I can also use the CloudWatch Agent to publish collectd metrics to CloudWatch for 15-months retention. In practice, I might choose to use either StatsD or collectd to collect custom metrics, or I have the option to use both. All of these use cases  are supported by the CloudWatch agent.

Using the same demo RESTful service, I’ll show you how to monitor my service health using the collectd cURL plugin, which passes the collectd metrics to CloudWatch Agent via the network plugin.

For my RESTful service, the “/status” handler returns HTTP code 200 to signify that it’s up and running. This is important to monitor the health of my service and trigger an alert when the application does not respond with a HTTP 200 success code. Additionally, I want to monitor the lapsed time for each health check request.

To collect these metrics using collectd, I have a collectd daemon installed on the EC2 instance, running version 5.8.0. Here is my collectd config:

LoadPlugin logfile
LoadPlugin curl
LoadPlugin network

<Plugin logfile>
  LogLevel "debug"
  File "/var/log/collectd.log"
  Timestamp true
</Plugin>

<Plugin curl>
    <Page "status">
        URL "http://localhost:8080/status";
        MeasureResponseTime true
        MeasureResponseCode true
    </Page>
</Plugin>

<Plugin network>
    <Server "127.0.0.1" "25826">
        SecurityLevel Encrypt
        Username "user"
        Password "secret"
    </Server>
</Plugin>

 

For the cURL plugin, I configured it to measure response time (latency) and response code (HTTP status code) from the RESTful service.

Note that for the network plugin, I used Encrypt mode which requires an authentication file for the CloudWatch Agent to authenticate incoming collectd requests.  Click here for full details on the collectd installation script.

 

Step 3. Configure the CloudWatch agent

So far, I have shown you how to:

A.  Use StatsD to emit custom metrics to monitor my service health
B.  Optionally use collectd to collect metrics using plugins

Next, I will install and configure the CloudWatch agent to accept metrics from both the StatsD client and collectd plugins.

I installed the CloudWatch Agent following the instructions in the user guide, but here are the detailed steps:

Install CloudWatch Agent:

wget https://s3.amazonaws.com/amazoncloudwatch-agent/linux/amd64/latest/AmazonCloudWatchAgent.zip -O AmazonCloudWatchAgent.zip && unzip -o AmazonCloudWatchAgent.zip && sudo ./install.sh

Configure CloudWatch Agent to receive metrics from StatsD and collectd:

{
  "metrics": {
    "append_dimensions": {
      "AutoScalingGroupName": "${aws:AutoScalingGroupName}",
      "InstanceId": "${aws:InstanceId}"
    },
    "metrics_collected": {
      "collectd": {},
      "statsd": {}
    }
  }
}

Pass the above config (config.json) to the CloudWatch Agent:

sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -c file:config.json -s

In case you want to skip these steps and just execute my sample agent install script, you can find it here.

 

Step 4. Generate and monitor application traffic in CloudWatch

Now that I have the CloudWatch agent installed and configured to receive StatsD and collect metrics, I’m going to generate traffic through the service:

echo "send 100 requests"
for i in {1..100}
do
   curl "localhost:8080/encode?text=TextToEncode_${i}[email protected]#%"
   echo ""
   sleep 1
done

 

Next, I log in to the CloudWatch console and check that the service is up and running. Here’s a graph of the StatsD metrics:

 

Here is a graph of the collectd metrics:

 

Conclusion

With StatsD and collectd support, you can now use the CloudWatch Agent to collect and monitor your custom applications in addition to the system metrics and application logs it already collects. Furthermore, you can create operational dashboards with these metrics, set alarms to take automated actions when free memory is low, and troubleshoot issues by diving into the application logs.  Note that StatsD supports both Windows and Linux operating systems while collectd is Linux only.  For Windows, you can also continue to use Windows Performance Counters to collect custom metrics instead.

The CloudWatch Agent with custom metrics support (version 1.203420.0 or later) is available in all public AWS Regions, AWS GovCloud (US), with AWS China (Beijing) and AWS China (Ningxia) coming soon.

The agent is free to use; you pay the usual CloudWatch prices for logs and custom metrics.

For more details, head over to the CloudWatch user guide for StatsD and collectd.