Tag Archives: alerts

AWS Alarms for Application Errors

Post Syndicated from Bozho original https://techblog.bozho.net/aws-alarms-for-application-errors/

Monitoring is key for any real-world application. You have to know what’s happening and be alerted in real time if something wrong is happening. AWS has CloudWatch for that, and gives you a lot of metrics automatically. But there are some that you have to define yourself. And then you need to define proper alarms.

Here I’ll focus on hour:

  • High number of application errors
  • High number of application warnings
  • High number of 5xx errors on the load balancer
  • High number of 4xx errors on the load balancer

First, the prerequisites:

  • You need to be using CloudFormation to automate everything. You can create all of those things manually, but automation is a big plus
  • If using CloudFormation, you’d preferably have a sub-stack for configuring alarms
  • You need to be collecting your logs with CloudWatch logs

If you are not using CloudWatch logs, here’s a simple config file and script to enable them:

{
  "agent": {
    "metrics_collection_interval": 10,
    "region": "eu-west-1",
    "logfile": "/opt/aws/amazon-cloudwatch-agent/logs/amazon-cloudwatch-agent.log"
  },
  "logs": {
    "logs_collected": {
      "files": {
        "collect_list": [
          {
            "file_path": "{{logPath}}",
            "log_group_name": "{{logGroupName}}",
            "log_stream_name": "{instance_id}",
            "timestamp_format": "%Y-%m-%d %H:%M:%S"
          }
        ]
      }
    }
  }
}
# install AWS CloudWatch monitor
mkdir cloud-watch-agent
cd cloud-watch-agent
wget https://s3.amazonaws.com/amazoncloudwatch-agent/linux/amd64/latest/AmazonCloudWatchAgent.zip
unzip AmazonCloudWatchAgent.zip
./install.sh

aws s3 cp s3://$BUCKET_NAME/cloudwatch-agent-config.json /var/config/cloudwatch-agent-config.json
sed -i -- 's|{{logPath}}|/var/log/application.log|g' /var/config/cloudwatch-agent-config.json
sed -i -- 's|{{logGroupName}}|app_node|g' /var/config/cloudwatch-agent-config.json

sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -c file:/var/config/cloudwatch-agent-config.json -s

Now you have to define two things: Log metrics and alarms. The cloudformation code below creates both:

    "HighAppErrorsNotification": {
      "Type": "AWS::CloudWatch::Alarm",
      "Properties": {
        "AlarmActions": [
          {
            "Ref": "NotificationTopicId"
          }
        ],
        "InsufficientDataActions": [
          {
            "Ref": "NotificationTopicId"
          }
        ],
        "AlarmDescription": "Notify if there are too many application errors",
        "ComparisonOperator": "GreaterThanOrEqualToThreshold",
        "EvaluationPeriods": "1",
        "MetricName": "ApplicationErrors",
        "Namespace": "LogMetrics",
        "Period": "900",
        "Statistic": "Sum",
        "Threshold": "5",
        "TreatMissingData": "ignore"
      }
    },
    "ErrorMetricFilter": {
      "Type": "AWS::Logs::MetricFilter",
      "Properties": {
        "LogGroupName": "app_node",
        "FilterPattern": "ERROR",
        "MetricTransformations": [
          {
            "DefaultValue": 0,
            "MetricValue": "1",
            "MetricNamespace": "LogMetrics",
            "MetricName": "ApplicationErrors"
          }
        ]
      }
    },

If you need to do that manually, go to the CloudWatch logs homepage, select the log group (app_node) and use the button “Create metric filter” ontop. It lets you specify the pattern to look for (“ERROR” in this case). When you have that ready, you can create an Alarm based on it, through the Alarms -> Create alarm. Lookup the metric by name and select it to trigger the alarm (in the example above, it gets triggered if there are more than 5 errors within 900 seconds)

You can then create an identical alarm for warnings (pattern to look for: “WARN”). The threshold there might be higher, e.g. 10 or 20. But that depends on your application logging patterns.

Then there’s the error 5xx load balancer alarms. In CloudFormation it would look like this:

   "TooMany5xxErrorsWebAppAlarmNotification": {
      "Type": "AWS::CloudWatch::Alarm",
      "Properties": {
        "AlarmActions": [
          {
            "Ref": "NotificationTopicId"
          }
        ],
        "InsufficientDataActions": [
          {
            "Ref": "NotificationTopicId"
          }
        ],
        "AlarmDescription": "Notify if there are too many 5xx errors",
        "ComparisonOperator": "GreaterThanOrEqualToThreshold",
        "Dimensions": [
          {
            "Name": "LoadBalancer",
            "Value": {
              "Ref": "WebAppALBId"
            }
          }
        ],
        "TreatMissingData": "notBreaching",
        "EvaluationPeriods": "1",
        "MetricName": "HTTPCode_Target_5XX_Count",
        "Namespace": "AWS/ApplicationELB",
        "Period": "60",
        "Statistic": "Sum",
        "Threshold": "2"
      }
    }

You can again create that manually – look for the HTTPCode_Target_5XX_Count metric in the metric selection screen for the alarm. You have several options there, the most straightforward is to select the per AppELB metric. And again, the same approach can be used for 4xx errors (HTTPCode_Target_5XX_Count).

Getting this running with CloudFormation (and even manually) is not as straighforward as it seems. The right combination of metric names, namespaces and values is not obvious and the relevant documentation is not the first thing that pops up. So I decided to share something that works, as it may take some time experimenting before getting it to that state.

But even outside of CloudFormation or AWS context, monitoring and alerting in case of a high number of application errors, warnings and HTTP errors is a must. And automating the creation of those alarms is the recommended approach.

The post AWS Alarms for Application Errors appeared first on Bozho's tech blog.

The Benefits of Side Projects

Post Syndicated from Bozho original https://techblog.bozho.net/the-benefits-of-side-projects/

Side projects are the things you do at home, after work, for your own “entertainment”, or to satisfy your desire to learn new stuff, in case your workplace doesn’t give you that opportunity (or at least not enough of it). Side projects are also a way to build stuff that you think is valuable but not necessarily “commercialisable”. Many side projects are open-sourced sooner or later and some of them contribute to the pool of tools at other people’s disposal.

I’ve outlined one recommendation about side projects before – do them with technologies that are new to you, so that you learn important things that will keep you better positioned in the software world.

But there are more benefits than that – serendipitous benefits, for example. And I’d like to tell some personal stories about that. I’ll focus on a few examples from my list of side projects to show how, through a sort-of butterfly effect, they helped shape my career.

The computoser project, no matter how cool algorithmic music composition, didn’t manage to have much of a long term impact. But it did teach me something apart from niche musical theory – how to read a bulk of scientific papers (mostly computer science) and understand them without being formally trained in the particular field. We’ll see how that was useful later.

Then there was the “State alerts” project – a website that scraped content from public institutions in my country (legislation, legislation proposals, decisions by regulators, new tenders, etc.), made them searchable, and “subscribable” – so that you get notified when a keyword of interest is mentioned in newly proposed legislation, for example. (I obviously subscribed for “information technologies” and “electronic”).

And that project turned out to have a significant impact on the following years. First, I chose a new technology to write it with – Scala. Which turned out to be of great use when I started working at TomTom, and on the 3rd day I was transferred to a Scala project, which was way cooler and much more complex than the original one I was hired for. It was a bit ironic, as my colleagues had just read that “I don’t like Scala” a few weeks earlier, but nevertheless, that was one of the most interesting projects I’ve worked on, and it went on for two years. Had I not known Scala, I’d probably be gone from TomTom much earlier (as the other project was restructured a few times), and I would not have learned many of the scalability, architecture and AWS lessons that I did learn there.

But the very same project had an even more important follow-up. Because if its “civic hacking” flavour, I was invited to join an informal group of developers (later officiated as an NGO) who create tools that are useful for society (something like MySociety.org). That group gathered regularly, discussed both tools and policies, and at some point we put up a list of policy priorities that we wanted to lobby policy makers. One of them was open source for the government, the other one was open data. As a result of our interaction with an interim government, we donated the official open data portal of my country, functioning to this day.

As a result of that, a few months later we got a proposal from the deputy prime minister’s office to “elect” one of the group for an advisor to the cabinet. And we decided that could be me. So I went for it and became advisor to the deputy prime minister. The job has nothing to do with anything one could imagine, and it was challenging and fascinating. We managed to pass legislation, including one that requires open source for custom projects, eID and open data. And all of that would not have been possible without my little side project.

As for my latest side project, LogSentinel – it became my current startup company. And not without help from the previous two mentioned above – the computer science paper reading was of great use when I was navigating the crypto papers landscape, and from the government job I not only gained invaluable legal knowledge, but I also “got” a co-founder.

Some other side projects died without much fanfare, and that’s fine. But the ones above shaped my “story” in a way that would not have been possible otherwise.

And I agree that such serendipitous chain of events could have happened without side projects – I could’ve gotten these opportunities by meeting someone at a bar (unlikely, but who knows). But we, as software engineers, are capable of tilting chance towards us by utilizing our skills. Side projects are our “extracurricular activities”, and they often lead to unpredictable, but rather positive chains of events. They would rarely be the only factor, but they are certainly great at unlocking potential.

The post The Benefits of Side Projects appeared first on Bozho's tech blog.

AWS IoT 1-Click – Use Simple Devices to Trigger Lambda Functions

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-iot-1-click-use-simple-devices-to-trigger-lambda-functions/

We announced a preview of AWS IoT 1-Click at AWS re:Invent 2017 and have been refining it ever since, focusing on simplicity and a clean out-of-box experience. Designed to make IoT available and accessible to a broad audience, AWS IoT 1-Click is now generally available, along with new IoT buttons from AWS and AT&T.

I sat down with the dev team a month or two ago to learn about the service so that I could start thinking about my blog post. During the meeting they gave me a pair of IoT buttons and I started to think about some creative ways to put them to use. Here are a few that I came up with:

Help Request – Earlier this month I spent a very pleasant weekend at the HackTillDawn hackathon in Los Angeles. As the participants were hacking away, they occasionally had questions about AWS, machine learning, Amazon SageMaker, and AWS DeepLens. While we had plenty of AWS Solution Architects on hand (decked out in fashionable & distinctive AWS shirts for easy identification), I imagined an IoT button for each team. Pressing the button would alert the SA crew via SMS and direct them to the proper table.

Camera ControlTim Bray and I were in the AWS video studio, prepping for the first episode of Tim’s series on AWS Messaging. Minutes before we opened the Twitch stream I realized that we did not have a clean, unobtrusive way to ask the camera operator to switch to a closeup view. Again, I imagined that a couple of IoT buttons would allow us to make the request.

Remote Dog Treat Dispenser – My dog barks every time a stranger opens the gate in front of our house. While it is great to have confirmation that my Ring doorbell is working, I would like to be able to press a button and dispense a treat so that Luna stops barking!

Homes, offices, factories, schools, vehicles, and health care facilities can all benefit from IoT buttons and other simple IoT devices, all managed using AWS IoT 1-Click.

All About AWS IoT 1-Click
As I said earlier, we have been focusing on simplicity and a clean out-of-box experience. Here’s what that means:

Architects can dream up applications for inexpensive, low-powered devices.

Developers don’t need to write any device-level code. They can make use of pre-built actions, which send email or SMS messages, or write their own custom actions using AWS Lambda functions.

Installers don’t have to install certificates or configure cloud endpoints on newly acquired devices, and don’t have to worry about firmware updates.

Administrators can monitor the overall status and health of each device, and can arrange to receive alerts when a device nears the end of its useful life and needs to be replaced, using a single interface that spans device types and manufacturers.

I’ll show you how easy this is in just a moment. But first, let’s talk about the current set of devices that are supported by AWS IoT 1-Click.

Who’s Got the Button?
We’re launching with support for two types of buttons (both pictured above). Both types of buttons are pre-configured with X.509 certificates, communicate to the cloud over secure connections, and are ready to use.

The AWS IoT Enterprise Button communicates via Wi-Fi. It has a 2000-click lifetime, encrypts outbound data using TLS, and can be configured using BLE and our mobile app. It retails for $19.99 (shipping and handling not included) and can be used in the United States, Europe, and Japan.

The AT&T LTE-M Button communicates via the LTE-M cellular network. It has a 1500-click lifetime, and also encrypts outbound data using TLS. The device and the bundled data plan is available an an introductory price of $29.99 (shipping and handling not included), and can be used in the United States.

We are very interested in working with device manufacturers in order to make even more shapes, sizes, and types of devices (badge readers, asset trackers, motion detectors, and industrial sensors, to name a few) available to our customers. Our team will be happy to tell you about our provisioning tools and our facility for pushing OTA (over the air) updates to large fleets of devices; you can contact them at [email protected].

AWS IoT 1-Click Concepts
I’m eager to show you how to use AWS IoT 1-Click and the buttons, but need to introduce a few concepts first.

Device – A button or other item that can send messages. Each device is uniquely identified by a serial number.

Placement Template – Describes a like-minded collection of devices to be deployed. Specifies the action to be performed and lists the names of custom attributes for each device.

Placement – A device that has been deployed. Referring to placements instead of devices gives you the freedom to replace and upgrade devices with minimal disruption. Each placement can include values for custom attributes such as a location (“Building 8, 3rd Floor, Room 1337”) or a purpose (“Coffee Request Button”).

Action – The AWS Lambda function to invoke when the button is pressed. You can write a function from scratch, or you can make use of a pair of predefined functions that send an email or an SMS message. The actions have access to the attributes; you can, for example, send an SMS message with the text “Urgent need for coffee in Building 8, 3rd Floor, Room 1337.”

Getting Started with AWS IoT 1-Click
Let’s set up an IoT button using the AWS IoT 1-Click Console:

If I didn’t have any buttons I could click Buy devices to get some. But, I do have some, so I click Claim devices to move ahead. I enter the device ID or claim code for my AT&T button and click Claim (I can enter multiple claim codes or device IDs if I want):

The AWS buttons can be claimed using the console or the mobile app; the first step is to use the mobile app to configure the button to use my Wi-Fi:

Then I scan the barcode on the box and click the button to complete the process of claiming the device. Both of my buttons are now visible in the console:

I am now ready to put them to use. I click on Projects, and then Create a project:

I name and describe my project, and click Next to proceed:

Now I define a device template, along with names and default values for the placement attributes. Here’s how I set up a device template (projects can contain several, but I just need one):

The action has two mandatory parameters (phone number and SMS message) built in; I add three more (Building, Room, and Floor) and click Create project:

I’m almost ready to ask for some coffee! The next step is to associate my buttons with this project by creating a placement for each one. I click Create placements to proceed. I name each placement, select the device to associate with it, and then enter values for the attributes that I established for the project. I can also add additional attributes that are peculiar to this placement:

I can inspect my project and see that everything looks good:

I click on the buttons and the SMS messages appear:

I can monitor device activity in the AWS IoT 1-Click Console:

And also in the Lambda Console:

The Lambda function itself is also accessible, and can be used as-is or customized:

As you can see, this is the code that lets me use {{*}}include all of the placement attributes in the message and {{Building}} (for example) to include a specific placement attribute.

Now Available
I’ve barely scratched the surface of this cool new service and I encourage you to give it a try (or a click) yourself. Buy a button or two, build something cool, and let me know all about it!

Pricing is based on the number of enabled devices in your account, measured monthly and pro-rated for partial months. Devices can be enabled or disabled at any time. See the AWS IoT 1-Click Pricing page for more info.

To learn more, visit the AWS IoT 1-Click home page or read the AWS IoT 1-Click documentation.

Jeff;

 

How to retain system tables’ data spanning multiple Amazon Redshift clusters and run cross-cluster diagnostic queries

Post Syndicated from Karthik Sonti original https://aws.amazon.com/blogs/big-data/how-to-retain-system-tables-data-spanning-multiple-amazon-redshift-clusters-and-run-cross-cluster-diagnostic-queries/

Amazon Redshift is a data warehouse service that logs the history of the system in STL log tables. The STL log tables manage disk space by retaining only two to five days of log history, depending on log usage and available disk space.

To retain STL tables’ data for an extended period, you usually have to create a replica table for every system table. Then, for each you load the data from the system table into the replica at regular intervals. By maintaining replica tables for STL tables, you can run diagnostic queries on historical data from the STL tables. You then can derive insights from query execution times, query plans, and disk-spill patterns, and make better cluster-sizing decisions. However, refreshing replica tables with live data from STL tables at regular intervals requires schedulers such as Cron or AWS Data Pipeline. Also, these tables are specific to one cluster and they are not accessible after the cluster is terminated. This is especially true for transient Amazon Redshift clusters that last for only a finite period of ad hoc query execution.

In this blog post, I present a solution that exports system tables from multiple Amazon Redshift clusters into an Amazon S3 bucket. This solution is serverless, and you can schedule it as frequently as every five minutes. The AWS CloudFormation deployment template that I provide automates the solution setup in your environment. The system tables’ data in the Amazon S3 bucket is partitioned by cluster name and query execution date to enable efficient joins in cross-cluster diagnostic queries.

I also provide another CloudFormation template later in this post. This second template helps to automate the creation of tables in the AWS Glue Data Catalog for the system tables’ data stored in Amazon S3. After the system tables are exported to Amazon S3, you can run cross-cluster diagnostic queries on the system tables’ data and derive insights about query executions in each Amazon Redshift cluster. You can do this using Amazon QuickSight, Amazon Athena, Amazon EMR, or Amazon Redshift Spectrum.

You can find all the code examples in this post, including the CloudFormation templates, AWS Glue extract, transform, and load (ETL) scripts, and the resolution steps for common errors you might encounter in this GitHub repository.

Solution overview

The solution in this post uses AWS Glue to export system tables’ log data from Amazon Redshift clusters into Amazon S3. The AWS Glue ETL jobs are invoked at a scheduled interval by AWS Lambda. AWS Systems Manager, which provides secure, hierarchical storage for configuration data management and secrets management, maintains the details of Amazon Redshift clusters for which the solution is enabled. The last-fetched time stamp values for the respective cluster-table combination are maintained in an Amazon DynamoDB table.

The following diagram covers the key steps involved in this solution.

The solution as illustrated in the preceding diagram flows like this:

  1. The Lambda function, invoke_rs_stl_export_etl, is triggered at regular intervals, as controlled by Amazon CloudWatch. It’s triggered to look up the AWS Systems Manager parameter store to get the details of the Amazon Redshift clusters for which the system table export is enabled.
  2. The same Lambda function, based on the Amazon Redshift cluster details obtained in step 1, invokes the AWS Glue ETL job designated for the Amazon Redshift cluster. If an ETL job for the cluster is not found, the Lambda function creates one.
  3. The ETL job invoked for the Amazon Redshift cluster gets the cluster credentials from the parameter store. It gets from the DynamoDB table the last exported time stamp of when each of the system tables was exported from the respective Amazon Redshift cluster.
  4. The ETL job unloads the system tables’ data from the Amazon Redshift cluster into an Amazon S3 bucket.
  5. The ETL job updates the DynamoDB table with the last exported time stamp value for each system table exported from the Amazon Redshift cluster.
  6. The Amazon Redshift cluster system tables’ data is available in Amazon S3 and is partitioned by cluster name and date for running cross-cluster diagnostic queries.

Understanding the configuration data

This solution uses AWS Systems Manager parameter store to store the Amazon Redshift cluster credentials securely. The parameter store also securely stores other configuration information that the AWS Glue ETL job needs for extracting and storing system tables’ data in Amazon S3. Systems Manager comes with a default AWS Key Management Service (AWS KMS) key that it uses to encrypt the password component of the Amazon Redshift cluster credentials.

The following table explains the global parameters and cluster-specific parameters required in this solution. The global parameters are defined once and applicable at the overall solution level. The cluster-specific parameters are specific to an Amazon Redshift cluster and repeat for each cluster for which you enable this post’s solution. The CloudFormation template explained later in this post creates these parameters as part of the deployment process.

Parameter nameTypeDescription
Global parametersdefined once and applied to all jobs
redshift_query_logs.global.s3_prefixStringThe Amazon S3 path where the query logs are exported. Under this path, each exported table is partitioned by cluster name and date.
redshift_query_logs.global.tempdirStringThe Amazon S3 path that AWS Glue ETL jobs use for temporarily staging the data.
redshift_query_logs.global.role>StringThe name of the role that the AWS Glue ETL jobs assume. Just the role name is sufficient. The complete Amazon Resource Name (ARN) is not required.
redshift_query_logs.global.enabled_cluster_listStringListA comma-separated list of cluster names for which system tables’ data export is enabled. This gives flexibility for a user to exclude certain clusters.
Cluster-specific parametersfor each cluster specified in the enabled_cluster_list parameter
redshift_query_logs.<<cluster_name>>.connectionStringThe name of the AWS Glue Data Catalog connection to the Amazon Redshift cluster. For example, if the cluster name is product_warehouse, the entry is redshift_query_logs.product_warehouse.connection.
redshift_query_logs.<<cluster_name>>.userStringThe user name that AWS Glue uses to connect to the Amazon Redshift cluster.
redshift_query_logs.<<cluster_name>>.passwordSecure StringThe password that AWS Glue uses to connect the Amazon Redshift cluster’s encrypted-by key that is managed in AWS KMS.

For example, suppose that you have two Amazon Redshift clusters, product-warehouse and category-management, for which the solution described in this post is enabled. In this case, the parameters shown in the following screenshot are created by the solution deployment CloudFormation template in the AWS Systems Manager parameter store.

Solution deployment

To make it easier for you to get started, I created a CloudFormation template that automatically configures and deploys the solution—only one step is required after deployment.

Prerequisites

To deploy the solution, you must have one or more Amazon Redshift clusters in a private subnet. This subnet must have a network address translation (NAT) gateway or a NAT instance configured, and also a security group with a self-referencing inbound rule for all TCP ports. For more information about why AWS Glue ETL needs the configuration it does, described previously, see Connecting to a JDBC Data Store in a VPC in the AWS Glue documentation.

To start the deployment, launch the CloudFormation template:

CloudFormation stack parameters

The following table lists and describes the parameters for deploying the solution to export query logs from multiple Amazon Redshift clusters.

PropertyDefaultDescription
S3BucketmybucketThe bucket this solution uses to store the exported query logs, stage code artifacts, and perform unloads from Amazon Redshift. For example, the mybucket/extract_rs_logs/data bucket is used for storing all the exported query logs for each system table partitioned by the cluster. The mybucket/extract_rs_logs/temp/ bucket is used for temporarily staging the unloaded data from Amazon Redshift. The mybucket/extract_rs_logs/code bucket is used for storing all the code artifacts required for Lambda and the AWS Glue ETL jobs.
ExportEnabledRedshiftClustersRequires InputA comma-separated list of cluster names from which the system table logs need to be exported.
DataStoreSecurityGroupsRequires InputA list of security groups with an inbound rule to the Amazon Redshift clusters provided in the parameter, ExportEnabledClusters. These security groups should also have a self-referencing inbound rule on all TCP ports, as explained on Connecting to a JDBC Data Store in a VPC.

After you launch the template and create the stack, you see that the following resources have been created:

  1. AWS Glue connections for each Amazon Redshift cluster you provided in the CloudFormation stack parameter, ExportEnabledRedshiftClusters.
  2. All parameters required for this solution created in the parameter store.
  3. The Lambda function that invokes the AWS Glue ETL jobs for each configured Amazon Redshift cluster at a regular interval of five minutes.
  4. The DynamoDB table that captures the last exported time stamps for each exported cluster-table combination.
  5. The AWS Glue ETL jobs to export query logs from each Amazon Redshift cluster provided in the CloudFormation stack parameter, ExportEnabledRedshiftClusters.
  6. The IAM roles and policies required for the Lambda function and AWS Glue ETL jobs.

After the deployment

For each Amazon Redshift cluster for which you enabled the solution through the CloudFormation stack parameter, ExportEnabledRedshiftClusters, the automated deployment includes temporary credentials that you must update after the deployment:

  1. Go to the parameter store.
  2. Note the parameters <<cluster_name>>.user and redshift_query_logs.<<cluster_name>>.password that correspond to each Amazon Redshift cluster for which you enabled this solution. Edit these parameters to replace the placeholder values with the right credentials.

For example, if product-warehouse is one of the clusters for which you enabled system table export, you edit these two parameters with the right user name and password and choose Save parameter.

Querying the exported system tables

Within a few minutes after the solution deployment, you should see Amazon Redshift query logs being exported to the Amazon S3 location, <<S3Bucket_you_provided>>/extract_redshift_query_logs/data/. In that bucket, you should see the eight system tables partitioned by customer name and date: stl_alert_event_log, stl_dlltext, stl_explain, stl_query, stl_querytext, stl_scan, stl_utilitytext, and stl_wlm_query.

To run cross-cluster diagnostic queries on the exported system tables, create external tables in the AWS Glue Data Catalog. To make it easier for you to get started, I provide a CloudFormation template that creates an AWS Glue crawler, which crawls the exported system tables stored in Amazon S3 and builds the external tables in the AWS Glue Data Catalog.

Launch this CloudFormation template to create external tables that correspond to the Amazon Redshift system tables. S3Bucket is the only input parameter required for this stack deployment. Provide the same Amazon S3 bucket name where the system tables’ data is being exported. After you successfully create the stack, you can see the eight tables in the database, redshift_query_logs_db, as shown in the following screenshot.

Now, navigate to the Athena console to run cross-cluster diagnostic queries. The following screenshot shows a diagnostic query executed in Athena that retrieves query alerts logged across multiple Amazon Redshift clusters.

You can build the following example Amazon QuickSight dashboard by running cross-cluster diagnostic queries on Athena to identify the hourly query count and the key query alert events across multiple Amazon Redshift clusters.

How to extend the solution

You can extend this post’s solution in two ways:

  • Add any new Amazon Redshift clusters that you spin up after you deploy the solution.
  • Add other system tables or custom query results to the list of exports from an Amazon Redshift cluster.

Extend the solution to other Amazon Redshift clusters

To extend the solution to more Amazon Redshift clusters, add the three cluster-specific parameters in the AWS Systems Manager parameter store following the guidelines earlier in this post. Modify the redshift_query_logs.global.enabled_cluster_list parameter to append the new cluster to the comma-separated string.

Extend the solution to add other tables or custom queries to an Amazon Redshift cluster

The current solution ships with the export functionality for the following Amazon Redshift system tables:

  • stl_alert_event_log
  • stl_dlltext
  • stl_explain
  • stl_query
  • stl_querytext
  • stl_scan
  • stl_utilitytext
  • stl_wlm_query

You can easily add another system table or custom query by adding a few lines of code to the AWS Glue ETL job, <<cluster-name>_extract_rs_query_logs. For example, suppose that from the product-warehouse Amazon Redshift cluster you want to export orders greater than $2,000. To do so, add the following five lines of code to the AWS Glue ETL job product-warehouse_extract_rs_query_logs, where product-warehouse is your cluster name:

  1. Get the last-processed time-stamp value. The function creates a value if it doesn’t already exist.

salesLastProcessTSValue = functions.getLastProcessedTSValue(trackingEntry=”mydb.sales_2000",job_configs=job_configs)

  1. Run the custom query with the time stamp.

returnDF=functions.runQuery(query="select * from sales s join order o where o.order_amnt > 2000 and sale_timestamp > '{}'".format (salesLastProcessTSValue) ,tableName="mydb.sales_2000",job_configs=job_configs)

  1. Save the results to Amazon S3.

functions.saveToS3(dataframe=returnDF,s3Prefix=s3Prefix,tableName="mydb.sales_2000",partitionColumns=["sale_date"],job_configs=job_configs)

  1. Get the latest time-stamp value from the returned data frame in Step 2.

latestTimestampVal=functions.getMaxValue(returnDF,"sale_timestamp",job_configs)

  1. Update the last-processed time-stamp value in the DynamoDB table.

functions.updateLastProcessedTSValue(“mydb.sales_2000",latestTimestampVal[0],job_configs)

Conclusion

In this post, I demonstrate a serverless solution to retain the system tables’ log data across multiple Amazon Redshift clusters. By using this solution, you can incrementally export the data from system tables into Amazon S3. By performing this export, you can build cross-cluster diagnostic queries, build audit dashboards, and derive insights into capacity planning by using services such as Athena. I also demonstrate how you can extend this solution to other ad hoc query use cases or tables other than system tables by adding a few lines of code.


Additional Reading

If you found this post useful, be sure to check out Using Amazon Redshift Spectrum, Amazon Athena, and AWS Glue with Node.js in Production and Amazon Redshift – 2017 Recap.


About the Author

Karthik Sonti is a senior big data architect at Amazon Web Services. He helps AWS customers build big data and analytical solutions and provides guidance on architecture and best practices.

 

 

 

 

Improve the Operational Efficiency of Amazon Elasticsearch Service Domains with Automated Alarms Using Amazon CloudWatch

Post Syndicated from Veronika Megler original https://aws.amazon.com/blogs/big-data/improve-the-operational-efficiency-of-amazon-elasticsearch-service-domains-with-automated-alarms-using-amazon-cloudwatch/

A customer has been successfully creating and running multiple Amazon Elasticsearch Service (Amazon ES) domains to support their business users’ search needs across products, orders, support documentation, and a growing suite of similar needs. The service has become heavily used across the organization.  This led to some domains running at 100% capacity during peak times, while others began to run low on storage space. Because of this increased usage, the technical teams were in danger of missing their service level agreements.  They contacted me for help.

This post shows how you can set up automated alarms to warn when domains need attention.

Solution overview

Amazon ES is a fully managed service that delivers Elasticsearch’s easy-to-use APIs and real-time analytics capabilities along with the availability, scalability, and security that production workloads require.  The service offers built-in integrations with a number of other components and AWS services, enabling customers to go from raw data to actionable insights quickly and securely.

One of these other integrated services is Amazon CloudWatch. CloudWatch is a monitoring service for AWS Cloud resources and the applications that you run on AWS. You can use CloudWatch to collect and track metrics, collect and monitor log files, set alarms, and automatically react to changes in your AWS resources.

CloudWatch collects metrics for Amazon ES. You can use these metrics to monitor the state of your Amazon ES domains, and set alarms to notify you about high utilization of system resources.  For more information, see Amazon Elasticsearch Service Metrics and Dimensions.

While the metrics are automatically collected, the missing piece is how to set alarms on these metrics at appropriate levels for each of your domains. This post includes sample Python code to evaluate the current state of your Amazon ES environment, and to set up alarms according to AWS recommendations and best practices.

There are two components to the sample solution:

  • es-check-cwalarms.py: This Python script checks the CloudWatch alarms that have been set, for all Amazon ES domains in a given account and region.
  • es-create-cwalarms.py: This Python script sets up a set of CloudWatch alarms for a single given domain.

The sample code can also be found in the amazon-es-check-cw-alarms GitHub repo. The scripts are easy to extend or combine, as described in the section “Extensions and Adaptations”.

Assessing the current state

The first script, es-check-cwalarms.py, is used to give an overview of the configurations and alarm settings for all the Amazon ES domains in the given region. The script takes the following parameters:

python es-checkcwalarms.py -h
usage: es-checkcwalarms.py [-h] [-e ESPREFIX] [-n NOTIFY] [-f FREE][-p PROFILE] [-r REGION]
Checks a set of recommended CloudWatch alarms for Amazon Elasticsearch Service domains (optionally, those beginning with a given prefix).
optional arguments:
  -h, --help   		show this help message and exit
  -e ESPREFIX, --esprefix ESPREFIX	Only check Amazon Elasticsearch Service domains that begin with this prefix.
  -n NOTIFY, --notify NOTIFY    List of CloudWatch alarm actions; e.g. ['arn:aws:sns:xxxx']
  -f FREE, --free FREE  Minimum free storage (MB) on which to alarm
  -p PROFILE, --profile PROFILE     IAM profile name to use
  -r REGION, --region REGION       AWS region for the domain. Default: us-east-1

The script first identifies all the domains in the given region (or, optionally, limits them to the subset that begins with a given prefix). It then starts running a set of checks against each one.

The script can be run from the command line or set up as a scheduled Lambda function. For example, for one customer, it was deemed appropriate to regularly run the script to check that alarms were correctly set for all domains. In addition, because configuration changes—cluster size increases to accommodate larger workloads being a common change—might require updates to alarms, this approach allowed the automatic identification of alarms no longer appropriately set as the domain configurations changed.

The output shown below is the output for one domain in my account.

Starting checks for Elasticsearch domain iotfleet , version is 53
Iotfleet Automated snapshot hour (UTC): 0
Iotfleet Instance configuration: 1 instances; type:m3.medium.elasticsearch
Iotfleet Instance storage definition is: 4 GB; free storage calced to: 819.2 MB
iotfleet Desired free storage set to (in MB): 819.2
iotfleet WARNING: Not using VPC Endpoint
iotfleet WARNING: Does not have Zone Awareness enabled
iotfleet WARNING: Instance count is ODD. Best practice is for an even number of data nodes and zone awareness.
iotfleet WARNING: Does not have Dedicated Masters.
iotfleet WARNING: Neither index nor search slow logs are enabled.
iotfleet WARNING: EBS not in use. Using instance storage only.
iotfleet Alarm ok; definition matches. Test-Elasticsearch-iotfleet-ClusterStatus.yellow-Alarm ClusterStatus.yellow
iotfleet Alarm ok; definition matches. Test-Elasticsearch-iotfleet-ClusterStatus.red-Alarm ClusterStatus.red
iotfleet Alarm ok; definition matches. Test-Elasticsearch-iotfleet-CPUUtilization-Alarm CPUUtilization
iotfleet Alarm ok; definition matches. Test-Elasticsearch-iotfleet-JVMMemoryPressure-Alarm JVMMemoryPressure
iotfleet WARNING: Missing alarm!! ('ClusterIndexWritesBlocked', 'Maximum', 60, 5, 'GreaterThanOrEqualToThreshold', 1.0)
iotfleet Alarm ok; definition matches. Test-Elasticsearch-iotfleet-AutomatedSnapshotFailure-Alarm AutomatedSnapshotFailure
iotfleet Alarm: Threshold does not match: Test-Elasticsearch-iotfleet-FreeStorageSpace-Alarm Should be:  819.2 ; is 3000.0

The output messages fall into the following categories:

  • System overview, Informational: The Amazon ES version and configuration, including instance type and number, storage, automated snapshot hour, etc.
  • Free storage: A calculation for the appropriate amount of free storage, based on the recommended 20% of total storage.
  • Warnings: best practices that are not being followed for this domain. (For more about this, read on.)
  • Alarms: An assessment of the CloudWatch alarms currently set for this domain, against a recommended set.

The script contains an array of recommended CloudWatch alarms, based on best practices for these metrics and statistics. Using the array allows alarm parameters (such as free space) to be updated within the code based on current domain statistics and configurations.

For a given domain, the script checks if each alarm has been set. If the alarm is set, it checks whether the values match those in the array esAlarms. In the output above, you can see three different situations being reported:

  • Alarm ok; definition matches. The alarm set for the domain matches the settings in the array.
  • Alarm: Threshold does not match. An alarm exists, but the threshold value at which the alarm is triggered does not match.
  • WARNING: Missing alarm!! The recommended alarm is missing.

All in all, the list above shows that this domain does not have a configuration that adheres to best practices, nor does it have all the recommended alarms.

Setting up alarms

Now that you know that the domains in their current state are missing critical alarms, you can correct the situation.

To demonstrate the script, set up a new domain named “ver”, in us-west-2. Specify 1 node, and a 10-GB EBS disk. Also, create an SNS topic in us-west-2 with a name of “sendnotification”, which sends you an email.

Run the second script, es-create-cwalarms.py, from the command line. This script creates (or updates) the desired CloudWatch alarms for the specified Amazon ES domain, “ver”.

python es-create-cwalarms.py -r us-west-2 -e test -c ver -n "['arn:aws:sns:us-west-2:xxxxxxxxxx:sendnotification']"
EBS enabled: True type: gp2 size (GB): 10 No Iops 10240  total storage (MB)
Desired free storage set to (in MB): 2048.0
Creating  Test-Elasticsearch-ver-ClusterStatus.yellow-Alarm
Creating  Test-Elasticsearch-ver-ClusterStatus.red-Alarm
Creating  Test-Elasticsearch-ver-CPUUtilization-Alarm
Creating  Test-Elasticsearch-ver-JVMMemoryPressure-Alarm
Creating  Test-Elasticsearch-ver-FreeStorageSpace-Alarm
Creating  Test-Elasticsearch-ver-ClusterIndexWritesBlocked-Alarm
Creating  Test-Elasticsearch-ver-AutomatedSnapshotFailure-Alarm
Successfully finished creating alarms!

As with the first script, this script contains an array of recommended CloudWatch alarms, based on best practices for these metrics and statistics. This approach allows you to add or modify alarms based on your use case (more on that below).

After running the script, navigate to Alarms on the CloudWatch console. You can see the set of alarms set up on your domain.

Because the “ver” domain has only a single node, cluster status is yellow, and that alarm is in an “ALARM” state. It’s already sent a notification that the alarm has been triggered.

What to do when an alarm triggers

After alarms are set up, you need to identify the correct action to take for each alarm, which depends on the alarm triggered. For ideas, guidance, and additional pointers to supporting documentation, see Get Started with Amazon Elasticsearch Service: Set CloudWatch Alarms on Key Metrics. For information about common errors and recovery actions to take, see Handling AWS Service Errors.

In most cases, the alarm triggers due to an increased workload. The likely action is to reconfigure the system to handle the increased workload, rather than reducing the incoming workload. Reconfiguring any backend store—a category of systems that includes Elasticsearch—is best performed when the system is quiescent or lightly loaded. Reconfigurations such as setting zone awareness or modifying the disk type cause Amazon ES to enter a “processing” state, potentially disrupting client access.

Other changes, such as increasing the number of data nodes, may cause Elasticsearch to begin moving shards, potentially impacting search performance on these shards while this is happening. These actions should be considered in the context of your production usage. For the same reason I also do not recommend running a script that resets all domains to match best practices.

Avoid the need to reconfigure during heavy workload by setting alarms at a level that allows a considered approach to making the needed changes. For example, if you identify that each weekly peak is increasing, you can reconfigure during a weekly quiet period.

While Elasticsearch can be reconfigured without being quiesced, it is not a best practice to automatically scale it up and down based on usage patterns. Unlike some other AWS services, I recommend against setting a CloudWatch action that automatically reconfigures the system when alarms are triggered.

There are other situations where the planned reconfiguration approach may not work, such as low or zero free disk space causing the domain to reject writes. If the business is dependent on the domain continuing to accept incoming writes and deleting data is not an option, the team may choose to reconfigure immediately.

Extensions and adaptations

You may wish to modify the best practices encoded in the scripts for your own environment or workloads. It’s always better to avoid situations where alerts are generated but routinely ignored. All alerts should trigger a review and one or more actions, either immediately or at a planned date. The following is a list of common situations where you may wish to set different alarms for different domains:

  • Dev/test vs. production
    You may have a different set of configuration rules and alarms for your dev environment configurations than for test. For example, you may require zone awareness and dedicated masters for your production environment, but not for your development domains. Or, you may not have any alarms set in dev. For test environments that mirror your potential peak load, test to ensure that the alarms are appropriately triggered.
  • Differing workloads or SLAs for different domains
    You may have one domain with a requirement for superfast search performance, and another domain with a heavy ingest load that tolerates slower search response. Your reaction to slow response for these two workloads is likely to be different, so perhaps the thresholds for these two domains should be set at a different level. In this case, you might add a “max CPU utilization” alarm at 100% for 1 minute for the fast search domain, while the other domain only triggers an alarm when the average has been higher than 60% for 5 minutes. You might also add a “free space” rule with a higher threshold to reflect the need for more space for the heavy ingest load if there is danger that it could fill the available disk quickly.
  • “Normal” alarms versus “emergency” alarms
    If, for example, free disk space drops to 25% of total capacity, an alarm is triggered that indicates action should be taken as soon as possible, such as cleaning up old indexes or reconfiguring at the next quiet period for this domain. However, if free space drops below a critical level (20% free space), action must be taken immediately in order to prevent Amazon ES from setting the domain to read-only. Similarly, if the “ClusterIndexWritesBlocked” alarm triggers, the domain has already stopped accepting writes, so immediate action is needed. In this case, you may wish to set “laddered” alarms, where one threshold causes an alarm to be triggered to review the current workload for a planned reconfiguration, but a different threshold raises a “DefCon 3” alarm that immediate action is required.

The sample scripts provided here are a starting point, intended for you to adapt to your own environment and needs.

Running the scripts one time can identify how far your current state is from your desired state, and create an initial set of alarms. Regularly re-running these scripts can capture changes in your environment over time and adjusting your alarms for changes in your environment and configurations. One customer has set them up to run nightly, and to automatically create and update alarms to match their preferred settings.

Removing unwanted alarms

Each CloudWatch alarm costs approximately $0.10 per month. You can remove unwanted alarms in the CloudWatch console, under Alarms. If you set up a “ver” domain above, remember to remove it to avoid continuing charges.

Conclusion

Setting CloudWatch alarms appropriately for your Amazon ES domains can help you avoid suboptimal performance and allow you to respond to workload growth or configuration issues well before they become urgent. This post gives you a starting point for doing so. The additional sleep you’ll get knowing you don’t need to be concerned about Elasticsearch domain performance will allow you to focus on building creative solutions for your business and solving problems for your customers.

Enjoy!


Additional Reading

If you found this post useful, be sure to check out Analyzing Amazon Elasticsearch Service Slow Logs Using Amazon CloudWatch Logs Streaming and Kibana and Get Started with Amazon Elasticsearch Service: How Many Shards Do I Need?

 


About the Author

Dr. Veronika Megler is a senior consultant at Amazon Web Services. She works with our customers to implement innovative big data, AI and ML projects, helping them accelerate their time-to-value when using AWS.

 

 

 

[$] Monitoring with Prometheus 2.0

Post Syndicated from corbet original https://lwn.net/Articles/744410/rss

Prometheus is a monitoring tool
built from scratch by SoundCloud in 2012. It works by pulling metrics from
monitored services and storing them in a time series database (TSDB). It
has a powerful query language to inspect that database, create alerts, and
plot basic graphs. Those graphs can then be used to detect anomalies or
trends for (possibly automated) resource provisioning. Prometheus also has
extensive service discovery features and supports high availability
configurations.

That’s what the brochure says, anyway; let’s see how it works in the hands
of an old grumpy system administrator. I’ll be drawing comparisons
with Munin and Nagios frequently because those are the tools I have
used for over a decade in monitoring Unix clusters.

12 B2 Power Tips for Experts and Developers

Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/advanced-cloud-storage-tips/

B2 Tips for Pros
If you’ve been using B2 Cloud Storage for a while, you probably think you know all that you can do with it. But do you?

We’ve put together a list of blazing power tips for experts and developers that will take you to the next level. Take a look below.

If you’re new to B2, we have a list of power tips for you, too.
Visit 12 Power Tips for New B2 Users.
Backblaze logo

1    Manage File Versions

Use Lifecycle Rules on a Bucket to set how many days to keep files that are no longer the current version. This is a great way to manage the amount of space your B2 account is using.

Backblaze logo

2    Easily Stay on Top of Your B2 Account Limits

Set usage caps and get text/email alerts for your B2 account when you approach limits that you define.

Backblaze logo

3    Bring on Your Big Files

You can upload files as large as 10TB to B2.

Backblaze logo

4    You Can Use FedEx to Get Your Data into B2

If you have over 20TB of data, you can use Backblaze’s Fireball hard disk array to load large volumes of data directly into your B2 account. We ship a Fireball to you and you ship it back.

Backblaze logo

5    You Have Command-Line Control of All B2 Functions

You have complete control over B2 using our command line tool that is available for Macintosh, Windows, and Linux.

Backblaze logo

6    You Can Use Your Own Domain Name To Front a Public B2 Bucket

You can create a vanity URL for your B2 account.

Backblaze logo

7    See What’s Happening in Your Account with Graphical Reports

You can view graphical reports summarizing your B2 usage — transactions, downloads, averages, data stored — in your B2 account dashboard.

Backblaze logo

8    Create a B2 SDK

You can build your own B2 SDK for JVM-based or JVM-compatible languages using our B2 Java SDK on Github.

Backblaze logo

9    B2’s API is Easy to Use

B2’s API is similar to, but simpler than Amazon’s S3 API, making it super easy for developers to integrate with B2 Cloud Storage.

Backblaze logo

10    View Code Examples To Get Your B2 Project Started

The B2 API is well documented and has code examples for cURL, Java, Python, Swift, Ruby, C#, and PHP. For example, here’s how to create a B2 Bucket.

Backblaze logo

11    Developers can set the B2 part size as low as 5 MB

When working with large files, the minimum file part size can be set as low as 5MB or as high as 5GB. This gives developers the ability to maximize the throughput of B2 data uploads and downloads. See Large Files and Downloading for more developer tips.

Backblaze logo

12    Your App or Device Can Work with B2, as well

Your B2 integration can be listed on Backblaze’s website. Visit Submit an Integration to get started.

Want to Learn More About B2?

You can find more information on B2 on our website and in our help pages.

The post 12 B2 Power Tips for Experts and Developers appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Using Amazon CloudWatch and Amazon SNS to Notify when AWS X-Ray Detects Elevated Levels of Latency, Errors, and Faults in Your Application

Post Syndicated from Bharath Kumar original https://aws.amazon.com/blogs/devops/using-amazon-cloudwatch-and-amazon-sns-to-notify-when-aws-x-ray-detects-elevated-levels-of-latency-errors-and-faults-in-your-application/

AWS X-Ray helps developers analyze and debug production applications built using microservices or serverless architectures and quantify customer impact. With X-Ray, you can understand how your application and its underlying services are performing and identify and troubleshoot the root cause of performance issues and errors. You can use these insights to identify issues and opportunities for optimization.

In this blog post, I will show you how you can use Amazon CloudWatch and Amazon SNS to get notified when X-Ray detects high latency, errors, and faults in your application. Specifically, I will show you how to use this sample app to get notified through an email or SMS message when your end users observe high latencies or server-side errors when they use your application. You can customize the alarms and events by updating the sample app code.

Sample App Overview

The sample app uses the X-Ray GetServiceGraph API to get the following information:

  • Aggregated response time.
  • Requests that failed with 4xx status code (errors).
  • 429 status code (throttle).
  • 5xx status code (faults).
Sample app architecture

Overview of sample app architecture

Getting started

The sample app uses AWS CloudFormation to deploy the required resources.
To install the sample app:

  1. Run git clone to get the sample app.
  2. Update the JSON file in the Setup folder with threshold limits and notification details.
  3. Run the install.py script to install the sample app.

For more information about the installation steps, see the readme file on GitHub.

You can update the app configuration to include your phone number or email to get notified when your application in X-Ray breaches the latency, error, and fault limits you set in the configuration. If you prefer to not provide your phone number and email, then you can use the CloudWatch alarm deployed by the sample app to monitor your application in X-Ray.

The sample app deploys resources with the sample app namespace you provided during setup. This enables you to have multiple sample apps in the same region.

CloudWatch rules

The sample app uses two CloudWatch rules:

  1. SCHEDULEDLAMBDAFOR-sample_app_name to trigger at regular intervals the AWS Lambda function that queries the GetServiceGraph API.
  2. XRAYALERTSFOR-sample_app_name to look for published CloudWatch events that match the pattern defined in this rule.
CloudWatch Rules for sample app

CloudWatch rules created for the sample app

CloudWatch alarms

If you did not provide your phone number or email in the JSON file, the sample app uses a CloudWatch alarm named XRayCloudWatchAlarm-sample_app_name in combination with the CloudWatch event that you can use for monitoring.

CloudWatch Alarm for sample app

CloudWatch alarm created for the sample app

Amazon SNS messages

The sample app creates two SNS topics:

  • sample_app_name-cloudwatcheventsnstopic to send out an SMS message when the CloudWatch event matches a pattern published from the Lambda function.
  • sample_app_name-cloudwatchalarmsnstopic to send out an email message when the CloudWatch alarm goes into an ALARM state.
Amazon SNS for sample app

Amazon SNS created for the sample app

Getting notifications

The CloudWatch event looks for the following matching pattern:

{
  "detail-type": [
    "XCW Notification for Alerts"
  ],
  "source": [
    "<sample_app_name>-xcw.alerts"
  ]
}

The event then invokes an SNS topic that sends out an SMS message.

SMS in sample app

SMS that is sent when CloudWatch Event invokes Amazon SNS topic

The CloudWatch alarm looks for the TriggeredRules metric that is published whenever the CloudWatch event matches the event pattern. It goes into the ALARM state whenever TriggeredRules > 0 for the specified evaluation period and invokes an SNS topic that sends an email message.

Email sent in sample app

Email that is sent when CloudWatch Alarm goes to ALARM state

Stopping notifications

If you provided your phone number or email address, but would like to stop getting notified, change the SUBSCRIBE_TO_EMAIL_SMS environment variable in the Lambda function to No. Then, go to the Amazon SNS console and delete the subscriptions. You can still monitor your application for elevated levels of latency, errors, and faults by using the CloudWatch console.

Lambda environment variable in sample app

Change environment variable in Lambda

 

Delete subscription in SNS for sample app

Delete subscriptions to stop getting notified

Uninstalling the sample app

To uninstall the sample app, run the uninstall.py script in the Setup folder.

Extending the sample app

The sample app notifes you when when X-Ray detects high latency, errors, and faults in your application. You can extend it to provide more value for your use cases (for example, to perform an action on a resource when the state of a CloudWatch alarm changes).

To summarize, after this set up you will be able to get notified through Amazon SNS when X-Ray detects high latency, errors and faults in your application.

I hope you found this information about setting up alarms and alerts for your application in AWS X-Ray helpful. Feel free to leave questions or other feedback in the comments. Feel free to learn more about AWS X-Ray, Amazon SNS and Amazon CloudWatch

About the Author

Bharath Kumar is a Sr.Product Manager with AWS X-Ray. He has developed and launched mobile games, web applications on microservices and serverless architecture.

AWS Systems Manager – A Unified Interface for Managing Your Cloud and Hybrid Resources

Post Syndicated from Randall Hunt original https://aws.amazon.com/blogs/aws/aws-systems-manager/

AWS Systems Manager is a new way to manage your cloud and hybrid IT environments. AWS Systems Manager provides a unified user interface that simplifies resource and application management, shortens the time to detect and resolve operational problems, and makes it easy to operate and manage your infrastructure securely at scale. This service is absolutely packed full of features. It defines a new experience around grouping, visualizing, and reacting to problems using features from products like Amazon EC2 Systems Manager (SSM) to enable rich operations across your resources.

As I said above, there are a lot of powerful features in this service and we won’t be able to dive deep on all of them but it’s easy to go to the console and get started with any of the tools.

Resource Groupings

Resource Groups allow you to create logical groupings of most resources that support tagging like: Amazon Elastic Compute Cloud (EC2) instances, Amazon Simple Storage Service (S3) buckets, Elastic Load Balancing balancers, Amazon Relational Database Service (RDS) instances, Amazon Virtual Private Cloud, Amazon Kinesis streams, Amazon Route 53 zones, and more. Previously, you could use the AWS Console to define resource groupings but AWS Systems Manager provides this new resource group experience via a new console and API. These groupings are a fundamental building block of Systems Manager in that they are frequently the target of various operations you may want to perform like: compliance management, software inventories, patching, and other automations.

You start by defining a group based on tag filters. From there you can view all of the resources in a centralized console. You would typically use these groupings to differentiate between applications, application layers, and environments like production or dev – but you can make your own rules about how to use them as well. If you imagine a typical 3 tier web-app you might have a few EC2 instances, an ELB, a few S3 buckets, and an RDS instance. You can define a grouping for that application and with all of those different resources simultaneously.

Insights

AWS Systems Manager automatically aggregates and displays operational data for each resource group through a dashboard. You no longer need to navigate through multiple AWS consoles to view all of your operational data. You can easily integrate your exiting Amazon CloudWatch dashboards, AWS Config rules, AWS CloudTrail trails, AWS Trusted Advisor notifications, and AWS Personal Health Dashboard performance and availability alerts. You can also easily view your software inventories across your fleet. AWS Systems Manager also provides a compliance dashboard allowing you to see the state of various security controls and patching operations across your fleets.

Acting on Insights

Building on the success of EC2 Systems Manager (SSM), AWS Systems Manager takes all of the features of SSM and provides a central place to access them. These are all the same experiences you would have through SSM with a more accesible console and centralized interface. You can use the resource groups you’ve defined in Systems Manager to visualize and act on groups of resources.

Automation


Automations allow you to define common IT tasks as a JSON document that specify a list of tasks. You can also use community published documents. These documents can be executed through the Console, CLIs, SDKs, scheduled maintenance windows, or triggered based on changes in your infrastructure through CloudWatch events. You can track and log the execution of each step in the documents and prompt for additional approvals. It also allows you to incrementally roll out changes and automatically halt when errors occur. You can start executing an automation directly on a resource group and it will be able to apply itself to the resources that it understands within the group.

Run Command

Run Command is a superior alternative to enabling SSH on your instances. It provides safe, secure remote management of your instances at scale without logging into your servers, replacing the need for SSH bastions or remote powershell. It has granular IAM permissions that allow you to restrict which roles or users can run certain commands.

Patch Manager, Maintenance Windows, and State Manager

I’ve written about Patch Manager before and if you manage fleets of Windows and Linux instances it’s a great way to maintain a common baseline of security across your fleet.

Maintenance windows allow you to schedule instance maintenance and other disruptive tasks for a specific time window.

State Manager allows you to control various server configuration details like anti-virus definitions, firewall settings, and more. You can define policies in the console or run existing scripts, PowerShell modules, or even Ansible playbooks directly from S3 or GitHub. You can query State Manager at any time to view the status of your instance configurations.

Things To Know

There’s some interesting terminology here. We haven’t done the best job of naming things in the past so let’s take a moment to clarify. EC2 Systems Manager (sometimes called SSM) is what you used before today. You can still invoke aws ssm commands. However, AWS Systems Manager builds on and enhances many of the tools provided by EC2 Systems Manager and allows those same tools to be applied to more than just EC2. When you see the phrase “Systems Manager” in the future you should think of AWS Systems Manager and not EC2 Systems Manager.

AWS Systems Manager with all of this useful functionality is provided at no additional charge. It is immediately available in all public AWS regions.

The best part about these services is that even with their tight integrations each one is designed to be used in isolation as well. If you only need one component of these services it’s simple to get started with only that component.

There’s a lot more than I could ever document in this post so I encourage you all to jump into the console and documentation to figure out where you can start using AWS Systems Manager.

Randall

In the Works – AWS IoT Device Defender – Secure Your IoT Fleet

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/in-the-works-aws-sepio-secure-your-iot-fleet/

Scale takes on a whole new meaning when it comes to IoT. Last year I was lucky enough to tour a gigantic factory that had, on average, one environment sensor per square meter. The sensors measured temperature, humidity, and air purity several times per second, and served as an early warning system for contaminants. I’ve heard customers express interest in deploying IoT-enabled consumer devices in the millions or tens of millions.

With powerful, long-lived devices deployed in a geographically distributed fashion, managing security challenges is crucial. However, the limited amount of local compute power and memory can sometimes limit the ability to use encryption and other forms of data protection.

To address these challenges and to allow our customers to confidently deploy IoT devices at scale, we are working on IoT Device Defender. While the details might change before release, AWS IoT Device Defender is designed to offer these benefits:

Continuous AuditingAWS IoT Device Defender monitors the policies related to your devices to ensure that the desired security settings are in place. It looks for drifts away from best practices and supports custom audit rules so that you can check for conditions that are specific to your deployment. For example, you could check to see if a compromised device has subscribed to sensor data from another device. You can run audits on a schedule or on an as-needed basis.

Real-Time Detection and AlertingAWS IoT Device Defender looks for and quickly alerts you to unusual behavior that could be coming from a compromised device. It does this by monitoring the behavior of similar devices over time, looking for unauthorized access attempts, changes in connection patterns, and changes in traffic patterns (either inbound or outbound).

Fast Investigation and Mitigation – In the event that you get an alert that something unusual is happening, AWS IoT Device Defender gives you the tools, including contextual information, to help you to investigate and mitigate the problem. Device information, device statistics, diagnostic logs, and previous alerts are all at your fingertips. You have the option to reboot the device, revoke its permissions, reset it to factory defaults, or push a security fix.

Stay Tuned
I’ll have more info (and a hands-on post) as soon as possible, so stay tuned!

Jeff;

How to Patch, Inspect, and Protect Microsoft Windows Workloads on AWS—Part 2

Post Syndicated from Koen van Blijderveen original https://aws.amazon.com/blogs/security/how-to-patch-inspect-and-protect-microsoft-windows-workloads-on-aws-part-2/

Yesterday in Part 1 of this blog post, I showed you how to:

  1. Launch an Amazon EC2 instance with an AWS Identity and Access Management (IAM) role, an Amazon Elastic Block Store (Amazon EBS) volume, and tags that Amazon EC2 Systems Manager (Systems Manager) and Amazon Inspector use.
  2. Configure Systems Manager to install the Amazon Inspector agent and patch your EC2 instances.

Today in Steps 3 and 4, I show you how to:

  1. Take Amazon EBS snapshots using Amazon EBS Snapshot Scheduler to automate snapshots based on instance tags.
  2. Use Amazon Inspector to check if your EC2 instances running Microsoft Windows contain any common vulnerabilities and exposures (CVEs).

To catch up on Steps 1 and 2, see yesterday’s blog post.

Step 3: Take EBS snapshots using EBS Snapshot Scheduler

In this section, I show you how to use EBS Snapshot Scheduler to take snapshots of your instances at specific intervals. To do this, I will show you how to:

  • Determine the schedule for EBS Snapshot Scheduler by providing you with best practices.
  • Deploy EBS Snapshot Scheduler by using AWS CloudFormation.
  • Tag your EC2 instances so that EBS Snapshot Scheduler backs up your instances when you want them backed up.

In addition to making sure your EC2 instances have all the available operating system patches applied on a regular schedule, you should take snapshots of the EBS storage volumes attached to your EC2 instances. Taking regular snapshots allows you to restore your data to a previous state quickly and cost effectively. With Amazon EBS snapshots, you pay only for the actual data you store, and snapshots save only the data that has changed since the previous snapshot, which minimizes your cost. You will use EBS Snapshot Scheduler to make regular snapshots of your EC2 instance. EBS Snapshot Scheduler takes advantage of other AWS services including CloudFormation, Amazon DynamoDB, and AWS Lambda to make backing up your EBS volumes simple.

Determine the schedule

As a best practice, you should back up your data frequently during the hours when your data changes the most. This reduces the amount of data you lose if you have to restore from a snapshot. For the purposes of this blog post, the data for my instances changes the most between the business hours of 9:00 A.M. to 5:00 P.M. Pacific Time. During these hours, I will make snapshots hourly to minimize data loss.

In addition to backing up frequently, another best practice is to establish a strategy for retention. This will vary based on how you need to use the snapshots. If you have compliance requirements to be able to restore for auditing, your needs may be different than if you are able to detect data corruption within three hours and simply need to restore to something that limits data loss to five hours. EBS Snapshot Scheduler enables you to specify the retention period for your snapshots. For this post, I only need to keep snapshots for recent business days. To account for weekends, I will set my retention period to three days, which is down from the default of 15 days when deploying EBS Snapshot Scheduler.

Deploy EBS Snapshot Scheduler

In Step 1 of Part 1 of this post, I showed how to configure an EC2 for Windows Server 2012 R2 instance with an EBS volume. You will use EBS Snapshot Scheduler to take eight snapshots each weekday of your EC2 instance’s EBS volumes:

  1. Navigate to the EBS Snapshot Scheduler deployment page and choose Launch Solution. This takes you to the CloudFormation console in your account. The Specify an Amazon S3 template URL option is already selected and prefilled. Choose Next on the Select Template page.
  2. On the Specify Details page, retain all default parameters except for AutoSnapshotDeletion. Set AutoSnapshotDeletion to Yes to ensure that old snapshots are periodically deleted. The default retention period is 15 days (you will specify a shorter value on your instance in the next subsection).
  3. Choose Next twice to move to the Review step, and start deployment by choosing the I acknowledge that AWS CloudFormation might create IAM resources check box and then choosing Create.

Tag your EC2 instances

EBS Snapshot Scheduler takes a few minutes to deploy. While waiting for its deployment, you can start to tag your instance to define its schedule. EBS Snapshot Scheduler reads tag values and looks for four possible custom parameters in the following order:

  • <snapshot time> – Time in 24-hour format with no colon.
  • <retention days> – The number of days (a positive integer) to retain the snapshot before deletion, if set to automatically delete snapshots.
  • <time zone> – The time zone of the times specified in <snapshot time>.
  • <active day(s)>all, weekdays, or mon, tue, wed, thu, fri, sat, and/or sun.

Because you want hourly backups on weekdays between 9:00 A.M. and 5:00 P.M. Pacific Time, you need to configure eight tags—one for each hour of the day. You will add the eight tags shown in the following table to your EC2 instance.

TagValue
scheduler:ebs-snapshot:09000900;3;utc;weekdays
scheduler:ebs-snapshot:10001000;3;utc;weekdays
scheduler:ebs-snapshot:11001100;3;utc;weekdays
scheduler:ebs-snapshot:12001200;3;utc;weekdays
scheduler:ebs-snapshot:13001300;3;utc;weekdays
scheduler:ebs-snapshot:14001400;3;utc;weekdays
scheduler:ebs-snapshot:15001500;3;utc;weekdays
scheduler:ebs-snapshot:16001600;3;utc;weekdays

Next, you will add these tags to your instance. If you want to tag multiple instances at once, you can use Tag Editor instead. To add the tags in the preceding table to your EC2 instance:

  1. Navigate to your EC2 instance in the EC2 console and choose Tags in the navigation pane.
  2. Choose Add/Edit Tags and then choose Create Tag to add all the tags specified in the preceding table.
  3. Confirm you have added the tags by choosing Save. After adding these tags, navigate to your EC2 instance in the EC2 console. Your EC2 instance should look similar to the following screenshot.
    Screenshot of how your EC2 instance should look in the console
  4. After waiting a couple of hours, you can see snapshots beginning to populate on the Snapshots page of the EC2 console.Screenshot of snapshots beginning to populate on the Snapshots page of the EC2 console
  5. To check if EBS Snapshot Scheduler is active, you can check the CloudWatch rule that runs the Lambda function. If the clock icon shown in the following screenshot is green, the scheduler is active. If the clock icon is gray, the rule is disabled and does not run. You can enable or disable the rule by selecting it, choosing Actions, and choosing Enable or Disable. This also allows you to temporarily disable EBS Snapshot Scheduler.Screenshot of checking to see if EBS Snapshot Scheduler is active
  1. You can also monitor when EBS Snapshot Scheduler has run by choosing the name of the CloudWatch rule as shown in the previous screenshot and choosing Show metrics for the rule.Screenshot of monitoring when EBS Snapshot Scheduler has run by choosing the name of the CloudWatch rule

If you want to restore and attach an EBS volume, see Restoring an Amazon EBS Volume from a Snapshot and Attaching an Amazon EBS Volume to an Instance.

Step 4: Use Amazon Inspector

In this section, I show you how to you use Amazon Inspector to scan your EC2 instance for common vulnerabilities and exposures (CVEs) and set up Amazon SNS notifications. To do this I will show you how to:

  • Install the Amazon Inspector agent by using EC2 Run Command.
  • Set up notifications using Amazon SNS to notify you of any findings.
  • Define an Amazon Inspector target and template to define what assessment to perform on your EC2 instance.
  • Schedule Amazon Inspector assessment runs to assess your EC2 instance on a regular interval.

Amazon Inspector can help you scan your EC2 instance using prebuilt rules packages, which are built and maintained by AWS. These prebuilt rules packages tell Amazon Inspector what to scan for on the EC2 instances you select. Amazon Inspector provides the following prebuilt packages for Microsoft Windows Server 2012 R2:

  • Common Vulnerabilities and Exposures
  • Center for Internet Security Benchmarks
  • Runtime Behavior Analysis

In this post, I’m focused on how to make sure you keep your EC2 instances patched, backed up, and inspected for common vulnerabilities and exposures (CVEs). As a result, I will focus on how to use the CVE rules package and use your instance tags to identify the instances on which to run the CVE rules. If your EC2 instance is fully patched using Systems Manager, as described earlier, you should not have any findings with the CVE rules package. Regardless, as a best practice I recommend that you use Amazon Inspector as an additional layer for identifying any unexpected failures. This involves using Amazon CloudWatch to set up weekly Amazon Inspector scans, and configuring Amazon Inspector to notify you of any findings through SNS topics. By acting on the notifications you receive, you can respond quickly to any CVEs on any of your EC2 instances to help ensure that malware using known CVEs does not affect your EC2 instances. In a previous blog post, Eric Fitzgerald showed how to remediate Amazon Inspector security findings automatically.

Install the Amazon Inspector agent

To install the Amazon Inspector agent, you will use EC2 Run Command, which allows you to run any command on any of your EC2 instances that have the Systems Manager agent with an attached IAM role that allows access to Systems Manager.

  1. Choose Run Command under Systems Manager Services in the navigation pane of the EC2 console. Then choose Run a command.
    Screenshot of choosing "Run a command"
  2. To install the Amazon Inspector agent, you will use an AWS managed and provided command document that downloads and installs the agent for you on the selected EC2 instance. Choose AmazonInspector-ManageAWSAgent. To choose the target EC2 instance where this command will be run, use the tag you previously assigned to your EC2 instance, Patch Group, with a value of Windows Servers. For this example, set the concurrent installations to 1 and tell Systems Manager to stop after 5 errors.
    Screenshot of installing the Amazon Inspector agent
  3. Retain the default values for all other settings on the Run a command page and choose Run. Back on the Run Command page, you can see if the command that installed the Amazon Inspector agent executed successfully on all selected EC2 instances.
    Screenshot showing that the command that installed the Amazon Inspector agent executed successfully on all selected EC2 instances

Set up notifications using Amazon SNS

Now that you have installed the Amazon Inspector agent, you will set up an SNS topic that will notify you of any findings after an Amazon Inspector run.

To set up an SNS topic:

  1. In the AWS Management Console, choose Simple Notification Service under Messaging in the Services menu.
  2. Choose Create topic, name your topic (only alphanumeric characters, hyphens, and underscores are allowed) and give it a display name to ensure you know what this topic does (I’ve named mine Inspector). Choose Create topic.
    "Create new topic" page
  3. To allow Amazon Inspector to publish messages to your new topic, choose Other topic actions and choose Edit topic policy.
  4. For Allow these users to publish messages to this topic and Allow these users to subscribe to this topic, choose Only these AWS users. Type the following ARN for the US East (N. Virginia) Region in which you are deploying the solution in this post: arn:aws:iam::316112463485:root. This is the ARN of Amazon Inspector itself. For the ARNs of Amazon Inspector in other AWS Regions, see Setting Up an SNS Topic for Amazon Inspector Notifications (Console). Amazon Resource Names (ARNs) uniquely identify AWS resources across all of AWS.
    Screenshot of editing the topic policy
  5. To receive notifications from Amazon Inspector, subscribe to your new topic by choosing Create subscription and adding your email address. After confirming your subscription by clicking the link in the email, the topic should display your email address as a subscriber. Later, you will configure the Amazon Inspector template to publish to this topic.
    Screenshot of subscribing to the new topic

Define an Amazon Inspector target and template

Now that you have set up the notification topic by which Amazon Inspector can notify you of findings, you can create an Amazon Inspector target and template. A target defines which EC2 instances are in scope for Amazon Inspector. A template defines which packages to run, for how long, and on which target.

To create an Amazon Inspector target:

  1. Navigate to the Amazon Inspector console and choose Get started. At the time of writing this blog post, Amazon Inspector is available in the US East (N. Virginia), US West (N. California), US West (Oregon), EU (Ireland), Asia Pacific (Mumbai), Asia Pacific (Seoul), Asia Pacific (Sydney), and Asia Pacific (Tokyo) Regions.
  2. For Amazon Inspector to be able to collect the necessary data from your EC2 instance, you must create an IAM service role for Amazon Inspector. Amazon Inspector can create this role for you if you choose Choose or create role and confirm the role creation by choosing Allow.
    Screenshot of creating an IAM service role for Amazon Inspector
  3. Amazon Inspector also asks you to tag your EC2 instance and install the Amazon Inspector agent. You already performed these steps in Part 1 of this post, so you can proceed by choosing Next. To define the Amazon Inspector target, choose the previously used Patch Group tag with a Value of Windows Servers. This is the same tag that you used to define the targets for patching. Then choose Next.
    Screenshot of defining the Amazon Inspector target
  4. Now, define your Amazon Inspector template, and choose a name and the package you want to run. For this post, use the Common Vulnerabilities and Exposures package and choose the default duration of 1 hour. As you can see, the package has a version number, so always select the latest version of the rules package if multiple versions are available.
    Screenshot of defining an assessment template
  5. Configure Amazon Inspector to publish to your SNS topic when findings are reported. You can also choose to receive a notification of a started run, a finished run, or changes in the state of a run. For this blog post, you want to receive notifications if there are any findings. To start, choose Assessment Templates from the Amazon Inspector console and choose your newly created Amazon Inspector assessment template. Choose the icon below SNS topics (see the following screenshot).
    Screenshot of choosing an assessment template
  6. A pop-up appears in which you can choose the previously created topic and the events about which you want SNS to notify you (choose Finding reported).
    Screenshot of choosing the previously created topic and the events about which you want SNS to notify you

Schedule Amazon Inspector assessment runs

The last step in using Amazon Inspector to assess for CVEs is to schedule the Amazon Inspector template to run using Amazon CloudWatch Events. This will make sure that Amazon Inspector assesses your EC2 instance on a regular basis. To do this, you need the Amazon Inspector template ARN, which you can find under Assessment templates in the Amazon Inspector console. CloudWatch Events can run your Amazon Inspector assessment at an interval you define using a Cron-based schedule. Cron is a well-known scheduling agent that is widely used on UNIX-like operating systems and uses the following syntax for CloudWatch Events.

Image of Cron schedule

All scheduled events use a UTC time zone, and the minimum precision for schedules is one minute. For more information about scheduling CloudWatch Events, see Schedule Expressions for Rules.

To create the CloudWatch Events rule:

  1. Navigate to the CloudWatch console, choose Events, and choose Create rule.
    Screenshot of starting to create a rule in the CloudWatch Events console
  2. On the next page, specify if you want to invoke your rule based on an event pattern or a schedule. For this blog post, you will select a schedule based on a Cron expression.
  3. You can schedule the Amazon Inspector assessment any time you want using the Cron expression, or you can use the Cron expression I used in the following screenshot, which will run the Amazon Inspector assessment every Sunday at 10:00 P.M. GMT.
    Screenshot of scheduling an Amazon Inspector assessment with a Cron expression
  4. Choose Add target and choose Inspector assessment template from the drop-down menu. Paste the ARN of the Amazon Inspector template you previously created in the Amazon Inspector console in the Assessment template box and choose Create a new role for this specific resource. This new role is necessary so that CloudWatch Events has the necessary permissions to start the Amazon Inspector assessment. CloudWatch Events will automatically create the new role and grant the minimum set of permissions needed to run the Amazon Inspector assessment. To proceed, choose Configure details.
    Screenshot of adding a target
  5. Next, give your rule a name and a description. I suggest using a name that describes what the rule does, as shown in the following screenshot.
  6. Finish the wizard by choosing Create rule. The rule should appear in the Events – Rules section of the CloudWatch console.
    Screenshot of completing the creation of the rule
  7. To confirm your CloudWatch Events rule works, wait for the next time your CloudWatch Events rule is scheduled to run. For testing purposes, you can choose your CloudWatch Events rule and choose Edit to change the schedule to run it sooner than scheduled.
    Screenshot of confirming the CloudWatch Events rule works
  8. Now navigate to the Amazon Inspector console to confirm the launch of your first assessment run. The Start time column shows you the time each assessment started and the Status column the status of your assessment. In the following screenshot, you can see Amazon Inspector is busy Collecting data from the selected assessment targets.
    Screenshot of confirming the launch of the first assessment run

You have concluded the last step of this blog post by setting up a regular scan of your EC2 instance with Amazon Inspector and a notification that will let you know if your EC2 instance is vulnerable to any known CVEs. In a previous Security Blog post, Eric Fitzgerald explained How to Remediate Amazon Inspector Security Findings Automatically. Although that blog post is for Linux-based EC2 instances, the post shows that you can learn about Amazon Inspector findings in other ways than email alerts.

Conclusion

In this two-part blog post, I showed how to make sure you keep your EC2 instances up to date with patching, how to back up your instances with snapshots, and how to monitor your instances for CVEs. Collectively these measures help to protect your instances against common attack vectors that attempt to exploit known vulnerabilities. In Part 1, I showed how to configure your EC2 instances to make it easy to use Systems Manager, EBS Snapshot Scheduler, and Amazon Inspector. I also showed how to use Systems Manager to schedule automatic patches to keep your instances current in a timely fashion. In Part 2, I showed you how to take regular snapshots of your data by using EBS Snapshot Scheduler and how to use Amazon Inspector to check if your EC2 instances running Microsoft Windows contain any common vulnerabilities and exposures (CVEs).

If you have comments about today’s or yesterday’s post, submit them in the “Comments” section below. If you have questions about or issues implementing any part of this solution, start a new thread on the Amazon EC2 forum or the Amazon Inspector forum, or contact AWS Support.

– Koen

Me on the Equifax Breach

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/11/me_on_the_equif.html

Testimony and Statement for the Record of Bruce Schneier
Fellow and Lecturer, Belfer Center for Science and International Affairs, Harvard Kennedy School
Fellow, Berkman Center for Internet and Society at Harvard Law School

Hearing on “Securing Consumers’ Credit Data in the Age of Digital Commerce”

Before the

Subcommittee on Digital Commerce and Consumer Protection
Committee on Energy and Commerce
United States House of Representatives

1 November 2017
2125 Rayburn House Office Building
Washington, DC 20515

Mister Chairman and Members of the Committee, thank you for the opportunity to testify today concerning the security of credit data. My name is Bruce Schneier, and I am a security technologist. For over 30 years I have studied the technologies of security and privacy. I have authored 13 books on these subjects, including Data and Goliath: The Hidden Battles to Collect Your Data and Control Your World (Norton, 2015). My popular newsletter CryptoGram and my blog Schneier on Security are read by over 250,000 people.

Additionally, I am a Fellow and Lecturer at the Harvard Kennedy School of Government –where I teach Internet security policy — and a Fellow at the Berkman-Klein Center for Internet and Society at Harvard Law School. I am a board member of the Electronic Frontier Foundation, AccessNow, and the Tor Project; and an advisory board member of Electronic Privacy Information Center and VerifiedVoting.org. I am also a special advisor to IBM Security and the Chief Technology Officer of IBM Resilient.

I am here representing none of those organizations, and speak only for myself based on my own expertise and experience.

I have eleven main points:

1. The Equifax breach was a serious security breach that puts millions of Americans at risk.

Equifax reported that 145.5 million US customers, about 44% of the population, were impacted by the breach. (That’s the original 143 million plus the additional 2.5 million disclosed a month later.) The attackers got access to full names, Social Security numbers, birth dates, addresses, and driver’s license numbers.

This is exactly the sort of information criminals can use to impersonate victims to banks, credit card companies, insurance companies, cell phone companies and other businesses vulnerable to fraud. As a result, all 143 million US victims are at greater risk of identity theft, and will remain at risk for years to come. And those who suffer identify theft will have problems for months, if not years, as they work to clean up their name and credit rating.

2. Equifax was solely at fault.

This was not a sophisticated attack. The security breach was a result of a vulnerability in the software for their websites: a program called Apache Struts. The particular vulnerability was fixed by Apache in a security patch that was made available on March 6, 2017. This was not a minor vulnerability; the computer press at the time called it “critical.” Within days, it was being used by attackers to break into web servers. Equifax was notified by Apache, US CERT, and the Department of Homeland Security about the vulnerability, and was provided instructions to make the fix.

Two months later, Equifax had still failed to patch its systems. It eventually got around to it on July 29. The attackers used the vulnerability to access the company’s databases and steal consumer information on May 13, over two months after Equifax should have patched the vulnerability.

The company’s incident response after the breach was similarly damaging. It waited nearly six weeks before informing victims that their personal information had been stolen and they were at increased risk of identity theft. Equifax opened a website to help aid customers, but the poor security around that — the site was at a domain separate from the Equifax domain — invited fraudulent imitators and even more damage to victims. At one point, the official Equifax communications even directed people to that fraudulent site.

This is not the first time Equifax failed to take computer security seriously. It confessed to another data leak in January 2017. In May 2016, one of its websites was hacked, resulting in 430,000 people having their personal information stolen. Also in 2016, a security researcher found and reported a basic security vulnerability in its main website. And in 2014, the company reported yet another security breach of consumer information. There are more.

3. There are thousands of data brokers with similarly intimate information, similarly at risk.

Equifax is more than a credit reporting agency. It’s a data broker. It collects information about all of us, analyzes it all, and then sells those insights. It might be one of the biggest, but there are 2,500 to 4,000 other data brokers that are collecting, storing, and selling information about us — almost all of them companies you’ve never heard of and have no business relationship with.

The breadth and depth of information that data brokers have is astonishing. Data brokers collect and store billions of data elements covering nearly every US consumer. Just one of the data brokers studied holds information on more than 1.4 billion consumer transactions and 700 billion data elements, and another adds more than 3 billion new data points to its database each month.

These brokers collect demographic information: names, addresses, telephone numbers, e-mail addresses, gender, age, marital status, presence and ages of children in household, education level, profession, income level, political affiliation, cars driven, and information about homes and other property. They collect lists of things we’ve purchased, when we’ve purchased them, and how we paid for them. They keep track of deaths, divorces, and diseases in our families. They collect everything about what we do on the Internet.

4. These data brokers deliberately hide their actions, and make it difficult for consumers to learn about or control their data.

If there were a dozen people who stood behind us and took notes of everything we purchased, read, searched for, or said, we would be alarmed at the privacy invasion. But because these companies operate in secret, inside our browsers and financial transactions, we don’t see them and we don’t know they’re there.

Regarding Equifax, few consumers have any idea what the company knows about them, who they sell personal data to or why. If anyone knows about them at all, it’s about their business as a credit bureau, not their business as a data broker. Their website lists 57 different offerings for business: products for industries like automotive, education, health care, insurance, and restaurants.

In general, options to “opt-out” don’t work with data brokers. It’s a confusing process, and doesn’t result in your data being deleted. Data brokers will still collect data about consumers who opt out. It will still be in those companies’ databases, and will still be vulnerable. It just don’t be included individually when they sell data to their customers.

5. The existing regulatory structure is inadequate.

Right now, there is no way for consumers to protect themselves. Their data has been harvested and analyzed by these companies without their knowledge or consent. They cannot improve the security of their personal data, and have no control over how vulnerable it is. They only learn about data breaches when the companies announce them — which can be months after the breaches occur — and at that point the onus is on them to obtain credit monitoring services or credit freezes. And even those only protect consumers from some of the harms, and only those suffered after Equifax admitted to the breach.

Right now, the press is reporting “dozens” of lawsuits against Equifax from shareholders, consumers, and banks. Massachusetts has sued Equifax for violating state consumer protection and privacy laws. Other states may follow suit.

If any of these plaintiffs win in the court, it will be a rare victory for victims of privacy breaches against the companies that have our personal information. Current law is too narrowly focused on people who have suffered financial losses directly traceable to a specific breach. Proving this is difficult. If you are the victim of identity theft in the next month, is it because of Equifax or does the blame belong to another of the thousands of companies who have your personal data? As long as one can’t prove it one way or the other, data brokers remain blameless and liability free.

Additionally, much of this market in our personal data falls outside the protections of the Fair Credit Reporting Act. And in order for the Federal Trade Commission to levy a fine against Equifax, it needs to have a consent order and then a subsequent violation. Any fines will be limited to credit information, which is a small portion of the enormous amount of information these companies know about us. In reality, this is not an effective enforcement regime.

Although the FTC is investigating Equifax, it is unclear if it has a viable case.

6. The market cannot fix this because we are not the customers of data brokers.

The customers of these companies are people and organizations who want to buy information: banks looking to lend you money, landlords deciding whether to rent you an apartment, employers deciding whether to hire you, companies trying to figure out whether you’d be a profitable customer — everyone who wants to sell you something, even governments.

Markets work because buyers choose from a choice of sellers, and sellers compete for buyers. None of us are Equifax’s customers. None of us are the customers of any of these data brokers. We can’t refuse to do business with the companies. We can’t remove our data from their databases. With few limited exceptions, we can’t even see what data these companies have about us or correct any mistakes.

We are the product that these companies sell to their customers: those who want to use our personal information to understand us, categorize us, make decisions about us, and persuade us.

Worse, the financial markets reward bad security. Given the choice between increasing their cybersecurity budget by 5%, or saving that money and taking the chance, a rational CEO chooses to save the money. Wall Street rewards those whose balance sheets look good, not those who are secure. And if senior management gets unlucky and the a public breach happens, they end up okay. Equifax’s CEO didn’t get his $5.2 million severance pay, but he did keep his $18.4 million pension. Any company that spends more on security than absolutely necessary is immediately penalized by shareholders when its profits decrease.

Even the negative PR that Equifax is currently suffering will fade. Unless we expect data brokers to put public interest ahead of profits, the security of this industry will never improve without government regulation.

7. We need effective regulation of data brokers.

In 2014, the Federal Trade Commission recommended that Congress require data brokers be more transparent and give consumers more control over their personal information. That report contains good suggestions on how to regulate this industry.

First, Congress should help plaintiffs in data breach cases by authorizing and funding empirical research on the harm individuals receive from these breaches.

Specifically, Congress should move forward legislative proposals that establish a nationwide “credit freeze” — which is better described as changing the default for disclosure from opt-out to opt-in — and free lifetime credit monitoring services. By this I do not mean giving customers free credit-freeze options, a proposal by Senators Warren and Schatz, but that the default should be a credit freeze.

The credit card industry routinely notifies consumers when there are suspicious charges. It is obvious that credit reporting agencies should have a similar obligation to notify consumers when there is suspicious activity concerning their credit report.

On the technology side, more could be done to limit the amount of personal data companies are allowed to collect. Increasingly, privacy safeguards impose “data minimization” requirements to ensure that only the data that is actually needed is collected. On the other hand, Congress should not create a new national identifier to replace the Social Security Numbers. That would make the system of identification even more brittle. Better is to reduce dependence on systems of identification and to create contextual identification where necessary.

Finally, Congress needs to give the Federal Trade Commission the authority to set minimum security standards for data brokers and to give consumers more control over their personal information. This is essential as long as consumers are these companies’ products and not their customers.

8. Resist complaints from the industry that this is “too hard.”

The credit bureaus and data brokers, and their lobbyists and trade-association representatives, will claim that many of these measures are too hard. They’re not telling you the truth.

Take one example: credit freezes. This is an effective security measure that protects consumers, but the process of getting one and of temporarily unfreezing credit is made deliberately onerous by the credit bureaus. Why isn’t there a smartphone app that alerts me when someone wants to access my credit rating, and lets me freeze and unfreeze my credit at the touch of the screen? Too hard? Today, you can have an app on your phone that does something similar if you try to log into a computer network, or if someone tries to use your credit card at a physical location different from where you are.

Moreover, any credit bureau or data broker operating in Europe is already obligated to follow the more rigorous EU privacy laws. The EU General Data Protection Regulation will come into force, requiring even more security and privacy controls for companies collecting storing the personal data of EU citizens. Those companies have already demonstrated that they can comply with those more stringent regulations.

Credit bureaus, and data brokers in general, are deliberately not implementing these 21st-century security solutions, because they want their services to be as easy and useful as possible for their actual customers: those who are buying your information. Similarly, companies that use this personal information to open accounts are not implementing more stringent security because they want their services to be as easy-to-use and convenient as possible.

9. This has foreign trade implications.

The Canadian Broadcast Corporation reported that 100,000 Canadians had their data stolen in the Equifax breach. The British Broadcasting Corporation originally reported that 400,000 UK consumers were affected; Equifax has since revised that to 15.2 million.

Many American Internet companies have significant numbers of European users and customers, and rely on negotiated safe harbor agreements to legally collect and store personal data of EU citizens.

The European Union is in the middle of a massive regulatory shift in its privacy laws, and those agreements are coming under renewed scrutiny. Breaches such as Equifax give these European regulators a powerful argument that US privacy regulations are inadequate to protect their citizens’ data, and that they should require that data to remain in Europe. This could significantly harm American Internet companies.

10. This has national security implications.

Although it is still unknown who compromised the Equifax database, it could easily have been a foreign adversary that routinely attacks the servers of US companies and US federal agencies with the goal of exploiting security vulnerabilities and obtaining personal data.

When the Fair Credit Reporting Act was passed in 1970, the concern was that the credit bureaus might misuse our data. That is still a concern, but the world has changed since then. Credit bureaus and data brokers have far more intimate data about all of us. And it is valuable not only to companies wanting to advertise to us, but foreign governments as well. In 2015, the Chinese breached the database of the Office of Personal Management and stole the detailed security clearance information of 21 million Americans. North Korea routinely engages in cybercrime as way to fund its other activities. In a world where foreign governments use cyber capabilities to attack US assets, requiring data brokers to limit collection of personal data, securely store the data they collect, and delete data about consumers when it is no longer needed is a matter of national security.

11. We need to do something about it.

Yes, this breach is a huge black eye and a temporary stock dip for Equifax — this month. Soon, another company will have suffered a massive data breach and few will remember Equifax’s problem. Does anyone remember last year when Yahoo admitted that it exposed personal information of a billion users in 2013 and another half billion in 2014?

Unless Congress acts to protect consumer information in the digital age, these breaches will continue.

Thank you for the opportunity to testify today. I will be pleased to answer your questions.

Impersonating iOS Password Prompts

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/10/impersonating_i.html

This is an interesting security vulnerability: because it is so easy to impersonate iOS password prompts, a malicious app can steal your password just by asking.

Why does this work?

iOS asks the user for their iTunes password for many reasons, the most common ones are recently installed iOS operating system updates, or iOS apps that are stuck during installation.

As a result, users are trained to just enter their Apple ID password whenever iOS prompts you to do so. However, those popups are not only shown on the lock screen, and the home screen, but also inside random apps, e.g. when they want to access iCloud, GameCenter or In-App-Purchases.

This could easily be abused by any app, just by showing an UIAlertController, that looks exactly like the system dialog.

Even users who know a lot about technology have a hard time detecting that those alerts are phishing attacks.

The essay proposes some solutions, but I’m not sure they’ll work. We’re all trained to trust our computers and the applications running on them.

How to Query Personally Identifiable Information with Amazon Macie

Post Syndicated from Chad Woolf original https://aws.amazon.com/blogs/security/how-to-query-personally-identifiable-information-with-amazon-macie/

Amazon Macie logo

In August 2017 at the AWS Summit New York, AWS launched a new security and compliance service called Amazon Macie. Macie uses machine learning to automatically discover, classify, and protect sensitive data in AWS. In this blog post, I demonstrate how you can use Macie to help enable compliance with applicable regulations, starting with data retention.

How to query retained PII with Macie

Data retention and mandatory data deletion are common topics across compliance frameworks, so knowing what is stored and how long it has been or needs to be stored is of critical importance. For example, you can use Macie for Payment Card Industry Data Security Standard (PCI DSS) 3.2, requirement 3, “Protect stored cardholder data,” which mandates a “quarterly process for identifying and securely deleting stored cardholder data that exceeds defined retention.” You also can use Macie for ISO 27017 requirement 12.3.1, which calls for “retention periods for backup data.” In each of these cases, you can use Macie’s built-in queries to identify the age of data in your Amazon S3 buckets and to help meet your compliance needs.

To get started with Macie and run your first queries of personally identifiable information (PII) and sensitive data, follow the initial setup as described in the launch post on the AWS Blog. After you have set up Macie, walk through the following steps to start running queries. Start by focusing on the S3 buckets that you want to inventory and capture important compliance related activity and data.

To start running Macie queries:

  1. In the AWS Management Console, launch the Macie console (you can type macie to find the console).
  2. Click Dashboard in the navigation pane. This shows you an overview of the risk level and data classification type of all inventoried S3 buckets, categorized by date and type.
    Screenshot of "Dashboard" in the navigation pane
  3. Choose S3 objects by PII priority. This dashboard lets you sort by PII priority and PII types.
    Screenshot of "S3 objects by PII priority"
  4. In this case, I want to find information about credit card numbers. I choose the magnifying glass for the type cc_number (note that PII types can be used for custom queries). This view shows the events where PII classified data has been uploaded to S3. When I scroll down, I see the individual files that have been identified.
    Screenshot showing the events where PII classified data has been uploaded to S3
  5. Before looking at the files, I want to continue to build the query by only showing items with high priority. To do so, I choose the row called Object PII Priority and then the magnifying glass icon next to High.
    Screenshot of refining the query for high priority events
  6. To view the results matching these queries, I scroll down and choose any file listed. This shows vital information such as creation date, location, and object access control list (ACL).
  7. The piece I am most interested in this case is the Object PII details line to understand more about what was found in the file. In this case, I see name and credit card information, which is what caused the high priority. Scrolling up again, I also see that the query fields have updated as I interacted with the UI.
    Screenshot showing "Object PII details"

Let’s say that I want to get an alert every time Macie finds new data matching this query. This alert can be used to automate response actions by using AWS Lambda and Amazon CloudWatch Events.

  1. I choose the left green icon called Save query as alert.
    Screenshot of "Save query as alert" button
  2. I can customize the alert and change things like category or severity to fit my needs based on the alert data.
  3. Another way to find the information I am looking for is to run custom queries. To start using custom queries, I choose Research in the navigation pane.
    1. To learn more about custom Macie queries and what you can do on the Research tab, see Using the Macie Research Tab.
  4. I change the type of query I want to run from CloudTrail data to S3 objects in the drop-down list menu.
    Screenshot of choosing "S3 objects" from the drop-down list menu
  5. Because I want PII data, I start typing in the query box, which has an autocomplete feature. I choose the pii_types: query. I can now type the data I want to look for. In this case, I want to see all files matching the credit card filter so I type cc_number and press Enter. The query box now says, pii_types:cc_number. I press Enter again to enable autocomplete, and then I type AND pii_types:email to require both a credit card number and email address in a single object.
    The query looks for all files matching the credit card filter ("cc_number")
  6. I choose the magnifying glass to search and Macie shows me all S3 objects that are tagged as PII of type Credit Cards. I can further specify that I only want to see PII of type Credit Card that are classified as High priority by adding AND and pii_impact:high to the query.
    Screenshot showing narrowing the query results furtherAs before, I can save this new query as an alert by clicking Save query as alert, which will be triggered by data matching the query going forward.

Advanced tip

Try the following advanced queries using Lucene query syntax and save the queries as alerts in Macie.

  • Use a regular-expression based query to search for a minimum of 10 credit card numbers and 10 email addresses in a single object:
    • pii_explain.cc_number:/([1-9][0-9]|[0-9]{3,}) distinct Credit Card Numbers.*/ AND pii_explain.email:/([1-9][0-9]|[0-9]{3,}) distinct Email Addresses.*/
  • Search for objects containing at least one credit card, name, and email address that have an object policy enabling global access (searching for S3 AllUsers or AuthenticatedUsers permissions):
    • (object_acl.Grants.Grantee.URI:”http\://acs.amazonaws.com/groups/global/AllUsers” OR  object_acl.Grants.Grantee.URI:”http\://acs.amazonaws.com/groups/global/AllUsers”) AND (pii_types.cc_number AND pii_types.email AND pii_types.name)

These are two ways to identify and be alerted about PII by using Macie. In a similar way, you can create custom alerts for various AWS CloudTrail events by choosing a different data set on which to run the queries again. In the examples in this post, I identified credit cards stored in plain text (all data in this post is example data only), determined how long they had been stored in S3 by viewing the result details, and set up alerts to notify or trigger actions on new sensitive data being stored. With queries like these, you can build a reliable data validation program.

If you have comments about this post, submit them in the “Comments” section below. If you have questions about how to use Macie, start a new thread on the Macie forum or contact AWS Support.

-Chad

Manage Kubernetes Clusters on AWS Using CoreOS Tectonic

Post Syndicated from Arun Gupta original https://aws.amazon.com/blogs/compute/kubernetes-clusters-aws-coreos-tectonic/

There are multiple ways to run a Kubernetes cluster on Amazon Web Services (AWS). The first post in this series explained how to manage a Kubernetes cluster on AWS using kops. This second post explains how to manage a Kubernetes cluster on AWS using CoreOS Tectonic.

Tectonic overview

Tectonic delivers the most current upstream version of Kubernetes with additional features. It is a commercial offering from CoreOS and adds the following features over the upstream:

  • Installer
    Comes with a graphical installer that installs a highly available Kubernetes cluster. Alternatively, the cluster can be installed using AWS CloudFormation templates or Terraform scripts.
  • Operators
    An operator is an application-specific controller that extends the Kubernetes API to create, configure, and manage instances of complex stateful applications on behalf of a Kubernetes user. This release includes an etcd operator for rolling upgrades and a Prometheus operator for monitoring capabilities.
  • Console
    A web console provides a full view of applications running in the cluster. It also allows you to deploy applications to the cluster and start the rolling upgrade of the cluster.
  • Monitoring
    Node CPU and memory metrics are powered by the Prometheus operator. The graphs are available in the console. A large set of preconfigured Prometheus alerts are also available.
  • Security
    Tectonic ensures that cluster is always up to date with the most recent patches/fixes. Tectonic clusters also enable role-based access control (RBAC). Different roles can be mapped to an LDAP service.
  • Support
    CoreOS provides commercial support for clusters created using Tectonic.

Tectonic can be installed on AWS using a GUI installer or Terraform scripts. The installer prompts you for the information needed to boot the Kubernetes cluster, such as AWS access and secret key, number of master and worker nodes, and instance size for the master and worker nodes. The cluster can be created after all the options are specified. Alternatively, Terraform assets can be downloaded and the cluster can be created later. This post shows using the installer.

CoreOS License and Pull Secret

Even though Tectonic is a commercial offering, a cluster for up to 10 nodes can be created by creating a free account at Get Tectonic for Kubernetes. After signup, a CoreOS License and Pull Secret files are provided on your CoreOS account page. Download these files as they are needed by the installer to boot the cluster.

IAM user permission

The IAM user to create the Kubernetes cluster must have access to the following services and features:

  • Amazon Route 53
  • Amazon EC2
  • Elastic Load Balancing
  • Amazon S3
  • Amazon VPC
  • Security groups

Use the aws-policy policy to grant the required permissions for the IAM user.

DNS configuration

A subdomain is required to create the cluster, and it must be registered as a public Route 53 hosted zone. The zone is used to host and expose the console web application. It is also used as the static namespace for the Kubernetes API server. This allows kubectl to be able to talk directly with the master.

The domain may be registered using Route 53. Alternatively, a domain may be registered at a third-party registrar. This post uses a kubernetes-aws.io domain registered at a third-party registrar and a tectonic subdomain within it.

Generate a Route 53 hosted zone using the AWS CLI. Download jq to run this command:

ID=$(uuidgen) && \
aws route53 create-hosted-zone \
--name tectonic.kubernetes-aws.io \
--caller-reference $ID \
| jq .DelegationSet.NameServers

The command shows an output such as the following:

[
  "ns-1924.awsdns-48.co.uk",
  "ns-501.awsdns-62.com",
  "ns-1259.awsdns-29.org",
  "ns-749.awsdns-29.net"
]

Create NS records for the domain with your registrar. Make sure that the NS records can be resolved using a utility like dig web interface. A sample output would look like the following:

The bottom of the screenshot shows NS records configured for the subdomain.

Download and run the Tectonic installer

Download the Tectonic installer (version 1.7.1) and extract it. The latest installer can always be found at coreos.com/tectonic. Start the installer:

./tectonic/tectonic-installer/$PLATFORM/installer

Replace $PLATFORM with either darwin or linux. The installer opens your default browser and prompts you to select the cloud provider. Choose Amazon Web Services as the platform. Choose Next Step.

Specify the Access Key ID and Secret Access Key for the IAM role that you created earlier. This allows the installer to create resources required for the Kubernetes cluster. This also gives the installer full access to your AWS account. Alternatively, to protect the integrity of your main AWS credentials, use a temporary session token to generate temporary credentials.

You also need to choose a region in which to install the cluster. For the purpose of this post, I chose a region close to where I live, Northern California. Choose Next Step.

Give your cluster a name. This name is part of the static namespace for the master and the address of the console.

To enable in-place update to the Kubernetes cluster, select the checkbox next to Automated Updates. It also enables update to the etcd and Prometheus operators. This feature may become a default in future releases.

Choose Upload “tectonic-license.txt” and upload the previously downloaded license file.

Choose Upload “config.json” and upload the previously downloaded pull secret file. Choose Next Step.

Let the installer generate a CA certificate and key. In this case, the browser may not recognize this certificate, which I discuss later in the post. Alternatively, you can provide a CA certificate and a key in PEM format issued by an authorized certificate authority. Choose Next Step.

Use the SSH key for the region specified earlier. You also have an option to generate a new key. This allows you to later connect using SSH into the Amazon EC2 instances provisioned by the cluster. Here is the command that can be used to log in:

ssh –i <key> [email protected]<ec2-instance-ip>

Choose Next Step.

Define the number and instance type of master and worker nodes. In this case, create a 6 nodes cluster. Make sure that the worker nodes have enough processing power and memory to run the containers.

An etcd cluster is used as persistent storage for all of Kubernetes API objects. This cluster is required for the Kubernetes cluster to operate. There are three ways to use the etcd cluster as part of the Tectonic installer:

  • (Default) Provision the cluster using EC2 instances. Additional EC2 instances are used in this case.
  • Use an alpha support for cluster provisioning using the etcd operator. The etcd operator is used for automated operations of the etcd master nodes for the cluster itself, in addition to for etcd instances that are created for application usage. The etcd cluster is provisioned within the Tectonic installer.
  • Bring your own pre-provisioned etcd cluster.

Use the first option in this case.

For more information about choosing the appropriate instance type, see the etcd hardware recommendation. Choose Next Step.

Specify the networking options. The installer can create a new public VPC or use a pre-existing public or private VPC. Make sure that the VPC requirements are met for an existing VPC.

Give a DNS name for the cluster. Choose the domain for which the Route 53 hosted zone was configured earlier, such as tectonic.kubernetes-aws.io. Multiple clusters may be created under a single domain. The cluster name and the DNS name would typically match each other.

To select the CIDR range, choose Show Advanced Settings. You can also choose the Availability Zones for the master and worker nodes. By default, the master and worker nodes are spread across multiple Availability Zones in the chosen region. This makes the cluster highly available.

Leave the other values as default. Choose Next Step.

Specify an email address and password to be used as credentials to log in to the console. Choose Next Step.

At any point during the installation, you can choose Save progress. This allows you to save configurations specified in the installer. This configuration file can then be used to restore progress in the installer at a later point.

To start the cluster installation, choose Submit. At another time, you can download the Terraform assets by choosing Manually boot. This allows you to boot the cluster later.

The logs from the Terraform scripts are shown in the installer. When the installation is complete, the console shows that the Terraform scripts were successfully applied, the domain name was resolved successfully, and that the console has started. The domain works successfully if the DNS resolution worked earlier, and it’s the address where the console is accessible.

Choose Download assets to download assets related to your cluster. It contains your generated CA, kubectl configuration file, and the Terraform state. This download is an important step as it allows you to delete the cluster later.

Choose Next Step for the final installation screen. It allows you to access the Tectonic console, gives you instructions about how to configure kubectl to manage this cluster, and finally deploys an application using kubectl.

Choose Go to my Tectonic Console. In our case, it is also accessible at http://cluster.tectonic.kubernetes-aws.io/.

As I mentioned earlier, the browser does not recognize the self-generated CA certificate. Choose Advanced and connect to the console. Enter the login credentials specified earlier in the installer and choose Login.

The Kubernetes upstream and console version are shown under Software Details. Cluster health shows All systems go and it means that the API server and the backend API can be reached.

To view different Kubernetes resources in the cluster choose, the resource in the left navigation bar. For example, all deployments can be seen by choosing Deployments.

By default, resources in the all namespace are shown. Other namespaces may be chosen by clicking on a menu item on the top of the screen. Different administration tasks such as managing the namespaces, getting list of the nodes and RBAC can be configured as well.

Download and run Kubectl

Kubectl is required to manage the Kubernetes cluster. The latest version of kubectl can be downloaded using the following command:

curl -LO https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/darwin/amd64/kubectl

It can also be conveniently installed using the Homebrew package manager. To find and access a cluster, Kubectl needs a kubeconfig file. By default, this configuration file is at ~/.kube/config. This file is created when a Kubernetes cluster is created from your machine. However, in this case, download this file from the console.

In the console, choose admin, My Account, Download Configuration and follow the steps to download the kubectl configuration file. Move this file to ~/.kube/config. If kubectl has already been used on your machine before, then this file already exists. Make sure to take a backup of that file first.

Now you can run the commands to view the list of deployments:

~ $ kubectl get deployments --all-namespaces
NAMESPACE         NAME                                    DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
kube-system       etcd-operator                           1         1         1            1           43m
kube-system       heapster                                1         1         1            1           40m
kube-system       kube-controller-manager                 3         3         3            3           43m
kube-system       kube-dns                                1         1         1            1           43m
kube-system       kube-scheduler                          3         3         3            3           43m
tectonic-system   container-linux-update-operator         1         1         1            1           40m
tectonic-system   default-http-backend                    1         1         1            1           40m
tectonic-system   kube-state-metrics                      1         1         1            1           40m
tectonic-system   kube-version-operator                   1         1         1            1           40m
tectonic-system   prometheus-operator                     1         1         1            1           40m
tectonic-system   tectonic-channel-operator               1         1         1            1           40m
tectonic-system   tectonic-console                        2         2         2            2           40m
tectonic-system   tectonic-identity                       2         2         2            2           40m
tectonic-system   tectonic-ingress-controller             1         1         1            1           40m
tectonic-system   tectonic-monitoring-auth-alertmanager   1         1         1            1           40m
tectonic-system   tectonic-monitoring-auth-prometheus     1         1         1            1           40m
tectonic-system   tectonic-prometheus-operator            1         1         1            1           40m
tectonic-system   tectonic-stats-emitter                  1         1         1            1           40m

This output is similar to the one shown in the console earlier. Now, this kubectl can be used to manage your resources.

Upgrade the Kubernetes cluster

Tectonic allows the in-place upgrade of the cluster. This is an experimental feature as of this release. The clusters can be updated either automatically, or with manual approval.

To perform the update, choose Administration, Cluster Settings. If an earlier Tectonic installer, version 1.6.2 in this case, is used to install the cluster, then this screen would look like the following:

Choose Check for Updates. If any updates are available, choose Start Upgrade. After the upgrade is completed, the screen is refreshed.

This is an experimental feature in this release and so should only be used on clusters that can be easily replaced. This feature may become a fully supported in a future release. For more information about the upgrade process, see Upgrading Tectonic & Kubernetes.

Delete the Kubernetes cluster

Typically, the Kubernetes cluster is a long-running cluster to serve your applications. After its purpose is served, you may delete it. It is important to delete the cluster as this ensures that all resources created by the cluster are appropriately cleaned up.

The easiest way to delete the cluster is using the assets downloaded in the last step of the installer. Extract the downloaded zip file. This creates a directory like <cluster-name>_TIMESTAMP. In that directory, give the following command to delete the cluster:

TERRAFORM_CONFIG=$(pwd)/.terraformrc terraform destroy --force

This destroys the cluster and all associated resources.

You may have forgotten to download the assets. There is a copy of the assets in the directory tectonic/tectonic-installer/darwin/clusters. In this directory, another directory with the name <cluster-name>_TIMESTAMP contains your assets.

Conclusion

This post explained how to manage Kubernetes clusters using the CoreOS Tectonic graphical installer.  For more details, see Graphical Installer with AWS. If the installation does not succeed, see the helpful Troubleshooting tips. After the cluster is created, see the Tectonic tutorials to learn how to deploy, scale, version, and delete an application.

Future posts in this series will explain other ways of creating and running a Kubernetes cluster on AWS.

Arun

A Hardware Privacy Monitor for iPhones

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/09/a_hardware_priv.html

Andrew “bunnie” Huang and Edward Snowden have designed a hardware device that attaches to an iPhone and monitors it for malicious surveillance activities, even in instances where the phone’s operating system has been compromised. They call it an Introspection Engine, and their use model is a journalist who is concerned about government surveillance:

Our introspection engine is designed with the following goals in mind:

  1. Completely open source and user-inspectable (“You don’t have to trust us”)
  2. Introspection operations are performed by an execution domain completely separated from the phone”s CPU (“don’t rely on those with impaired judgment to fairly judge their state”)

  3. Proper operation of introspection system can be field-verified (guard against “evil maid” attacks and hardware failures)

  4. Difficult to trigger a false positive (users ignore or disable security alerts when there are too many positives)

  5. Difficult to induce a false negative, even with signed firmware updates (“don’t trust the system vendor” — state-level adversaries with full cooperation of system vendors should not be able to craft signed firmware updates that spoof or bypass the introspection engine)

  6. As much as possible, the introspection system should be passive and difficult to detect by the phone’s operating system (prevent black-listing/targeting of users based on introspection engine signatures)

  7. Simple, intuitive user interface requiring no specialized knowledge to interpret or operate (avoid user error leading to false negatives; “journalists shouldn’t have to be cryptographers to be safe”)

  8. Final solution should be usable on a daily basis, with minimal impact on workflow (avoid forcing field reporters into the choice between their personal security and being an effective journalist)

This looks like fantastic work, and they have a working prototype.

Of course, this does nothing to stop all the legitimate surveillance that happens over a cell phone: location tracking, records of who you talk to, and so on.

BoingBoing post.