Tag Archives: telemetry

Sharing Secrets with AWS Lambda Using AWS Systems Manager Parameter Store

Post Syndicated from Chris Munns original https://aws.amazon.com/blogs/compute/sharing-secrets-with-aws-lambda-using-aws-systems-manager-parameter-store/

This post courtesy of Roberto Iturralde, Sr. Application Developer- AWS Professional Services

Application architects are faced with key decisions throughout the process of designing and implementing their systems. One decision common to nearly all solutions is how to manage the storage and access rights of application configuration. Shared configuration should be stored centrally and securely with each system component having access only to the properties that it needs for functioning.

With AWS Systems Manager Parameter Store, developers have access to central, secure, durable, and highly available storage for application configuration and secrets. Parameter Store also integrates with AWS Identity and Access Management (IAM), allowing fine-grained access control to individual parameters or branches of a hierarchical tree.

This post demonstrates how to create and access shared configurations in Parameter Store from AWS Lambda. Both encrypted and plaintext parameter values are stored with only the Lambda function having permissions to decrypt the secrets. You also use AWS X-Ray to profile the function.

Solution overview

This example is made up of the following components:

  • An AWS SAM template that defines:
    • A Lambda function and its permissions
    • An unencrypted Parameter Store parameter that the Lambda function loads
    • A KMS key that only the Lambda function can access. You use this key to create an encrypted parameter later.
  • Lambda function code in Python 3.6 that demonstrates how to load values from Parameter Store at function initialization for reuse across invocations.

Launch the AWS SAM template

To create the resources shown in this post, you can download the SAM template or choose the button to launch the stack. The template requires one parameter, an IAM user name, which is the name of the IAM user to be the admin of the KMS key that you create. In order to perform the steps listed in this post, this IAM user will need permissions to execute Lambda functions, create Parameter Store parameters, administer keys in KMS, and view the X-Ray console. If you have these privileges in your IAM user account you can use your own account to complete the walkthrough. You can not use the root user to administer the KMS keys.

SAM template resources

The following sections show the code for the resources defined in the template.
Lambda function

ParameterStoreBlogFunctionDev:
    Type: 'AWS::Serverless::Function'
    Properties:
      FunctionName: 'ParameterStoreBlogFunctionDev'
      Description: 'Integrating lambda with Parameter Store'
      Handler: 'lambda_function.lambda_handler'
      Role: !GetAtt ParameterStoreBlogFunctionRoleDev.Arn
      CodeUri: './code'
      Environment:
        Variables:
          ENV: 'dev'
          APP_CONFIG_PATH: 'parameterStoreBlog'
          AWS_XRAY_TRACING_NAME: 'ParameterStoreBlogFunctionDev'
      Runtime: 'python3.6'
      Timeout: 5
      Tracing: 'Active'

  ParameterStoreBlogFunctionRoleDev:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
          -
            Effect: Allow
            Principal:
              Service:
                - 'lambda.amazonaws.com'
            Action:
              - 'sts:AssumeRole'
      ManagedPolicyArns:
        - 'arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole'
      Policies:
        -
          PolicyName: 'ParameterStoreBlogDevParameterAccess'
          PolicyDocument:
            Version: '2012-10-17'
            Statement:
              -
                Effect: Allow
                Action:
                  - 'ssm:GetParameter*'
                Resource: !Sub 'arn:aws:ssm:${AWS::Region}:${AWS::AccountId}:parameter/dev/parameterStoreBlog*'
        -
          PolicyName: 'ParameterStoreBlogDevXRayAccess'
          PolicyDocument:
            Version: '2012-10-17'
            Statement:
              -
                Effect: Allow
                Action:
                  - 'xray:PutTraceSegments'
                  - 'xray:PutTelemetryRecords'
                Resource: '*'

In this YAML code, you define a Lambda function named ParameterStoreBlogFunctionDev using the SAM AWS::Serverless::Function type. The environment variables for this function include the ENV (dev) and the APP_CONFIG_PATH where you find the configuration for this app in Parameter Store. X-Ray tracing is also enabled for profiling later.

The IAM role for this function extends the AWSLambdaBasicExecutionRole by adding IAM policies that grant the function permissions to write to X-Ray and get parameters from Parameter Store, limited to paths under /dev/parameterStoreBlog*.
Parameter Store parameter

SimpleParameter:
    Type: AWS::SSM::Parameter
    Properties:
      Name: '/dev/parameterStoreBlog/appConfig'
      Description: 'Sample dev config values for my app'
      Type: String
      Value: '{"key1": "value1","key2": "value2","key3": "value3"}'

This YAML code creates a plaintext string parameter in Parameter Store in a path that your Lambda function can access.
KMS encryption key

ParameterStoreBlogDevEncryptionKeyAlias:
    Type: AWS::KMS::Alias
    Properties:
      AliasName: 'alias/ParameterStoreBlogKeyDev'
      TargetKeyId: !Ref ParameterStoreBlogDevEncryptionKey

  ParameterStoreBlogDevEncryptionKey:
    Type: AWS::KMS::Key
    Properties:
      Description: 'Encryption key for secret config values for the Parameter Store blog post'
      Enabled: True
      EnableKeyRotation: False
      KeyPolicy:
        Version: '2012-10-17'
        Id: 'key-default-1'
        Statement:
          -
            Sid: 'Allow administration of the key & encryption of new values'
            Effect: Allow
            Principal:
              AWS:
                - !Sub 'arn:aws:iam::${AWS::AccountId}:user/${IAMUsername}'
            Action:
              - 'kms:Create*'
              - 'kms:Encrypt'
              - 'kms:Describe*'
              - 'kms:Enable*'
              - 'kms:List*'
              - 'kms:Put*'
              - 'kms:Update*'
              - 'kms:Revoke*'
              - 'kms:Disable*'
              - 'kms:Get*'
              - 'kms:Delete*'
              - 'kms:ScheduleKeyDeletion'
              - 'kms:CancelKeyDeletion'
            Resource: '*'
          -
            Sid: 'Allow use of the key'
            Effect: Allow
            Principal:
              AWS: !GetAtt ParameterStoreBlogFunctionRoleDev.Arn
            Action:
              - 'kms:Encrypt'
              - 'kms:Decrypt'
              - 'kms:ReEncrypt*'
              - 'kms:GenerateDataKey*'
              - 'kms:DescribeKey'
            Resource: '*'

This YAML code creates an encryption key with a key policy with two statements.

The first statement allows a given user (${IAMUsername}) to administer the key. Importantly, this includes the ability to encrypt values using this key and disable or delete this key, but does not allow the administrator to decrypt values that were encrypted with this key.

The second statement grants your Lambda function permission to encrypt and decrypt values using this key. The alias for this key in KMS is ParameterStoreBlogKeyDev, which is how you reference it later.

Lambda function

Here I walk you through the Lambda function code.

import os, traceback, json, configparser, boto3
from aws_xray_sdk.core import patch_all
patch_all()

# Initialize boto3 client at global scope for connection reuse
client = boto3.client('ssm')
env = os.environ['ENV']
app_config_path = os.environ['APP_CONFIG_PATH']
full_config_path = '/' + env + '/' + app_config_path
# Initialize app at global scope for reuse across invocations
app = None

class MyApp:
    def __init__(self, config):
        """
        Construct new MyApp with configuration
        :param config: application configuration
        """
        self.config = config

    def get_config(self):
        return self.config

def load_config(ssm_parameter_path):
    """
    Load configparser from config stored in SSM Parameter Store
    :param ssm_parameter_path: Path to app config in SSM Parameter Store
    :return: ConfigParser holding loaded config
    """
    configuration = configparser.ConfigParser()
    try:
        # Get all parameters for this app
        param_details = client.get_parameters_by_path(
            Path=ssm_parameter_path,
            Recursive=False,
            WithDecryption=True
        )

        # Loop through the returned parameters and populate the ConfigParser
        if 'Parameters' in param_details and len(param_details.get('Parameters')) > 0:
            for param in param_details.get('Parameters'):
                param_path_array = param.get('Name').split("/")
                section_position = len(param_path_array) - 1
                section_name = param_path_array[section_position]
                config_values = json.loads(param.get('Value'))
                config_dict = {section_name: config_values}
                print("Found configuration: " + str(config_dict))
                configuration.read_dict(config_dict)

    except:
        print("Encountered an error loading config from SSM.")
        traceback.print_exc()
    finally:
        return configuration

def lambda_handler(event, context):
    global app
    # Initialize app if it doesn't yet exist
    if app is None:
        print("Loading config and creating new MyApp...")
        config = load_config(full_config_path)
        app = MyApp(config)

    return "MyApp config is " + str(app.get_config()._sections)

Beneath the import statements, you import the patch_all function from the AWS X-Ray library, which you use to patch boto3 to create X-Ray segments for all your boto3 operations.

Next, you create a boto3 SSM client at the global scope for reuse across function invocations, following Lambda best practices. Using the function environment variables, you assemble the path where you expect to find your configuration in Parameter Store. The class MyApp is meant to serve as an example of an application that would need its configuration injected at construction. In this example, you create an instance of ConfigParser, a class in Python’s standard library for handling basic configurations, to give to MyApp.

The load_config function loads the all the parameters from Parameter Store at the level immediately beneath the path provided in the Lambda function environment variables. Each parameter found is put into a new section in ConfigParser. The name of the section is the name of the parameter, less the base path. In this example, the full parameter name is /dev/parameterStoreBlog/appConfig, which is put in a section named appConfig.

Finally, the lambda_handler function initializes an instance of MyApp if it doesn’t already exist, constructing it with the loaded configuration from Parameter Store. Then it simply returns the currently loaded configuration in MyApp. The impact of this design is that the configuration is only loaded from Parameter Store the first time that the Lambda function execution environment is initialized. Subsequent invocations reuse the existing instance of MyApp, resulting in improved performance. You see this in the X-Ray traces later in this post. For more advanced use cases where configuration changes need to be received immediately, you could implement an expiry policy for your configuration entries or push notifications to your function.

To confirm that everything was created successfully, test the function in the Lambda console.

  1. Open the Lambda console.
  2. In the navigation pane, choose Functions.
  3. In the Functions pane, filter to ParameterStoreBlogFunctionDev to find the function created by the SAM template earlier. Open the function name to view its details.
  4. On the top right of the function detail page, choose Test. You may need to create a new test event. The input JSON doesn’t matter as this function ignores the input.

After running the test, you should see output similar to the following. This demonstrates that the function successfully fetched the unencrypted configuration from Parameter Store.

Create an encrypted parameter

You currently have a simple, unencrypted parameter and a Lambda function that can access it.

Next, you create an encrypted parameter that only your Lambda function has permission to use for decryption. This limits read access for this parameter to only this Lambda function.

To follow along with this section, deploy the SAM template for this post in your account and make your IAM user name the KMS key admin mentioned earlier.

  1. In the Systems Manager console, under Shared Resources, choose Parameter Store.
  2. Choose Create Parameter.
    • For Name, enter /dev/parameterStoreBlog/appSecrets.
    • For Type, select Secure String.
    • For KMS Key ID, choose alias/ParameterStoreBlogKeyDev, which is the key that your SAM template created.
    • For Value, enter {"secretKey": "secretValue"}.
    • Choose Create Parameter.
  3. If you now try to view the value of this parameter by choosing the name of the parameter in the parameters list and then choosing Show next to the Value field, you won’t see the value appear. This is because, even though you have permission to encrypt values using this KMS key, you do not have permissions to decrypt values.
  4. In the Lambda console, run another test of your function. You now also see the secret parameter that you created and its decrypted value.

If you do not see the new parameter in the Lambda output, this may be because the Lambda execution environment is still warm from the previous test. Because the parameters are loaded at Lambda startup, you need a fresh execution environment to refresh the values.

Adjust the function timeout to a different value in the Advanced Settings at the bottom of the Lambda Configuration tab. Choose Save and test to trigger the creation of a new Lambda execution environment.

Profiling the impact of querying Parameter Store using AWS X-Ray

By using the AWS X-Ray SDK to patch boto3 in your Lambda function code, each invocation of the function creates traces in X-Ray. In this example, you can use these traces to validate the performance impact of your design decision to only load configuration from Parameter Store on the first invocation of the function in a new execution environment.

From the Lambda function details page where you tested the function earlier, under the function name, choose Monitoring. Choose View traces in X-Ray.

This opens the X-Ray console in a new window filtered to your function. Be aware of the time range field next to the search bar if you don’t see any search results.
In this screenshot, I’ve invoked the Lambda function twice, one time 10.3 minutes ago with a response time of 1.1 seconds and again 9.8 minutes ago with a response time of 8 milliseconds.

Looking at the details of the longer running trace by clicking the trace ID, you can see that the Lambda function spent the first ~350 ms of the full 1.1 sec routing the request through Lambda and creating a new execution environment for this function, as this was the first invocation with this code. This is the portion of time before the initialization subsegment.

Next, it took 725 ms to initialize the function, which includes executing the code at the global scope (including creating the boto3 client). This is also a one-time cost for a fresh execution environment.

Finally, the function executed for 65 ms, of which 63.5 ms was the GetParametersByPath call to Parameter Store.

Looking at the trace for the second, much faster function invocation, you see that the majority of the 8 ms execution time was Lambda routing the request to the function and returning the response. Only 1 ms of the overall execution time was attributed to the execution of the function, which makes sense given that after the first invocation you’re simply returning the config stored in MyApp.

While the Traces screen allows you to view the details of individual traces, the X-Ray Service Map screen allows you to view aggregate performance data for all traced services over a period of time.

In the X-Ray console navigation pane, choose Service map. Selecting a service node shows the metrics for node-specific requests. Selecting an edge between two nodes shows the metrics for requests that traveled that connection. Again, be aware of the time range field next to the search bar if you don’t see any search results.

After invoking your Lambda function several more times by testing it from the Lambda console, you can view some aggregate performance metrics. Look at the following:

  • From the client perspective, requests to the Lambda service for the function are taking an average of 50 ms to respond. The function is generating ~1 trace per minute.
  • The function itself is responding in an average of 3 ms. In the following screenshot, I’ve clicked on this node, which reveals a latency histogram of the traced requests showing that over 95% of requests return in under 5 ms.
  • Parameter Store is responding to requests in an average of 64 ms, but note the much lower trace rate in the node. This is because you only fetch data from Parameter Store on the initialization of the Lambda execution environment.

Conclusion

Deduplication, encryption, and restricted access to shared configuration and secrets is a key component to any mature architecture. Serverless architectures designed using event-driven, on-demand, compute services like Lambda are no different.

In this post, I walked you through a sample application accessing unencrypted and encrypted values in Parameter Store. These values were created in a hierarchy by application environment and component name, with the permissions to decrypt secret values restricted to only the function needing access. The techniques used here can become the foundation of secure, robust configuration management in your enterprise serverless applications.

Skygofree: New Government Malware for Android

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2018/01/skygofree_new_g.html

Kaspersky Labs is reporting on a new piece of sophisticated malware:

We observed many web landing pages that mimic the sites of mobile operators and which are used to spread the Android implants. These domains have been registered by the attackers since 2015. According to our telemetry, that was the year the distribution campaign was at its most active. The activities continue: the most recently observed domain was registered on October 31, 2017. Based on our KSN statistics, there are several infected individuals, exclusively in Italy.

Moreover, as we dived deeper into the investigation, we discovered several spyware tools for Windows that form an implant for exfiltrating sensitive data on a targeted machine. The version we found was built at the beginning of 2017, and at the moment we are not sure whether this implant has been used in the wild.

It seems to be Italian. Ars Technica speculates that it is related to Hacking Team:

That’s not to say the malware is perfect. The various versions examined by Kaspersky Lab contained several artifacts that provide valuable clues about the people who may have developed and maintained the code. Traces include the domain name h3g.co, which was registered by Italian IT firm Negg International. Negg officials didn’t respond to an email requesting comment for this post. The malware may be filling a void left after the epic hack in 2015 of Hacking Team, another Italy-based developer of spyware.

BoingBoing post.

Glenn’s Take on re:Invent 2017 Part 1

Post Syndicated from Glenn Gore original https://aws.amazon.com/blogs/architecture/glenns-take-on-reinvent-2017-part-1/

GREETINGS FROM LAS VEGAS

Glenn Gore here, Chief Architect for AWS. I’m in Las Vegas this week — with 43K others — for re:Invent 2017. We have a lot of exciting announcements this week. I’m going to post to the AWS Architecture blog each day with my take on what’s interesting about some of the announcements from a cloud architectural perspective.

Why not start at the beginning? At the Midnight Madness launch on Sunday night, we announced Amazon Sumerian, our platform for VR, AR, and mixed reality. The hype around VR/AR has existed for many years, though for me, it is a perfect example of how a working end-to-end solution often requires innovation from multiple sources. For AR/VR to be successful, we need many components to come together in a coherent manner to provide a great experience.

First, we need lightweight, high-definition goggles with motion tracking that are comfortable to wear. Second, we need to track movement of our body and hands in a 3-D space so that we can interact with virtual objects in the virtual world. Third, we need to build the virtual world itself and populate it with assets and define how the interactions will work and connect with various other systems.

There has been rapid development of the physical devices for AR/VR, ranging from iOS devices to Oculus Rift and HTC Vive, which provide excellent capabilities for the first and second components defined above. With the launch of Amazon Sumerian we are solving for the third area, which will help developers easily build their own virtual worlds and start experimenting and innovating with how to apply AR/VR in new ways.

Already, within 48 hours of Amazon Sumerian being announced, I have had multiple discussions with customers and partners around some cool use cases where VR can help in training simulations, remote-operator controls, or with new ideas around interacting with complex visual data sets, which starts bringing concepts straight out of sci-fi movies into the real (virtual) world. I am really excited to see how Sumerian will unlock the creative potential of developers and where this will lead.

Amazon MQ
I am a huge fan of distributed architectures where asynchronous messaging is the backbone of connecting the discrete components together. Amazon Simple Queue Service (Amazon SQS) is one of my favorite services due to its simplicity, scalability, performance, and the incredible flexibility of how you can use Amazon SQS in so many different ways to solve complex queuing scenarios.

While Amazon SQS is easy to use when building cloud-native applications on AWS, many of our customers running existing applications on-premises required support for different messaging protocols such as: Java Message Service (JMS), .Net Messaging Service (NMS), Advanced Message Queuing Protocol (AMQP), MQ Telemetry Transport (MQTT), Simple (or Streaming) Text Orientated Messaging Protocol (STOMP), and WebSockets. One of the most popular applications for on-premise message brokers is Apache ActiveMQ. With the release of Amazon MQ, you can now run Apache ActiveMQ on AWS as a managed service similar to what we did with Amazon ElastiCache back in 2012. For me, there are two compelling, major benefits that Amazon MQ provides:

  • Integrate existing applications with cloud-native applications without having to change a line of application code if using one of the supported messaging protocols. This removes one of the biggest blockers for integration between the old and the new.
  • Remove the complexity of configuring Multi-AZ resilient message broker services as Amazon MQ provides out-of-the-box redundancy by always storing messages redundantly across Availability Zones. Protection is provided against failure of a broker through to complete failure of an Availability Zone.

I believe that Amazon MQ is a major component in the tools required to help you migrate your existing applications to AWS. Having set up cross-data center Apache ActiveMQ clusters in the past myself and then testing to ensure they work as expected during critical failure scenarios, technical staff working on migrations to AWS benefit from the ease of deploying a fully redundant, managed Apache ActiveMQ cluster within minutes.

Who would have thought I would have been so excited to revisit Apache ActiveMQ in 2017 after using SQS for many, many years? Choice is a wonderful thing.

Amazon GuardDuty
Maintaining application and information security in the modern world is increasingly complex and is constantly evolving and changing as new threats emerge. This is due to the scale, variety, and distribution of services required in a competitive online world.

At Amazon, security is our number one priority. Thus, we are always looking at how we can increase security detection and protection while simplifying the implementation of advanced security practices for our customers. As a result, we released Amazon GuardDuty, which provides intelligent threat detection by using a combination of multiple information sources, transactional telemetry, and the application of machine learning models developed by AWS. One of the biggest benefits of Amazon GuardDuty that I appreciate is that enabling this service requires zero software, agents, sensors, or network choke points. which can all impact performance or reliability of the service you are trying to protect. Amazon GuardDuty works by monitoring your VPC flow logs, AWS CloudTrail events, DNS logs, as well as combing other sources of security threats that AWS is aggregating from our own internal and external sources.

The use of machine learning in Amazon GuardDuty allows it to identify changes in behavior, which could be suspicious and require additional investigation. Amazon GuardDuty works across all of your AWS accounts allowing for an aggregated analysis and ensuring centralized management of detected threats across accounts. This is important for our larger customers who can be running many hundreds of AWS accounts across their organization, as providing a single common threat detection of their organizational use of AWS is critical to ensuring they are protecting themselves.

Detection, though, is only the beginning of what Amazon GuardDuty enables. When a threat is identified in Amazon GuardDuty, you can configure remediation scripts or trigger Lambda functions where you have custom responses that enable you to start building automated responses to a variety of different common threats. Speed of response is required when a security incident may be taking place. For example, Amazon GuardDuty detects that an Amazon Elastic Compute Cloud (Amazon EC2) instance might be compromised due to traffic from a known set of malicious IP addresses. Upon detection of a compromised EC2 instance, we could apply an access control entry restricting outbound traffic for that instance, which stops loss of data until a security engineer can assess what has occurred.

Whether you are a customer running a single service in a single account, or a global customer with hundreds of accounts with thousands of applications, or a startup with hundreds of micro-services with hourly release cycle in a devops world, I recommend enabling Amazon GuardDuty. We have a 30-day free trial available for all new customers of this service. As it is a monitor of events, there is no change required to your architecture within AWS.

Stay tuned for tomorrow’s post on AWS Media Services and Amazon Neptune.

 

Glenn during the Tour du Mont Blanc

New- AWS IoT Device Management

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-iot-device-management/

AWS IoT and AWS Greengrass give you a solid foundation and programming environment for your IoT devices and applications.

The nature of IoT means that an at-scale device deployment often encompasses millions or even tens of millions of devices deployed at hundreds or thousands of locations. At that scale, treating each device individually is impossible. You need to be able to set up, monitor, update, and eventually retire devices in bulk, collective fashion while also retaining the flexibility to accommodate varying deployment configurations, device models, and so forth.

New AWS IoT Device Management
Today we are launching AWS IoT Device Management to help address this challenge. It will help you through each phase of the device lifecycle, from manufacturing to retirement. Here’s what you get:

Onboarding – Starting with devices in their as-manufactured state, you can control the provisioning workflow. You can use IoT Device Management templates to quickly onboard entire fleets of devices with a few clicks. The templates can include information about device certificates and access policies.

Organization – In order to deal with massive numbers of devices, AWS IoT Device Management extends the existing IoT Device Registry and allows you to create a hierarchical model of your fleet and to set policies on a hierarchical basis. You can drill-down through the hierarchy in order to locate individual devices. You can also query your fleet on attributes such as device type or firmware version.

Monitoring – Telemetry from the devices is used to gather real-time connection, authentication, and status metrics, which are published to Amazon CloudWatch. You can examine the metrics and locate outliers for further investigation. IoT Device Management lets you configure the log level for each device group, and you can also publish change events for the Registry and Jobs for monitoring purposes.

Remote ManagementAWS IoT Device Management lets you remotely manage your devices. You can push new software and firmware to them, reset to factory defaults, reboot, and set up bulk updates at the desired velocity.

Exploring AWS IoT Device Management
The AWS IoT Device Management Console took me on a tour and pointed out how to access each of the features of the service:

I already have a large set of devices (pressure gauges):

These gauges were created using the new template-driven bulk registration feature. Here’s how I create a template:

The gauges are organized into groups (by US state in this case):

Here are the gauges in Colorado:

AWS IoT group policies allow you to control access to specific IoT resources and actions for all members of a group. The policies are structured very much like IAM policies, and can be created in the console:

Jobs are used to selectively update devices. Here’s how I create one:

As indicated by the Job type above, jobs can run either once or continuously. Here’s how I choose the devices to be updated:

I can create custom authorizers that make use of a Lambda function:

I’ve shown you a medium-sized subset of AWS IoT Device Management in this post. Check it out for yourself to learn more!

Jeff;

 

Roundup of AWS HIPAA Eligible Service Announcements

Post Syndicated from Ana Visneski original https://aws.amazon.com/blogs/aws/roundup-of-aws-hipaa-eligible-service-announcements/

At AWS we have had a number of HIPAA eligible service announcements. Patrick Combes, the Healthcare and Life Sciences Global Technical Leader at AWS, and Aaron Friedman, a Healthcare and Life Sciences Partner Solutions Architect at AWS, have written this post to tell you all about it.

-Ana


We are pleased to announce that the following AWS services have been added to the BAA in recent weeks: Amazon API Gateway, AWS Direct Connect, AWS Database Migration Service, and Amazon SQS. All four of these services facilitate moving data into and through AWS, and we are excited to see how customers will be using these services to advance their solutions in healthcare. While we know the use cases for each of these services are vast, we wanted to highlight some ways that customers might use these services with Protected Health Information (PHI).

As with all HIPAA-eligible services covered under the AWS Business Associate Addendum (BAA), PHI must be encrypted while at-rest or in-transit. We encourage you to reference our HIPAA whitepaper, which details how you might configure each of AWS’ HIPAA-eligible services to store, process, and transmit PHI. And of course, for any portion of your application that does not touch PHI, you can use any of our 90+ services to deliver the best possible experience to your users. You can find some ideas on architecting for HIPAA on our website.

Amazon API Gateway
Amazon API Gateway is a web service that makes it easy for developers to create, publish, monitor, and secure APIs at any scale. With PHI now able to securely transit API Gateway, applications such as patient/provider directories, patient dashboards, medical device reports/telemetry, HL7 message processing and more can securely accept and deliver information to any number and type of applications running within AWS or client presentation layers.

One particular area we are excited to see how our customers leverage Amazon API Gateway is with the exchange of healthcare information. The Fast Healthcare Interoperability Resources (FHIR) specification will likely become the next-generation standard for how health information is shared between entities. With strong support for RESTful architectures, FHIR can be easily codified within an API on Amazon API Gateway. For more information on FHIR, our AWS Healthcare Competency partner, Datica, has an excellent primer.

AWS Direct Connect
Some of our healthcare and life sciences customers, such as Johnson & Johnson, leverage hybrid architectures and need to connect their on-premises infrastructure to the AWS Cloud. Using AWS Direct Connect, you can establish private connectivity between AWS and your datacenter, office, or colocation environment, which in many cases can reduce your network costs, increase bandwidth throughput, and provide a more consistent network experience than Internet-based connections.

In addition to a hybrid-architecture strategy, AWS Direct Connect can assist with the secure migration of data to AWS, which is the first step to using the wide array of our HIPAA-eligible services to store and process PHI, such as Amazon S3 and Amazon EMR. Additionally, you can connect to third-party/externally-hosted applications or partner-provided solutions as well as securely and reliably connect end users to those same healthcare applications, such as a cloud-based Electronic Medical Record system.

AWS Database Migration Service (DMS)
To date, customers have migrated over 20,000 databases to AWS through the AWS Database Migration Service. Customers often use DMS as part of their cloud migration strategy, and now it can be used to securely and easily migrate your core databases containing PHI to the AWS Cloud. As your source database remains fully operational during the migration with DMS, you minimize downtime for these business-critical applications as you migrate your databases to AWS. This service can now be utilized to securely transfer such items as patient directories, payment/transaction record databases, revenue management databases and more into AWS.

Amazon Simple Queue Service (SQS)
Amazon Simple Queue Service (SQS) is a message queueing service for reliably communicating among distributed software components and microservices at any scale. One way that we envision customers using SQS with PHI is to buffer requests between application components that pass HL7 or FHIR messages to other parts of their application. You can leverage features like SQS FIFO to ensure your messages containing PHI are passed in the order they are received and delivered in the order they are received, and available until a consumer processes and deletes it. This is important for applications with patient record updates or processing payment information in a hospital.

Let’s get building!
We are beyond excited to see how our customers will use our newly HIPAA-eligible services as part of their healthcare applications. What are you most excited for? Leave a comment below.

Let’s Encrypt 2016 In Review

Post Syndicated from Let's Encrypt - Free SSL/TLS Certificates original https://letsencrypt.org//2017/01/06/le-2016-in-review.html

Our first full year as a live CA was an exciting one. I’m incredibly proud of what our team and community accomplished during 2016. I’d like to share some thoughts about how we’ve changed, what we’ve accomplished, and what we’ve learned.

At the start of 2016, Let’s Encrypt certificates had been available to the public for less than a month and we were supporting approximately 240,000 active (unexpired) certificates. That seemed like a lot at the time! Now we’re frequently issuing that many new certificates in a single day while supporting more than 20,000,000 active certificates in total. We’ve issued more than a million certificates in a single day a few times recently. We’re currently serving an average of 6,700 OCSP responses per second. We’ve done a lot of optimization work, we’ve had to add some hardware, and there have been some long nights for our staff, but we’ve been able to keep up and we’re ready for another year of strong growth.

Let's Encrypt certificate issuance statistics.

We added a number of new features during the past year, including support for the ACME DNS challenge, ECDSA signing, IPv6, and Internationalized Domain Names.

When 2016 started, our root certificate had not been accepted into any major root programs. Today we’ve been accepted into the Mozilla, Apple, and Google root programs. We’re close to announcing acceptance into another major root program. These are major steps towards being able to operate as an independent CA. You can read more about why here.

The ACME protocol for issuing and managing certificates is at the heart of how Let’s Encrypt works. Having a well-defined and heavily audited specification developed in public on a standards track has been a major contributor to our growth and the growth of our client ecosystem. Great progress was made in 2016 towards standardizing ACME in the IETF ACME working group. We’re hoping for a final document around the end of Q2 2017, and we’ll announce plans for implementation of the updated protocol around that time as well.

Supporting the kind of growth we saw in 2016 meant adding staff, and during the past year Internet Security Research Group (ISRG), the non-profit entity behind Let’s Encrypt, went from four full-time employees to nine. We’re still a pretty small crew given that we’re now one of the largest CAs in the world (if not the largest), but it works because of our intense focus on automation, the fact that we’ve been able to hire great people, and because of the incredible support we receive from the Let’s Encrypt community.

Let’s Encrypt exists in order to help create a 100% encrypted Web. Our own metrics can be interesting, but they’re only really meaningful in terms of the impact they have on progress towards a more secure and privacy-respecting Web. The metric we use to track progress towards that goal is the percentage of page loads using HTTPS, as seen by browsers. According to Firefox Telemetry, the Web has gone from approximately 39% of page loads using HTTPS each day to just about 49% during the past year. We’re incredibly close to a Web that is more encrypted than not. We’re proud to have been a big part of that, but we can’t take credit for all of it. Many people and organizations around the globe have come to realize that we need to invest in a more secure and privacy-respecting Web, and have taken steps to secure their own sites as well as their customers’. Thank you to everyone that has advocated for HTTPS this year, or helped to make it easier for people to make the switch.

We learned some lessons this year. When we had service interruptions they were usually related to managing the rapidly growing database backing our CA. Also, while most of our code had proper tests, some small pieces didn’t and that led to incidents that shouldn’t have happened. That said, I’m proud of the way we handle incidents promptly, including quick and transparent public disclosure.

We also learned a lot about our client ecosystem. At the beginning of 2016, ISRG / Let’s Encrypt provided client software called letsencrypt. We’ve always known that we would never be able produce software that would work for every Web server/stack, but we felt that we needed to offer a client that would work well for a large number of people and that could act as a reference client. By March of 2016, earlier than we had foreseen, it had become clear that our community was up to the task of creating a wide range of quality clients, and that our energy would be better spent fostering that community than producing our own client. That’s when we made the decision to hand off development of our client to the Electronic Frontier Foundation (EFF). EFF renamed the client to Certbot and has been doing an excellent job maintaining and improving it as one of many client options.

As exciting as 2016 was for Let’s Encrypt and encryption on the Web, 2017 seems set to be an even more incredible year. Much of the infrastructure and many of the plans necessary for a 100% encrypted Web came into being or solidified in 2016. More and more hosting providers and CDNs are supporting HTTPS with one click or by default, often without additional fees. It has never been easier for people and organizations running their own sites to find the tools, services, and information they need to move to HTTPS. Browsers are planning to update their user interfaces to better reflect the risks associated with non-secure connections.

We’d like to thank our community, including our sponsors, for making everything we did this past year possible. Please consider getting involved or making a donation, and if your company or organization would like to sponsor Let’s Encrypt please email us at [email protected].

Let’s Encrypt 2016 In Review

Post Syndicated from Let's Encrypt - Free SSL/TLS Certificates original https://letsencrypt.org/2017/01/06/le-2016-in-review.html

<p>Our first full year as a live CA was an exciting one. I’m incredibly proud of what our team and community accomplished during 2016. I’d like to share some thoughts about how we’ve changed, what we’ve accomplished, and what we’ve learned.</p>

<p>At the start of 2016, Let’s Encrypt certificates had been available to the public for less than a month and we were supporting approximately 240,000 active (unexpired) certificates. That seemed like a lot at the time! Now we’re frequently issuing that many new certificates in a single day while supporting more than 20,000,000 active certificates in total. We’ve issued more than a million certificates in a single day a few times recently. We’re currently serving an average of 6,700 OCSP responses per second. We’ve done a lot of optimization work, we’ve had to add some hardware, and there have been some long nights for our staff, but we’ve been able to keep up and we’re ready for another year of <a href="https://letsencrypt.org/stats/">strong growth</a>.</p>

<p><center><p><img src="/images/Jan-6-2017-Cert-Stats.png" alt="Let's Encrypt certificate issuance statistics." style="width: 650px; margin-bottom: 17px;"/></p></center></p>

<p>We added a number of <a href="https://letsencrypt.org/upcoming-features/">new features</a> during the past year, including support for the ACME DNS challenge, ECDSA signing, IPv6, and Internationalized Domain Names.</p>

<p>When 2016 started, our root certificate had not been accepted into any major root programs. Today we’ve been accepted into the Mozilla, Apple, and Google root programs. We’re close to announcing acceptance into another major root program. These are major steps towards being able to operate as an independent CA. You can read more about why <a href="https://letsencrypt.org/2016/08/05/le-root-to-be-trusted-by-mozilla.html">here</a>.</p>

<p>The ACME protocol for issuing and managing certificates is at the heart of how Let’s Encrypt works. Having a well-defined and heavily audited specification developed in public on a standards track has been a major contributor to our growth and the growth of our client ecosystem. Great progress was made in 2016 towards standardizing ACME in the <a href="https://datatracker.ietf.org/wg/acme/charter/">IETF ACME working group</a>. We’re hoping for a final document around the end of Q2 2017, and we’ll announce plans for implementation of the updated protocol around that time as well.</p>

<p>Supporting the kind of growth we saw in 2016 meant adding staff, and during the past year Internet Security Research Group (ISRG), the non-profit entity behind Let’s Encrypt, went from four full-time employees to nine. We’re still a pretty small crew given that we’re now one of the largest CAs in the world (if not the largest), but it works because of our intense focus on automation, the fact that we’ve been able to hire great people, and because of the incredible support we receive from the Let’s Encrypt community.</p>

<p>Let’s Encrypt exists in order to help create a 100% encrypted Web. Our own metrics can be interesting, but they’re only really meaningful in terms of the impact they have on progress towards a more secure and privacy-respecting Web. The metric we use to track progress towards that goal is the percentage of page loads using HTTPS, as seen by browsers. According to Firefox Telemetry, the Web has gone from approximately 39% of page loads using HTTPS each day to just about 49% during the past year. We’re incredibly close to a Web that is more encrypted than not. We’re proud to have been a big part of that, but we can’t take credit for all of it. Many people and organizations around the globe have come to realize that we need to invest in a more secure and privacy-respecting Web, and have taken steps to secure their own sites as well as their customers’. Thank you to everyone that has advocated for HTTPS this year, or helped to make it easier for people to make the switch.</p>

<p>We learned some lessons this year. When we had service interruptions they were usually related to managing the rapidly growing database backing our CA. Also, while most of our code had proper tests, some small pieces didn’t and that led to <a href="https://community.letsencrypt.org/c/incidents">incidents</a> that shouldn’t have happened. That said, I’m proud of the way we handle incidents promptly, including quick and transparent public disclosure.</p>

<p>We also learned a lot about our client ecosystem. At the beginning of 2016, ISRG / Let’s Encrypt provided client software called letsencrypt. We’ve always known that we would never be able produce software that would work for every Web server/stack, but we felt that we needed to offer a client that would work well for a large number of people and that could act as a reference client. By March of 2016, earlier than we had foreseen, it had become clear that our community was up to the task of creating a wide range of quality clients, and that our energy would be better spent fostering that community than producing our own client. That’s when we made the decision to <a href="https://letsencrypt.org/2016/03/09/le-client-new-home.html">hand off development of our client</a> to the Electronic Frontier Foundation (EFF). EFF renamed the client to <a href="https://certbot.eff.org/">Certbot</a> and has been doing an excellent job maintaining and improving it as one of <a href="https://letsencrypt.org/docs/client-options/">many client options</a>.</p>

<p>As exciting as 2016 was for Let’s Encrypt and encryption on the Web, 2017 seems set to be an even more incredible year. Much of the infrastructure and many of the plans necessary for a 100% encrypted Web came into being or solidified in 2016. More and more hosting providers and CDNs are supporting HTTPS with one click or by default, often without additional fees. It has never been easier for people and organizations running their own sites to find the tools, services, and information they need to move to HTTPS. Browsers are planning to update their user interfaces to better reflect the risks associated with non-secure connections.</p>

<p>We’d like to thank our community, including our <a href="https://letsencrypt.org/sponsors/">sponsors</a>, for making everything we did this past year possible. Please consider <a href="https://letsencrypt.org/getinvolved/">getting involved</a> or making a <a href="https://letsencrypt.org/donate/">donation</a>, and if your company or organization would like to sponsor Let’s Encrypt please email us at <a href="mailto:[email protected]">[email protected]</a>.</p>

Month in Review: December 2016

Post Syndicated from Derek Young original https://aws.amazon.com/blogs/big-data/month-in-review-december-2016/

Another month of big data solutions on the Big Data Blog.

Take a look at our summaries below and learn, comment, and share. Thank you for reading!

Implementing Authorization and Auditing using Apache Ranger on Amazon EMR
Apache Ranger is a framework to enable, monitor, and manage comprehensive data security across the Hadoop platform. Features include centralized security administration, fine-grained authorization across many Hadoop components (Hadoop, Hive, HBase, Storm, Knox, Solr, Kafka, and YARN) and central auditing. In this post, walk through the steps to enable authorization and audit for Amazon EMR clusters using Apache Ranger.

Amazon Redshift Engineering’s Advanced Table Design Playbook
Amazon Redshift is a fully managed, petabyte scale, massively parallel data warehouse that offers simple operations and high performance. In practice, the best way to improve query performance by orders of magnitude is by tuning Amazon Redshift tables to better meet your workload requirements. This five-part blog series will guide you through applying distribution styles, sort keys, and compression encodings and configuring tables for data durability and recovery purposes.

Interactive Analysis of Genomic Datasets Using Amazon Athena
In this post, learn to prepare genomic data for analysis with Amazon Athena. We’ll demonstrate how Athena is well-adapted to address common genomics query paradigms using the Thousand Genomes dataset hosted on Amazon S3, a seminal genomics study. Although this post is focused on genomic analysis, similar approaches can be applied to any discipline where large-scale, interactive analysis is required.

Joining and Enriching Streaming Data on Amazon Kinesis
In this blog post, learn three approaches for joining and enriching streaming data on Amazon Kinesis Streams by using Amazon Kinesis Analytics, AWS Lambda, and Amazon DynamoDB.

Using SaltStack to Run Commands in Parallel on Amazon EMR
SaltStack is an open source project for automation and configuration management. It started as a remote execution engine designed to scale to many machines while delivering high-speed execution. You can now use the new bootstrap action that installs SaltStack on Amazon EMR. It provides a basic configuration that enables selective targeting of the nodes based on instance roles, instance groups, and other parameters.

Building an Event-Based Analytics Pipeline for Amazon Game Studios’ Breakaway
Amazon Game Studios’ new title Breakaway is an online 4v4 team battle sport that delivers fast action, teamwork, and competition. In this post, learn the technical details of how the Breakaway team uses AWS to collect, process, and analyze gameplay telemetry to answer questions about arena design.

Respond to State Changes on Amazon EMR Clusters with Amazon CloudWatch Events
With new support for Amazon EMR in Amazon CloudWatch Events, you can be notified quickly and programmatically respond to state changes in your EMR clusters. Additionally, these events are also displayed in the Amazon EMR console. CloudWatch Events allows you to create filters and rules to match these events and route them to Amazon SNS topics, AWS Lambda functions, Amazon SQS queues, streams in Amazon Kinesis Streams, or built-in targets.

Run Jupyter Notebook and JupyterHub on Amazon EMR
Data scientists who run Jupyter and JupyterHub on Amazon EMR can use Python, R, Julia, and Scala to process, analyze, and visualize big data stored in Amazon S3. Jupyter notebooks can be saved to S3 automatically, so users can shut down and launch new EMR clusters, as needed. See how EMR makes it easy to spin up clusters with different sizes and CPU/memory configurations to suit different workloads and budgets.

Derive Insights from IoT in Minutes using AWS IoT, Amazon Kinesis Firehose, Amazon Athena, and Amazon QuickSight
In this post, see how you can build a business intelligence capability for streaming IoT device data using AWS serverless and managed services. You can be up and running in minutes―starting small, but able to easily grow to millions of devices and billions of messages.

Serving Real-Time Machine Learning Predictions on Amazon EMR
The typical progression for creating and using a trained model for recommendations falls into two general areas: training the model and hosting the model. Model training has become a well-known standard practice. In this post, we highlight one way to host those recommendations using Amazon EMR with JobServer

Powering Amazon Redshift Analytics with Apache Spark and Amazon Machine Learning
In this post, learn to generate a predictive model for flight delays that can be used to help pick the flight least likely to add to your travel stress. To accomplish this, you’ll use Apache Spark running on Amazon EMR for extracting, transforming, and loading (ETL) the data, Amazon Redshift for analysis, and Amazon Machine Learning for creating predictive models.

FROM THE ARCHIVE

Running sparklyr – RStudio’s R Interface to Spark on Amazon EMR
Sparklyr is an R interface to Spark that allows users to use Spark as the backend for dplyr, one of the most popular data manipulation packages. Sparklyr provides interfaces to Spark packages and also allows users to query data in Spark using SQL and develop extensions for the full Spark API. This short post shows you how to run RStudio and sparklyr on EMR.


Want to learn more about Big Data or Streaming Data? Check out our Big Data and Streaming data educational pages.

Leave a comment below to let us know what big data topics you’d like to see next on the AWS Big Data Blog.

IoT saves lives but infosec wants to change that

Post Syndicated from Robert Graham original http://blog.erratasec.com/2016/12/iot-saves-lives.html

The cybersecurity industry mocks/criticizes IoT. That’s because they are evil and wrong. IoT saves lives. This was demonstrated a couple weeks ago when a terrorist attempted to drive a truck through a Christmas market in German. The truck has an Internet-connected braking system (firmware updates, configuration, telemetry). When it detected the collision, it deployed the brakes, bringing the truck to a stop. Injuries and deaths were a 10th of the similar Nice truck attack earlier in the year.

All the trucks shipped by Scania in the last five years have had mobile phone connectivity to the Internet. Scania pulls back telemetry from trucks, for the purposes of improving drivers, but also to help improve the computerized features of the trucks. They put everything under the microscope, such as how to improve air conditioning to make the trucks more environmentally friendly.

Among their features is the “Autonomous Emergency Braking” system. This is the system that saved lives in Germany.

You can read up on these features on their website, or in their annual report [*].

My point is this: the cybersecurity industry is a bunch of police-state fetishists that want to stop innovation, to solve the “security” problem first before allowing innovation to continue. This will only cost lives. Yes, we desperately need to solve the problem. Almost certainly, the Scania system can trivially be hacked by mediocre hackers. But if Scania had waited first to secure its system before rolling it out in trucks, many more people would now be dead in Germany. Don’t listen to cybersecurity professionals who want to stop the IoT revolution — they just don’t care if people die.


Update: Many, such the first comment, point out that the emergency brakes operate independently of the Internet connection, thus disproving this post.

That’s silly. That’s the case of all IoT devices. The toaster still toasts without Internet. The surveillance camera still records video without Internet. My car, which also has emergency brakes, still stops. In almost no IoT is the Internet connectivity integral to the day-to-day operation. Instead, Internet connectivity is for things like configuration, telemetry, and downloading firmware updates — as in the case of Scania.

While the brakes don’t make their decision based on the current connectivity, connectivity is nonetheless essential to the equation. Scania monitors its fleet of 170,000 trucks and uses that information to make trucks, including braking systems, better.

My car is no more or less Internet connected than the Scania truck, yet hackers have released exploits at hacking conferences for it, and it’s listed as a classic example of an IoT device. Before you say a Scania truck isn’t an IoT device, you first have to get all those other hackers to stop calling my car an IoT device.

Monetize your APIs in AWS Marketplace using API Gateway

Post Syndicated from Bryan Liston original https://aws.amazon.com/blogs/compute/monetize-your-apis-in-aws-marketplace-using-api-gateway/


Shiva Krishnamurthy, Sr. Product Manager

Amazon API Gateway helps you quickly build highly scalable, secure, and robust APIs. Today, we are announcing an integration of API Gateway with AWS Marketplace. You can now easily monetize your APIs built with API Gateway, market them directly to AWS customers, and reuse AWS bill calculation and collection mechanisms.

AWS Marketplace lists over 3,500 software listings across 35 product categories with over 100K active customers. With the recent announcement of SaaS Subscriptions, API sellers can, for the first time, take advantage of the full suite of Marketplace features, including customer acquisition, unified billing, and reporting. For AWS customers, this means that they can now subscribe to API products through AWS Marketplace and pay on an existing AWS bill. This gives you direct access to the AWS customer base.

To get started, identify the API on API Gateway that you want to sell on AWS Marketplace. Next, package that API into usage plans. Usage plans allow you to set throttling limits and quotas to your APIs and allow you to control third-party usage of your API. You can create multiple usage plans with different limits (e.g., Silver, Gold, Platinum) and offer them as different API products on AWS Marketplace.

Let’s suppose that you offer a Pet Store API managed by API Gateway and you want to start selling it through AWS Marketplace: you must offer a developer portal, a website that you must maintain that allows new customers to register an account using AWS-provided billing identifiers. Also, the portal needs to provide registered customers with access to your APIs during and after purchase.

To help you get started, we have created a reference implementation of a developer portal application. You can use this implementation to create a developer portal from scratch, or use it as a reference guide to integrate API Gateway into an existing developer portal that you already operate. For a detailed walkthrough on setting up a developer portal using our reference application, see (Generate Your Own API Gateway Developer Portal).

After you have set up your developer portal, register as a seller with the AWS Marketplace. After registration, submit a product load form to list your product for sale. In this step, you describe your API product, define pricing, and submit AWS account IDs to be used to test the subscription. You also submit the URL of your developer portal. Currently, API Gateway only supports “per request” pricing models.

After you have registered as a seller, you are given an AWS Marketplace product code. Log in to the API Gateway console to associate this product code with the corresponding usage plan on API Gateway. This tells API Gateway to send telemetry records to AWS Marketplace when your API is used.

marketplaceintegration_1.png

After the product code is associated, test the “end user” flow by subscribing to your API products using the AWS IDs that you submitted via the Marketplace; verify the proper functionality. When you’ve finished verifying, submit your product for final approval using instructions provided in the Seller Guide.

Visit here to learn more about this feature.

IT Governance in a Dynamic DevOps Environment

Post Syndicated from Shashi Prabhakar original https://aws.amazon.com/blogs/devops/it-governance-in-a-dynamic-devops-environment/

IT Governance in a Dynamic DevOps Environment
Governance involves the alignment of security and operations with productivity to ensure a company achieves its business goals. Customers who are migrating to the cloud might be in various stages of implementing governance. Each stage poses its own challenges. In this blog post, the first in a series, I will discuss a four-step approach to automating governance with AWS services.

Governance and the DevOps Environment
Developers with a DevOps and agile mindset are responsible for building and operating services. They often rely on a central security team to develop and apply policies, seek security reviews and approvals, or implement best practices.

These policies and rules are not strictly enforced by the security team. They are treated as guidelines that developers can follow to get the much-desired flexibility from using AWS. However, due to time constraints or lack of awareness, developers may not always follow best practices and standards. If these best practices and rules were strictly enforced, the security team could become a bottleneck.

For customers migrating to AWS, the automated governance mechanisms described in this post will preserve flexibility for developers while providing controls for the security team.

These are some common challenges in a dynamic development environment:

·      Quick or short path to accomplishing tasks like hardcoding credentials in code.

·      Cost management (for example, controlling the type of instance launched).

·      Knowledge transfer.

·      Manual processes.

Steps to Governance
Here is a four-step approach to automating governance:

At initial setup, you want to implement some (1) controls for high-risk actions. After they are in place, you need to (2) monitor your environment to make sure you have configured resources correctly. Monitoring will help you discover issues you want to (3) fix as soon as possible. You’ll also want to regularly produce an (4) audit report that shows everything is compliant.

The example in this post helps illustrate the four-step approach: A central IT team allows its Big Data team to run a test environment of Amazon EMR clusters. The team runs the EMR job with 100 t2.medium instances, but when a team member spins up 100 r3.8xlarge instances to complete the job more quickly, the business incurs an unexpected expense.

The central IT team cares about governance and implements a few measures to prevent this from happening again:

·      Control elements: The team uses CloudFormation to restrict the number and type of instances and AWS Identity and Access Management to allow only a certain group to modify the EMR cluster.

·      Monitor elements: The team uses tagging, AWS Config, and AWS Trusted Advisor to monitor the instance limit and determine if anyone exceeded the number of allowed instances.

·      Fix: The team creates a custom Config rule to terminate instances that are not of the type specified.

·      Audit: The team reviews the lifecycle of the EMR instance in AWS Config.

 

Control

You can prevent mistakes by standardizing configurations (through AWS CloudFormation), restricting configuration options (through AWS Service Catalog), and controlling permissions (through IAM).

AWS CloudFormation helps you control the workflow environment in a single package. In this example, we use a CloudFormation template to restrict the number and type of instances and tagging to control the environment.

For example, the team can prevent the choice of r3.8xlarge instances by using CloudFormation with a fixed instance type and a fixed number of instances (100).

Cloudformation Template Sample

EMR cluster with tag:

{
“Type” : “AWS::EMR::Cluster”,
“Properties” : {
“AdditionalInfo” : JSON object,
“Applications” : [ Applications, … ],
“BootstrapActions” [ Bootstrap Actions, … ],
“Configurations” : [ Configurations, … ],
“Instances” : JobFlowInstancesConfig,
“JobFlowRole” : String,
“LogUri” : String,
“Name” : String,
“ReleaseLabel” : String,
“ServiceRole” : String,
“Tags” : [ Resource Tag, … ],
“VisibleToAllUsers” : Boolean
}
}
EMR cluster JobFlowInstancesConfig InstanceGroupConfig with fixed instance type and number:
{

“BidPrice” : String,

“Configurations” : [ Configuration, … ],

“EbsConfiguration” : EBSConfiguration,

“InstanceCount” : Integer,

“InstanceType” : String,

“Market” : String,

“Name” : String

}
AWS Service Catalog can be used to distribute approved products (servers, databases, websites) in AWS. This gives IT administrators more flexibility in terms of which user can access which products. It also gives them the ability to enforce compliance based on business standards.

AWS IAM is used to control which users can access which AWS services and resources. By using IAM role, you can avoid the use of root credentials in your code to access AWS resources.

In this example, we give the team lead full EMR access, including console and API access (not covered here), and give developers read-only access with no console access. If a developer wants to run the job, the developer just needs PEM files.

IAM Policy
This policy is for the team lead with full EMR access:

{
“Version”: “2012-10-17”,
“Statement”: [
{
“Effect”: “Allow”,
“Action”: [
“cloudwatch:*”,
“cloudformation:CreateStack”,
“cloudformation:DescribeStackEvents”,
“ec2:AuthorizeSecurityGroupIngress”,
“ec2:AuthorizeSecurityGroupEgress”,
“ec2:CancelSpotInstanceRequests”,
“ec2:CreateRoute”,
“ec2:CreateSecurityGroup”,
“ec2:CreateTags”,
“ec2:DeleteRoute”,
“ec2:DeleteTags”,
“ec2:DeleteSecurityGroup”,
“ec2:DescribeAvailabilityZones”,
“ec2:DescribeAccountAttributes”,
“ec2:DescribeInstances”,
“ec2:DescribeKeyPairs”,
“ec2:DescribeRouteTables”,
“ec2:DescribeSecurityGroups”,
“ec2:DescribeSpotInstanceRequests”,
“ec2:DescribeSpotPriceHistory”,
“ec2:DescribeSubnets”,
“ec2:DescribeVpcAttribute”,
“ec2:DescribeVpcs”,
“ec2:DescribeRouteTables”,
“ec2:DescribeNetworkAcls”,
“ec2:CreateVpcEndpoint”,
“ec2:ModifyImageAttribute”,
“ec2:ModifyInstanceAttribute”,
“ec2:RequestSpotInstances”,
“ec2:RevokeSecurityGroupEgress”,
“ec2:RunInstances”,
“ec2:TerminateInstances”,
“elasticmapreduce:*”,
“iam:GetPolicy”,
“iam:GetPolicyVersion”,
“iam:ListRoles”,
“iam:PassRole”,
“kms:List*”,
“s3:*”,
“sdb:*”,
“support:CreateCase”,
“support:DescribeServices”,
“support:DescribeSeverityLevels”
],
“Resource”: “*”
}
]
}
This policy is for developers with read-only access:

{
“Version”: “2012-10-17”,
“Statement”: [
{
“Effect”: “Allow”,
“Action”: [
“elasticmapreduce:Describe*”,
“elasticmapreduce:List*”,
“s3:GetObject”,
“s3:ListAllMyBuckets”,
“s3:ListBucket”,
“sdb:Select”,
“cloudwatch:GetMetricStatistics”
],
“Resource”: “*”
}
]
}
These are IAM managed policies. If you want to change the permissions, you can create your own IAM custom policy.

 

Monitor

Use logs available from AWS CloudTrail, Amazon Cloudwatch, Amazon VPC, Amazon S3, and Elastic Load Balancing as much as possible. You can use AWS Config, Trusted Advisor, and CloudWatch events and alarms to monitor these logs.

AWS CloudTrail can be used to log API calls in AWS. It helps you fix problems, secure your environment, and produce audit reports. For example, you could use CloudTrail logs to identify who launched those r3.8xlarge instances.

Picture1

AWS Config can be used to keep track of and act on rules. Config rules check the configuration of your AWS resources for compliance. You’ll also get, at a glance, the compliance status of your environment based on the rules you configured.

Amazon CloudWatch can be used to monitor and alarm on incorrectly configured resources. CloudWatch entities–metrics, alarms, logs, and events–help you monitor your AWS resources. Using metrics (including custom metrics), you can monitor resources and get a dashboard with customizable widgets. Cloudwatch Logs can be used to stream data from AWS-provided logs in addition to your system logs, which is helpful for fixing and auditing.

CloudWatch Events help you take actions on changes. VPC flow, S3, and ELB logs provide you with data to make smarter decisions when fixing problems or optimizing your environment.

AWS Trusted Advisor analyzes your AWS environment and provides best practice recommendations in four categories: cost, performance, security, and fault tolerance. This online resource optimization tool also includes AWS limit warnings.

We will use Trusted Advisor to make sure a limit increase is not going to become bottleneck in launching 100 instances:

Trusted Advisor

Picture2

Fix
Depending on the violation and your ability to monitor and view the resource configuration, you might want to take action when you find an incorrectly configured resource that will lead to a security violation. It’s important the fix doesn’t result in unwanted consequences and that you maintain an auditable record of the actions you performed.

Picture3

You can use AWS Lambda to automate everything. When you use Lambda with Amazon Cloudwatch Events to fix issues, you can take action on an instance termination event or the addition of new instance to an Auto Scaling group. You can take an action on any AWS API call by selecting it as source. You can also use AWS Config managed rules and custom rules with remediation. While you are getting informed about the environment based on AWS Config rules, you can use AWS Lambda to take action on top of these rules. This helps in automating the fixes.

AWS Config to Find Running Instance Type

Picture4
To fix the problem in our use case, you can implement a Config custom rule and trigger (for example, the shutdown of the instances if the instance type is larger than .xlarge or the tearing down of the EMR cluster).

 

Audit

You’ll want to have a report ready for the auditor at the end of the year or quarter. You can automate your reporting system using AWS Config resources.

You can view AWS resource configurations and history so you can see when the r3.8xlarge instance cluster was launched or which security group was attached. You can even search for deleted or terminated instances.

AWS Config Resources

Picture5

Picture6

More Control, Monitor, and Fix Examples
Armando Leite from AWS Professional Services has created a sample governance framework that leverages Cloudwatch Events and AWS Lambda to enforce a set of controls (flows between layers, no OS root access, no remote logins). When a deviation is noted (monitoring), automated action is taken to respond to an event and, if necessary, recover to a known good state (fix).

·      Remediate (for example, shut down the instance) through custom Config rules or a CloudWatch event to trigger the workflow.

·      Monitor a user’s OS activity and escalation to root access. As events unfold, new Lambda functions dynamically enable more logs and subscribe to log data for further live analysis.

·      If the telemetry indicates it’s appropriate, restore the system to a known good state.

Vote for the top 20 Raspberry Pi projects in The MagPi!

Post Syndicated from Rob Zwetsloot original https://www.raspberrypi.org/blog/vote-top-20-raspberry-pi-projects-magpi/

Although this Thursday will see the release of issue 49 of The MagPi, we’re already hard at work putting together our 50th issue spectacular. As part of this issue we’re going to be covering 50 of the best Raspberry Pi projects ever and we want you, the community, to vote for the top 20.

Below we have listed the 30 projects that we think represent the best of the best. All we ask is that you vote for your favourite. We will have a few special categories with some other amazing projects in the final article, but if you think we’ve missed out something truly excellent, let us know in the comments. Here’s the list so you can remind yourselves of the projects, with the poll posted at the bottom.

From paper boats to hybrid sports cars

From paper boats to hybrid sports cars

  1. SeeMore – a huge sculpture of 256 Raspberry Pis connected as a cluster
  2. BeetBox – beets (vegetable) you can use to play sick beats (music)
  3. Voyage – 300 paper boats (actually polypropylene) span a river, and you control how they light up
  4. Aquarium – a huge aquarium with Pi-powered weather control simulating the environment of the Cayman Islands
  5. ramanPi – a Raman spectrometer used to identify different types of molecules
  6. Joytone – an electronic musical instrument operated by 72 back-lit joysticks
  7. Internet of LEGO – a city of LEGO, connected to and controlled by the internet
  8. McMaster Formula Hybrid – a Raspberry Pi provides telemetry on this hybrid racing car
  9. PiGRRL – Adafruit show us how to make an upgraded, 3D-printed Game Boy
  10. Magic Mirror – check out how you look while getting some at-a-glance info about your day
Dinosaurs, space, and modern art

Dinosaurs, space, and modern art

  1. 4bot – play a game of Connect 4 with a Raspberry Pi robot
  2. Blackgang Chine dinosaurs – these theme park attractions use the diminutive Pi to make them larger than life
  3. Sound Fighter – challenge your friend to the ultimate Street Fight, controlled by pianos
  4. Astro Pi – Raspberry Pis go to space with code written by school kids
  5. Pi in the Sky – Raspberry Pis go to near space and send back live images
  6. BrewPi – a microbrewery controlled by a micro-computer
  7. LED Mirror – a sci-fi effect comes to life as you’re represented on a wall of lights
  8. Raspberry Pi VCR – a retro VCR-player is turned into a pink media playing machine
  9. #OZWall – Contemporary art in the form of many TVs from throughout the ages
  10. #HiutMusic – you choose the music for a Welsh denim factory through Twitter
Robots and arcade machines make the cut

Robots and arcade machines make the cut

  1. CandyPi – control a jelly bean dispenser from your browser without the need to twist the dial
  2. Digital Zoetrope – still images rotated to create animation, updated for the 21st century
  3. LifeBox – create virtual life inside this box and watch it adapt and survive
  4. Coffee Table Pi – classy coffee table by name, arcade cabinet by nature. Tea and Pac-Man, anyone?
  5. Raspberry Pi Notebook – this handheld Raspberry Pi is many people’s dream machine
  6. Pip-Boy 3000A – turn life into a Bethesda RPG with this custom Pip-Boy
  7. Mason Jar Preserve – Mason jars are used to preserve things, so this one is a beautiful backup server to preserve your data
  8. Pi glass – Google Glass may be gone but you can still make your own amazing Raspberry Pi facsimile
  9. DoodleBorg – a powerful PiBorg robot that can tow a caravan
  10. BigHak – a Big Trak that is truly big: it’s large enough for you to ride in

Now you’ve refreshed your memory of all these amazing projects, it’s time to vote for the one you think is best!

Note: There is a poll embedded within this post, please visit the site to participate in this post’s poll.

The vote is running over the next two weeks, and the results will be in The MagPi 50. We’ll see you again on Thursday for the release of the excellent MagPi 49: don’t miss it!

The post Vote for the top 20 Raspberry Pi projects in The MagPi! appeared first on Raspberry Pi.

Writing SQL on Streaming Data with Amazon Kinesis Analytics – Part 1

Post Syndicated from Ryan Nienhuis original https://blogs.aws.amazon.com/bigdata/post/Tx2D4GLDJXPKHOY/Writing-SQL-on-Streaming-Data-with-Amazon-Kinesis-Analytics-Part-1

Ryan Nienhuis is a Senior Product Manager for Amazon Kinesis

This is the first of two AWS Big Data blog posts on Writing SQL on Streaming Data with Amazon Kinesis Analytics. In this post, I provide an overview of streaming data and key concepts like the basics of streaming SQL, and complete a walkthrough using a simple example. In the next post, I will cover more advanced stream processing concepts using Amazon Kinesis Analytics.

Most organizations use batch data processing to perform their analytics in daily or hourly intervals to inform their business decisions and improve their customer experiences. However, you can derive significantly more value from your data if you are able to process and react in real time. Indeed, the value of insights in your data can decline rapidly over time – the faster you react, the better. For example:

  • Analyzing your company’s key performance indicators over the last 24 hours is a better reflection of your current business than analyzing last month’s metrics.
  • Reacting to an operational event as it is happening is far more valuable than discovering a week later that the event occurred. 
  • Identifying that a customer is unable to complete a purchase on your ecommerce site so you can assist them in completing the order is much better than finding out next week that they were unable to complete the transaction.

Real-time insights are extremely valuable, but difficult to extract from streaming data. Processing data in real time can be difficult because it needs to be done quickly and continuously to keep up with the speed at which the data is produced. In addition, the analysis may require data to be processed in the same order in which it was generated for accurate results, which can be hard due to the distributed nature of the data.

Because of these complexities, people start by implementing simple applications that perform streaming ETL, such as collecting, validating, and normalizing log data across different applications. Some then progress to basic processing like rolling min-max computations, while a select few implement sophisticated processing such as anomaly detection or correlating events by user sessions.  With each step, more and more value is extracted from the data but the difficulty level also increases.

With the launch of Amazon Kinesis Analytics, you can now easily write SQL ­­­on streaming data, providing a powerful way to build a stream processing application in minutes. The service allows you to connect to streaming data sources, process the data with sub-second latencies, and continuously emit results to downstream destinations for use in real-time alerts, dashboards, or further analysis.

This post introduces you to Amazon Kinesis Analytics, the fundamentals of writing ANSI-Standard SQL over streaming data, and works through a simple example application that continuously generates metrics over time windows.

What is streaming data?

Today, data is generated continuously from a large variety of sources, including clickstream data from mobile and web applications, ecommerce transactions, application logs from servers, telemetry from connected devices, and many other sources.

Typically, hundreds to millions of these sources create data that is usually small (order of kilobytes) and occurs in a sequence. For example, your ecommerce site has thousands of individuals concurrently interacting with the site, each generating a sequence of events based upon their activity (click product, add to cart, purchase product, etc.). When these sequences are captured continuously from these sources as events occur, the data is categorized as streaming data.

Amazon Kinesis Streams

Capturing event data with low latency and durably storing it in a highly available, scalable data store, such as Amazon Kinesis Streams, is the foundation for streaming data. Streams enables you to capture and store data for ordered, replayable, real-time processing using a streaming application. You configure your data sources to emit data into the stream, then build applications that read and process data from that stream in real-time. To build your applications, you can use the Amazon Kinesis Client Library (KCL), AWS Lambda, Apache Storm, and a number of other solutions, including Amazon Kinesis Analytics.

Amazon Kinesis Firehose

One of the more common use cases for streaming data is to capture it and then load it to a cloud storage service, a database, or other analytics service. Amazon Kinesis Firehose is a fully managed service that offers an easy to use solution to collect and deliver streaming data to Amazon S3, Amazon Redshift, and Amazon Elasticsearch Service.

With Firehose, you create delivery streams using the AWS Management Console to specify your destinations of choice and choose from configuration options that allow you to batch, compress, and encrypt your data before it is loaded into the destination. From there, you set up your data sources to start sending data to the Firehose delivery stream, which loads it continuously to your destinations with no ongoing administration.

Amazon Kinesis Analytics

Amazon Kinesis Analytics provides an easy and powerful way to process and analyze streaming data with standard SQL. Using Analytics, you build applications that continuously read data from streaming sources, process it in real-time using SQL code, and emit the results downstream to your configured destinations.

An Analytics application can ingest data from Streams and Firehose. The service detects a schema associated with the data in your source for common formats, which you can further refine using an interactive schema editor. Your application’s SQL code can be anything from a simple count or average, to more advanced analytics like correlating events over time windows. You author your SQL using an interactive editor, and then test it with live streaming data.

Finally, you configure your application to emit SQL results to up to four destinations, including S3, Amazon Redshift, and Amazon Elasticsearch Service (through a Firehose delivery stream); or to an Amazon Kinesis stream. After setup, the service scales your application to handle your query complexity and streaming data throughput – you don’t have to provision or manage any servers.

Walkthrough (part 1): Run your first SQL query using Amazon Kinesis Analytics

The easiest way to understand Amazon Kinesis Analytics is to try it out. You need an AWS account to get started. I interrupt the walkthrough to discuss streaming through time windows in more detail, then you create a second SQL query with more metrics and an additional step for your application.

A streaming application consist of three components:

  • Streaming data sources
  • Analytics written in SQL
  • Destinations for the results

The application continuously reads data from a streaming source, generates analytics using your SQL code, and emits those results to up to four destinations. This walkthrough will cover the first two steps and point you in the right direction for completing an end-to-end application by adding a destination for your SQL results.

Create an Amazon Kinesis Analytics application

  1. Open the Amazon Kinesis Analytics console and choose Create a new application.

  1. Provide a name and (optional) description for your application and choose Continue.

You are taken to the application hub page.

Create a streaming data source

For input, Analytics supports Amazon Kinesis Streams and Amazon Kinesis Firehose as streaming data input, and reference data input through S3. The primary difference between these two sources is that data is read continuously from the streaming data sources and at one time for reference data sources. Reference data sources are used for joining against the incoming stream to enrich the data.

In Amazon Kinesis Analytics, choose Connect to a source.

If you have existing Amazon Kinesis streams or Firehose delivery streams, they are shown here.

For the purposes of this post, you will be using a demo stream, which creates and populates a stream with sample data on your behalf. The demo stream is created under your account with a single shard, which supports up to 1 MB/sec of write throughput and 2 MB/sec of read throughput. Analytics will write simulated stock ticker data to the demo stream directly from your browser. Your application will read data from the stream in real time.

Next, choose Configure a new stream and Create demo stream.

Later, you will refer to the demo stream in your SQL code as “SOURCE_SQL_STREAM_001”. Analytics calls the DiscoverInputSchema API action, which infers a schema by sampling records from your selected input data stream. You can see the applied schema on your data in the formatted sample shown in the browser, as well as the original sample taken from the raw stream. You can then edit the schema to fine tune it to your needs.

Feel free to explore; when you are ready, choose Save and continue. You are taken back to the streaming application hub.

Create a SQL query for analyzing data

On the streaming application hub, choose Go to SQL Editor and Run Application.

This SQL editor is the development environment for Amazon Kinesis Analytics. On the top portion of the screen, there is a text editor with syntax highlighting and intelligent auto-complete, as well as a number of SQL templates to help you get started. On the bottom portion of the screen, there is an area for you to explore your source data, your intermediate SQL results, and the data you are writing to your destinations. You can view the entire flow of your application here, end-to-end.

Next, choose Add SQL from Template.

Amazon Kinesis Analytics provides a number of SQL templates that work with the demo stream. Feel free to explore; when you’re ready, choose the COUNT, AVG, etc. (aggregate functions) + Tumbling time window template and choose Add SQL to Editor.

The SELECT statement in this SQL template performs a count over a 10-second tumbling window. A window is used to group rows together relative to the current row that the Amazon Kinesis Analytics application is processing.

Choose Save and run SQL. Congratulations, you just wrote your first SQL query on streaming data!

Streaming SQL with Amazon Kinesis Analytics

In a relational database, you work with tables of data, using INSERT statements to add records and SELECT statements to query the data in a table. In Amazon Kinesis Analytics, you work with in-application streams, which are similar to tables in that you can CREATE, INSERT, and SELECT from them. However, unlike a table, data is continuously inserted into an in-application stream, even while you are executing a SQL statement against it. The data in an in-application stream is therefore unbounded.

In your application code, you interact primarily with in-application streams. For instance, a source in-application stream represents your configured Amazon Kinesis stream or Firehose delivery stream in the application, which by default is named “SOURCE_SQL_STREAM_001”. A destination in-application stream represents your configured destinations, which by default is named “DESTINATION_SQL_STREAM”. When interacting with in-application streams, the following is true:

  • The SELECT statement is used in the context of an INSERT statement. That is, when you select rows from one in-application stream, you insert results into another in-application stream.
  • The INSERT statement is always used in the context of a pump. That is, you use pumps to write to an in-application stream. A pump is the mechanism used to make an INSERT statement continuous.

There are two separate SQL statements in the template you selected in the first walkthrough. The first statement creates a target in-application stream for the SQL results; the second statement creates a PUMP for inserting into that stream and includes the SELECT statement.

Generating real-time analytics using windows

In the console, look at the SQL results from the walkthrough, which are sampled and continuously streamed to the console.

In the example application you just built, you used a 10-second tumbling time window to perform an aggregation of records. Notice the special column called ROWTIME, which represents the time a row was inserted into the first in-application stream. The ROWTIME value is incrementing every 10 seconds with each new set of SQL results. (Some 10 second windows may not be shown in the console because we sample results on the high speed stream.) You use this special column in your tumbling time window to help define the start and end of each result set.

Windows are important because they define the bounds for which you want your query to operate. The starting bound is usually the current row that Amazon Kinesis Analytics is processing, and the window defines the ending bound. Windows are required with any query that works across rows, because the in-application stream is unbounded and windows provide a mechanism to bind the result set and make the query deterministic. Analytics supports three types of windows: specifically tumbling, sliding, and custom windows. These concepts will be covered in depth in our next blog post.

Tumbling windows, like the one you selected in your template, are useful for periodic reports. You can use a tumbling window to compute an average number of visitors to your website in the last 5 minutes, or the maximum over the past hour. A single result is emitted for each key in the group as specified by the clause at the end of the defined window.

In streaming data, there are different types of time and how they are used is important to the analytics.  Our example uses ROWTIME, or the processing time, which is great for some use cases. However, in many scenarios, you want a time that more accurately reflects when the event occurred, such as the event or ingest time. Amazon Kinesis Analytics supports all three different time semantics for processing data; processing, event, and ingest time. These concepts will be covered in depth in our next blog post.

Part 2: Run your second SQL query using Amazon Kinesis Analytics

The next part of the walkthrough adds some additional metrics to your first SQL query and adds a second step to your application.

Add metrics to the SQL statement

In the SQL editor, add some additional SQL code.

First, add some metrics including the average price, average change, maximum price, and minimum price over the same window. Note that you need to add these in your SELECT statement as well as the in-application stream you are inserting into, DESTINATION_SQL_STREAM.

Second, add the sector to the query so you have additional information about the stock ticker. Note that the sector must be added to both the SELECT and GROUP BY clauses.

When you are finished, your SQL code should look like the following:

CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
    ticker_symbol VARCHAR(4),
    sector VARCHAR(16), 
    ticker_symbol_count INTEGER,
    avg_price REAL,
    avg_change REAL,
    max_price REAL,
    min_price REAL);

CREATE OR REPLACE  PUMP "STREAM_PUMP" AS INSERT INTO "DESTINATION_SQL_STREAM"
SELECT STREAM   ticker_symbol,
                sector,
                COUNT(*) AS ticker_symbol_count,
                AVG(price) as avg_price,
                AVG(change) as avg_change,
                MAX(price) as max_price,
                MIN(price) as min_price
FROM "SOURCE_SQL_STREAM_001"
GROUP BY ticker_symbol, sector, FLOOR(("SOURCE_SQL_STREAM_001".ROWTIME - TIMESTAMP '1970-01-01 00:00:00') SECOND / 10 TO SECOND);

Choose Save and run SQL.

Add a second step to your SQL code

Next, add a second step to your SQL code. You can use in-application streams to store intermediate SQL results, which can then be used as input for additional SQL statements. This allows you to build applications with multiple steps serially before sending it to the destination of your choice. Additionally, you can also use in-application streams to perform multiple steps in parallel and send to multiple destinations.

First, change the DESTINATION_SQL_STREAM name in your two SQL statements to be INTERMEDIATE_SQL_STREAM.

Next, add a second SQL step that selects from INTERMEDIATE_SQL_STREAM and INSERTS into a DESTINATION_SQL_STREAM. The SELECT statement should filter only for companies in the TECHNOLOGY sector using a simple WHERE clause. You must also create the DESTINATION_SQL_STREAM to insert SQL results into. Your final application code should look like the following:

CREATE OR REPLACE STREAM "INTERMEDIATE_SQL_STREAM" (
    ticker_symbol VARCHAR(4),
    sector VARCHAR(16), 
    ticker_symbol_count INTEGER,
    avg_price REAL,
    avg_change REAL,
    max_price REAL,
    min_price REAL);

CREATE OR REPLACE  PUMP "STREAM_PUMP" AS INSERT INTO "INTERMEDIATE_SQL_STREAM"
SELECT STREAM   ticker_symbol,
                sector,
                COUNT(*) AS ticker_symbol_count,
                AVG(price) as avg_price,
                AVG(change) as avg_change,
                MAX(price) as max_price,
                MIN(price) as min_price
FROM "SOURCE_SQL_STREAM_001"
GROUP BY ticker_symbol, sector, FLOOR(("SOURCE_SQL_STREAM_001".ROWTIME - TIMESTAMP '1970-01-01 00:00:00') SECOND / 10 TO SECOND);

CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
    ticker_symbol VARCHAR(4),
    sector VARCHAR(16), 
    ticker_symbol_count INTEGER,
    avg_price REAL,
    avg_change REAL,
    max_price REAL,
    min_price REAL);

CREATE OR REPLACE PUMP "STREAM_PUMP_02" AS INSERT INTO "DESTINATION_SQL_STREAM"
SELECT STREAM   ticker_symbol, sector, ticker_symbol_count, avg_price, avg_change, max_price, min_price
FROM "INTERMEDIATE_SQL_STREAM"
WHERE sector = 'TECHNOLOGY';

Choose Save and run SQL.

You can see both of the in-application streams on the left side of the Real-time analytics tab, and select either to see each step in your application for end-to-end visibility.

From here, you can add a destination for your SQL results, such as an Amazon S3 bucket. After set up, your application continuously reads data from the streaming source, processes it using your SQL code, and emits the results to your configured destination.

Clean up

The final step is to clean up. Take the following steps to avoid incurring charges.

  1. Delete the Streams demo stream.
  2. Stop the Analytics application.

Summary

Previously, real-time stream data processing was only accessible to those with the technical skills to build and manage a complex application. With Amazon Kinesis Analytics, anyone familiar with the ANSI SQL standard can build and deploy a stream data processing application in minutes.

This application you just built provides a managed and elastic data processing pipeline using Analytics that calculates useful results over streaming data. Results are calculated as they arrive, and you can configure a destination to deliver them to a persistent store like Amazon S3.

It’s simple to get this solution working for your use case. All that is required is to replace the Amazon Kinesis demo stream with your own, and then set up data producers. From there, configure the analytics and you have an end-to-end solution for capturing, processing, and durably storing streaming data.

If you have questions or suggestions, please comment below.