New – Parallel Query for Amazon Aurora

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-parallel-query-for-amazon-aurora/

Amazon Aurora is a relational database that was designed to take full advantage of the abundance of networking, processing, and storage resources available in the cloud. While maintaining compatibility with MySQL and PostgreSQL on the user-visible side, Aurora makes use of a modern, purpose-built distributed storage system under the covers. Your data is striped across hundreds of storage nodes distributed over three distinct AWS Availability Zones, with two copies per zone, on fast SSD storage. Here’s what this looks like (extracted from Getting Started with Amazon Aurora):

New Parallel Query
When we launched Aurora we also hinted at our plans to apply the same scale-out design principle to other layers of the database stack. Today I would like to tell you about our next step along that path.

Each node in the storage layer pictured above also includes plenty of processing power. Aurora is now able to make great use of that processing power by taking your analytical queries (generally those that process all or a large part of a good-sized table) and running them in parallel across hundreds or thousands of storage nodes, with speed benefits approaching two orders of magnitude. Because this new model reduces network, CPU, and buffer pool contention, you can run a mix of analytical and transactional queries simultaneously on the same table while maintaining high throughput for both types of queries.

The instance class determines the number of parallel queries that can be active at a given time:

  • db.r*.large – 1 concurrent parallel query session
  • db.r*.xlarge – 2 concurrent parallel query sessions
  • db.r*.2xlarge – 4 concurrent parallel query sessions
  • db.r*.4xlarge – 8 concurrent parallel query sessions
  • db.r*.8xlarge – 16 concurrent parallel query sessions
  • db.r4.16xlarge – 16 concurrent parallel query sessions

You can use the aurora_pq parameter to enable and disable the use of parallel queries at the global and the session level.

Parallel queries enhance the performance of over 200 types of single-table predicates and hash joins. The Aurora query optimizer will automatically decide whether to use Parallel Query based on the size of the table and the amount of table data that is already in memory; you can also use the aurora_pq_force session variable to override the optimizer for testing purposes.

Parallel Query in Action
You will need to create a fresh cluster in order to make use of the Parallel Query feature. You can create one from scratch, or you can restore a snapshot.

To create a cluster that supports Parallel Query, I simply choose Provisioned with Aurora parallel query enabled as the Capacity type:

I used the CLI to restore a 100 GB snapshot for testing, and then explored one of the queries from the TPC-H benchmark. Here’s the basic query:

SELECT
  l_orderkey,
  SUM(l_extendedprice * (1-l_discount)) AS revenue,
  o_orderdate,
  o_shippriority

FROM customer, orders, lineitem

WHERE
  c_mktsegment='AUTOMOBILE'
  AND c_custkey = o_custkey
  AND l_orderkey = o_orderkey
  AND o_orderdate < date '1995-03-13'
  AND l_shipdate > date '1995-03-13'

GROUP BY
  l_orderkey,
  o_orderdate,
  o_shippriority

ORDER BY
  revenue DESC,
  o_orderdate LIMIT 15;

The EXPLAIN command shows the query plan, including the use of Parallel Query:

+----+-------------+----------+------+-------------------------------+------+---------+------+-----------+--------------------------------------------------------------------------------------------------------------------------------+
| id | select_type | table    | type | possible_keys                 | key  | key_len | ref  | rows      | Extra                                                                                                                          |
+----+-------------+----------+------+-------------------------------+------+---------+------+-----------+--------------------------------------------------------------------------------------------------------------------------------+
|  1 | SIMPLE      | customer | ALL  | PRIMARY                       | NULL | NULL    | NULL |  14354602 | Using where; Using temporary; Using filesort                                                                                   |
|  1 | SIMPLE      | orders   | ALL  | PRIMARY,o_custkey,o_orderdate | NULL | NULL    | NULL | 154545408 | Using where; Using join buffer (Hash Join Outer table orders); Using parallel query (4 columns, 1 filters, 1 exprs; 0 extra)   |
|  1 | SIMPLE      | lineitem | ALL  | PRIMARY,l_shipdate            | NULL | NULL    | NULL | 606119300 | Using where; Using join buffer (Hash Join Outer table lineitem); Using parallel query (4 columns, 1 filters, 1 exprs; 0 extra) |
+----+-------------+----------+------+-------------------------------+------+---------+------+-----------+--------------------------------------------------------------------------------------------------------------------------------+
3 rows in set (0.01 sec)

Here is the relevant part of the Extras column:

Using parallel query (4 columns, 1 filters, 1 exprs; 0 extra)

The query runs in less than 2 minutes when Parallel Query is used:

+------------+-------------+-------------+----------------+
| l_orderkey | revenue     | o_orderdate | o_shippriority |
+------------+-------------+-------------+----------------+
|   92511430 | 514726.4896 | 1995-03-06  |              0 |
|  593851010 | 475390.6058 | 1994-12-21  |              0 |
|  188390981 | 458617.4703 | 1995-03-11  |              0 |
|  241099140 | 457910.6038 | 1995-03-12  |              0 |
|  520521156 | 457157.6905 | 1995-03-07  |              0 |
|  160196293 | 456996.1155 | 1995-02-13  |              0 |
|  324814597 | 456802.9011 | 1995-03-12  |              0 |
|   81011334 | 455300.0146 | 1995-03-07  |              0 |
|   88281862 | 454961.1142 | 1995-03-03  |              0 |
|   28840519 | 454748.2485 | 1995-03-08  |              0 |
|  113920609 | 453897.2223 | 1995-02-06  |              0 |
|  377389669 | 453438.2989 | 1995-03-07  |              0 |
|  367200517 | 453067.7130 | 1995-02-26  |              0 |
|  232404000 | 452010.6506 | 1995-03-08  |              0 |
|   16384100 | 450935.1906 | 1995-03-02  |              0 |
+------------+-------------+-------------+----------------+
15 rows in set (1 min 53.36 sec)

I can disable Parallel Query for the session (I can use an RDS custom cluster parameter group for a longer-lasting effect):

set SESSION aurora_pq=OFF;

The query runs considerably slower without it:

+------------+-------------+-------------+----------------+
| l_orderkey | o_orderdate | revenue     | o_shippriority |
+------------+-------------+-------------+----------------+
|   92511430 | 1995-03-06  | 514726.4896 |              0 |
...
|   16384100 | 1995-03-02  | 450935.1906 |              0 |
+------------+-------------+-------------+----------------+
15 rows in set (1 hour 25 min 51.89 sec)

This was on a db.r4.2xlarge instance; other instance sizes, data sets, access patterns, and queries will perform differently. I can also override the query optimizer and insist on the use of Parallel Query for testing purposes:

set SESSION aurora_pq_force=ON;

Things to Know
Here are a couple of things to keep in mind when you start to explore Amazon Aurora Parallel Query:

Engine Support – We are launching with support for MySQL 5.6, and are working on support for MySQL 5.7 and PostgreSQL.

Table Formats – The table row format must be COMPACT; partitioned tables are not supported.

Data Types – The TEXT, BLOB, and GEOMETRY data types are not supported.

DDL – The table cannot have any pending fast online DDL operations.

Cost – You can make use of Parallel Query at no extra charge. However, because it makes direct access to storage, there is a possibility that your IO cost will increase.

Give it a Shot
This feature is available now and you can start using it today!

Jeff;

 

Developing .NET Core AWS Lambda functions

Post Syndicated from Chris Munns original https://aws.amazon.com/blogs/compute/developing-net-core-aws-lambda-functions/

This post is courtesy of Mark Easton, Senior Solutions Architect – AWS

One of the biggest benefits of Lambda functions is that they isolate you from the underlying infrastructure. While that makes it easy to deploy and manage your code, it’s critical to have a clearly defined approach for testing, debugging, and diagnosing problems.

There’s a variety of best practices and AWS services to help you out. When developing Lambda functions in .NET, you can follow a four-pronged approach:

This post demonstrates the approach by creating a simple Lambda function that can be called from a gateway created by Amazon API Gateway and which returns the current UTC time. The post shows you how to design your code to allow for easy debugging, logging and tracing.

If you haven’t created Lambda functions with .NET Core before, then the following posts can help you get started:

Unit testing Lambda functions

One of the easiest ways to create a .NET Core Lambda function is to use the .NET Core CLI and create a solution using the Lambda Empty Serverless template.

If you haven’t already installed the Lambda templates, run the following command:

dotnet new -i Amazon.Lambda.Templates::*

You can now use the template to create a serverless project and unit test project, and then add them to a .NET Core solution by running the following commands:

dotnet new serverless.EmptyServerless -n DebuggingExample
cd DebuggingExample
dotnet new sln -n DebuggingExample\
dotnet sln DebuggingExample.sln add */*/*.csproj

Although you haven’t added any code yet, you can validate that everything’s working by executing the unit tests. Run the following commands:

cd test/DebuggingExample.Tests/
dotnet test

One of the key principles to effective unit testing is ensuring that units of functionality can be tested in isolation. It’s good practice to de-couple the Lambda function’s actual business logic from the plumbing code that handles the actual Lambda requests.

Using your favorite editor, create a new file, ITimeProcessor.cs, in the src/DebuggingExample folder, and create the following basic interface:

using System;

namespace DebuggingExample
{
    public interface ITimeProcessor
    {
        DateTime CurrentTimeUTC();
    }
}

Then, create a new TimeProcessor.cs file in the src/DebuggingExample folder. The file contains a concrete class implementing the interface.

using System;

namespace DebuggingExample
{
    public class TimeProcessor : ITimeProcessor
    {
        public DateTime CurrentTimeUTC()
        {
            return DateTime.UtcNow;
        }
    }
} 

Now add a TimeProcessorTest.cs file to the src/DebuggingExample.Tests folder. The file should contain the following code:

using System;
using Xunit;

namespace DebuggingExample.Tests
{
    public class TimeProcessorTest
    {
        [Fact]
        public void TestCurrentTimeUTC()
        {
            // Arrange
            var processor = new TimeProcessor();
            var preTestTimeUtc = DateTime.UtcNow;

            // Act
            var result = processor.CurrentTimeUTC();

            // Assert time moves forwards 
            var postTestTimeUtc = DateTime.UtcNow;
            Assert.True(result >= preTestTimeUtc);
            Assert.True(result <= postTestTimeUtc);
        }
    }
}

You can then execute all the tests. From the test/DebuggingExample.Tests folder, run the following command:

dotnet test

Surfacing business logic in a Lambda function

Now that you have your business logic written and tested, you can surface it as a Lambda function. Edit the src/DebuggingExample/Function.cs file so that it calls the CurrentTimeUTC method:

using System;
using System.Collections.Generic;
using System.Net;
using Amazon.Lambda.Core;
using Amazon.Lambda.APIGatewayEvents;
using Newtonsoft.Json;

// Assembly attribute to enable the Lambda function's JSON input to be converted into a .NET class.
[assembly: LambdaSerializer(
typeof(Amazon.Lambda.Serialization.Json.JsonSerializer))] 

namespace DebuggingExample
{
    public class Functions
    {
        ITimeProcessor processor = new TimeProcessor();

        public APIGatewayProxyResponse Get(
APIGatewayProxyRequest request, ILambdaContext context)
        {
            var result = processor.CurrentTimeUTC();

            return CreateResponse(result);
        }

APIGatewayProxyResponse CreateResponse(DateTime? result)
{
    int statusCode = (result != null) ? 
        (int)HttpStatusCode.OK : 
        (int)HttpStatusCode.InternalServerError;

    string body = (result != null) ? 
        JsonConvert.SerializeObject(result) : string.Empty;

    var response = new APIGatewayProxyResponse
    {
        StatusCode = statusCode,
        Body = body,
        Headers = new Dictionary<string, string>
        { 
            { "Content-Type", "application/json" }, 
            { "Access-Control-Allow-Origin", "*" } 
        }
    };
    
    return response;
}
    }
}

First, an instance of the TimeProcessor class is instantiated, and a Get() method is then defined to act as the entry point to the Lambda function.

By default, .NET Core Lambda function handlers expect their input in a Stream. This can be overridden by declaring a customer serializer, and then defining the handler’s method signature using a custom request and response type.

Because the project was created using the serverless.EmptyServerless template, it already overrides the default behavior. It does this by including a using reference to Amazon.Lambda.APIGatewayEvents and then declaring a custom serializer. For more information about using custom serializers in .NET, see the AWS Lambda for .NET Core repository on GitHub.

Get() takes a couple of parameters:

  • The APIGatewayProxyRequest parameter contains the request from the API Gateway fronting the Lambda function
  • The optional ILambdaContext parameter contains details of the execution context.

The Get() method calls CurrentTimeUTC() to retrieve the time from the business logic.

Finally, the result from CurrentTimeUTC() is passed to the CreateResponse() method, which converts the result into an APIGatewayResponse object to be returned to the caller.

Because the updated Lambda function no longer passes the unit tests, update the TestGetMethod in test/DebuggingExample.Tests/FunctionTest.cs file. Update the test by removing the following line:

Assert.Equal("Hello AWS Serverless", response.Body);

This leaves your FunctionTest.cs file as follows:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading.Tasks;
using Xunit;
using Amazon.Lambda.Core;
using Amazon.Lambda.TestUtilities;
using Amazon.Lambda.APIGatewayEvents;
using DebuggingExample;

namespace DebuggingExample.Tests
{
    public class FunctionTest
    {
        public FunctionTest()
        {
        }

        [Fact]
        public void TetGetMethod()
        {
            TestLambdaContext context;
            APIGatewayProxyRequest request;
            APIGatewayProxyResponse response;

            Functions functions = new Functions();

            request = new APIGatewayProxyRequest();
            context = new TestLambdaContext();
            response = functions.Get(request, context);
            Assert.Equal(200, response.StatusCode);
        }
    }
}

Again, you can check that everything is still working. From the test/DebuggingExample.Tests folder, run the following command:

dotnet test

Local integration testing with the AWS SAM CLI

Unit testing is a great start for testing thin slices of functionality. But to test that your API Gateway and Lambda function integrate with each other, you can test locally by using the AWS SAM CLI, installed as described in the AWS Lambda Developer Guide.

Unlike unit testing, which allows you to test functions in isolation outside of their runtime environment, the AWS SAM CLI executes your code in a locally hosted Docker container. It can also simulate a locally hosted API gateway proxy, allowing you to run component integration tests.

After you’ve installed the AWS SAM CLI, you can start using it by creating a template that describes your Lambda function by saving a file named template.yaml in the DebuggingExample directory with the following contents:

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: Sample SAM Template for DebuggingExample

# More info about Globals: https://github.com/awslabs/serverless-application-model/blob/master/docs/globals.rst
Globals:
    Function:
        Timeout: 10

Resources:

    DebuggingExampleFunction:
        Type: AWS::Serverless::Function # More info about Function Resource: https://github.com/awslabs/serverless-application-model/blob/master/versions/2016-10-31.md#awsserverlessfunction
        Properties:
            FunctionName: DebuggingExample
			CodeUri: src/DebuggingExample/bin/Release/netcoreapp2.1/publish
            Handler: DebuggingExample::DebuggingExample.Functions::Get
            Runtime: dotnetcore2.1
            Environment: # More info about Env Vars: https://github.com/awslabs/serverless-application-model/blob/master/versions/2016-10-31.md#environment-object
                Variables:
                    PARAM1: VALUE
            Events:
                DebuggingExample:
                    Type: Api # More info about API Event Source: https://github.com/awslabs/serverless-application-model/blob/master/versions/2016-10-31.md#api
                    Properties:
                        Path: /
                        Method: get

Outputs:

    DebuggingExampleApi:
      Description: "API Gateway endpoint URL for Prod stage for Debugging Example function"
      Value: !Sub "https://${ServerlessRestApi}.execute-api.${AWS::Region}.amazonaws.com/Prod/DebuggingExample/"

    DebuggingExampleFunction:
      Description: "Debugging Example Lambda Function ARN"
      Value: !GetAtt DebuggingExampleFunction.Arn

    DebuggingExampleFunctionIamRole:
      Description: "Implicit IAM Role created for Debugging Example function"
      Value: !GetAtt DebuggingExampleFunctionRole.Arn

Now that you have an AWS SAM CLI template, you can test your code locally. Because the Lambda function expects a request from API Gateway, create a sample API Gateway request. Run the following command:

sam local generate-event api > testApiRequest.json

You can now publish your DebuggingExample code locally and invoke it by passing in the sample request as follows:

dotnet publish -c Release
sam local invoke "DebuggingExampleFunction" --event testApiRequest.json

The first time that you run it, it might take some time to pull down the container image in which to host the Lambda function. After you’ve invoked it one time, the container image is cached locally, and execution speeds up.

Finally, rather than testing your function by sending it a sample request, test it with a real API gateway request by running API Gateway locally:

sam local start-api

If you now navigate to http://127.0.0.1:3000/ in your browser, you can get the API gateway to send a request to your locally hosted Lambda function. See the results in your browser.

Logging events with CloudWatch

Having a test strategy allows you to execute, test, and debug Lambda functions. After you’ve deployed your functions to AWS, you must still log what the functions are doing so that you can monitor their behavior.

The easiest way to add logging to your Lambda functions is to add code that writes events to CloudWatch. To do this, add a new method, LogMessage(), to the src/DebuggingExample/Function.cs file.

void LogMessage(ILambdaContext ctx, string msg)
{
    ctx.Logger.LogLine(
        string.Format("{0}:{1} - {2}", 
            ctx.AwsRequestId, 
            ctx.FunctionName,
            msg));
}

This takes in the context object from the Lambda function’s Get() method, and sends a message to CloudWatch by calling the context object’s Logger.Logline() method.

You can now add calls to LogMessage in the Get() method to log events in CloudWatch. It’s also a good idea to add a Try… Catch… block to ensure that exceptions are logged as well.

        public APIGatewayProxyResponse Get(APIGatewayProxyRequest request, ILambdaContext context)
        {
            LogMessage(context, "Processing request started");

            APIGatewayProxyResponse response;
            try
            {
                var result = processor.CurrentTimeUTC();
                response = CreateResponse(result);

                LogMessage(context, "Processing request succeeded.");
            }
            catch (Exception ex)
            {
                LogMessage(context, string.Format("Processing request failed - {0}", ex.Message));
                response = CreateResponse(null);
            }

            return response;
        }

To validate that the changes haven’t broken anything, you can now execute the unit tests again. Run the following commands:

cd test/DebuggingExample.Tests/
dotnet test

Tracing execution with X-Ray

Your code now logs events in CloudWatch, which provides a solid mechanism to help monitor and diagnose problems.

However, it can also be useful to trace your Lambda function’s execution to help diagnose performance or connectivity issues, especially if it’s called by or calling other services. X-Ray provides a variety of features to help analyze and trace code execution.

To enable active tracing on your function you need to modify the SAM template we created earlier to add a new attribute to the function resource definition. With SAM this is as easy as adding the Tracing attribute and specifying it as Active below the Timeout attribute in the Globals section of the template.yaml file:

Globals:
    Function:
        Timeout: 10
        Tracing: Active

To call X-Ray from within your .NET Core code, you must add the AWSSDKXRayRecoder to your solution by running the following command in the src/DebuggingExample folder:

dotnet add package AWSXRayRecorder –-version 2.2.1-beta

Then, add the following using statement at the top of the src/DebuggingExample/Function.cs file:

using Amazon.XRay.Recorder.Core;

Add a new method to the Function class, which takes a function and name and then records an X-Ray subsegment to trace the execution of the function.

        private T TraceFunction<T>(Func<T> func, string subSegmentName)
        {
            AWSXRayRecorder.Instance.BeginSubsegment(subSegmentName);
            T result = func();
            AWSXRayRecorder.Instance.EndSubsegment();

            return result;
        } 

You can now update the Get() method by replacing the following line:

var result = processor.CurrentTimeUTC();

Replace it with this line:

var result = TraceFunction(processor.CurrentTimeUTC, "GetTime");

The final version of Function.cs, in all its glory, is now:

using System;
using System.Collections.Generic;
using System.Net;
using Amazon.Lambda.Core;
using Amazon.Lambda.APIGatewayEvents;
using Newtonsoft.Json;
using Amazon.XRay.Recorder.Core;

// Assembly attribute to enable the Lambda function's JSON input to be converted into a .NET class.
[assembly: LambdaSerializer(
typeof(Amazon.Lambda.Serialization.Json.JsonSerializer))]

namespace DebuggingExample
{
    public class Functions
    {
        ITimeProcessor processor = new TimeProcessor();

        public APIGatewayProxyResponse Get(APIGatewayProxyRequest request, ILambdaContext context)
        {
            LogMessage(context, "Processing request started");

            APIGatewayProxyResponse response;
            try
            {
                var result = TraceFunction(processor.CurrentTimeUTC, "GetTime");
                response = CreateResponse(result);

                LogMessage(context, "Processing request succeeded.");
            }
            catch (Exception ex)
            {
                LogMessage(context, string.Format("Processing request failed - {0}", ex.Message));
                response = CreateResponse(null);
            }

            return response;
        }

        APIGatewayProxyResponse CreateResponse(DateTime? result)
        {
            int statusCode = (result != null) ?
                (int)HttpStatusCode.OK :
                (int)HttpStatusCode.InternalServerError;

            string body = (result != null) ?
                JsonConvert.SerializeObject(result) : string.Empty;

            var response = new APIGatewayProxyResponse
            {
                StatusCode = statusCode,
                Body = body,
                Headers = new Dictionary<string, string>
        {
            { "Content-Type", "application/json" },
            { "Access-Control-Allow-Origin", "*" }
        }
            };

            return response;
        }

        private void LogMessage(ILambdaContext context, string message)
        {
            context.Logger.LogLine(string.Format("{0}:{1} - {2}", context.AwsRequestId, context.FunctionName, message));
        }

        private T TraceFunction<T>(Func<T> func, string actionName)
        {
            AWSXRayRecorder.Instance.BeginSubsegment(actionName);
            T result = func();
            AWSXRayRecorder.Instance.EndSubsegment();

            return result;
        }
    }
}

Since AWS X-Ray requires an agent to collect trace information, if you want to test the code locally you should now install the AWS X-Ray agent. Once it’s installed, confirm the changes haven’t broken anything by running the unit tests again:

cd test/DebuggingExample.Tests/
dotnet test

For more information about using X-Ray from .NET Core, see the AWS X-Ray Developer Guide. For information about adding support for X-Ray in Visual Studio, see the New AWS X-Ray .NET Core Support post.

Deploying and testing the Lambda function remotely

Having created your Lambda function and tested it locally, you’re now ready to package and deploy your code.

First of all you need an Amazon S3 bucket to deploy the code into. If you don’t already have one, create a suitable S3 bucket.

You can now package the .NET Lambda Function and copy it to Amazon S3.

sam package \
  --template-file template.yaml \
  --output-template debugging-example.yaml \
  --s3-bucket debugging-example-deploy

Finally, deploy the Lambda function by running the following command:

sam deploy \
   --template-file debugging-example.yaml \
   --stack-name DebuggingExample \
   --capabilities CAPABILITY_IAM \
   --region eu-west-1

After your code has deployed successfully, test it from your local machine by running the following command:

dotnet lambda invoke-function DebuggingExample -–region eu-west-1

Diagnosing the Lambda function

Having run the Lambda function, you can now monitor its behavior by logging in to the AWS Management Console and then navigating to CloudWatch LogsCloudWatch Logs Console

You can now click on the /aws/lambda/DebuggingExample log group to view all the recorded log streams for your Lambda function.

If you open one of the log streams, you see the various messages recorded for the Lambda function, including the two events explicitly logged from within the Get() method.Lambda CloudWatch Logs

To review the logs locally, you can also use the AWS SAM CLI to retrieve CloudWatch logs and then display them in your terminal.

sam logs -n DebuggingExample --region eu-west-1

As a final alternative, you can also execute the Lambda function by choosing Test on the Lambda console. The execution results are displayed in the Log output section. Lambda Console Execution

In the X-Ray console, the Service Map page shows a map of the Lambda function’s connections.

Your Lambda function is essentially standalone. However, the Service Map page can be critical in helping to understand performance issues when a Lambda function is connected with a number of other services.X-Ray Service Map

If you open the Traces screen, the trace list showing all the trace results that it’s recorded. Open one of the traces to see a breakdown of the Lambda function performance.

X-Ray Traces UI

Conclusion

In this post, I showed you how to develop Lambda functions in .NET Core, how unit tests can be used, how to use the AWS SAM CLI for local integration tests, how CloudWatch can be used for logging and monitoring events, and finally how to use X-Ray to trace Lambda function execution.

Put together, these techniques provide a solid foundation to help you debug and diagnose your Lambda functions effectively. Explore each of the services further, because when it comes to production workloads, great diagnosis is key to providing a great and uninterrupted customer experience.

Hard Disk Drive (HDD) vs Solid State Drive (SSD): What’s the Diff?

Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/hdd-versus-ssd-whats-the-diff/

whats the diff? SSD vs. HDD

HDDs and SSDs have changed in the two years since Peter Cohen wrote the original version of this post on March 8 of 2016. We thought it was time for an update. We hope you enjoy it.

— Editor

In This Corner: The Hard Disk Drive (HDD)

The traditional spinning hard drive has been a standard for many generations of personal computers. Constantly improving technology has enabled hard drive makers to pack more storage capacity than ever, at a cost per gigabyte that still makes hard drives the best bang for the buck.

IBM RamacAs sophisticated as they’ve become, hard drives have been around since 1956. The ones back then were two feet across and could store only a few megabytes of information, but technology has improved to the point where you can cram 10 terabytes into something about the same size as a kitchen sponge.

Inside a hard drive is something that looks more than a bit like an old record player: There’s a platter, or stacked platters, which spin around a central axis — a spindle — typically at about 5,400 to 7,200 revolutions per minute. Some hard drives built for performance work faster.

Hard Drive exploded viewInformation is written to and read from the drive by changing the magnetic fields on those spinning platters using an armature called a read-write head. Visually, it looks a bit like the arm of a record player, but instead of being equipped with a needle that runs in a physical groove on the record, the read-write head hovers slightly above the physical surface of the disk.

The two most common form factors for hard drives are 2.5-inch, common for laptops, and 3.5-inch, common for desktop machines. The size is standardized, which makes for easier repair and replacement when things go wrong.

The vast majority of drives in use today connect through a standard interface called Serial ATA (or SATA). Specialized storage systems sometimes use Serial Attached SCSI (SAS), Fibre Channel, or other exotic interfaces designed for special purposes.

Hard Disk Drives Cost Advantage

Proven technology that’s been in use for decades makes hard disk drives cheap — much cheaper, per gigabyte than solid state drives. HDD storage can run as low as three cents per gigabyte. You don’t spend a lot but you get lots of space. HDD makers continue to improve storage capacity while keeping costs low, so HDDs remain the choice of anyone looking for a lot of storage without spending a lot of money.

The downside is that HDDs can be power-hungry, generate noise, produce heat, and don’t work nearly as fast as SSDs. Perhaps the biggest difference is that HDDs, with all their similarities to record players, are ultimately mechanical devices. Over time, mechanical devices will wear out. It’s not a question of if, it’s a question of when.

HDD technology isn’t standing still, and price per unit stored has decreased dramatically. As we said in our post, HDD vs SSD: What Does the Future for Storage Hold? — Part 2, the cost per gigabyte for HDDs has decreased by two billion times in about 60 years.

HDD manufacturers have made dramatic advances in technology to keep storing more and more information on HD platters — referred to as areal density. As HDD manufacturers try to outdo each other, consumers have benefited from larger and larger drive sizes. One technique is to replace the air in drives with helium, which reduces reduces friction and supports greater areal density. Another technology that should be available soon uses heat-assisted magnetic recording (HAMR). HAMR records magnetically using laser-thermal assistance that ultimately could lead to a 20 terabyte drive by 2019. See our post on HAMR by Seagate’s CTO Mark Re, What is HAMR and How Does It Enable the High-Capacity Needs of the Future?

The continued competition and race to put more and more storage in the same familiar 3.5” HDD form factor means that it will be a relatively small, very high capacity choice for storage for many years to come.

In the Opposite Corner: The Solid State Drive (SSD)

Solid State Drives (SSDs) have become much more common in recent years. They’re standard issue across Apple’s laptop line, for example the MacBook, MacBook Pro and MacBook Air all come standard with SSDs. So does the Mac Pro.

Inside an SSDSolid state is industry shorthand for an integrated circuit, and that’s the key difference between an SSD and a HDD: there are no moving parts inside an SSD. Rather than using disks, motors and read/write heads, SSDs use flash memory instead — that is, computer chips that retain their information even when the power is turned off.

SSDs work in principle the same way the storage on your smartphone or tablet works. But the SSDs you find in today’s Macs and PCs work faster than the storage in your mobile device.

The mechanical nature of HDDs limits their overall performance. Hard drive makers work tirelessly to improve data transfer speeds and reduce latency and idle time, but there’s a finite amount they can do. SSDs provide a huge performance advantage over hard drives — they’re faster to start up, faster to shut down, and faster to transfer data.

A Range of SSD Form Factors

SSDs can be made smaller and use less power than hard drives. They also don’t make noise, and can be more reliable because they’re not mechanical. As a result, computers designed to use SSDs can be smaller, thinner, lighter and last much longer on a single battery charge than computers that use hard drives.

SSD Conversion KitMany SSD makers produce SSD mechanisms that are designed to be plug-and-play drop-in replacements for 2.5-inch and 3.5-inch hard disk drives because there are millions of existing computers (and many new computers still made with hard drives) that can benefit from the change. They’re equipped with the same SATA interface and power connector you might find on a hard drive.


Intel SSD DC P4500A wide range of SSD form factors are now available. Memory Sticks, once limited to 128MB maximum, now come in versions as large as 2 TB. They are used primarily in mobile devices where size and density are primary factor, such as cameras, phones, drones, and so forth. Other high density form factors are designed for data center applications, such as Intel’s 32 TB P4500. Resembling a standard 12-inch ruler, the Intel SSD DC P4500 has a 32 terabyte capacity. Stacking 64 extremely thin layers of 3D NAND, the P4500 is currently the world’s densest solid state drive. The price is not yet available, but given that the DC P4500 SSD requires only one-tenth the power and just one-twentieth the space of traditional hard disk storage, once the price comes out of the stratosphere you can be sure that there will be a market for it.

Nimbus ExaDrive 100TB SSDEarlier this year, Nimbus Data announced the ExaDrive D100 100TB SSD. This SSD by itself holds over twice as much data as Backblaze’s first Storage Pods. Nimbus Data has said that the drive will have pricing comparable to other business-grade SSDs “on a per terabyte basis.” That likely means a price in the tens of thousands of dollars.

SSD drive manufacturers also are chasing ways to store more data in ever smaller form factors and at greater speeds. The familiar SSD drive that looks like a 2.5” HDD drive is starting to become less common. Given the very high speeds that data can be read and copied to the memory chips inside SSDs, it’s natural that computer and storage designers want to take full advantage of that capability. Increasingly, storage is plugging directly into the computer’s system board, and in the process taking on new shapes.

Anand Lal Shimpi, anandtech.com -- http://www.anandtech.com/show/6293/ngff-ssds-putting-an-end-to-proprietary-ultrabook-ssd-form-factors

A size comparison of an mSATA SSD (left) and an M.2 2242 SSD (right)

Laptop makers adopted the mSATA, and then the M.2 standard, which can be as small as a few squares of chocolate but have the same capacity as any 2.5” SATA SSD.

Another interface technology called NvM Express or NVMe may start to move from servers in the data center to consumer laptops in the next few years. NVMe will push storage speeds in laptops and workstations even higher.

SSDs Fail Too

Just like hard drives, SSDs can wear out, though for different reasons. With hard drives, it’s often just the mechanical reality of a spinning motor that wears down over time. Although there are no moving parts inside an SSD, each memory bank has a finite life expectancy — a limit on the number of times it can be written to and read from before it stops working. Logic built into the drives tries to dynamically manage these operations to minimize problems and extend its life.

For practical purposes, most of us don’t need to worry about SSD longevity. An SSD you put in your computer today will likely outlast the computer. But it’s sobering to remember that even though SSDs are inherently more rugged than hard drives, they’re still prone to the same laws of entropy as everything else in the universe.

Planning for the Future of Storage

If you’re still using a computer with a SATA hard drive, you can see a huge performance increase by switching to an SSD. What’s more, the cost of SSDs has dropped dramatically over the course of the past couple of years, so it’s less expensive than ever to do this sort of upgrade.

Whether you’re using a HDD or an SSD, a good backup plan is essential because eventually any drive will fail. You should have a local backup combined with secure cloud-based backup like Backblaze, which satisfies the 3-2-1 backup strategy. To help get started, make sure to check out our Backup Guide.

Hopefully, we’ve given you some insight about HDDs and SSDs. And as always, we encourage your questions and comments, so fire away!


Editor’s note:  You might enjoy reading more about the future of HDDs and SSDs in our two-part series, HDD vs SSD: What Does the Future for Storage Hold?

The post Hard Disk Drive (HDD) vs Solid State Drive (SSD): What’s the Diff? appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Security updates for Thursday

Post Syndicated from jake original https://lwn.net/Articles/765814/rss

Security updates have been issued by Debian (glusterfs, php5, reportbug, and suricata), openSUSE (chromium and exempi), Red Hat (openstack-rabbitmq-container), SUSE (couchdb, crowbar, crowbar-core, crowbar-ha, crowbar-init, crowbar-openstack, crowbar-ui, gdm, OpenStack, pango, and webkit2gtk3), and Ubuntu (bind9, lcms, lcms2, and lcms2).

HackSpace magazine issue 11: best maker hardware

Post Syndicated from Andrew Gregory original https://www.raspberrypi.org/blog/hackspace-magazine-11-best-maker-hardware/

Today is that glorious day of the month when a new issue of HackSpace magazine comes out!

HackSpace magazine #11: All you can hardware

The cream of this year’s hardware crop

You’re on safe and solid ground with an Arduino, or one of Adafruit’s boards — so much so that many makers get comfortable and never again look at the other options that are out there. With the help of Hackster’s chief hardware nerd Alex Glow, we’re here to open your eyes to the new devices and boards that could really kick your making into gear. We know it’s easy to stick with what you know, but trust us — hacker tech is getting better all the time. So try something new!

Hackspace magazine hardware feature spread

One man and his shed shack

If you want to learn stuff like how to build a workbench that includes a voice-activated beer dispenser, then check out Al’s Hack Shack on Youtube.

Al's Hack Shack

We went to see the man inside the shack to learn about the maker community’s love of sharing, why being grown-up means you get more time to play, and why making is good for your mental health.

Hacky Racers

Maker culture shows itself in all sorts of quirky forms. The one we’re portraying in issue 11 is the Hacky Racers: motorsport meets Robot Wars meets mud. Lots of mud. If you feel the need, the need for speed (or mud), then get involved!

Hacky Racers

Laser harp

Yes, you read that right! At HackSpace magazine, we get a lot of gear coming in for us to test, but few items have given us more joy than this laser harp.

It’s easy to build, it’s affordable, and it poses only a very small risk of burning out your retinas. It’s the most fun you can have for £8.59 including postage. Promise. Read our full review in this month’s issue!

And there’s more!

We demystify PAT testing, help you make sense of circuit design with a beginners’ guide to Tinkercad, tell you why you need an angle grinder, and show you the easiest way we’ve ever seen of keeping knives sharp. All this and more, in your latest issue of HackSpace magazine!

Get your copy of HackSpace magazine

If you like the sound of this month’s content, you can find HackSpace magazine in WHSmith, Tesco, Sainsbury’s, and independent newsagents in the UK. If you live in the US, check out your local Barnes & Noble, Fry’s, or Micro Center next week. We’re also shipping to stores in Australia, Hong Kong, Canada, Singapore, Belgium, and Brazil, so be sure to ask your local newsagent whether they’ll be getting HackSpace magazine. And if you’d rather try before you buy, you can always download the free PDF.

Subscribe now

Subscribe now” may not be subtle as a marketing message, but we really think you should. You’ll get the magazine early, plus a lovely physical paper copy, which has really good battery life.

Oh, and twelve-month print subscribers get an Adafruit Circuit Playground Express loaded with inputs and sensors and ready for your next project. Tempted?

The post HackSpace magazine issue 11: best maker hardware appeared first on Raspberry Pi.

Security Vulnerability in ESS ExpressVote Touchscreen Voting Computer

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2018/09/security_vulner_16.html

Of course the ESS ExpressVote voting computer will have lots of security vulnerabilities. It’s a computer, and computers have lots of vulnerabilities. This particular vulnerability is particularly interesting because it’s the result of a security mistake in the design process. Someone didn’t think the security through, and the result is a voter-verifiable paper audit trail that doesn’t provide the security it promises.

Here are the details:

Now there’s an even worse option than “DRE with paper trail”; I call it “;press this button if it’s OK for the machine to cheat” option. The country’s biggest vendor of voting machines, ES&S, has a line of voting machines called ExpressVote. Some of these are optical scanners (which are fine), and others are “combination” machines, basically a ballot-marking device and an optical scanner all rolled into one.

This video shows a demonstration of ExpressVote all-in-one touchscreens purchased by Johnson County, Kansas. The voter brings a blank ballot to the machine, inserts it into a slot, chooses candidates. Then the machine prints those choices onto the blank ballot and spits it out for the voter to inspect. If the voter is satisfied, she inserts it back into the slot, where it is counted (and dropped into a sealed ballot box for possible recount or audit).

So far this seems OK, except that the process is a bit cumbersome and not completely intuitive (watch the video for yourself). It still suffers from the problems I describe above: voter may not carefully review all the choices, especially in down-ballot races; counties need to buy a lot more voting machines, because voters occupy the machine for a long time (in contrast to op-scan ballots, where they occupy a cheap cardboard privacy screen).

But here’s the amazingly bad feature: “The version that we have has an option for both ways,” [Johnson County Election Commissioner Ronnie] Metsker said. “We instruct the voters to print their ballots so that they can review their paper ballots, but they’re not required to do so. If they want to press the button ‘cast ballot,’ it will cast the ballot, but if they do so they are doing so with full knowledge that they will not see their ballot card, it will instead be cast, scanned, tabulated and dropped in the secure ballot container at the backside of the machine.” [TYT Investigates, article by Jennifer Cohn, September 6, 2018]

Now it’s easy for a hacked machine to cheat undetectably! All the fraudulent vote-counting program has to do is wait until the voter chooses between “cast ballot without inspecting” and “inspect ballot before casting.” If the latter, then don’t cheat on this ballot. If the former, then change votes how it likes, and print those fraudulent votes on the paper ballot, knowing that the voter has already given up the right to look at it.

A voter-verifiable paper audit trail does not require every voter to verify the paper ballot. But it does require that every voter be able to verify the paper ballot. I am continuously amazed by how bad electronic voting machines are. Yes, they’re computers. But they also seem to be designed by people who don’t understand computer (or any) security.

After Years of Abusive E-mails, the Creator of Linux Steps Aside (The New Yorker)

Post Syndicated from jake original https://lwn.net/Articles/765674/rss

A story in The New Yorker magazine may help explain some of the timing of the recent upheavals in kernel-land. Longtime followers of kernel development will find the article to be a mixed bag—over the top in spots, fairly accurate elsewhere. “Torvalds’s decision to step aside came after The New Yorker asked him a series of questions about his conduct for a story on complaints about his abusive behavior discouraging women from working as Linux-kernel programmers. In a response to The New Yorker, Torvalds said, ‘I am very proud of the Linux code that I invented and the impact it has had on the world. I am not, however, always proud of my inability to communicate well with others—this is a lifelong struggle for me. To anyone whose feelings I have hurt, I am deeply sorry.’

Narrativ is helping producers monetize their digital content with Amazon Redshift

Post Syndicated from Colin Nichols original https://aws.amazon.com/blogs/big-data/narrativ-is-helping-producers-monetize-their-digital-content-with-amazon-redshift/

Narrativ, in their own words: Narrativ is building monetization technology for the next generation of digital content producers. Our product portfolio includes a real-time bidding platform and visual storytelling tools that together generate millions of dollars of advertiser value and billions of data points each month.

At Narrativ, we have seen massive growth in our platform usage with a similar order-of-magnitude increase in data generated by our products over the past 15 months. In this blog post, we share our evolution to a robust, scalable, performant, and cost-effective analytics environment with AWS. We also discuss the best practices that we learned along the way in data warehousing and data lake analytics.

Anticipating Narrativ’s continued growth acceleration, we began planning late last year for the next order of magnitude. We have been using Amazon Redshift as our data warehouse, which has served us very well. As our data continued to grow, we utilized Amazon S3 as a data lake and used external tables in Amazon Redshift Spectrum to query data directly in S3. We were excited that this allowed us to easily scale storage and compute resources separately to meet our needs without trading off against cost or complexity.

In the process, we created Spectrify, which simplifies working with Redshift Spectrum and encapsulates a number of best practices. Spectrify accomplishes three things in one easy command. First, it exports an Amazon Redshift table to S3 in comma-separated value (CSV) format. Second, it converts exported CSVs to Apache Parquet files in parallel. Third, it creates the external tables in the Amazon Redshift cluster. Queries can now span massive volumes of Parquet data in S3 with data in Amazon Redshift, and return results in just moments.

 

The preceding diagram shows how Spectrify simplifies querying across Parquet data in Amazon S3 and data in Amazon Redshift, returning results in just moments.

Amazon Redshift at Narrativ

Amazon Redshift has been the foundation of our data warehouse since Narrativ’s inception. We use it to produce traffic statistics, to feed the data-driven algorithms in our programmatic systems, and as a window to explore data insights. Amazon Redshift has scaled well to support the company’s needs. Our cluster has grown from 3 nodes to 36 nodes, and queries per hour have increased significantly without losing performance. As our workload moved from statistics and accounting toward data-driven insights, Amazon Redshift kept pace with increasing query complexity.

When planning the next iteration of our data strategy, our requirements were as follows:

  • Be maintainable by a small engineering team (mostly hands-off)
  • Be able to query all historic data at any time
  • Be able to query the previous three months of data very quickly
  • Maintain performance at 10x current data volume

We considered a few ideas to meet these requirements:

  • Upgrade to a DC2 cluster and a DS2 cluster
  • Upgrade to a very large DC2 cluster
  • Move to an open-source alternative (difficult to maintain)
  • Move to another big data provider (high barrier to entry, costly, risky)

Then we discovered Redshift Spectrum, which changes the game by separating compute costs from storage costs. With Redshift Spectrum, we can offload older, infrequently used data to S3 while maintaining a modestly sized, yet fast, Amazon Redshift cluster. Compared to other solutions, Redshift Spectrum reduces complexity by allowing us to join to the same set of dimension tables for all of our queries. And finally, it even preserves, if not improves, performance (depending on the query). All of our requirements were met with Redshift Spectrum, and it was easy, fast, and cost-effective to implement.

Moving forward, we view Redshift Spectrum as a critical tool to scaling our data capabilities on AWS to the next order of magnitude. Redshift Spectrum allows us to define Amazon Redshift tables backed by data stored in Amazon S3. In this way, Redshift Spectrum turns Amazon Redshift into a hybrid data warehouse/data lake. It provides the best of both worlds—unlimited storage separated from compute costs, plus the performance and converged workflow of Amazon Redshift itself.

Unlocking Redshift Spectrum

Using best practices can provide massive performance improvements in Redshift Spectrum. We shared our Spectrify implementation best practices in an open-source Python library in GitHub to help others reap the same rewards that we have. Following are further details on our top suggestions for optimizing Redshift Spectrum, which you will find incorporated into Spectrify.

Use Apache Parquet

Apache Parquet is a columnar file format. When you store your data in Parquet, Redshift Spectrum can scan only the column data relevant to your query. In contrast, row-oriented formats like CSV or JSON require a scan of every column, regardless of the query. For Narrativ’s workload, using Parquet led to large performance gains and drastically lower bills.

Columnar formats also enable more efficient data compression. Data compresses better when similar data is grouped together. In this case, we saw about an 80 percent reduction in gzipped Parquet files vs. gzipped CSVs. This reduction means data can be ingested, and thus scanned, much faster.

When we first investigated converting our data to Parquet, we found no options that fit our needs. We don’t use Spark or any other project from the Hadoop ecosystem, which is the most common toolset for creating Parquet files. A few Python projects had write support for Parquet. However, we had reservations about data consistency caveats, especially in relation to nullable columns. Additionally, we didn’t find a library that supported the correct timestamp format.

We built Spectrify to solve all these problems. We worked with the Apache Arrow project to ensure full compatibility between Redshift Spectrum and Parquet files generated by Spectrify.

Partition your data

Narrativ stores timestamped event data in Redshift Spectrum. We partition our data on event creation date, allowing Redshift Spectrum to skip inspecting most of our data files for many of our queries. Generally, our queries are concerned with a small subset of the time domain, such as a month or even just a day (for example, answering a specific question from Black Friday 2017). Queries are rarely concerned with to the entire years-long history.

This drastically reduces the amount of data scanned, and we see most queries complete in one-tenth the time. Also, our bill is much lower. Partitioning alone reduced our average cost per query by a factor of 20.

Summary

As we prepared for an order-of-magnitude increase in data volumes, we turned to a data lake strategy built on Amazon S3 using Amazon Redshift and Redshift Spectrum for analytics. Redshift Spectrum enabled us to query data across S3 and Amazon Redshift, maintain or improve query performance, and do it easily and cost-effectively. Along the way, we learned best practices and built Spectrify to streamline data preparation and analytics even further. We hope that sharing our code examples and what we learned will also help you achieve similar results with your growing data.

For more information, see our Spectrify Usage documentation and code samples, and our full example on GitHub.

 


Additional Reading

If you found this post useful, be sure to check out Using Amazon Redshift Spectrum, Amazon Athena, and AWS Glue with Node.js in Production and How I built a data warehouse using Amazon Redshift and AWS services in record time.

 


About the Author

Colin Nichols leads big data initiatives at Narrativ. He enjoys working at the intersection of systems, and has a strong record of shaping engineering culture around the tenets of quality, automation, and continuous learning. In his spare time, Colin enjoys teaching his dog, MacGyver, new tricks.

[$] Resource control at Facebook

Post Syndicated from jake original https://lwn.net/Articles/764761/rss

Facebook runs a lot of programs and it tries to pack as many as it can onto
each machine. That means running close to—and sometimes beyond—the
resource limits on any given machine. How the system reacts when, for example,
memory is exhausted, makes a big difference in Facebook getting its work
done. Tejun Heo came to 2018
Open Source Summit North America
to describe the resource control
work that has been done by the team he works on at Facebook.

AWS Data Transfer Price Reductions – Up to 34% (Japan) and 28% (Australia)

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-data-transfer-price-reductions-up-to-34-japan-and-28-australia/

I’ve got good good news for AWS customers who make use of our Asia Pacific (Tokyo) and Asia Pacific (Sydney) Regions. Effective September 1, 2018 we are reducing prices for data transfer from Amazon Elastic Compute Cloud (EC2), Amazon Simple Storage Service (S3), and Amazon CloudFront by up to 34% in Japan and 28% in Australia.

EC2 and S3 Data Transfer
Here are the new prices for data transfer from EC2 and S3 to the Internet:

EC2 & S3 Data Transfer Out to Internet Japan Australia
Old Rate New Rate Change Old Rate New Rate Change
Up to 1 GB / Month $0.000 $0.000 0% $0.000 $0.000 0%
Next 9.999 TB / Month $0.140 $0.114 -19% $0.140 $0.114 -19%
Next 40 TB / Month $0.135 $0.089 -34% $0.135 $0.098 -27%
Next 100 TB / Month $0.130 $0.086 -34% $0.130 $0.094 -28%
Greater than 150 TB / Month $0.120 $0.084 -30% $0.120 $0.092 -23%

You can consult the EC2 Pricing and S3 Pricing pages for more information.

CloudFront Data Transfer
Here are the new prices for data transfer from CloudFront edge nodes to the Internet

CloudFront Data Transfer Out to Internet Japan Australia
Old Rate New Rate Change Old Rate New Rate Change
Up to 10 TB / Month $0.140 $0.114 -19% $0.140 $0.114 -19%
Next 40 TB / Month $0.135 $0.089 -34% $0.135 $0.098 -27%
Next 100 TB / Month $0.120 $0.086 -28% $0.120 $0.094 -22%
Next 350 TB / Month $0.100 $0.084 -16% $0.100 $0.092 -8%
Next 524 TB / Month $0.080 $0.080 0% $0.095 $0.090 -5%
Next 4 PB / Month $0.070 $0.070 0% $0.090 $0.085 -6%
Over 5 PB / Month $0.060 $0.060 0% $0.085 $0.080 -6%

Visit the CloudFront Pricing page for more information.

We have also reduced the price of data transfer from CloudFront to your Origin. The price for CloudFront Data Transfer to Origin from edge locations in Australia has been reduced 20% to $0.080 per GB. This represents content uploads via POST and PUT.

Things to Know
Here are a couple of interesting things that you should know about AWS and data transfer:

AWS Free Tier – You can use the AWS Free Tier to get started with, and to learn more about, EC2, S3, CloudFront, and many other AWS services. The AWS Getting Started page contains lots of resources to help you with your first project.

Data Transfer from AWS Origins to CloudFront – There is no charge for data transfers from an AWS origin (S3, EC2, Elastic Load Balancing, and so forth) to any CloudFront edge location.

CloudFront Reserved Capacity Pricing – If you routinely use CloudFront to deliver 10 TB or more content per month, you should investigate our Reserved Capacity pricing. You can receive a significant discount by committing to transfer 10 TB or more content from a single region, with additional discounts at higher levels of usage. To learn more or to sign up, simply Contact Us.

Jeff;

 

LLVM 7.0.0 released

Post Syndicated from corbet original https://lwn.net/Articles/765536/rss

Version 7.0.0 of the LLVM compiler suite is out.
It is the result of the community’s work
over the past six months, including: function multiversioning in Clang
with the ‘target’ attribute for ELF-based x86/x86_64 targets, improved
PCH support in clang-cl, preliminary DWARF v5 support, basic support
for OpenMP 4.5 offloading to NVPTX, OpenCL C++ support, MSan, X-Ray
and libFuzzer support for FreeBSD, early UBSan, X-Ray and libFuzzer
support for OpenBSD, UBSan checks for implicit conversions, many
long-tail compatibility issues fixed in lld which is now production
ready for ELF, COFF and MinGW, new tools llvm-exegesis, llvm-mca and
diagtool
“.
The list of new features
is long; see
the
overall release notes
,
the
Clang release notes
,
the
Clang tools release notes
, and
the
LLD linker release notes
for more information.

Watching VinylVideo with a Raspberry Pi A+

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/vinylvideo-with-raspberry-pi/

Play back video and sound on your television using your turntable and the VinylVideo converter, as demonstrated by YouTuber TechMoan.

VinylVideo – Playing video from a 45rpm record

With a VinylVideo convertor you can play video from a vinyl record played on a standard record player. Curiosity, tech-demo or art?

A brief history of VinylVideo

When demand for vinyl dipped in the early nineties, Austrian artist Gebhard Sengmüller introduced the world to his latest creation: VinylVideo. With VinylVideo you can play audio and visuals from an LP vinyl record using a standard turntable and a converter box plugged into a television set.

Gebhard Sengmüller original VinylVideo

While the project saw some interest throughout the nineties and early noughties, in the end only 20 conversion sets were ever produced.

However, when fellow YouTuber Randy Riddle (great name) got in touch with UK-based tech enthusiast TechMoan to tell him about a VinylVideo revival device becoming available, TechMoan had no choice but to invest.

Where the Pi comes in

After getting the VinylVideo converter box to work with an old Sony CRT unit, TechMoan decided to take apart the box to better understand how it works

You’ll notice a familiar logo at the top right there. Yes, it’s using a Raspberry Pi, a model A+ to be precise, to do the video decoding and output. It makes sense in a low-volume operation — use something that’s ready-made rather than getting a custom-made board done that you probably have to buy in batches of a thousand from China.

There’s very little else inside the sturdy steel casing, but what TechMoan’s investigation shows is that the Pi is connected to a custom-made phono preamp via USB and runs software written specifically for the VinylVideo conversion and playback.

Using Raspberry Pi for VinylVideo playback

For more information on the original project, visit the extremely dated VinylVideo website. And for more on the new product, you can visit the revival converter’s website.

Be sure to subscribe to TechMoan’s YouTube channel for more videos, and see how you can support him on Patreon.

And a huge thank you to David Ferguson for the heads-up! You can watch David talk about his own Raspberry Pi project, PiBakery, on our YouTube channel.

The post Watching VinylVideo with a Raspberry Pi A+ appeared first on Raspberry Pi.

Pegasus Spyware Used in 45 Countries

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2018/09/pegasus_spyware.html

Citizen Lab has published a new report about the Pegasus spyware. From a ZDNet article:

The malware, known as Pegasus (or Trident), was created by Israeli cyber-security firm NSO Group and has been around for at least three years — when it was first detailed in a report over the summer of 2016.

The malware can operate on both Android and iOS devices, albeit it’s been mostly spotted in campaigns targeting iPhone users primarily. On infected devices, Pegasus is a powerful spyware that can do many things, such as record conversations, steal private messages, exfiltrate photos, and much much more.

From the report:

We found suspected NSO Pegasus infections associated with 33 of the 36 Pegasus operators we identified in 45 countries: Algeria, Bahrain, Bangladesh, Brazil, Canada, Cote d’Ivoire, Egypt, France, Greece, India, Iraq, Israel, Jordan, Kazakhstan, Kenya, Kuwait, Kyrgyzstan, Latvia, Lebanon, Libya, Mexico, Morocco, the Netherlands, Oman, Pakistan, Palestine, Poland, Qatar, Rwanda, Saudi Arabia, Singapore, South Africa, Switzerland, Tajikistan, Thailand, Togo, Tunisia, Turkey, the UAE, Uganda, the United Kingdom, the United States, Uzbekistan, Yemen, and Zambia. As our findings are based on country-level geolocation of DNS servers, factors such as VPNs and satellite Internet teleport locations can introduce inaccuracies.

Six of those countries are known to deploy spyware against political opposition: Bahrain, Kazakhstan, Mexico, Morocco, Saudi Arabia, and the United Arab Emirates.

Also note:

On 17 September 2018, we then received a public statement from NSO Group. The statement mentions that “the list of countries in which NSO is alleged to operate is simply inaccurate. NSO does not operate in many of the countries listed.” This statement is a misunderstanding of our investigation: the list in our report is of suspected locations of NSO infections, it is not a list of suspected NSO customers. As we describe in Section 3, we observed DNS cache hits from what appear to be 33 distinct operators, some of whom appeared to be conducting operations in multiple countries. Thus, our list of 45 countries necessarily includes countries that are not NSO Group customers. We describe additional limitations of our method in Section 4, including factors such as VPNs and satellite connections, which can cause targets to appear in other countries.

Motherboard article. Slashdot and Boing Boing posts.

How to Run Headless Front-End Tests with AWS Cloud9 and AWS CodeBuild

Post Syndicated from Eric Z. Beard original https://aws.amazon.com/blogs/devops/how-to-run-headless-front-end-tests-with-aws-cloud9-and-aws-codebuild/

Automated testing is a critical component to a well-designed software development lifecycle. When you test front-end applications, you often use a browser in combination with testing frameworks. A headless browser is one that is used on a server that does not normally need to run visual applications. In this blog post, I will show you how to configure AWS Cloud9 and AWS CodeBuild to support testing an Angular application with the headless version of Chrome. AWS Cloud9 has deep integration with services such as AWS Lambda, and the environment is easily accessible anywhere, from any internet-connected device.

AWS Cloud9

By default, Cloud9 runs on an Amazon EC2 instance that is managed for you. You can also run it on any Linux machine that is accessible through SSH.

First, create a Cloud9 environment.

  1. Sign in to the AWS Management Console, scroll down to Developer Tools, and choose Cloud9.
  2. On the following page, choose Create Environment.
  3. Enter a name for your environment and then choose Next Step.
  4. On the following page, leave the defaults for the time being and click Next Step.
  5. On the following page, choose Create Environment.

It might take a few minutes for your environment to initialize. Behind the scenes, an EC2 instance is created for you in the region you have currently selected in the console. In the environment, press Alt-T to bring up a bash terminal tab. For the remaining steps in this post, you will enter commands into this tab.

There is a lot to take in if this is your first time using Cloud9. If you need help getting set up or want to learn more, see the Cloud9 User Guide.

Install and configure Angular

The first thing we will do in our new environment is to install and configure an Angular application.

  1. Upgrade Node to the latest version supported by AWS Lambda. (At the time of this writing, that’s 8.10.)
    nvm install 8.10
  2. Install the Angular CLI using npm, the Node Package Manager. Install it as a global package with the –g option so that it is available to run from anywhere in your environment.
    npm install -g @angular/cli
  3. Use the Angular CLI to create an Angular application.
    ng new my-app
    cd my-app/
  4. Run the application to make sure everything is working as expected. To preview a running application in Cloud9, the app must run on a specific port. With Angular, you must disable the default host header check.
    ng serve --port 8080 --host localhost --disable-host-check

     

    On the toolbar, next to Run, choose Preview and then choose Preview Running Application. You should see something like this:

  5. Press Ctrl-C to stop serving and then in the my-app directory, try to test your application.
    cd ..
    ng test --watch=false

    That obviously doesn’t work the way you would expect it to on a regular workstation. The testing framework can’t find Chrome because we are running on a headless EC2 instance. To start addressing the problem, first install a package called Puppeteer as a development dependency in your application.

    I’d like to give credit here to Alex Bainter, a software developer who wrote a comprehensive blog post about replacing PhantomJS with headless Chromium and Karma. His post was extremely helpful to me when I had to figure this out for the first time.

  6. Install Puppeteer and its dependencies.
    npm i -D puppeteer
    npm i –D @angular-devkit/build-angular
  7. You can get a good look at the missing Chrome libraries by running the ldd command on the binary that comes with Puppeteer.
    cd node_modules/puppeteer/.local-chromium/linux-564778/chrome-linux/

    (By the time you read this post, the version number in that path will probably be different. Look in the puppeteer/.local-chromium directory to see what it is for your installation.)

    ldd chrome | grep not

    You should see output that looks like this:

    libXcursor.so.1 => not found
    libXdamage.so.1 => not found
    libXfixes.so.3 => not found
    libcups.so.2 => not found
    libXss.so.1 => not found
    libXrandr.so.2 => not found
    libpangocairo-1.0.so.0 => not found
    libpango-1.0.so.0 => not found
    libcairo.so.2 => not found
    libatk-1.0.so.0 => not found
    libatk-bridge-2.0.so.0 => not found
    libgtk-3.so.0 => not found
    libgdk-3.so.0 => not found
    libgdk_pixbuf-2.0.so.0 => not found

 

Install headless Chrome

Now comes the tricky part. Installing headless Chrome on an Amazon Linux EC2 instance is no simple task. One strategy is to install the various dependencies by compiling from source, but the chain of dependencies for Chrome, which includes gtk+ and glib, soon gets out of hand. I found another blogger who solved the problem by borrowing from the CentOS and Fedora package repositories. Thanks to Yuanyi for this part of the solution.

  1. Install yum packages to cover basic dependencies.
    sudo yum install -y libXcursor libXdamage libcups libXss libXrandr \
        cups-libs dbus-glib libXinerama cairo cairo-gobject pango
  2. Borrow packages from CentOS and Fedora.
    sudo rpm -ivh --nodeps http://mirror.centos.org/centos/7/os/x86_64/Packages/atk-2.22.0-3.el7.x86_64.rpm
    sudo rpm -ivh --nodeps http://mirror.centos.org/centos/7/os/x86_64/Packages/at-spi2-atk-2.22.0-2.el7.x86_64.rpm
    sudo rpm -ivh --nodeps http://mirror.centos.org/centos/7/os/x86_64/Packages/at-spi2-core-2.22.0-1.el7.x86_64.rpm
    sudo rpm -ivh --nodeps http://dl.fedoraproject.org/pub/archive/fedora/linux/releases/20/Fedora/x86_64/os/Packages/g/GConf2-3.2.6-7.fc20.x86_64.rpm
    sudo rpm -ivh --nodeps http://dl.fedoraproject.org/pub/archive/fedora/linux/releases/20/Fedora/x86_64/os/Packages/l/libXScrnSaver-1.2.2-6.fc20.x86_64.rpm
    sudo rpm -ivh --nodeps http://dl.fedoraproject.org/pub/archive/fedora/linux/releases/20/Fedora/x86_64/os/Packages/l/libxkbcommon-0.3.1-1.fc20.x86_64.rpm
    sudo rpm -ivh --nodeps http://dl.fedoraproject.org/pub/archive/fedora/linux/releases/20/Fedora/x86_64/os/Packages/l/libwayland-client-1.2.0-3.fc20.x86_64.rpm
    sudo rpm -ivh --nodeps http://dl.fedoraproject.org/pub/archive/fedora/linux/releases/20/Fedora/x86_64/os/Packages/l/libwayland-cursor-1.2.0-3.fc20.x86_64.rpm
    sudo rpm -ivh --nodeps http://dl.fedoraproject.org/pub/archive/fedora/linux/releases/20/Fedora/x86_64/os/Packages/g/gtk3-3.10.4-1.fc20.x86_64.rpm
    sudo rpm -ivh --nodeps http://dl.fedoraproject.org/pub/archive/fedora/linux/releases/16/Fedora/x86_64/os/Packages/gdk-pixbuf2-2.24.0-1.fc16.x86_64.rpm
  3. Edit src/karma.conf.js to require Puppeteer and set the CHROME_BIN environment variable. Here is the full content of that file after the changes.
    const puppeteer = require("puppeteer");
    process.env.CHROME_BIN = puppeteer.executablePath();
    
    module.exports = function (config) {
        config.set({
            basePath: '',
            frameworks: ['jasmine', ' @angular-devkit/build-angular'],
            plugins: [
                require('karma-jasmine'),
                require('karma-chrome-launcher'),
                require('karma-jasmine-html-reporter'),
                require('karma-coverage-istanbul-reporter'),
               require('@angular-devkit/build-angular/plugins/karma')
            ],
            client:{
                clearContext: false // leave Jasmine Spec Runner output visible in browser
            },
        coverageIstanbulReporter: {
            reports: [ 'html', 'lcovonly' ],
            fixWebpackSourcePaths: true
        },
        angularCli: {
            environment: 'dev'
        },
        reporters: ['progress', 'kjhtml'],
        port: 8080,
        colors: true,
        logLevel: config.LOG_INFO,
        autoWatch: true,
        browsers: ['ChromeHeadlessNoSandbox'],
        customLaunchers: {
            ChromeHeadlessNoSandbox: {
                base: 'ChromeHeadless',
                flags: ['--no-sandbox']
            }
        },
        singleRun: false
    
    });
    
    };
  4. Make a small adjustment to your test specification in src/app/app.component.spec.ts so that it is checking for the title in the test called "should render title in a h1 tag". Run ng test again.
    ng test --watch=false

If you see that green SUCCESS indicator, then you have done it! You installed Angular and created an application, installed Puppeteer, and by filling in the missing libraries for Chrome, you made it possible to run headless Chrome tests in Cloud9!

AWS CodeBuild

The next piece of the puzzle is your CI/CD pipeline. When a developer checks in new code, you want to test that code with a continuous integration tool like AWS CodeBuild. With CodeBuild, the problem related to headless Chrome is slightly different than it was with Cloud9, because the default build environment for Node apps is an Ubuntu image. You still need to install Chromium and its dependencies, but Ubuntu packages make it easier.

  1. Navigate to the CodeBuild console and create a new build project. Give it a name and configure the source repository. You will need to store your code for this exercise with one of the providers listed later so that CodeBuild knows where to find it when you start a build. Since you are already logged in to the AWS console, AWS CodeCommit is a good option, but you could also choose Amazon S3, Bitbucket, or GitHub.
  2. Configure the build environment. For Operating system, choose Ubuntu. For Runtime, choose Node.js. You can specify your own container image for the build, but the buildspec.yml described in step 3 works out of the box with the default image.
  3. For the build specification, provide the following buildspec.yml file in the root directory of the source code repository.
    
    version: 0.1
    phases:
      install:
        commands:
    
          # Install the Angular CLI
          - npm install -g @angular/cli
    
          # Install puppeteer as a dev dependency
          - npm i -D puppeteer
          - npm i –D @angular-devkit/build-angular
    
          # Print out missing libs
          - echo "Missing Libs" || ldd ./node_modules/puppeteer/.local-chromium/linux-549031/chrome-linux/chrome | grep not
    
          # Upgrade apt
          - apt-get upgrade
    
          # Update libs
          - apt-get update
    
          # Install apt-transport-https
          - apt-get install -y apt-transport-https
    
          # Use apt to install the Chrome dependencies
          - apt-get install -y libxcursor1
          - apt-get install -y libgtk-3-dev
          - apt-get install -y libxss1
          - apt-get install -y libasound2
          - apt-get install -y libnspr4
          - apt-get install -y libnss3
          - apt-get install -y libx11-xcb1
    
          # Print out missing libs
          - echo "Missing Libs" || ldd ./node_modules/puppeteer/.local-chromium/linux-549031/chrome-linux/chrome | grep not
    
          # Install project dependencies
          - npm install
    
      pre_build:
        commands
    	  - echo "Nothing to pre_build"
    
      build:
        commands:
    
          - printenv 
    
          # Build the project
          - ng build
    
          # Run headless Chrome tests
          - ng test --watch=false
          - printenv
    
      post_build:
        commands:
    
          - printenv
    
          # Deploy the project to S3
    
          - if [ ${CODEBUILD_BUILD_SUCCEEDING}=1 ]; then aws s3 sync --delete dist/ "s3://${BUCKET_NAME}"; else echo "Skipping aws sync"; fi
    
    artifacts:
      files:
        - src/*
    
    

    Feel free to remove those ldd and printenv statements, but it is worth taking a look at the output to get a better understanding of what is going on with the build.

  4. Specify the location for artifacts. The following step isn’t required, but it makes it possible to incorporate the build project into AWS CodePipeline.
  5. Expand Advanced Settings and configure an environment variable for the website bucket name.
  6. Configure the buckets. CodeBuild can’t write to the S3 buckets unless you give the service explicit permissions to do so. This is one of the most common causes of build failures for projects that involve S3. Attach the following policy to the CodeBuild service role to give it access to those buckets. Choose Continue and Save to create the build project, and then navigate to the IAM console and search for the CodeBuild service role that was just created for you. Add this as an inline policy.
    
    {
    	"Version": "2012-10-17",
    	"Statement": [
    		{
    			"Sid": "VisualEditor0",
    			"Effect": "Allow",
    			"Action": "s3:*",
    			"Resource": [
    				"arn:aws:s3:::YOUR_BUCKET_FOR_ARTIFACTS",
    				"arn:aws:s3:::YOUR_BUCKET_FOR_ARTIFACTS /*"
    			]
    		},
    		{
    			"Sid": "VisualEditor1",
    			"Effect": "Allow",
    			"Action": "s3:*",
    			"Resource": [
    				"arn:aws:s3:::YOUR_BUCKET_FOR_THE_WEBSITE",
    				"arn:aws:s3:::YOUR_BUCKET_FOR_THE_WEBSITE /*"
    			]
    		}
    	]
    }
    
    
  7. You should now be able to start the build and see that the compiled website has been copied to your S3 bucket after the build is complete.

 

Alternative Cloud9 installation using SSH and Ubuntu

You can run the Cloud9 IDE from a Linux machine that you create, rather than letting Cloud9 provision it for you. Create a Cloud9 environment and choose Connect and run in remote server. For more information about this type of setup, see Creating an SSH Environment in the AWS Cloud9 User Guide.

After you have configured the environment, the work you have to do is much simpler than on the Amazon Linux instance, because there are Ubuntu packages that install the required dependencies. Follow the instructions earlier in this post until you get to the “Install headless Chrome” section. Issue this command:

sudo apt install -y libxcursor1 libgtk-3-dev libxss1 libasound2 libnspr4 libnss3

You don’t need to borrow from any of the CentOS or Fedora repositories.

Make changes to karma.conf.js as described earlier and you should then be ready to test your application.

 

Summary

You are now able to run headless integration tests using Cloud9 by installing Puppeteer and filling in the required Chrome dependencies. You can also extend this to the container image used to test your application with CodeBuild. Automated testing is vital to a trustworthy DevOps pipeline, and Cloud9 opens up new possibilities for developers of all types, including front-end developers.

Happy coding! –EZB

By continuing to use the site, you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.

Close