Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/technical-university-denmark-speakers/
Students taking Design of Mechatronics at the Technical University of Denmark have created some seriously elegant and striking Raspberry Pi speakers. Their builds are part of a project asking them to “explore, design and build a 3D printed speaker, around readily available electronics and components”.
The students have been uploading their designs, incorporating Raspberry Pis and HiFiBerry HATs, to Thingiverse throughout April. The task is a collaboration with luxury brand Bang & Olufsen’s Create initiative, and the results wouldn’t look out of place in a high-end showroom; I’d happily take any of these home.
Søren Qvist’s wall-mounted kitchen sphere uses 3D-printed and laser-cut parts, along with the HiFiBerry HAT and B&O speakers to create a sleek-looking design.
Otto Ømann’s group have designed the Hex One – a work-in-progress wireless 360° speaker. A particular objective for their project is to create a speaker using as many 3D-printed parts as possible.
“The design is supposed to resemble that of a B&O speaker, and from a handful of categories we chose to create a portable and wearable speaker,” explain Gustav Larsen and his team.
Oliver Repholtz Behrens and team have housed a Raspberry Pi and HiFiBerry HAT inside this this stylish airplay speaker. You can follow their design progress on their team blog.
Tue Thomsen’s six-person team Mechatastic have produced the B&O TILE. “The speaker consists of four 3D-printed cabinet and top parts, where the top should be covered by fabric,” they explain. “The speaker insides consists of laser-cut wood to hold the tweeter and driver and encase the Raspberry Pi.”
The team aimed to design a speaker that would be at home in a kitchen. With a removable upper casing allowing for a choice of colour, the TILE can be customised to fit particular tastes and colour schemes.
Build your own speakers with Raspberry Pis
Raspberry Pi’s onboard audio jack, along with third-party HATs such as the HiFiBerry and Pimoroni Speaker pHAT, make speaker design and fabrication with the Pi an interesting alternative to pre-made tech. These builds don’t tend to be technically complex, and they provide some lovely examples of tech-based projects that reflect makers’ own particular aesthetic style.
If you have access to a 3D printer or a laser cutter, perhaps at a nearby maker space, then those can be excellent resources, but fancy kit isn’t a requirement. Basic joinery and crafting with card or paper are just a couple of ways you can build things that are all your own, using familiar tools and materials. We think more people would enjoy getting hands-on with this sort of thing if they gave it a whirl, and we publish a free magazine to help.
Looking for a new project to build around the Raspberry Pi Zero, I came across the pHAT DAC from Pimoroni. This little add-on board adds audio playback capabilities to the Pi Zero. Because the pHAT uses the GPIO pins, the USB OTG port remains available for a wifi dongle.
This video by Frederick Vandenbosch is a great example of building AirPlay speakers using a Pi and HAT, and a quick search will find you lots more relevant tutorials and ideas.
Have you built your own? Share your speaker-based Pi builds with us in the comments.
The post 3D-printed speakers from the Technical University of Denmark appeared first on Raspberry Pi.
Post Syndicated from Kausalya Rani Krishna Samy original https://aws.amazon.com/blogs/devops/cicd-with-data-enabling-data-portability-in-a-software-delivery-pipeline-with-aws-developer-tools-kubernetes-and-portworx/
Data is the soul of an application. As containers make it easier to package and deploy applications faster, testing plays an even more important role in the reliable delivery of software. Given that all applications have data, development teams want a way to reliably control, move, and test using real application data or, at times, obfuscated data.
For many teams, moving application data through a CI/CD pipeline, while honoring compliance and maintaining separation of concerns, has been a manual task that doesn’t scale. At best, it is limited to a few applications, and is not portable across environments. The goal should be to make running and testing stateful containers (think databases and message buses where operations are tracked) as easy as with stateless (such as with web front ends where they are often not).
Why is state important in testing scenarios? One reason is that many bugs manifest only when code is tested against real data. For example, we might simply want to test a database schema upgrade but a small synthetic dataset does not exercise the critical, finer corner cases in complex business logic. If we want true end-to-end testing, we need to be able to easily manage our data or state.
In this blog post, we define a CI/CD pipeline reference architecture that can automate data movement between applications. We also provide the steps to follow to configure the CI/CD pipeline.
Stateful Pipelines: Need for Portable Volumes
As part of continuous integration, testing, and deployment, a team may need to reproduce a bug found in production against a staging setup. Here, the hosting environment is comprised of a cluster with Kubernetes as the scheduler and Portworx for persistent volumes. The testing workflow is then automated by AWS CodeCommit, AWS CodePipeline, and AWS CodeBuild.
Portworx offers Kubernetes storage that can be used to make persistent volumes portable between AWS environments and pipelines. The addition of Portworx to the AWS Developer Tools continuous deployment for Kubernetes reference architecture adds persistent storage and storage orchestration to a Kubernetes cluster. The example uses MongoDB as the demonstration of a stateful application. In practice, the workflow applies to any containerized application such as Cassandra, MySQL, Kafka, and Elasticsearch.
Using the reference architecture, a developer calls CodePipeline to trigger a snapshot of the running production MongoDB database. Portworx then creates a block-based, writable snapshot of the MongoDB volume. Meanwhile, the production MongoDB database continues serving end users and is uninterrupted.
Without the Portworx integrations, a manual process would require an application-level backup of the database instance that is outside of the CI/CD process. For larger databases, this could take hours and impact production. The use of block-based snapshots follows best practices for resilient and non-disruptive backups.
As part of the workflow, CodePipeline deploys a new MongoDB instance for staging onto the Kubernetes cluster and mounts the second Portworx volume that has the data from production. CodePipeline triggers the snapshot of a Portworx volume through an AWS Lambda function, as shown here
AWS Developer Tools with Kubernetes: Integrated Workflow with Portworx
In the following workflow, a developer is testing changes to a containerized application that calls on MongoDB. The tests are performed against a staging instance of MongoDB. The same workflow applies if changes were on the server side. The original production deployment is scheduled as a Kubernetes deployment object and uses Portworx as the storage for the persistent volume.
The continuous deployment pipeline runs as follows:
- Developers integrate bug fix changes into a main development branch that gets merged into a CodeCommit master branch.
- Amazon CloudWatch triggers the pipeline when code is merged into a master branch of an AWS CodeCommit repository.
- AWS CodePipeline sends the new revision to AWS CodeBuild, which builds a Docker container image with the build ID.
- AWS CodeBuild pushes the new Docker container image tagged with the build ID to an Amazon ECR registry.
- Kubernetes downloads the new container (for the database client) from Amazon ECR and deploys the application (as a pod) and staging MongoDB instance (as a deployment object).
- AWS CodePipeline, through a Lambda function, calls Portworx to snapshot the production MongoDB and deploy a staging instance of MongoDB• Portworx provides a snapshot of the production instance as the persistent storage of the staging MongoDB
• The MongoDB instance mounts the snapshot.
At this point, the staging setup mimics a production environment. Teams can run integration and full end-to-end tests, using partner tooling, without impacting production workloads. The full pipeline is shown here.
This reference architecture showcases how development teams can easily move data between production and staging for the purposes of testing. Instead of taking application-specific manual steps, all operations in this CodePipeline architecture are automated and tracked as part of the CI/CD process.
This integrated experience is part of making stateful containers as easy as stateless. With AWS CodePipeline for CI/CD process, developers can easily deploy stateful containers onto a Kubernetes cluster with Portworx storage and automate data movement within their process.
The reference architecture and code are available on GitHub:
● Reference architecture: https://github.com/portworx/aws-kube-codesuite
● Lambda function source code for Portworx additions: https://github.com/portworx/aws-kube-codesuite/blob/master/src/kube-lambda.py
Post Syndicated from Randall Hunt original https://aws.amazon.com/blogs/aws/iot-analytics-now-generally-available/
Today, I’m pleased to announce that, as of April 24th 2018, the AWS IoT Analytics service is generally available. Customers can use IoT Analytics to clean, process, encrich, store, and analyze their connected device data at scale. AWS IoT Analytics is now available in US East (N. Virginia), US West (Oregon), US East (Ohio), and EU (Ireland). In November of last year, my colleague Tara Walker wrote an excellent post that walks through some of the features of the AWS IoT Analytics service and Ben Kehoe (an AWS Community Hero and Research Scientist at iRobot) spoke at AWS Re:Invent about replacing iRobot’s existing “rube goldberg machine” for forwarding data into an elasticsearch cluster with AWS IoT Analytics.
Iterating on customer feedback received during the service preview the AWS IoT Analytics team has added a number of new features including the ability to ingest data from external souces using the BatchPutMessage API, the ability to set a data retention policy on stored data, the ability to reprocess existing data, preview pipeline results, and preview messages from channels with the SampleChannelData API.
Let’s cover the core concepts of IoT Analytics and then walk through an example.
AWS IoT Analytics Concepts
AWS IoT Analytics can be broken down into a few simple concepts. For data preparation customers have: Channels, Pipelines, and Data Stores. For analyzing data customers have: Datasets and Notebooks.
- Channels are the entry point into IoT Analytics and they collect data from an existing IoT Core MQTT topic or from external sources that send messages to the channel using the Ingestion API. Channels are elastically scalable and consume messages in Binary or JSON format. Channels also immutably store raw device data for easily reprocessing using different logic if your needs change.
- Pipelines consume messages from channels and allow you to process messages with steps, called activities, such as filtering on attributes, transforming the content of the message by adding or remvoing fields, invoking lambda functions for complex transformations and adding data from external data sources, or even enriching the messages with data from IoT Core. Pipelines output their data to a Data Store.
- Data Stores are a queryable IoT-optimized data storage solution for the output of your pipelines. Data stores support custom retention periods to optimize costs. When a customer queries a Data Store the result is put into a Dataset.
- Datasets are similar to a view in a SQL database. Customers create a dataset by running a query against a data store. Data sets can be generated manually or on a recurring schedule.
- Notebooks are Amazon SageMaker hosted Jupyter notebooks that let customers analyze their data with custom code and even build or train ML models on the data. IoT Analytics offers several notebook templates with pre-authored models for common IoT use cases such as Predictive Maintenance, Anomaly Detection, Fleet Segmentation, and Forecasting.
Additionally, you can use IoT analytics as a data source for Amazon QuickSight for easy visualizations of your data. You can find pricing information for each of these services on the AWS IoT Analytics Pricing Page.
IoT Analytics Walkthrough
While this walkthrough uses the console everything shown here is equally easy to do with the CLI. When we first navigate to the console we have a helpful guide telling us to build a channel, pipeline, and a data store:
Our first step is to create a channel. I already have some data into an MQTT channel with IoT core so I’ll select that channel. First we’ll name the channel and select a retention period.
Now, I’ll select my IoT Core topic and grab the data. I can also post messages directly into the channel with the PutMessages APIs.
Now that I have a channel my next step is to create a pipeline. To do this I’ll select “Create a pipeline from this channel” from the “Actions” drop down.
Now, I’ll walk through the pipeline wizard giving my pipeline a name and a source.
I’ll select which of the message attributes the pipeline should expect. This can draw from the channel with the sampling API and guess at which attributes are needed or I could upload a specification in JSON.
Next I define the pipeline activities. If I’m dealing with binary data I need a lambda function to first deserialize the message into JSON so the other filter functions can operate on it. I can create filters, calculate attributes based on other attributes, and I can also enrich the message with metadata from IoT core registry.
For now I just want to filter out some messages and make a small transform with a Lambda function.
Finally, I choose or create a data store to output the results of my pipeline.
Now that I have a data store, I can create a view of that data by creating a data set.
I’ll just select all the data from the data store for this dataset but I could also select individual attributes as needed.
I have a data set! I can adjust the cron expression in the schedule to re-run this as frequently or infrequently as I wish.
If I want to create a model from my data I can create a SageMaker powered Jupyter notebook. There are a few templates that are great starting points like anomaly detection or output forecasting.
Here you can see an example of the anomaly detection notebook.
Finally, if I want to create simple visualizations of my data I can use QuickSight to bring in an IoT Analytics data set.
Let Us Know
I’m excited to see what customers build with AWS IoT Analytics. My colleagues on the IoT teams are eager to hear your feedback about the service so please let us know in the comments or on Twitter what features you want to see.
Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/deep-learning-pokedex/
Squeal with delight as your inner Pokémon trainer witnesses the wonder of Adrian Rosebrock’s deep learning Pokédex.
This video demos a real-like Pokedex, complete with visual recognition, that I created using a Raspberry Pi, Python, and Deep Learning. You can find the entire blog post, including code, using this link: https://www.pyimagesearch.com/2018/04/30/a-fun-hands-on-deep-learning-project-for-beginners-students-and-hobbyists/ Music credit to YouTube user “No Copyright” for providing royalty free music: https://www.youtube.com/watch?v=PXpjqURczn8
The history of Pokémon in 30 seconds
The Pokémon franchise was created by video game designer Satoshi Tajiri in 1995. In the fictional world of Pokémon, Pokémon Trainers explore the vast landscape, catching and training small creatures called Pokémon. To date, there are 802 different types of Pokémon. They range from the ever recognisable Pikachu, a bright yellow electric Pokémon, to the highly sought-after Shiny Charizard, a metallic, playing-card-shaped Pokémon that your mate Alex claims she has in mint condition, but refuses to show you.
In the world of Pokémon, children as young as ten-year-old protagonist and all-round annoyance Ash Ketchum are allowed to leave home and wander the wilderness. There, they hunt vicious, deadly creatures in the hope of becoming a Pokémon Master.
Adrian’s deep learning Pokédex
Adrian is a bit of a deep learning pro, as demonstrated by his Santa/Not Santa detector, which we wrote about last year. For that project, he also provided a great explanation of what deep learning actually is. In a nutshell:
…a subfield of machine learning, which is, in turn, a subfield of artificial intelligence (AI).While AI embodies a large, diverse set of techniques and algorithms related to automatic reasoning (inference, planning, heuristics, etc), the machine learning subfields are specifically interested in pattern recognition and learning from data.
As with his earlier Raspberry Pi project, Adrian uses the Keras deep learning model and the TensorFlow backend, plus a few other packages such as Adrian’s own imutils functions and OpenCV.
Adrian trained a Convolutional Neural Network using Keras on a dataset of 1191 Pokémon images, obtaining 96.84% accuracy. As Adrian explains, this model is able to identify Pokémon via still image and video. It’s perfect for creating a Pokédex – an interactive Pokémon catalogue that should, according to the franchise, be able to identify and read out information on any known Pokémon when captured by camera. More information on model training can be found on Adrian’s blog.
For the physical build, a Raspberry Pi 3 with camera module is paired with the Raspberry Pi 7″ touch display to create a portable Pokédex. And while Adrian comments that the same result can be achieved using your home computer and a webcam, that’s not how Adrian rolls as a Raspberry Pi fan.
Plus, the smaller size of the Pi is perfect for one of you to incorporate this deep learning model into a 3D-printed Pokédex for ultimate Pokémon glory, pretty please, thank you.
Adrian has gone into impressive detail about how the project works and how you can create your own on his blog, pyimagesearch. So if you’re interested in learning more about deep learning, and making your own Pokédex, be sure to visit.
AWS Config enables continuous monitoring of your AWS resources, making it simple to assess, audit, and record resource configurations and changes. AWS Config does this through the use of rules that define the desired configuration state of your AWS resources. AWS Config provides a number of AWS managed rules that address a wide range of security concerns such as checking if you encrypted your Amazon Elastic Block Store (Amazon EBS) volumes, tagged your resources appropriately, and enabled multi-factor authentication (MFA) for root accounts. You can also create custom rules to codify your compliance requirements through the use of AWS Lambda functions.
In this post we’ll show you how to use AWS Config to monitor our Amazon Simple Storage Service (S3) bucket ACLs and policies for violations which allow public read or public write access. If AWS Config finds a policy violation, we’ll have it trigger an Amazon CloudWatch Event rule to trigger an AWS Lambda function which either corrects the S3 bucket ACL, or notifies you via Amazon Simple Notification Service (Amazon SNS) that the policy is in violation and allows public read or public write access. We’ll show you how to do this in five main steps.
- Enable AWS Config to monitor Amazon S3 bucket ACLs and policies for compliance violations.
- Create an IAM Role and Policy that grants a Lambda function permissions to read S3 bucket policies and send alerts through SNS.
- Create and configure a CloudWatch Events rule that triggers the Lambda function when AWS Config detects an S3 bucket ACL or policy violation.
- Create a Lambda function that uses the IAM role to review S3 bucket ACLs and policies, correct the ACLs, and notify your team of out-of-compliance policies.
- Verify the monitoring solution.
Note: This post assumes your compliance policies require the buckets you monitor not allow public read or write access. If you have intentionally open buckets serving static content, for example, you can use this post as a jumping-off point for a solution tailored to your needs.
At the end of this post, we provide an AWS CloudFormation template that implements the solution outlined. The template enables you to deploy the solution in multiple regions quickly.
Important: The use of some of the resources deployed, including those deployed using the provided CloudFormation template, will incur costs as long as they are in use. AWS Config Rules incur costs in each region they are active.
Here’s an architecture diagram of what we’ll implement:
Step 1: Enable AWS Config and Amazon S3 Bucket monitoring
The following steps demonstrate how to set up AWS Config to monitor Amazon S3 buckets.
- Sign into the AWS Management Console and open the AWS Config console.
- If this is your first time using AWS Config, select Get started. If you’ve already used AWS Config, select Settings.
- In the Settings page, under Resource types to record, clear the All resources checkbox. In the Specific types list, select Bucket under S3.
- Choose the Amazon S3 bucket for storing configuration history and snapshots. We’ll create a new Amazon S3 bucket.
- If you prefer to use an existing Amazon S3 bucket in your account, select the Choose a bucket from your account radio button and, using the dropdown, select an existing bucket.
- If you prefer to use an existing Amazon S3 bucket in your account, select the Choose a bucket from your account radio button and, using the dropdown, select an existing bucket.
- Under Amazon SNS topic, check the box next to Stream configuration changes and notifications to an Amazon SNS topic, and then select the radio button to Create a topic.
- Alternatively, you can choose a topic that you have previously created and subscribed to.
- If you created a new SNS topic you need to subscribe to it to receive notifications. We’ll cover this in a later step.
- Alternatively, you can choose a topic that you have previously created and subscribed to.
- Under AWS Config role, choose Create a role (unless you already have a role you want to use). We’re using the auto-suggested role name.
- Select Next.
- Configure Amazon S3 bucket monitoring rules:
- On the AWS Config rules page, search for S3 and choose the s3-bucket-publice-read-prohibited and s3-bucket-public-write-prohibited rules, then click Next.
- On the Review page, select Confirm. AWS Config is now analyzing your Amazon S3 buckets, capturing their current configurations, and evaluating the configurations against the rules we selected.
- On the AWS Config rules page, search for S3 and choose the s3-bucket-publice-read-prohibited and s3-bucket-public-write-prohibited rules, then click Next.
- If you created a new Amazon SNS topic, open the Amazon SNS Management Console and locate the topic you created:
- Copy the ARN of the topic (the string that begins with arn:) because you’ll need it in a later step.
- Select the checkbox next to the topic, and then, under the Actions menu, select Subscribe to topic.
- Select Email as the protocol, enter your email address, and then select Create subscription.
- After several minutes, you’ll receive an email asking you to confirm your subscription for notifications for this topic. Select the link to confirm the subscription.
Step 2: Create a Role for Lambda
Our Lambda will need permissions that enable it to inspect and modify Amazon S3 bucket ACLs and policies, log to CloudWatch Logs, and publishing to an Amazon SNS topic. We’ll now set up a custom AWS Identity and Access Management (IAM) policy and role to support these actions and assign them to the Lambda function we’ll create in the next section.
- In the AWS Management Console, under Services, select IAM to access the IAM Console.
- Create a policy with the following permissions, or copy the following policy:
- Create a role for your Lambda function:
- Select Lambda from the list of services that will use this role.
- Select the check box next to the policy you created previously, and then select Next: Review
- Name your role, give it a description, and then select Create Role. In this example, we’re naming the role LambdaS3PolicySecuringRole.
Step 3: Create and Configure a CloudWatch Rule
In this section, we’ll create a CloudWatch Rule to trigger the Lambda function when AWS Config determines that your Amazon S3 buckets are non-compliant.
- In the AWS Management Console, under Services, select CloudWatch.
- On the left-hand side, under Events, select Rules.
- Click Create rule.
- In Step 1: Create rule, under Event Source, select the dropdown list and select Build custom event pattern.
- Copy the following pattern and paste it into the text box:
The pattern matches events generated by AWS Config when it checks the Amazon S3 bucket for public accessibility.
- We’ll add a Lambda target later. For now, select your Amazon SNS topic created earlier, and then select Configure details.
- Give your rule a name and description. For this example, we’ll name ours AWSConfigFoundOpenBucket
- Click Create rule.
Step 4: Create a Lambda Function
In this section, we’ll create a new Lambda function to examine an Amazon S3 bucket’s ACL and bucket policy. If the bucket ACL is found to allow public access, the Lambda function overwrites it to be private. If a bucket policy is found, the Lambda function creates an SNS message, puts the policy in the message body, and publishes it to the Amazon SNS topic we created. Bucket policies can be complex, and overwriting your policy may cause unexpected loss of access, so this Lambda function doesn’t attempt to alter your policy in any way.
- Get the ARN of the Amazon SNS topic created earlier.
- In the AWS Management Console, under Services, select Lambda to go to the Lambda Console.
- From the Dashboard, select Create Function. Or, if you were taken directly to the Functions page, select the Create Function button in the upper-right.
- On the Create function page:
- Choose Author from scratch.
- Provide a name for the function. We’re using AWSConfigOpenAccessResponder.
- The Lambda function we’ve written is Python 3.6 compatible, so in the Runtime dropdown list, select Python 3.6.
- Under Role, select Choose an existing role. Select the role you created in the previous section, and then select Create function.
- We’ll now add a CloudWatch Event based on the rule we created earlier.
- In the Add triggers section, select CloudWatch Events. A CloudWatch Events box should appear connected to the left side of the Lambda Function and have a note that states Configuration required.
- From the Rule dropdown box, choose the rule you created earlier, and then select Add.
- In the Add triggers section, select CloudWatch Events. A CloudWatch Events box should appear connected to the left side of the Lambda Function and have a note that states Configuration required.
- Scroll up to the Designer section and select the name of your Lambda function.
- Delete the default code and paste in the following code:
- Scroll down to the Environment variables section. This code uses an environment variable to store the Amazon SNS topic ARN.
- For the key, enter TOPIC_ARN.
- For the value, enter the ARN of the Amazon SNS topic created earlier.
- Under Execution role, select Choose an existing role, and then select the role created earlier from the dropdown.
- Leave everything else as-is, and then, at the top, select Save.
Step 5: Verify it Works
We now have the Lambda function, an Amazon SNS topic, AWS Config watching our Amazon S3 buckets, and a CloudWatch Rule to trigger the Lambda function if a bucket is found to be non-compliant. Let’s test them to make sure they work.
We have an Amazon S3 bucket, myconfigtestbucket that’s been created in the region monitored by AWS Config, as well as the associated Lambda function. This bucket has no public read or write access set in an ACL or a policy, so it’s compliant.
Let’s change the bucket’s ACL to allow public listing of objects:
After saving, the bucket now has public access. After several minutes, the AWS Config Dashboard notes that there is one non-compliant resource:
In the Amazon S3 Console, we can see that the bucket no longer has public listing of objects enabled after the invocation of the Lambda function triggered by the CloudWatch Rule created earlier.
Notice that the AWS Config Dashboard now shows that there are no longer any non-compliant resources:
Now, let’s try out the Amazon S3 bucket policy check by configuring a bucket policy that allows list access:
A few minutes after setting this bucket policy on the
myconfigtestbucket bucket, AWS Config recognizes the bucket is no longer compliant. Because this is a bucket policy rather than an ACL, we publish a notification to the SNS topic we created earlier that lets us know about the potential policy violation:
Knowing that the policy allows open listing of the bucket, we can now modify or delete the policy, after which AWS Config will recognize that the resource is compliant.
In this post, we demonstrated how you can use AWS Config to monitor for Amazon S3 buckets with open read and write access ACLs and policies. We also showed how to use Amazon CloudWatch, Amazon SNS, and Lambda to overwrite a public bucket ACL, or to alert you should a bucket have a suspicious policy. You can use the CloudFormation template to deploy this solution in multiple regions quickly. With this approach, you will be able to easily identify and secure open Amazon S3 bucket ACLs and policies. Once you have deployed this solution to multiple regions you can aggregate the results using an AWS Config aggregator. See this post to learn more.
If you have feedback about this blog post, submit comments in the Comments section below. If you have questions about this blog post, start a new thread on the AWS Config forum or contact AWS Support.
Want more AWS Security news? Follow us on Twitter.
Post Syndicated from Andy Klein original https://www.backblaze.com/blog/hard-drive-stats-for-q1-2018/
As of March 31, 2018 we had 100,110 spinning hard drives. Of that number, there were 1,922 boot drives and 98,188 data drives. This review looks at the quarterly and lifetime statistics for the data drive models in operation in our data centers. We’ll also take a look at why we are collecting and reporting 10 new SMART attributes and take a sneak peak at some 8 TB Toshiba drives. Along the way, we’ll share observations and insights on the data presented and we look forward to you doing the same in the comments.
Since April 2013, Backblaze has recorded and saved daily hard drive statistics from the drives in our data centers. Each entry consists of the date, manufacturer, model, serial number, status (operational or failed), and all of the SMART attributes reported by that drive. Currently there are about 97 million entries totaling 26 GB of data. You can download this data from our website if you want to do your own research, but for starters here’s what we found.
Hard Drive Reliability Statistics for Q1 2018
At the end of Q1 2018 Backblaze was monitoring 98,188 hard drives used to store data. For our evaluation below we remove from consideration those drives which were used for testing purposes and those drive models for which we did not have at least 45 drives. This leaves us with 98,046 hard drives. The table below covers just Q1 2018.
Notes and Observations
If a drive model has a failure rate of 0%, it only means there were no drive failures of that model during Q1 2018.
The overall Annualized Failure Rate (AFR) for Q1 is just 1.2%, well below the Q4 2017 AFR of 1.65%. Remember that quarterly failure rates can be volatile, especially for models that have a small number of drives and/or a small number of Drive Days.
There were 142 drives (98,188 minus 98,046) that were not included in the list above because we did not have at least 45 of a given drive model. We use 45 drives of the same model as the minimum number when we report quarterly, yearly, and lifetime drive statistics.
Welcome Toshiba 8TB drives, almost…
We mentioned Toshiba 8 TB drives in the first paragraph, but they don’t show up in the Q1 Stats chart. What gives? We only had 20 of the Toshiba 8 TB drives in operation in Q1, so they were excluded from the chart. Why do we have only 20 drives? When we test out a new drive model we start with the “tome test” and it takes 20 drives to fill one tome. A tome is the same drive model in the same logical position in each of the 20 Storage Pods that make up a Backblaze Vault. There are 60 tomes in each vault.
In this test, we created a Backblaze Vault of 8 TB drives, with 59 of the tomes being Seagate 8 TB drives and 1 tome being the Toshiba drives. Then we monitored the performance of the vault and its member tomes to see if, in this case, the Toshiba drives performed as expected.
So far the Toshiba drive is performing fine, but they have been in place for only 20 days. Next up is the “pod test” where we fill a Storage Pod with Toshiba drives and integrate it into a Backblaze Vault comprised of like-sized drives. We hope to have a better look at the Toshiba 8 TB drives in our Q2 report — stay tuned.
Lifetime Hard Drive Reliability Statistics
While the quarterly chart presented earlier gets a lot of interest, the real test of any drive model is over time. Below is the lifetime failure rate chart for all the hard drive models which have 45 or more drives in operation as of March 31st, 2018. For each model, we compute their reliability starting from when they were first installed.
Notes and Observations
The failure rates of all of the larger drives (8-, 10- and 12 TB) are very good, 1.2% AFR (Annualized Failure Rate) or less. Many of these drives were deployed in the last year, so there is some volatility in the data, but you can use the Confidence Interval to get a sense of the failure percentage range.
The overall failure rate of 1.84% is the lowest we have ever achieved, besting the previous low of 2.00% from the end of 2017.
Our regular readers and drive stats wonks may have noticed a sizable jump in the number of HGST 8 TB drives (model: HUH728080ALE600), from 45 last quarter to 1,045 this quarter. As the 10 TB and 12 TB drives become more available, the price per terabyte of the 8 TB drives has gone down. This presented an opportunity to purchase the HGST drives at a price in line with our budget.
We purchased and placed into service the 45 original HGST 8 TB drives in Q2 of 2015. They were our first Helium-filled drives and our only ones until the 10 TB and 12 TB Seagate drives arrived in Q3 2017. We’ll take a first look into whether or not Helium makes a difference in drive failure rates in an upcoming blog post.
New SMART Attributes
If you have previously worked with the hard drive stats data or plan to, you’ll notice that we added 10 more columns of data starting in 2018. There are 5 new SMART attributes we are tracking each with a raw and normalized value:
- 177 – Wear Range Delta
- 179 – Used Reserved Block Count Total
- 181- Program Fail Count Total or Non-4K Aligned Access Count
- 182 – Erase Fail Count
- 235 – Good Block Count AND System(Free) Block Count
The 5 values are all related to SSD drives.
Yes, SSD drives, but before you jump to any conclusions, we used 10 Samsung 850 EVO SSDs as boot drives for a period of time in Q1. This was an experiment to see if we could reduce boot up time for the Storage Pods. In our case, the improved boot up speed wasn’t worth the SSD cost, but it did add 10 new columns to the hard drive stats data.
Speaking of hard drive stats data, the complete data set used to create the information used in this review is available on our Hard Drive Test Data page. You can download and use this data for free for your own purpose, all we ask are three things: 1) you cite Backblaze as the source if you use the data, 2) you accept that you are solely responsible for how you use the data, and 3) you do not sell this data to anyone. It is free.
If you just want the summarized data used to create the tables and charts in this blog post, you can download the ZIP file containing the MS Excel spreadsheet.
Good luck and let us know if you find anything interesting.
[Ed: 5/1/2018 – Updated Lifetime chart to fix error in confidence interval for HGST 4TB drive, model: HDS5C4040ALE630]
Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2018/05/iot_inspector_t.html
Researchers at Princeton University have released IoT Inspector, a tool that analyzes the security and privacy of IoT devices by examining the data they send across the Internet. They’ve already used the tool to study a bunch of different IoT devices. From their blog post:
Finding #3: Many IoT Devices Contact a Large and Diverse Set of Third Parties
In many cases, consumers expect that their devices contact manufacturers’ servers, but communication with other third-party destinations may not be a behavior that consumers expect.
We have found that many IoT devices communicate with third-party services, of which consumers are typically unaware. We have found many instances of third-party communications in our analyses of IoT device network traffic. Some examples include:
- Samsung Smart TV. During the first minute after power-on, the TV talks to Google Play, Double Click, Netflix, FandangoNOW, Spotify, CBS, MSNBC, NFL, Deezer, and Facebookeven though we did not sign in or create accounts with any of them.
- Amcrest WiFi Security Camera. The camera actively communicates with cellphonepush.quickddns.com using HTTPS. QuickDDNS is a Dynamic DNS service provider operated by Dahua. Dahua is also a security camera manufacturer, although Amcrest’s website makes no references to Dahua. Amcrest customer service informed us that Dahua was the original equipment manufacturer.
- Halo Smoke Detector. The smart smoke detector communicates with broker.xively.com. Xively offers an MQTT service, which allows manufacturers to communicate with their devices.
- Geeni Light Bulb. The Geeni smart bulb communicates with gw.tuyaus.com, which is operated by TuYa, a China-based company that also offers an MQTT service.
We also looked at a number of other devices, such as Samsung Smart Camera and TP-Link Smart Plug, and found communications with third parties ranging from NTP pools (time servers) to video storage services.
Their first two findings are that “Many IoT devices lack basic encryption and authentication” and that “User behavior can be inferred from encrypted IoT device traffic.” No surprises there.
Related: IoT Hall of Shame.
Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/scanning-snacks-to-your-wunderlist-shopping-list/
Brian Carrigan found the remains of a $500 supermarket barcode scanner at a Scrap Exchange for $6.25, and decided to put it to use as a shopping list builder for his pantry.
Upcycling from scraps
Brian wasn’t planning to build the Wunderscan. But when he stumbled upon the remains of a $500 Cubit barcode scanner at his local reuse center, his inner maker took hold of the situation.
It had been ripped from its connectors and had unlabeled wires hanging from it; a bit of hardware gore if such a thing exists. It was labeled on sale for $6.25, and a quick search revealed that it originally retailed at over $500… I figured I would try to reverse engineer it, and if all else fails, scrap it for the laser and motor.
Brian decided that the scanner, once refurbished with a Raspberry Pi Zero W and new wiring, would make a great addition to his home pantry as a shopping list builder using Wunderlist. “I thought a great use of this would be to keep near our pantry so that when we are out of a spice or snack, we could just scan the item and it would get posted to our shopping list.”
The datasheet for the Cubit scanner was available online, and Brian was able to discover the missing pieces required to bring the unit back to working order.
However, no wiring diagram was provided with the datasheet, so he was forced to figure out the power connections and signal output for himself using a bit of luck and an oscilloscope.
Now that the part was powered and working, all that was left was finding the RS232 transmit line. I used my oscilloscope to do this part and found it by scanning items and looking for the signal. It was not long before this wire was found and I was able to receive UPC codes.
Scanning codes and building (Wunder)lists
When the scanner reads a barcode, it sends the ASCII representation of a UPC code to the attached Raspberry Pi Zero W. Brian used the free UPC Database to convert each code to the name of the corresponding grocery item. Next, he needed to add it to the Wunderlist shopping list that his wife uses for grocery shopping.
Wunderlist provides an API token so users can incorporate list-making into their projects. With a little extra coding, Brian was able to convert the scanning of a pantry item’s barcode into a new addition to the family shopping list.
Curious as to how it all came together? You can find information on the project, including code and hardware configurations, on Brian’s blog. If you’ve built something similar, we’d love to see it in the comments below.
The post Scanning snacks to your Wunderlist shopping list with Wunderscan appeared first on Raspberry Pi.
Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2018/04/security_vulner_14.html
With a $300 Proxmark RFID card reading and writing tool, any expired keycard pulled from the trash of a target hotel, and a set of cryptographic tricks developed over close to 15 years of on-and-off analysis of the codes Vingcard electronically writes to its keycards, they found a method to vastly narrow down a hotel’s possible master key code. They can use that handheld Proxmark device to cycle through all the remaining possible codes on any lock at the hotel, identify the correct one in about 20 tries, and then write that master code to a card that gives the hacker free reign to roam any room in the building. The whole process takes about a minute.
The two researchers say that their attack works only on Vingcard’s previous-generation Vision locks, not the company’s newer Visionline product. But they estimate that it nonetheless affects 140,000 hotels in more than 160 countries around the world; the researchers say that Vingcard’s Swedish parent company, Assa Abloy, admitted to them that the problem affects millions of locks in total. When WIRED reached out to Assa Abloy, however, the company put the total number of vulnerable locks somewhat lower, between 500,000 and a million.
Patching is a nightmare. It requires updating the firmware on every lock individually.
And the researchers speculate whether or not others knew of this hack:
The F-Secure researchers admit they don’t know if their Vinguard attack has occurred in the real world. But the American firm LSI, which trains law enforcement agencies in bypassing locks, advertises Vingcard’s products among those it promises to teach students to unlock. And the F-Secure researchers point to a 2010 assassination of a Palestinian Hamas official in a Dubai hotel, widely believed to have been carried out by the Israeli intelligence agency Mossad. The assassins in that case seemingly used a vulnerability in Vingcard locks to enter their target’s room, albeit one that required re-programming the lock. “Most probably Mossad has a capability to do something like this,” Tuominen says.
Post Syndicated from Karthik Kumar Odapally original https://aws.amazon.com/blogs/big-data/10-visualizations-to-try-in-amazon-quicksight-with-sample-data/
If you’re not already familiar with building visualizations for quick access to business insights using Amazon QuickSight, consider this your introduction. In this post, we’ll walk through some common scenarios with sample datasets to provide an overview of how you can connect yuor data, perform advanced analysis and access the results from any web browser or mobile device.
The following visualizations are built from the public datasets available in the links below. Before we jump into that, let’s take a look at the supported data sources, file formats and a typical QuickSight workflow to build any visualization.
Which data sources does Amazon QuickSight support?
At the time of publication, you can use the following data methods:
- Connect to AWS data sources, including:
- Amazon RDS
- Amazon Aurora
- Amazon Redshift
- Amazon Athena
- Amazon S3
- Upload Excel spreadsheets or flat files (CSV, TSV, CLF, and ELF)
- Connect to on-premises databases like Teradata, SQL Server, MySQL, and PostgreSQL
- Import data from SaaS applications like Salesforce and Snowflake
- Use big data processing engines like Spark and Presto
This list is constantly growing. For more information, see Supported Data Sources.
Answers in instants
SPICE is the Amazon QuickSight super-fast, parallel, in-memory calculation engine, designed specifically for ad hoc data visualization. SPICE stores your data in a system architected for high availability, where it is saved until you choose to delete it. Improve the performance of database datasets by importing the data into SPICE instead of using a direct database query. To calculate how much SPICE capacity your dataset needs, see Managing SPICE Capacity.
Typical Amazon QuickSight workflow
When you create an analysis, the typical workflow is as follows:
- Connect to a data source, and then create a new dataset or choose an existing dataset.
- (Optional) If you created a new dataset, prepare the data (for example, by changing field names or data types).
- Create a new analysis.
- Add a visual to the analysis by choosing the fields to visualize. Choose a specific visual type, or use AutoGraph and let Amazon QuickSight choose the most appropriate visual type, based on the number and data types of the fields that you select.
- (Optional) Modify the visual to meet your requirements (for example, by adding a filter or changing the visual type).
- (Optional) Add more visuals to the analysis.
- (Optional) Add scenes to the default story to provide a narrative about some aspect of the analysis data.
- (Optional) Publish the analysis as a dashboard to share insights with other users.
The following graphic illustrates a typical Amazon QuickSight workflow.
Visualizations created in Amazon QuickSight with sample datasets
Visualizations for a data analyst
Download and Resources: https://datacatalog.worldbank.org/dataset/world-development-indicators
Data catalog: The World Bank invests into multiple development projects at the national, regional, and global levels. It’s a great source of information for data analysts.
The following graph shows the percentage of the population that has access to electricity (rural and urban) during 2000 in Asia, Africa, the Middle East, and Latin America.
The following graph shows the share of healthcare costs that are paid out-of-pocket (private vs. public). Also, you can maneuver over the graph to get detailed statistics at a glance.
Visualizations for a trading analyst
Source: Deutsche Börse Public Dataset (DBG PDS)
Download and resources: https://aws.amazon.com/public-datasets/deutsche-boerse-pds/
Data catalog: The DBG PDS project makes real-time data derived from Deutsche Börse’s trading market systems available to the public for free. This is the first time that such detailed financial market data has been shared freely and continually from the source provider.
The following graph shows the market trend of max trade volume for different EU banks. It builds on the data available on XETRA engines, which is made up of a variety of equities, funds, and derivative securities. This graph can be scrolled to visualize trade for a period of an hour or more.
The following graph shows the common stock beating the rest of the maximum trade volume over a period of time, grouped by security type.
Visualizations for a data scientist
Download and resources: https://catalog.data.gov/dataset/road-weather-information-stations-788f8
Data catalog: Data derived from different sensor stations placed on the city bridges and surface streets are a core information source. The road weather information station has a temperature sensor that measures the temperature of the street surface. It also has a sensor that measures the ambient air temperature at the station each second.
The following graph shows the present max air temperature in Seattle from different RWI station sensors.
The following graph shows the minimum temperature of the road surface at different times, which helps predicts road conditions at a particular time of the year.
Visualizations for a data engineer
Download and resources: https://www.kaggle.com/datasnaek/youtube-new/data
Data catalog: Kaggle has come up with a platform where people can donate open datasets. Data engineers and other community members can have open access to these datasets and can contribute to the open data movement. They have more than 350 datasets in total, with more than 200 as featured datasets. It has a few interesting datasets on the platform that are not present at other places, and it’s a platform to connect with other data enthusiasts.
The following graph shows the trending YouTube videos and presents the max likes for the top 20 channels. This is one of the most popular datasets for data engineers.
The following graph shows the YouTube daily statistics for the max views of video titles published during a specific time period.
Visualizations for a business user
Source: New York Taxi Data
Download and resources: https://data.cityofnewyork.us/Transportation/2016-Green-Taxi-Trip-Data/hvrh-b6nb
Data catalog: NYC Open data hosts some very popular open data sets for all New Yorkers. This platform allows you to get involved in dive deep into the data set to pull some useful visualizations. 2016 Green taxi trip dataset includes trip records from all trips completed in green taxis in NYC in 2016. Records include fields capturing pick-up and drop-off dates/times, pick-up and drop-off locations, trip distances, itemized fares, rate types, payment types, and driver-reported passenger counts.
The following graph presents maximum fare amount grouped by the passenger count during a period of time during a day. This can be further expanded to follow through different day of the month based on the business need.
The following graph shows the NewYork taxi data from January 2016, showing the dip in the number of taxis ridden on January 23, 2016 across all types of taxis.
A quick search for that date and location shows you the following news report:
Using Amazon QuickSight, you can see patterns across a time-series data by building visualizations, performing ad hoc analysis, and quickly generating insights. We hope you’ll give it a try today!
If you found this post useful, be sure to check out Amazon QuickSight Adds Support for Combo Charts and Row-Level Security and Visualize AWS Cloudtrail Logs Using AWS Glue and Amazon QuickSight.
Karthik Odapally is a Sr. Solutions Architect in AWS. His passion is to build cost effective and highly scalable solutions on the cloud. In his spare time, he bakes cookies and cupcakes for family and friends here in the PNW. He loves vintage racing cars.
Pranabesh Mandal is a Solutions Architect in AWS. He has over a decade of IT experience. He is passionate about cloud technology and focuses on Analytics. In his spare time, he likes to hike and explore the beautiful nature and wild life of most divine national parks around the United States alongside his wife.
Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/day-in-life-human-resources-coordinator/
What’s a Typical Day for Michele at Backblaze?
After I’ve had a yummy cup of coffee — maybe with a honey and splash of half and half, I’ll generally start my day reviewing resumes and contacting potential candidates to set up an initial phone screen.
When I start the process of filling a position, I’ll spend a lot of time on the phone speaking with potential candidates. During a phone screen call we’ll chat about their experience, background and what they are ideally looking for in their next position. I also ask about what they like to do outside of work, and most importantly, how they feel about office dogs. A candidate may not always look great on paper, but could turn out to be a great cultural fit after speaking with them about their previous experience and what they’re passionate about.
Next, I push strong candidates to the subsequent steps with the hiring managers, which range from setting up a second phone screen, to setting up a Google hangout for completing coding tasks, to scheduling in-person interviews with the team.
At the end of the day after an in-person interview, I’ll check in with all the interviewers to debrief and decide how to proceed with the candidate. Everyone that interviewed the candidate will get together to give feedback. Is there a good cultural fit? Are they someone we’d like to work with? Keeping in contact with the candidates throughout the process and making sure they are organized and informed is a big part of my job. No one likes to wait around and wonder where they are in the process.
In between all the madness, I’ll put together offer letters, send out onboarding paperwork and links, and get all the necessary signatures to move forward.
On the candidate’s first day, I’ll go over benefits and the handbook and make sure everything is going smoothly in their overall orientation as they transition into their new role here at Backblaze!
What Makes Your Job Exciting?
- I get to speak with many different types of people and see what makes them tick and if they’d be a good fit at Backblaze
- The fast pace of the job
- Being constantly kept busy with different tasks including supporting the FUN committee by researching venues and ideas for family day and the holiday party
- I work on enjoyable projects like creating a people wall for new hires so we are able to put a face to the name
- Getting to take a mini road trip up to Sacramento each month to check in with the data center employees
- Constantly learning more and more about the job, the people, and the company
We’re growing rapidly and always looking for great people to join our team at Backblaze. Our team places a premium on open communications, being cleverly unconventional, and helping each other out.
Oh! We also offer competitive salaries, stock options, and amazing benefits.
Which Job Openings are You Currently Trying to Fill?
- Engineering Director
- Senior Java Engineer
- Senior Software Engineer
- Desktop and Laptop Windows Client Programmer
- Senior Systems Administrator
- Sales Development Representative
The post A Day in the Life of Michele, Human Resources Coordinator at Backblaze appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.
Post Syndicated from Blogs on Grafana Labs Blog original https://grafana.com/blog/2018/04/26/grafana-v5.1-released/
v5.1 Stable Release
The recent 5.0 major release contained a lot of new features so the Grafana 5.1 release is focused on smoothing out the rough edges and iterating over some of the new features.
There are two new features included, Heatmap Support for Prometheus and a new core data source for Microsoft SQL Server.
Another highlight is the revamp of the Grafana docker container that makes it easier to run and control but be aware there is a breaking change to file permissions that will affect existing containers with data volumes.
We got tons of useful improvement suggestions, bug reports and Pull Requests from our amazing community. Thank you all! See the full changelog for more details.
- Improved scrolling experience
- Improved docker image with a breaking change!
- Heatmap support for Prometheus
- Microsoft SQL Server as metric & table datasource!
- Copy and paste for panels and other improvements when adding panels to dashboards
- Align Zero-Line for Right and Left Y-axes in the Graph Panel
Improved Scrolling Experience
In Grafana v5.0 we introduced a new scrollbar component. Unfortunately this introduced a lot of issues and in some scenarios removed
the native scrolling functionality. Grafana v5.1 ships with a native scrollbar for all pages together with a scrollbar component for
the dashboard grid and panels that does not override the native scrolling functionality. We hope that these changes and improvements should
make the Grafana user experience much better!
Improved Docker Image
Grafana v5.1 brings an improved official docker image which should make it easier to run and use the Grafana docker image and at the same time give more control to the user how to use/run it.
We have switched the id of the grafana user running Grafana inside a docker container. Unfortunately this means that files created prior to 5.1 will not have the correct permissions for later versions and thereby introduces a breaking change. We made this change so that it would be easier for you to control what user Grafana is executed as.
Please read the updated documentation which includes migration instructions and more information.
Heatmap Support for Prometheus
The Prometheus datasource now supports transforming Prometheus histograms to the heatmap panel. The Prometheus histogram is a powerful feature, and we’re
really happy to finally allow our users to render those as heatmaps. The Heatmap panel documentation
contains more information on how to use it.
Another improvement is that the Prometheus query editor now supports autocomplete for template variables. More information in the Prometheus data source documentation.
Microsoft SQL Server
Grafana v5.1 now ships with a built-in Microsoft SQL Server (MSSQL) data source plugin that allows you to query and visualize data from any
Microsoft SQL Server 2005 or newer, including Microsoft Azure SQL Database. Do you have metric or log data in MSSQL? You can now visualize
that data and define alert rules on it as with any of Grafana’s other core datasources.
The using Microsoft SQL Server in Grafana documentation has more detailed information on how to get started.
Adding New Panels to Dashboards
The control for adding new panels to dashboards now includes panel search and it is also now possible to copy and paste panels between dashboards.
By copying a panel in a dashboard it will be displayed in the
Paste tab. When you switch to a new dashboard you can paste the
Align Zero-Line for Right and Left Y-axes
- Table Panel: New enhancements includes support for mapping a numeric value/range to text and additional units. More information in the Table panel documentation.
- New variable interpolation syntax: We now support a new option for rendering variables that gives the user full control of how the value(s) should be rendered. More details in the in the Variables documentation.
- Improved workflow for provisioned dashboards. More details here.
Checkout the CHANGELOG.md file for a complete list
of new features, changes, and bug fixes.
Post Syndicated from Stephen Jepsen original https://aws.amazon.com/blogs/architecture/store-protect-optimize-your-healthcare-data-with-aws/
This blog post was co-authored by Ujjwal Ratan, a senior AI/ML solutions architect on the global life sciences team.
Healthcare data is generated at an ever-increasing rate and is predicted to reach 35 zettabytes by 2020. Being able to cost-effectively and securely manage this data whether for patient care, research or legal reasons is increasingly important for healthcare providers.
Healthcare providers must have the ability to ingest, store and protect large volumes of data including clinical, genomic, device, financial, supply chain, and claims. AWS is well-suited to this data deluge with a wide variety of ingestion, storage and security services (e.g. AWS Direct Connect, Amazon Kinesis Streams, Amazon S3, Amazon Macie) for customers to handle their healthcare data. In a recent Healthcare IT News article, healthcare thought-leader, John Halamka, noted, “I predict that five years from now none of us will have datacenters. We’re going to go out to the cloud to find EHRs, clinical decision support, analytics.”
I realize simply storing this data is challenging enough. Magnifying the problem is the fact that healthcare data is increasingly attractive to cyber attackers, making security a top priority. According to Mariya Yao in her Forbes column, it is estimated that individual medical records can be worth hundreds or even thousands of dollars on the black market.
In this first of a 2-part post, I will address the value that AWS can bring to customers for ingesting, storing and protecting provider’s healthcare data. I will describe key components of any cloud-based healthcare workload and the services AWS provides to meet these requirements. In part 2 of this post we will dive deep into the AWS services used for advanced analytics, artificial intelligence and machine learning.
The data tsunami is upon us
So where is this data coming from? In addition to the ubiquitous electronic health record (EHR), the sources of this data include:
- genomic sequencers
- devices such as MRIs, x-rays and ultrasounds
- sensors and wearables for patients
- medical equipment telemetry
- mobile applications
Additional sources of data come from non-clinical, operational systems such as:
- human resources
- supply chain
- claims and billing
Data from these sources can be structured (e.g., claims data) as well as unstructured (e.g., clinician notes). Some data comes across in streams such as that taken from patient monitors, while some comes in batch form. Still other data comes in near-real time such as HL7 messages. All of this data has retention policies dictating how long it must be stored. Much of this data is stored in perpetuity as many systems in use today have no purge mechanism. AWS has services to manage all these data types as well as their retention, security and access policies.
Imaging is a significant contributor to this data tsunami. Increasing demand for early-stage diagnoses along with aging populations drive increasing demand for images from CT, PET, MRI, ultrasound, digital pathology, X-ray and fluoroscopy. For example, a thin-slice CT image can be hundreds of megabytes. Increasing demand and strict retention policies make storage costly.
Due to the plummeting cost of gene sequencing, molecular diagnostics (including liquid biopsy) is a large contributor to this data deluge. Many predict that as the value of molecular testing becomes more identifiable, the reimbursement models will change and it will increasingly become the standard of care. According to the Washington Post article “Sequencing the Genome Creates so Much Data We Don’t Know What to do with It,”
“Some researchers predict that up to one billion people will have their genome sequenced by 2025 generating up to 40 exabytes of data per year.”
Although genomics is primarily used for oncology diagnostics today, it’s also used for other purposes, pharmacogenomics — used to understand how an individual will metabolize a medication.
It is increasingly challenging for the typical hospital, clinic or physician practice to securely store, process and manage this data without cloud adoption.
Amazon has a variety of ingestion techniques depending on the nature of the data including size, frequency and structure. AWS Snowball and AWS Snowmachine are appropriate for extremely-large, secure data transfers whether one time or episodic. AWS Glue is a fully-managed ETL service for securely moving data from on-premise to AWS and Amazon Kinesis can be used for ingesting streaming data.
Amazon S3, Amazon S3 IA, and Amazon Glacier are economical, data-storage services with a pay-as-you-go pricing model that expand (or shrink) with the customer’s requirements.
The above architecture has four distinct components – ingestion, storage, security, and analytics. In this post I will dive deeper into the first three components, namely ingestion, storage and security. In part 2, I will look at how to use AWS’ analytics services to draw value on, and optimize, your healthcare data.
A typical provider data center will consist of many systems with varied datasets. AWS provides multiple tools and services to effectively and securely connect to these data sources and ingest data in various formats. The customers can choose from a range of services and use them in accordance with the use case.
For use cases involving one-time (or periodic), very large data migrations into AWS, customers can take advantage of AWS Snowball devices. These devices come in two sizes, 50 TB and 80 TB and can be combined together to create a petabyte scale data transfer solution.
The devices are easy to connect and load and they are shipped to AWS avoiding the network bottlenecks associated with such large-scale data migrations. The devices are extremely secure supporting 256-bit encryption and come in a tamper-resistant enclosure. AWS Snowball imports data in Amazon S3 which can then interface with other AWS compute services to process that data in a scalable manner.
For use cases involving a need to store a portion of datasets on premises for active use and offload the rest on AWS, the Amazon storage gateway service can be used. The service allows you to seamlessly integrate on premises applications via standard storage protocols like iSCSI or NFS mounted on a gateway appliance. It supports a file interface, a volume interface and a tape interface which can be utilized for a range of use cases like disaster recovery, backup and archiving, cloud bursting, storage tiering and migration.
Specific Industry Use Cases
By using the AWS proposed reference architecture for disaster recovery, healthcare providers can ensure their data assets are securely stored on the cloud and are easily accessible in the event of a disaster. The “AWS Disaster Recovery” whitepaper includes details on options available to customers based on their desired recovery time objective (RTO) and recovery point objective (RPO).
AWS is an ideal destination for offloading large volumes of less-frequently-accessed data. These datasets are rarely used in active compute operations but are exceedingly important to retain for reasons like compliance. By storing these datasets on AWS, customers can take advantage of the highly-durable platform to securely store their data and also retrieve them easily when they need to. For more details on how AWS enables customers to run back and archival use cases on AWS, please refer to the following set of whitepapers.
A healthcare provider may have a variety of databases spread throughout the hospital system supporting critical applications such as EHR, PACS, finance and many more. These datasets often need to be aggregated to derive information and calculate metrics to optimize business processes. AWS Glue is a fully-managed Extract, Transform and Load (ETL) service that can read data from a JDBC-enabled, on-premise database and transfer the datasets into AWS services like Amazon S3, Amazon Redshift and Amazon RDS. This allows customers to create transformation workflows that integrate smaller datasets from multiple sources and aggregates them on AWS.
Healthcare providers deal with a variety of streaming datasets which often have to be analyzed in near real time. These datasets come from a variety of sources such as sensors, messaging buses and social media, and often do not adhere to an industry standard. The Amazon Kinesis suite of services, that includes Amazon Kinesis Streams, Amazon Kinesis Firehose, and Amazon Kinesis Analytics, are the ideal set of services to accomplish the task of deriving value from streaming data.
Example: Using AWS Glue to de-identify and ingest healthcare data into S3
Let’s consider a scenario in which a provider maintains patient records in a database they want to ingest into S3. The provider also wants to de-identify the data by stripping personally- identifiable attributes and store the non-identifiable information in an S3 bucket. This bucket is different from the one that contains identifiable information. Doing this allows the healthcare provider to separate sensitive information with more restrictions set up via S3 bucket policies.
To ingest records into S3, we create a Glue job that reads from the source database using a Glue connection. The connection is also used by a Glue crawler to populate the Glue data catalog with the schema of the source database. We will use the Glue development endpoint and a zeppelin notebook server on EC2 to develop and execute the job.
Step 1: Import the necessary libraries and also set a glue context which is a wrapper on the spark context:
Step 2: Create a dataframe from the source data. I call the dataframe “readmissionsdata”. Here is what the schema would look like:
Step 3: Now select the columns that contains indentifiable information and store it in a new dataframe. Call the new dataframe “phi”.
Step 4: Non-PHI columns are stored in a separate dataframe. Call this dataframe “nonphi”.
Step 5: Write the two dataframes into two separate S3 buckets
Once successfully executed, the PHI and non-PHI attributes are stored in two separate files in two separate buckets that can be individually maintained.
In 2016, 327 healthcare providers reported a protected health information (PHI) breach, affecting 16.4m patient records. There have been 342 data breaches reported in 2017 — involving 3.2 million patient records.
To date, AWS has released 51 HIPAA-eligible services to help customers address security challenges and is in the process of making many more services HIPAA-eligible. These HIPAA-eligible services (along with all other AWS services) help customers build solutions that comply with HIPAA security and auditing requirements. A catalogue of HIPAA-enabled services can be found at AWS HIPAA-eligible services. It is important to note that AWS manages physical and logical access controls for the AWS boundary. However, the overall security of your workloads is a shared responsibility, where you are responsible for controlling user access to content on your AWS accounts.
AWS storage services allow you to store data efficiently while maintaining high durability and scalability. By using Amazon S3 as the central storage layer, you can take advantage of the Amazon S3 storage management features to get operational metrics on your data sets and transition them between various storage classes to save costs. By tagging objects on Amazon S3, you can build a governance layer on Amazon S3 to grant role based access to objects using Amazon IAM and Amazon S3 bucket policies.
To learn more about the Amazon S3 storage management features, see the following link.
In the example above, we are storing the PHI information in a bucket named “phi.” Now, we want to protect this information to make sure its encrypted, does not have unauthorized access, and all access requests to the data are logged.
Encryption: S3 provides settings to enable default encryption on a bucket. This ensures any object in the bucket is encrypted by default.
Logging: S3 provides object level logging that can be used to capture all API calls to the object. The API calls are logged in cloudtrail for easy access and consolidation. Moreover, it also supports events to proactively alert customers of read and write operations.
Access control: Customers can use S3 bucket policies and IAM policies to restrict access to the phi bucket. It can also put a restriction to enforce multi-factor authentication on the bucket. For example, the following policy enforces multi-factor authentication on the phi bucket:
In Part 1 of this blog, we detailed the ingestion, storage, security and management of healthcare data on AWS. Stay tuned for part two where we are going to dive deep into optimizing the data for analytics and machine learning.
Please visit Healthcare Providers and Insurers in the Cloud for additional information.
 “Largest Healthcare Data Breaches of 2016.” HIPAA Journal, 29 Aug. 2017
 “Largest Healthcare Data Breaches of 2017.” HIPAA Journal, 8 Mar. 2018
About the Author
Stephen Jepsen is a Global HCLS Practice Manager in AWS Professional Services.
Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/ransomware-update-viruses-targeting-business-it-servers/
As ransomware attacks have grown in number in recent months, the tactics and attack vectors also have evolved. While the primary method of attack used to be to target individual computer users within organizations with phishing emails and infected attachments, we’re increasingly seeing attacks that target weaknesses in businesses’ IT infrastructure.
How Ransomware Attacks Typically Work
In our previous posts on ransomware, we described the common vehicles used by hackers to infect organizations with ransomware viruses. Most often, downloaders distribute trojan horses through malicious downloads and spam emails. The emails contain a variety of file attachments, which if opened, will download and run one of the many ransomware variants. Once a user’s computer is infected with a malicious downloader, it will retrieve additional malware, which frequently includes crypto-ransomware. After the files have been encrypted, a ransom payment is demanded of the victim in order to decrypt the files.
What’s Changed With the Latest Ransomware Attacks?
In 2016, a customized ransomware strain called SamSam began attacking the servers in primarily health care institutions. SamSam, unlike more conventional ransomware, is not delivered through downloads or phishing emails. Instead, the attackers behind SamSam use tools to identify unpatched servers running Red Hat’s JBoss enterprise products. Once the attackers have successfully gained entry into one of these servers by exploiting vulnerabilities in JBoss, they use other freely available tools and scripts to collect credentials and gather information on networked computers. Then they deploy their ransomware to encrypt files on these systems before demanding a ransom. Gaining entry to an organization through its IT center rather than its endpoints makes this approach scalable and especially unsettling.
SamSam’s methodology is to scour the Internet searching for accessible and vulnerable JBoss application servers, especially ones used by hospitals. It’s not unlike a burglar rattling doorknobs in a neighborhood to find unlocked homes. When SamSam finds an unlocked home (unpatched server), the software infiltrates the system. It is then free to spread across the company’s network by stealing passwords. As it transverses the network and systems, it encrypts files, preventing access until the victims pay the hackers a ransom, typically between $10,000 and $15,000. The low ransom amount has encouraged some victimized organizations to pay the ransom rather than incur the downtime required to wipe and reinitialize their IT systems.
The success of SamSam is due to its effectiveness rather than its sophistication. SamSam can enter and transverse a network without human intervention. Some organizations are learning too late that securing internet-facing services in their data center from attack is just as important as securing endpoints.
The typical steps in a SamSam ransomware attack are:
Attackers gain access to vulnerable server
|Attackers exploit vulnerable software or weak/stolen credentials.|
Attack spreads via remote access tools
|Attackers harvest credentials, create SOCKS proxies to tunnel traffic, and abuse RDP to install SamSam on more computers in the network.|
Ransomware payload deployed
|Attackers run batch scripts to execute ransomware on compromised machines.|
Ransomware demand delivered requiring payment to decrypt files
|Demand amounts vary from victim to victim. Relatively low ransom amounts appear to be designed to encourage quick payment decisions.|
What all the organizations successfully exploited by SamSam have in common is that they were running unpatched servers that made them vulnerable to SamSam. Some organizations had their endpoints and servers backed up, while others did not. Some of those without backups they could use to recover their systems chose to pay the ransom money.
Timeline of SamSam History and Exploits
Since its appearance in 2016, SamSam has been in the news with many successful incursions into healthcare, business, and government institutions.
SamSam campaign targets vulnerable JBoss servers
Attackers hone in on healthcare organizations specifically, as they’re more likely to have unpatched JBoss machines.
SamSam finds new targets
SamSam begins targeting schools and government.
After initial success targeting healthcare, attackers branch out to other sectors.
SamSam shuts down Atlanta
• Second attack on Colorado Department of Transportation.
• City of Atlanta suffers a devastating attack by SamSam.
The attack has far-reaching impacts — crippling the court system, keeping residents from paying their water bills, limiting vital communications like sewer infrastructure requests, and pushing the Atlanta Police Department to file paper reports.
• SamSam campaign nets $325,000 in 4 weeks.
Infections spike as attackers launch new campaigns. Healthcare and government organizations are once again the primary targets.
How to Defend Against SamSam and Other Ransomware Attacks
The best way to respond to a ransomware attack is to avoid having one in the first place. If you are attacked, making sure your valuable data is backed up and unreachable by ransomware infection will ensure that your downtime and data loss will be minimal or none if you ever suffer an attack.
In our previous post, How to Recover From Ransomware, we listed the ten ways to protect your organization from ransomware.
- Use anti-virus and anti-malware software or other security policies to block known payloads from launching.
- Make frequent, comprehensive backups of all important files and isolate them from local and open networks. Cybersecurity professionals view data backup and recovery (74% in a recent survey) by far as the most effective solution to respond to a successful ransomware attack.
- Keep offline backups of data stored in locations inaccessible from any potentially infected computer, such as disconnected external storage drives or the cloud, which prevents them from being accessed by the ransomware.
- Install the latest security updates issued by software vendors of your OS and applications. Remember to patch early and patch often to close known vulnerabilities in operating systems, server software, browsers, and web plugins.
- Consider deploying security software to protect endpoints, email servers, and network systems from infection.
- Exercise cyber hygiene, such as using caution when opening email attachments and links.
- Segment your networks to keep critical computers isolated and to prevent the spread of malware in case of attack. Turn off unneeded network shares.
- Turn off admin rights for users who don’t require them. Give users the lowest system permissions they need to do their work.
- Restrict write permissions on file servers as much as possible.
- Educate yourself, your employees, and your family in best practices to keep malware out of your systems. Update everyone on the latest email phishing scams and human engineering aimed at turning victims into abettors.
Please Tell Us About Your Experiences with Ransomware
Have you endured a ransomware attack or have a strategy to avoid becoming a victim? Please tell us of your experiences in the comments.
The post Ransomware Update: Viruses Targeting Business IT Servers appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.
Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/google-nsynth-super/
Discover new sounds and explore the role of machine learning in music production and sound research with the NSynth Super, an ongoing project from Google’s Magenta research team that you can build at home.
Uploaded by AB Open on 2018-04-17.
What is the NSynth Super?
Part of the ongoing Magenta research project within Google, NSynth Super explores the ways in which machine learning tools help artists and musicians be creative.
“Technology has always played a role in creating new types of sounds that inspire musicians — from the sounds of distortion to the electronic sounds of synths,” explains the team behind the NSynth Super. “Today, advances in machine learning and neural networks have opened up new possibilities for sound generation.”
Using TensorFlow, the Magenta team builds tools and interfaces that let artists and musicians use machine learning in their work. The NSynth Super AI algorithm uses deep neural networking to investigate the character of sounds. It then builds new sounds based on these characteristics instead of simply mixing sounds together.
Using an autoencoder, it extracts 16 defining temporal features from each input. These features are then interpolated linearly to create new embeddings (mathematical representations of each sound). These new embeddings are then decoded into new sounds, which have the acoustic qualities of both inputs.
The team publishes all hardware designs and software that are part of their ongoing research under open-source licences, allowing you to build your own synth.
Build your own NSynth Super
Using these open-source tools, Andrew Black has produced his own NSynth Super, demoed in the video above. Andrew’s list of build materials includes a Raspberry Pi 3, potentiometers, rotary encoders, and the Adafruit 1.3″ OLED display. Magenta also provides Gerber files for you to fabricate your own PCB.
The build isn’t easy — it requires soldering skills or access to someone who can assemble PCBs. Take a look at Andrew’s blog post and the official NSynth GitHub repo to see whether you’re up to the challenge.
Music and Raspberry Pi
The Raspberry Pi has been widely used for music production and music builds. Be it retrofitting a boombox, distributing music atop Table Mountain, or coding tracks with Sonic Pi, the Pi offers endless opportunities for musicians and music lovers to expand their repertoire of builds and instruments.
If you’d like to try more music-based projects using the Raspberry Pi, you can check out our free resources. And if you’ve used a Raspberry Pi in your own musical project, please share it with us in the comments or via our social network accounts.
Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-support-the-first-decade/
We launched AWS Support a full decade ago, with Gold and Silver plans focused on Amazon EC2, Amazon S3, and Amazon SQS. Starting from that initial offering, backed by a small team in Seattle, AWS Support now encompasses thousands of people working from more than 60 locations.
A Quick Look Back
Over the years, that offering has matured and evolved in order to meet the needs of an increasingly diverse base of AWS customers. We aim to support you at every step of your cloud adoption journey, from your initial experiments to the time you deploy mission-critical workloads and applications.
We have worked hard to make our support model helpful and proactive. We do our best to provide you with the tools, alerts, and knowledge that will help you to build systems that are secure, robust, and dependable. Here are some of our most recent efforts toward that goal:
Trusted Advisor S3 Bucket Policy Check – AWS Trusted Advisor provides you with five categories of checks and makes recommendations that are designed to improve security and performance. Earlier this year we announced that the S3 Bucket Permissions Check is now free, and available to all AWS users. If you are signed up for the Business or Professional level of AWS Support, you can also monitor this check (and many others) using Amazon CloudWatch Events. You can use this to monitor and secure your buckets without human intervention.
Personal Health Dashboard – This tool provides you with alerts and guidance when AWS is experiencing events that may affect you. You get a personalized view into the performance and availability of the AWS services that underlie your AWS resources. It also generates Amazon CloudWatch Events so that you can initiate automated failover and remediation if necessary.
Well Architected / Cloud Ops Review – We’ve learned a lot about how to architect AWS-powered systems over the years and we want to share everything we know with you! The AWS Well-Architected Framework provide proven, detailed guidance in critical areas including operational excellence, security, reliability, performance efficiency, and cost optimization. You can read the materials online and you can also sign up for the online training course. If you are signed up for Enterprise support, you can also benefit from our Cloud Ops review.
Infrastructure Event Management – If you are launching a new app, kicking off a big migration, or hosting a large-scale event similar to Prime Day, we are ready with guidance and real-time support. Our Infrastructure Event Management team will help you to assess the readiness of your environment and work with you to identify and mitigate risks ahead of time.
Partner-Led Support – The new AWS Solution Provider Program for APN Consulting Partners allows partners to manage, service, support, and bill AWS accounts for end customers.
To learn more about how AWS customers have used AWS support to realize all of the benefits that I noted above, watch these videos (and find more on the Customer Testmonials page):
- Sprinklr Leverages AWS Support to Reduce Operating Costs.
- Skyscanner Migrates its Full Infrastructure to the Cloud With the Help of AWS Support.
The Amazon retail site makes heavy use of AWS. You can read my post, Prime Day 2017 – Powered by AWS, to learn more about the process of preparing to sustain a record-setting amount of traffic and to accept a like number of orders.
Come and Join Us
The AWS Support Team is in continuous hiring mode and we have openings all over the world! Here are a couple of highlights:
- Senior Cloud Technical Account Manager (San Francisco)
- AWS Kumo Tools – Software Development Engineer (Seattle)
Visit the AWS Careers page to learn more and to search for open positions.
Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2018/04/securing_electi_1.html
Elections serve two purposes. The first, and obvious, purpose is to accurately choose the winner. But the second is equally important: to convince the loser. To the extent that an election system is not transparently and auditably accurate, it fails in that second purpose. Our election systems are failing, and we need to fix them.
Today, we conduct our elections on computers. Our registration lists are in computer databases. We vote on computerized voting machines. And our tabulation and reporting is done on computers. We do this for a lot of good reasons, but a side effect is that elections now have all the insecurities inherent in computers. The only way to reliably protect elections from both malice and accident is to use something that is not hackable or unreliable at scale; the best way to do that is to back up as much of the system as possible with paper.
Recently, there have been two graphic demonstrations of how bad our computerized voting system is. In 2007, the states of California and Ohio conducted audits of their electronic voting machines. Expert review teams found exploitable vulnerabilities in almost every component they examined. The researchers were able to undetectably alter vote tallies, erase audit logs, and load malware on to the systems. Some of their attacks could be implemented by a single individual with no greater access than a normal poll worker; others could be done remotely.
Last year, the Defcon hackers’ conference sponsored a Voting Village. Organizers collected 25 pieces of voting equipment, including voting machines and electronic poll books. By the end of the weekend, conference attendees had found ways to compromise every piece of test equipment: to load malicious software, compromise vote tallies and audit logs, or cause equipment to fail.
It’s important to understand that these were not well-funded nation-state attackers. These were not even academics who had been studying the problem for weeks. These were bored hackers, with no experience with voting machines, playing around between parties one weekend.
It shouldn’t be any surprise that voting equipment, including voting machines, voter registration databases, and vote tabulation systems, are that hackable. They’re computers — often ancient computers running operating systems no longer supported by the manufacturers — and they don’t have any magical security technology that the rest of the industry isn’t privy to. If anything, they’re less secure than the computers we generally use, because their manufacturers hide any flaws behind the proprietary nature of their equipment.
We’re not just worried about altering the vote. Sometimes causing widespread failures, or even just sowing mistrust in the system, is enough. And an election whose results are not trusted or believed is a failed election.
Voting systems have another requirement that makes security even harder to achieve: the requirement for a secret ballot. Because we have to securely separate the election-roll system that determines who can vote from the system that collects and tabulates the votes, we can’t use the security systems available to banking and other high-value applications.
We can securely bank online, but can’t securely vote online. If we could do away with anonymity — if everyone could check that their vote was counted correctly — then it would be easy to secure the vote. But that would lead to other problems. Before the US had the secret ballot, voter coercion and vote-buying were widespread.
We can’t, so we need to accept that our voting systems are insecure. We need an election system that is resilient to the threats. And for many parts of the system, that means paper.
Let’s start with the voter rolls. We know they’ve already been targeted. In 2016, someone changed the party affiliation of hundreds of voters before the Republican primary. That’s just one possibility. A well-executed attack that deletes, for example, one in five voters at random — or changes their addresses — would cause chaos on election day.
Yes, we need to shore up the security of these systems. We need better computer, network, and database security for the various state voter organizations. We also need to better secure the voter registration websites, with better design and better internet security. We need better security for the companies that build and sell all this equipment.
Multiple, unchangeable backups are essential. A record of every addition, deletion, and change needs to be stored on a separate system, on write-only media like a DVD. Copies of that DVD, or — even better — a paper printout of the voter rolls, should be available at every polling place on election day. We need to be ready for anything.
Next, the voting machines themselves. Security researchers agree that the gold standard is a voter-verified paper ballot. The easiest (and cheapest) way to achieve this is through optical-scan voting. Voters mark paper ballots by hand; they are fed into a machine and counted automatically. That paper ballot is saved, and serves as a final true record in a recount in case of problems. Touch-screen machines that print a paper ballot to drop in a ballot box can also work for voters with disabilities, as long as the ballot can be easily read and verified by the voter.
Finally, the tabulation and reporting systems. Here again we need more security in the process, but we must always use those paper ballots as checks on the computers. A manual, post-election, risk-limiting audit varies the number of ballots examined according to the margin of victory. Conducting this audit after every election, before the results are certified, gives us confidence that the election outcome is correct, even if the voting machines and tabulation computers have been tampered with. Additionally, we need better coordination and communications when incidents occur.
It’s vital to agree on these procedures and policies before an election. Before the fact, when anyone can win and no one knows whose votes might be changed, it’s easy to agree on strong security. But after the vote, someone is the presumptive winner — and then everything changes. Half of the country wants the result to stand, and half wants it reversed. At that point, it’s too late to agree on anything.
The politicians running in the election shouldn’t have to argue their challenges in court. Getting elections right is in the interest of all citizens. Many countries have independent election commissions that are charged with conducting elections and ensuring their security. We don’t do that in the US.
Instead, we have representatives from each of our two parties in the room, keeping an eye on each other. That provided acceptable security against 20th-century threats, but is totally inadequate to secure our elections in the 21st century. And the belief that the diversity of voting systems in the US provides a measure of security is a dangerous myth, because few districts can be decisive and there are so few voting-machine vendors.
We can do better. In 2017, the Department of Homeland Security declared elections to be critical infrastructure, allowing the department to focus on securing them. On 23 March, Congress allocated $380m to states to upgrade election security.
These are good starts, but don’t go nearly far enough. The constitution delegates elections to the states but allows Congress to “make or alter such Regulations”. In 1845, Congress set a nationwide election day. Today, we need it to set uniform and strict election standards.
This essay originally appeared in the Guardian.
Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/confused-about-the-hybrid-cloud-youre-not-alone/
Do you have a clear understanding of the hybrid cloud? If you don’t, it’s not surprising.
Hybrid cloud has been applied to a greater and more varied number of IT solutions than almost any other recent data management term. About the only thing that’s clear about the hybrid cloud is that the term hybrid cloud wasn’t invented by customers, but by vendors who wanted to hawk whatever solution du jour they happened to be pushing.
Let’s be honest. We’re in an industry that loves hype. We can’t resist grafting hyper, multi, ultra, and super and other prefixes onto the beginnings of words to entice customers with something new and shiny. The alphabet soup of cloud-related terms can include various options for where the cloud is located (on-premises, off-premises), whether the resources are private or shared in some degree (private, community, public), what type of services are offered (storage, computing), and what type of orchestrating software is used to manage the workflow and the resources. With so many moving parts, it’s no wonder potential users are confused.
Let’s take a step back, try to clear up the misconceptions, and come up with a basic understanding of what the hybrid cloud is. To be clear, this is our viewpoint. Others are free to do what they like, so bear that in mind.
So, What is the Hybrid Cloud?
To get beyond the hype, let’s start with Forrester Research‘s idea of the hybrid cloud: “One or more public clouds connected to something in my data center. That thing could be a private cloud; that thing could just be traditional data center infrastructure.”
To put it simply, a hybrid cloud is a mash-up of on-premises and off-premises IT resources.
To expand on that a bit, we can say that the hybrid cloud refers to a cloud environment made up of a mixture of on-premises private cloud resources combined with third-party public cloud resources that use some kind of orchestration between them. The advantage of the hybrid cloud model is that it allows workloads and data to move between private and public clouds in a flexible way as demands, needs, and costs change, giving businesses greater flexibility and more options for data deployment and use.
In other words, if you have some IT resources in-house that you are replicating or augmenting with an external vendor, congrats, you have a hybrid cloud!
Private Cloud vs. Public Cloud
The cloud is really just a collection of purpose built servers. In a private cloud, the servers are dedicated to a single tenant or a group of related tenants. In a public cloud, the servers are shared between multiple unrelated tenants (customers). A public cloud is off-site, while a private cloud can be on-site or off-site — or on-prem or off-prem.
As an example, let’s look at a hybrid cloud meant for data storage, a hybrid data cloud. A company might set up a rule that says all accounting files that have not been touched in the last year are automatically moved off-prem to cloud storage to save cost and reduce the amount of storage needed on-site. The files are still available; they are just no longer stored on your local systems. The rules can be defined to fit an organization’s workflow and data retention policies.
The hybrid cloud concept also contains cloud computing. For example, at the end of the quarter, order processing application instances can be spun up off-premises in a hybrid computing cloud as needed to add to on-premises capacity.
Hybrid Cloud Benefits
If we accept that the hybrid cloud combines the best elements of private and public clouds, then the benefits of hybrid cloud solutions are clear, and we can identify the primary two benefits that result from the blending of private and public clouds.
Benefit 1: Flexibility and Scalability
Undoubtedly, the primary advantage of the hybrid cloud is its flexibility. It takes time and money to manage in-house IT infrastructure and adding capacity requires advance planning.
The cloud is ready and able to provide IT resources whenever needed on short notice. The term cloud bursting refers to the on-demand and temporary use of the public cloud when demand exceeds resources available in the private cloud. For example, some businesses experience seasonal spikes that can put an extra burden on private clouds. These spikes can be taken up by a public cloud. Demand also can vary with geographic location, events, or other variables. The public cloud provides the elasticity to deal with these and other anticipated and unanticipated IT loads. The alternative would be fixed cost investments in on-premises IT resources that might not be efficiently utilized.
For a data storage user, the on-premises private cloud storage provides, among other benefits, the highest speed access. For data that is not frequently accessed, or needed with the absolute lowest levels of latency, it makes sense for the organization to move it to a location that is secure, but less expensive. The data is still readily available, and the public cloud provides a better platform for sharing the data with specific clients, users, or with the general public.
Benefit 2: Cost Savings
The public cloud component of the hybrid cloud provides cost-effective IT resources without incurring capital expenses and labor costs. IT professionals can determine the best configuration, service provider, and location for each service, thereby cutting costs by matching the resource with the task best suited to it. Services can be easily scaled, redeployed, or reduced when necessary, saving costs through increased efficiency and avoiding unnecessary expenses.
Comparing Private vs Hybrid Cloud Storage Costs
To get an idea of the difference in storage costs between a purely on-premises solutions and one that uses a hybrid of private and public storage, we’ll present two scenarios. For each scenario we’ll use data storage amounts of 100 terabytes, 1 petabyte, and 2 petabytes. Each table is the same format, all we’ve done is change how the data is distributed: private (on-premises) cloud or public (off-premises) cloud. We are using the costs for our own B2 Cloud Storage in this example. The math can be adapted for any set of numbers you wish to use.
Scenario 1 100% of data on-premises storage
|Data stored On-Premises: 100%||100 TB||1,000 TB||2,000 TB|
|On-premises cost range||Monthly Cost|
|Low — $12/TB/Month||$1,200||$12,000||$24,000|
|High — $20/TB/Month||$2,000||$20,000||$40,000|
Scenario 2 20% of data on-premises with 80% public cloud storage (B2)
|Data stored On-Premises: 20%||20 TB||200 TB||400 TB|
|Data stored in Cloud: 80%||80 TB||800 TB||1,600 TB|
|On-premises cost range||Monthly Cost|
|Low — $12/TB/Month||$240||$2,400||$4,800|
|High — $20/TB/Month||$400||$4,000||$8,000|
|Public cloud cost range||Monthly Cost|
|Low — $5/TB/Month (B2)||$400||$4,000||$8,000|
|High — $20/TB/Month||$1,600||$16,000||$32,000|
|On-premises + public cloud cost range||Monthly Cost|
As can be seen in the numbers above, using a hybrid cloud solution and storing 80% of the data in the cloud with a provider such as Backblaze B2 can result in significant savings over storing only on-premises. For other cost scenarios, see the B2 Cost Calculator.
When Hybrid Might Not Always Be the Right Fit
There are circumstances where the hybrid cloud might not be the best solution. Smaller organizations operating on a tight IT budget might best be served by a purely public cloud solution. The cost of setting up and running private servers is substantial.
An application that requires the highest possible speed might not be suitable for hybrid, depending on the specific cloud implementation. While latency does play a factor in data storage for some users, it is less of a factor for uploading and downloading data than it is for organizations using the hybrid cloud for computing. Because Backblaze recognized the importance of speed and low-latency for customers wishing to use computing on data stored in B2, we directly connected our data centers with those of our computing partners, ensuring that latency would not be an issue even for a hybrid cloud computing solution.
It is essential to have a good understanding of workloads and their essential characteristics in order to make the hybrid cloud work well for you. Each application needs to be examined for the right mix of private cloud, public cloud, and traditional IT resources that fit the particular workload in order to benefit most from a hybrid cloud architecture.
The Hybrid Cloud Can Be a Win-Win Solution
From the high altitude perspective, any solution that enables an organization to respond in a flexible manner to IT demands is a win. Avoiding big upfront capital expenses for in-house IT infrastructure will appeal to the CFO. Being able to quickly spin up IT resources as they’re needed will appeal to the CTO and VP of Operations.
Should You Go Hybrid?
We’ve arrived at the bottom line and the question is, should you or your organization embrace hybrid cloud infrastructures?
According to 451 Research, by 2019, 69% of companies will operate in hybrid cloud environments, and 60% of workloads will be running in some form of hosted cloud service (up from 45% in 2017). That indicates that the benefits of the hybrid cloud appeal to a broad range of companies.
Clearly, depending on an organization’s needs, there are advantages to a hybrid solution. While it might have been possible to dismiss the hybrid cloud in the early days of the cloud as nothing more than a buzzword, that’s no longer true. The hybrid cloud has evolved beyond the marketing hype to offer real solutions for an increasingly complex and challenging IT environment.
If an organization approaches the hybrid cloud with sufficient planning and a structured approach, a hybrid cloud can deliver on-demand flexibility, empower legacy systems and applications with new capabilities, and become a catalyst for digital transformation. The result can be an elastic and responsive infrastructure that has the ability to quickly respond to changing demands of the business.
As data management professionals increasingly recognize the advantages of the hybrid cloud, we can expect more and more of them to embrace it as an essential part of their IT strategy.
Tell Us What You’re Doing with the Hybrid Cloud
Are you currently embracing the hybrid cloud, or are you still uncertain or hanging back because you’re satisfied with how things are currently? Maybe you’ve gone totally hybrid. We’d love to hear your comments below on how you’re dealing with the hybrid cloud.
 Private cloud can be on-premises or a dedicated off-premises facility.
 Hybrid cloud orchestration solutions are often proprietary, vertical, and task dependent.
As part of my current project (secure audit trail) I decided to make a survey about the use of audit trail “in the wild”.
I haven’t written in details about this project of mine (unlike with some other projects). Mostly because it’s commercial and I don’t want to use my blog as a direct promotion channel (though I am doing that at the moment, ironically). But the aim of this post is to shed some light on how audit trail is used.
The survey can be found here. The questions are basically: does your current project have audit trail functionality, and if yes, is it protected from tampering. If not – do you think you should have such functionality.
The results are interesting (although with only around 50 respondents)
So more than half of the systems (on which respondents are working) don’t have audit trail. While audit trail is recommended by information security and related standards, it may not find place in the “busy schedule” of a software project, even though it’s fairly easy to provide a trivial implementation (e.g. I’ve written how to quickly setup one with Hibernate and Spring)
A trivial implementation might do in many cases but if the audit log is critical (e.g. access to sensitive data, performing financial operations etc.), then relying on a trivial implementation might not be enough. In other words – if the sysadmin can access the database and delete or modify the audit trail, then it doesn’t serve much purpose. Hence the next question – how is the audit trail protected from tampering:
And apparently, from the less than 50% of projects with audit trail, around 50% don’t have technical guarantees that the audit trail can’t be tampered with. My guess is it’s more, because people have different understanding of what technical measures are sufficient. E.g. someone may think that digitally signing your log files (or log records) is sufficient, but in fact it isn’t, as whole files (or records) can be deleted (or fully replaced) without a way to detect that. Timestamping can help (and a good audit trail solution should have that), but it doesn’t guarantee the order of events or prevent a malicious actor from deleting or inserting fake ones. And if timestamping is done on a log file level, then any not-yet-timestamped log file is vulnerable to manipulation.
I’ve written about event logs before and their two flavours – event sourcing and audit trail. An event log can effectively be considered audit trail, but you’d need additional security to avoid the problems mentioned above.
So, let’s see what would various levels of security and usefulness of audit logs look like. There are many papers on the topic (e.g. this and this), and they often go into the intricate details of how logging should be implemented. I’ll try to give an overview of the approaches:
- Regular logs – rely on regular INFO log statements in the production logs to look for hints of what has happened. This may be okay, but is harder to look for evidence (as there is non-auditable data in those log files as well), and it’s not very secure – usually logs are collected (e.g. with graylog) and whoever has access to the log collector’s database (or search engine in the case of Graylog), can manipulate the data and not be caught
- Designated audit trail – whether it’s stored in the database or in logs files. It has the proper business-event level granularity, but again doesn’t prevent or detect tampering. With lower risk systems that may is perfectly okay.
- Timestamped logs – whether it’s log files or (harder to implement) database records. Timestamping is good, but if it’s not an external service, a malicious actor can get access to the local timestamping service and issue fake timestamps to either re-timestamp tampered files. Even if the timestamping is not compromised, whole entries can be deleted. The fact that they are missing can sometimes be deduced based on other factors (e.g. hour of rotation), but regularly verifying that is extra effort and may not always be feasible.
- Hash chaining – each entry (or sequence of log files) could be chained (just as blockchain transactions) – the next one having the hash of the previous one. This is a good solution (whether it’s local, external or 3rd party), but it has the risk of someone modifying or deleting a record, getting your entire chain and re-hashing it. All the checks will pass, but the data will not be correct
- Hash chaining with anchoring – the head of the chain (the hash of the last entry/block) could be “anchored” to an external service that is outside the capabilities of a malicious actor. Ideally, a public blockchain, alternatively – paper, a public service (twitter), email, etc. That way a malicious actor can’t just rehash the whole chain, because any check against the external service would fail.
- WORM storage (write once, ready many). You could send your audit logs almost directly to WORM storage, where it’s impossible to replace data. However, that is not ideal, as WORM storage can be slow and expensive. For example AWS Glacier has rather big retrieval times and searching through recent data makes it impractical. It’s actually cheaper than S3, for example, and you can have expiration policies. But having to support your own WORM storage is expensive. It is a good idea to eventually send the logs to WORM storage, but “fresh” audit trail should probably not be “archived” so that it’s searchable and some actionable insight can be gained from it.
- All-in-one – applying all of the above “just in case” may be unnecessary for every project out there, but that’s what I decided to do at LogSentinel. Business-event granularity with timestamping, hash chaining, anchoring, and eventually putting to WORM storage – I think that provides both security guarantees and flexibility.
I hope the overview is useful and the results from the survey shed some light on how this aspect of information security is underestimated.