Tag Archives: automation

Eight(ish) Raspberry Pi projects for the summer

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/raspberry-pi-summer-projects/

The sun is actually shining here in Cambridge, and with it, summer-themed Raspberry Pi projects are sprouting like mushrooms across our UK-based community (even though mushrooms don’t like hot weather…). So we thought we’d gather some of our favourite Pi-powered projects perfect for the sun-drenched outdoors.

Air quality monitors and solar radiation

With the sun out in all its glory, we’re spending far more time outside than is usual for UK summer. To protect yourself and your adventurous loved ones, you might want to build a Raspberry Pi device to monitor solar radiation.

Raspberry Pi summer project

“Solar radiation is the radiation, or energy, we get from the sun.” explains project designer Uladzislau Bayouski. “Measurements for solar radiation are higher on clear, sunny day and usually low on cloudy days. When the sun is down, or there are heavy clouds blocking the sun, solar radiation is measured at zero.”

To measure more health-related environmental conditions, you could build this air quality monitor and keep an eye on local pollution.

Particulater air quality Oliver Crask Raspberry Pi summer project

Maker Oliver Crask describes the project:

Data is collected by the particulates sensor and is combined with readings of temperature, humidity, and air pressure. This data is then transferred to the cloud, where it is visualised on a dashboard.

If you’ve been building your own hackable weather station using our free guide, these are also great add-ons to integrate into that project.

Build Your Own weather station kit assembled Raspberry Pi summer project

Automatic pet and plant feeders

While we’re spending our days out in the sun, we need to ensure that our pets and plants are still getting all the attention they need.

This automatic chicken feeder by Instructables user Bertil Vandekerkhove uses a Raspberry Pi to remotely control the release of chicken feed. No more rushing to get home to feed your feathered friends!

Raspberry Pi summer project

And while we’re automating our homes, let us not forget the plants! iPlanty is an automated plant-watering system that will ensure your favourite plant babies get all the moisture they need while you’re away from your home or office.

Planty Project

An automated Plant watering solution that waters my plant every day at 8:30

Electromagnetic bike shed lock

If, like me, you live in constant fear that your beloved bike may be stolen, this electromagnetic bike shed lock is the solution you need.

Raspberry Pi summer project

The lock system allows for only one user per lock at any one time, meaning that your bike needs to be removed before anyone else can use their RFID card to access the shed.

Time-lapse cameras

With so much sunlight available, now is the perfect time to build a time-lapse camera for your garden or local beauty spot. Alex D’s Zero W time-lapse HAT allows for some glorious cinematic sliding that’s really impressed us.

Slider Test Sunset

Slider settings: -960 mm drive distance -400 steps -28 seconds interval Camera settings (Canon EOS 550D): – Magic Lantern auto ettr – max ISO 1600 – max Exposure 10 seconds

If you don’t think you can match Alex’s PCB milling skills, you can combine our free Raspberry Pi timelapse resource and Adafruit’s motorised camera slider for a similar project!

Infrared laser tag

Raspberry Pi summer project

While it’s sunny and warm, why not make this Raspberry Pi Zero W laser tag for the kids…

…and then lock them outside, and enjoy a Pimms and a sit-down in peace. We’re here for you, suffering summer holiday parents. We understand.

Self-weighing smart suitcase

“We’re all going on a summer holiday”, and pj_dc’s smart suitcase will not only help you track of your case’s location, it’ll also weigh your baggage.

Raspberry Pi summer project

Four 50kg load cells built into the base of the case allow for weight measurement of its contents, while a GPS breakout board and antenna let you track where it is.

Our free resources

While they’re not all summer-themed, our free Raspberry Pi, Code Club, and CoderDojo resources will keep you and your family occupied over the summer months whenever you’ve had a little too much of the great outdoors. From simple Scratch projects through to Python and digital making builds, we’ve got something for makers of all levels and tastes!

Getting started with Raspberry Pi summmer projects

If you’re new to Raspberry Pi, begin with our Getting started guide. And if you’re looking for even more projects to try, our online community shares a sea of tutorials on Twitter every week.

The post Eight(ish) Raspberry Pi projects for the summer appeared first on Raspberry Pi.

timeShift(GrafanaBuzz, 1w) Issue 47

Post Syndicated from Blogs on Grafana Labs Blog original https://grafana.com/blog/2018/06/01/timeshiftgrafanabuzz-1w-issue-47/

Welcome to TimeShift We cover a lot of ground this week with posts on general monitoring principles, home automation, how CERN uses open source projects in their particle acceleration work, and more. Have an article you’d like highlighted here? Get in touch.
We’re excited to be a sponsor of Monitorama PDX June 4-6. If you’re going, please be sure and say hello! Latest Release: Grafana 5.1.3 This latest point release fixes a scrolling issue that was reported in Firefox.

MagPi 70: Home automation with Raspberry Pi

Post Syndicated from Rob Zwetsloot original https://www.raspberrypi.org/blog/magpi-70-home-automation/

Hey folks, Rob here! It’s the last Thursday of the month, and that means it’s time for a brand-new The MagPi. Issue 70 is all about home automation using your favourite microcomputer, the Raspberry Pi.

Cover of The MagPi 70 — Raspberry Pi home automation and tech upcycling

Home automation in this month’s The MagPi!

Raspberry Pi home automation

We think home automation is an excellent use of the Raspberry Pi, hiding it around your house and letting it power your lights and doorbells and…fish tanks? We show you how to do all of that, and give you some excellent tips on how to add even more automation to your home in our ten-page cover feature.

Upcycle your life

Our other big feature this issue covers upcycling, the hot trend of taking old electronics and making them better than new with some custom code and a tactically placed Raspberry Pi. For this feature, we had a chat with Martin Mander, upcycler extraordinaire, to find out his top tips for hacking your old hardware.

Article on upcycling in The MagPi 70 — Raspberry Pi home automation and tech upcycling

Upcycling is a lot of fun

But wait, there’s more!

If for some reason you want even more content, you’re in luck! We have some fun tutorials for you to try, like creating a theremin and turning a Babbage into an IoT nanny cam. We also continue our quest to make a video game in C++. Our project showcase is headlined by the Teslonda on page 28, a Honda/Tesla car hybrid that is just wonderful.

Diddyborg V2 review in The MagPi 70 — Raspberry Pi home automation and tech upcycling

We review PiBorg’s latest robot

All this comes with our definitive reviews and the community section where we celebrate you, our amazing community! You’re all good beans

Teslonda article in The MagPi 70 — Raspberry Pi home automation and tech upcycling

An amazing, and practical, Raspberry Pi project

Get The MagPi 70

Issue 70 is available today from WHSmith, Tesco, Sainsbury’s, and Asda. If you live in the US, head over to your local Barnes & Noble or Micro Center in the next few days for a print copy. You can also get the new issue online from our store, or digitally via our Android and iOS apps. And don’t forget, there’s always the free PDF as well.

New subscription offer!

Want to support the Raspberry Pi Foundation and the magazine? We’ve launched a new way to subscribe to the print version of The MagPi: you can now take out a monthly £4 subscription to the magazine, effectively creating a rolling pre-order system that saves you money on each issue.

The MagPi subscription offer — Raspberry Pi home automation and tech upcycling

You can also take out a twelve-month print subscription and get a Pi Zero W plus case and adapter cables absolutely free! This offer does not currently have an end date.

That’s it for today! See you next month.

Animated GIF: a door slides open and Captain Picard emerges hesitantly

The post MagPi 70: Home automation with Raspberry Pi appeared first on Raspberry Pi.

Protecting your API using Amazon API Gateway and AWS WAF — Part I

Post Syndicated from Chris Munns original https://aws.amazon.com/blogs/compute/protecting-your-api-using-amazon-api-gateway-and-aws-waf-part-i/

This post courtesy of Thiago Morais, AWS Solutions Architect

When you build web applications or expose any data externally, you probably look for a platform where you can build highly scalable, secure, and robust REST APIs. As APIs are publicly exposed, there are a number of best practices for providing a secure mechanism to consumers using your API.

Amazon API Gateway handles all the tasks involved in accepting and processing up to hundreds of thousands of concurrent API calls, including traffic management, authorization and access control, monitoring, and API version management.

In this post, I show you how to take advantage of the regional API endpoint feature in API Gateway, so that you can create your own Amazon CloudFront distribution and secure your API using AWS WAF.

AWS WAF is a web application firewall that helps protect your web applications from common web exploits that could affect application availability, compromise security, or consume excessive resources.

As you make your APIs publicly available, you are exposed to attackers trying to exploit your services in several ways. The AWS security team published a whitepaper solution using AWS WAF, How to Mitigate OWASP’s Top 10 Web Application Vulnerabilities.

Regional API endpoints

Edge-optimized APIs are endpoints that are accessed through a CloudFront distribution created and managed by API Gateway. Before the launch of regional API endpoints, this was the default option when creating APIs using API Gateway. It primarily helped to reduce latency for API consumers that were located in different geographical locations than your API.

When API requests predominantly originate from an Amazon EC2 instance or other services within the same AWS Region as the API is deployed, a regional API endpoint typically lowers the latency of connections. It is recommended for such scenarios.

For better control around caching strategies, customers can use their own CloudFront distribution for regional APIs. They also have the ability to use AWS WAF protection, as I describe in this post.

Edge-optimized API endpoint

The following diagram is an illustrated example of the edge-optimized API endpoint where your API clients access your API through a CloudFront distribution created and managed by API Gateway.

Regional API endpoint

For the regional API endpoint, your customers access your API from the same Region in which your REST API is deployed. This helps you to reduce request latency and particularly allows you to add your own content delivery network, as needed.

Walkthrough

In this section, you implement the following steps:

  • Create a regional API using the PetStore sample API.
  • Create a CloudFront distribution for the API.
  • Test the CloudFront distribution.
  • Set up AWS WAF and create a web ACL.
  • Attach the web ACL to the CloudFront distribution.
  • Test AWS WAF protection.

Create the regional API

For this walkthrough, use an existing PetStore API. All new APIs launch by default as the regional endpoint type. To change the endpoint type for your existing API, choose the cog icon on the top right corner:

After you have created the PetStore API on your account, deploy a stage called “prod” for the PetStore API.

On the API Gateway console, select the PetStore API and choose Actions, Deploy API.

For Stage name, type prod and add a stage description.

Choose Deploy and the new API stage is created.

Use the following AWS CLI command to update your API from edge-optimized to regional:

aws apigateway update-rest-api \
--rest-api-id {rest-api-id} \
--patch-operations op=replace,path=/endpointConfiguration/types/EDGE,value=REGIONAL

A successful response looks like the following:

{
    "description": "Your first API with Amazon API Gateway. This is a sample API that integrates via HTTP with your demo Pet Store endpoints", 
    "createdDate": 1511525626, 
    "endpointConfiguration": {
        "types": [
            "REGIONAL"
        ]
    }, 
    "id": "{api-id}", 
    "name": "PetStore"
}

After you change your API endpoint to regional, you can now assign your own CloudFront distribution to this API.

Create a CloudFront distribution

To make things easier, I have provided an AWS CloudFormation template to deploy a CloudFront distribution pointing to the API that you just created. Click the button to deploy the template in the us-east-1 Region.

For Stack name, enter RegionalAPI. For APIGWEndpoint, enter your API FQDN in the following format:

{api-id}.execute-api.us-east-1.amazonaws.com

After you fill out the parameters, choose Next to continue the stack deployment. It takes a couple of minutes to finish the deployment. After it finishes, the Output tab lists the following items:

  • A CloudFront domain URL
  • An S3 bucket for CloudFront access logs
Output from CloudFormation

Output from CloudFormation

Test the CloudFront distribution

To see if the CloudFront distribution was configured correctly, use a web browser and enter the URL from your distribution, with the following parameters:

https://{your-distribution-url}.cloudfront.net/{api-stage}/pets

You should get the following output:

[
  {
    "id": 1,
    "type": "dog",
    "price": 249.99
  },
  {
    "id": 2,
    "type": "cat",
    "price": 124.99
  },
  {
    "id": 3,
    "type": "fish",
    "price": 0.99
  }
]

Set up AWS WAF and create a web ACL

With the new CloudFront distribution in place, you can now start setting up AWS WAF to protect your API.

For this demo, you deploy the AWS WAF Security Automations solution, which provides fine-grained control over the requests attempting to access your API.

For more information about deployment, see Automated Deployment. If you prefer, you can launch the solution directly into your account using the following button.

For CloudFront Access Log Bucket Name, add the name of the bucket created during the deployment of the CloudFormation stack for your CloudFront distribution.

The solution allows you to adjust thresholds and also choose which automations to enable to protect your API. After you finish configuring these settings, choose Next.

To start the deployment process in your account, follow the creation wizard and choose Create. It takes a few minutes do finish the deployment. You can follow the creation process through the CloudFormation console.

After the deployment finishes, you can see the new web ACL deployed on the AWS WAF console, AWSWAFSecurityAutomations.

Attach the AWS WAF web ACL to the CloudFront distribution

With the solution deployed, you can now attach the AWS WAF web ACL to the CloudFront distribution that you created earlier.

To assign the newly created AWS WAF web ACL, go back to your CloudFront distribution. After you open your distribution for editing, choose General, Edit.

Select the new AWS WAF web ACL that you created earlier, AWSWAFSecurityAutomations.

Save the changes to your CloudFront distribution and wait for the deployment to finish.

Test AWS WAF protection

To validate the AWS WAF Web ACL setup, use Artillery to load test your API and see AWS WAF in action.

To install Artillery on your machine, run the following command:

$ npm install -g artillery

After the installation completes, you can check if Artillery installed successfully by running the following command:

$ artillery -V
$ 1.6.0-12

As the time of publication, Artillery is on version 1.6.0-12.

One of the WAF web ACL rules that you have set up is a rate-based rule. By default, it is set up to block any requesters that exceed 2000 requests under 5 minutes. Try this out.

First, use cURL to query your distribution and see the API output:

$ curl -s https://{distribution-name}.cloudfront.net/prod/pets
[
  {
    "id": 1,
    "type": "dog",
    "price": 249.99
  },
  {
    "id": 2,
    "type": "cat",
    "price": 124.99
  },
  {
    "id": 3,
    "type": "fish",
    "price": 0.99
  }
]

Based on the test above, the result looks good. But what if you max out the 2000 requests in under 5 minutes?

Run the following Artillery command:

artillery quick -n 2000 --count 10  https://{distribution-name}.cloudfront.net/prod/pets

What you are doing is firing 2000 requests to your API from 10 concurrent users. For brevity, I am not posting the Artillery output here.

After Artillery finishes its execution, try to run the cURL request again and see what happens:

 

$ curl -s https://{distribution-name}.cloudfront.net/prod/pets

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<HTML><HEAD><META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<TITLE>ERROR: The request could not be satisfied</TITLE>
</HEAD><BODY>
<H1>ERROR</H1>
<H2>The request could not be satisfied.</H2>
<HR noshade size="1px">
Request blocked.
<BR clear="all">
<HR noshade size="1px">
<PRE>
Generated by cloudfront (CloudFront)
Request ID: [removed]
</PRE>
<ADDRESS>
</ADDRESS>
</BODY></HTML>

As you can see from the output above, the request was blocked by AWS WAF. Your IP address is removed from the blocked list after it falls below the request limit rate.

Conclusion

In this first part, you saw how to use the new API Gateway regional API endpoint together with Amazon CloudFront and AWS WAF to secure your API from a series of attacks.

In the second part, I will demonstrate some other techniques to protect your API using API keys and Amazon CloudFront custom headers.

Replacing macOS Server with Synology NAS

Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/replacing-macos-server-with-synology-nas/

Synology NAS boxes backed up to the cloud

Businesses and organizations that rely on macOS server for essential office and data services are facing some decisions about the future of their IT services.

Apple recently announced that it is deprecating a significant portion of essential network services in macOS Server, as they described in a support statement posted on April 24, 2018, “Prepare for changes to macOS Server.” Apple’s note includes:

macOS Server is changing to focus more on management of computers, devices, and storage on your network. As a result, some changes are coming in how Server works. A number of services will be deprecated, and will be hidden on new installations of an update to macOS Server coming in spring 2018.

The note lists the services that will be removed in a future release of macOS Server, including calendar and contact support, Dynamic Host Configuration Protocol (DHCP), Domain Name Services (DNS), mail, instant messages, virtual private networking (VPN), NetInstall, Web server, and the Wiki.

Apple assures users who have already configured any of the listed services that they will be able to use them in the spring 2018 macOS Server update, but the statement ends with links to a number of alternative services, including hosted services, that macOS Server users should consider as viable replacements to the features it is removing. These alternative services are all FOSS (Free and Open-Source Software).

As difficult as this could be for organizations that use macOS server, this is not unexpected. Apple left the server hardware space back in 2010, when Steve Jobs announced the company was ending its line of Xserve rackmount servers, which were introduced in May, 2002. Since then, macOS Server has hardly been a prominent part of Apple’s product lineup. It’s not just the product itself that has lost some luster, but the entire category of SMB office and business servers, which has been undergoing a gradual change in recent years.

Some might wonder how important the news about macOS Server is, given that macOS Server represents a pretty small share of the server market. macOS Server has been important to design shops, agencies, education users, and small businesses that likely have been on Macs for ages, but it’s not a significant part of the IT infrastructure of larger organizations and businesses.

What Comes After macOS Server?

Lovers of macOS Server don’t have to fear having their Mac minis pried from their cold, dead hands quite yet. Installed services will continue to be available. In the fall of 2018, new installations and upgrades of macOS Server will require users to migrate most services to other software. Since many of the services of macOS Server were already open-source, this means that a change in software might not be required. It does mean more configuration and management required from those who continue with macOS Server, however.

Users can continue with macOS Server if they wish, but many will see the writing on the wall and look for a suitable substitute.

The Times They Are A-Changin’

For many people working in organizations, what is significant about this announcement is how it reflects the move away from the once ubiquitous server-based IT infrastructure. Services that used to be centrally managed and office-based, such as storage, file sharing, communications, and computing, have moved to the cloud.

In selecting the next office IT platforms, there’s an opportunity to move to solutions that reflect and support how people are working and the applications they are using both in the office and remotely. For many, this means including cloud-based services in office automation, backup, and business continuity/disaster recovery planning. This includes Software as a Service, Platform as a Service, and Infrastructure as a Service (Saas, PaaS, IaaS) options.

IT solutions that integrate well with the cloud are worth strong consideration for what comes after a macOS Server-based environment.

Synology NAS as a macOS Server Alternative

One solution that is becoming popular is to replace macOS Server with a device that has the ability to provide important office services, but also bridges the office and cloud environments. Using Network-Attached Storage (NAS) to take up the server slack makes a lot of sense. Many customers are already using NAS for file sharing, local data backup, automatic cloud backup, and other uses. In the case of Synology, their operating system, Synology DiskStation Manager (DSM), is Linux based, and integrates the basic functions of file sharing, centralized backup, RAID storage, multimedia streaming, virtual storage, and other common functions.

Synology NAS box

Synology NAS

Since DSM is based on Linux, there are numerous server applications available, including many of the same ones that are available for macOS Server, which shares conceptual roots with Linux as it comes from BSD Unix.

Synology DiskStation Manager Package Center screenshot

Synology DiskStation Manager Package Center

According to Ed Lukacs, COO at 2FIFTEEN Systems Management in Salt Lake City, their customers have found the move from macOS Server to Synology NAS not only painless, but positive. DSM works seamlessly with macOS and has been faster for their customers, as well. Many of their customers are running Adobe Creative Suite and Google G Suite applications, so a workflow that combines local storage, remote access, and the cloud, is already well known to them. Remote users are supported by Synology’s QuickConnect or VPN.

Business continuity and backup are simplified by the flexible storage capacity of the NAS. Synology has built-in backup to Backblaze B2 Cloud Storage with Synology’s Cloud Sync, as well as a choice of a number of other B2-compatible applications, such as Cloudberry, Comet, and Arq.

Customers have been able to get up and running quickly, with only initial data transfers requiring some time to complete. After that, management of the NAS can be handled in-house or with the support of a Managed Service Provider (MSP).

Are You Sticking with macOS Server or Moving to Another Platform?

If you’re affected by this change in macOS Server, please let us know in the comments how you’re planning to cope. Are you using Synology NAS for server services? Please tell us how that’s working for you.

The post Replacing macOS Server with Synology NAS appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Analyze data in Amazon DynamoDB using Amazon SageMaker for real-time prediction

Post Syndicated from YongSeong Lee original https://aws.amazon.com/blogs/big-data/analyze-data-in-amazon-dynamodb-using-amazon-sagemaker-for-real-time-prediction/

Many companies across the globe use Amazon DynamoDB to store and query historical user-interaction data. DynamoDB is a fast NoSQL database used by applications that need consistent, single-digit millisecond latency.

Often, customers want to turn their valuable data in DynamoDB into insights by analyzing a copy of their table stored in Amazon S3. Doing this separates their analytical queries from their low-latency critical paths. This data can be the primary source for understanding customers’ past behavior, predicting future behavior, and generating downstream business value. Customers often turn to DynamoDB because of its great scalability and high availability. After a successful launch, many customers want to use the data in DynamoDB to predict future behaviors or provide personalized recommendations.

DynamoDB is a good fit for low-latency reads and writes, but it’s not practical to scan all data in a DynamoDB database to train a model. In this post, I demonstrate how you can use DynamoDB table data copied to Amazon S3 by AWS Data Pipeline to predict customer behavior. I also demonstrate how you can use this data to provide personalized recommendations for customers using Amazon SageMaker. You can also run ad hoc queries using Amazon Athena against the data. DynamoDB recently released on-demand backups to create full table backups with no performance impact. However, it’s not suitable for our purposes in this post, so I chose AWS Data Pipeline instead to create managed backups are accessible from other services.

To do this, I describe how to read the DynamoDB backup file format in Data Pipeline. I also describe how to convert the objects in S3 to a CSV format that Amazon SageMaker can read. In addition, I show how to schedule regular exports and transformations using Data Pipeline. The sample data used in this post is from Bank Marketing Data Set of UCI.

The solution that I describe provides the following benefits:

  • Separates analytical queries from production traffic on your DynamoDB table, preserving your DynamoDB read capacity units (RCUs) for important production requests
  • Automatically updates your model to get real-time predictions
  • Optimizes for performance (so it doesn’t compete with DynamoDB RCUs after the export) and for cost (using data you already have)
  • Makes it easier for developers of all skill levels to use Amazon SageMaker

All code and data set in this post are available in this .zip file.

Solution architecture

The following diagram shows the overall architecture of the solution.

The steps that data follows through the architecture are as follows:

  1. Data Pipeline regularly copies the full contents of a DynamoDB table as JSON into an S3
  2. Exported JSON files are converted to comma-separated value (CSV) format to use as a data source for Amazon SageMaker.
  3. Amazon SageMaker renews the model artifact and update the endpoint.
  4. The converted CSV is available for ad hoc queries with Amazon Athena.
  5. Data Pipeline controls this flow and repeats the cycle based on the schedule defined by customer requirements.

Building the auto-updating model

This section discusses details about how to read the DynamoDB exported data in Data Pipeline and build automated workflows for real-time prediction with a regularly updated model.

Download sample scripts and data

Before you begin, take the following steps:

  1. Download sample scripts in this .zip file.
  2. Unzip the src.zip file.
  3. Find the automation_script.sh file and edit it for your environment. For example, you need to replace 's3://<your bucket>/<datasource path>/' with your own S3 path to the data source for Amazon ML. In the script, the text enclosed by angle brackets—< and >—should be replaced with your own path.
  4. Upload the json-serde-1.3.6-SNAPSHOT-jar-with-dependencies.jar file to your S3 path so that the ADD jar command in Apache Hive can refer to it.

For this solution, the banking.csv  should be imported into a DynamoDB table.

Export a DynamoDB table

To export the DynamoDB table to S3, open the Data Pipeline console and choose the Export DynamoDB table to S3 template. In this template, Data Pipeline creates an Amazon EMR cluster and performs an export in the EMRActivity activity. Set proper intervals for backups according to your business requirements.

One core node(m3.xlarge) provides the default capacity for the EMR cluster and should be suitable for the solution in this post. Leave the option to resize the cluster before running enabled in the TableBackupActivity activity to let Data Pipeline scale the cluster to match the table size. The process of converting to CSV format and renewing models happens in this EMR cluster.

For a more in-depth look at how to export data from DynamoDB, see Export Data from DynamoDB in the Data Pipeline documentation.

Add the script to an existing pipeline

After you export your DynamoDB table, you add an additional EMR step to EMRActivity by following these steps:

  1. Open the Data Pipeline console and choose the ID for the pipeline that you want to add the script to.
  2. For Actions, choose Edit.
  3. In the editing console, choose the Activities category and add an EMR step using the custom script downloaded in the previous section, as shown below.

Paste the following command into the new step after the data ­­upload step:

s3://#{myDDBRegion}.elasticmapreduce/libs/script-runner/script-runner.jar,s3://<your bucket name>/automation_script.sh,#{output.directoryPath},#{myDDBRegion}

The element #{output.directoryPath} references the S3 path where the data pipeline exports DynamoDB data as JSON. The path should be passed to the script as an argument.

The bash script has two goals, converting data formats and renewing the Amazon SageMaker model. Subsequent sections discuss the contents of the automation script.

Automation script: Convert JSON data to CSV with Hive

We use Apache Hive to transform the data into a new format. The Hive QL script to create an external table and transform the data is included in the custom script that you added to the Data Pipeline definition.

When you run the Hive scripts, do so with the -e option. Also, define the Hive table with the 'org.openx.data.jsonserde.JsonSerDe' row format to parse and read JSON format. The SQL creates a Hive EXTERNAL table, and it reads the DynamoDB backup data on the S3 path passed to it by Data Pipeline.

Note: You should create the table with the “EXTERNAL” keyword to avoid the backup data being accidentally deleted from S3 if you drop the table.

The full automation script for converting follows. Add your own bucket name and data source path in the highlighted areas.

#!/bin/bash
hive -e "
ADD jar s3://<your bucket name>/json-serde-1.3.6-SNAPSHOT-jar-with-dependencies.jar ; 
DROP TABLE IF EXISTS blog_backup_data ;
CREATE EXTERNAL TABLE blog_backup_data (
 customer_id map<string,string>,
 age map<string,string>, job map<string,string>, 
 marital map<string,string>,education map<string,string>, 
 default map<string,string>, housing map<string,string>,
 loan map<string,string>, contact map<string,string>, 
 month map<string,string>, day_of_week map<string,string>, 
 duration map<string,string>, campaign map<string,string>,
 pdays map<string,string>, previous map<string,string>, 
 poutcome map<string,string>, emp_var_rate map<string,string>, 
 cons_price_idx map<string,string>, cons_conf_idx map<string,string>,
 euribor3m map<string,string>, nr_employed map<string,string>, 
 y map<string,string> ) 
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' 
LOCATION '$1/';

INSERT OVERWRITE DIRECTORY 's3://<your bucket name>/<datasource path>/' 
SELECT concat( customer_id['s'],',', 
 age['n'],',', job['s'],',', 
 marital['s'],',', education['s'],',', default['s'],',', 
 housing['s'],',', loan['s'],',', contact['s'],',', 
 month['s'],',', day_of_week['s'],',', duration['n'],',', 
 campaign['n'],',',pdays['n'],',',previous['n'],',', 
 poutcome['s'],',', emp_var_rate['n'],',', cons_price_idx['n'],',',
 cons_conf_idx['n'],',', euribor3m['n'],',', nr_employed['n'],',', y['n'] ) 
FROM blog_backup_data
WHERE customer_id['s'] > 0 ; 

After creating an external table, you need to read data. You then use the INSERT OVERWRITE DIRECTORY ~ SELECT command to write CSV data to the S3 path that you designated as the data source for Amazon SageMaker.

Depending on your requirements, you can eliminate or process the columns in the SELECT clause in this step to optimize data analysis. For example, you might remove some columns that have unpredictable correlations with the target value because keeping the wrong columns might expose your model to “overfitting” during the training. In this post, customer_id  columns is removed. Overfitting can make your prediction weak. More information about overfitting can be found in the topic Model Fit: Underfitting vs. Overfitting in the Amazon ML documentation.

Automation script: Renew the Amazon SageMaker model

After the CSV data is replaced and ready to use, create a new model artifact for Amazon SageMaker with the updated dataset on S3.  For renewing model artifact, you must create a new training job.  Training jobs can be run using the AWS SDK ( for example, Amazon SageMaker boto3 ) or the Amazon SageMaker Python SDK that can be installed with “pip install sagemaker” command as well as the AWS CLI for Amazon SageMaker described in this post.

In addition, consider how to smoothly renew your existing model without service impact, because your model is called by applications in real time. To do this, you need to create a new endpoint configuration first and update a current endpoint with the endpoint configuration that is just created.

#!/bin/bash
## Define variable 
REGION=$2
DTTIME=`date +%Y-%m-%d-%H-%M-%S`
ROLE="<your AmazonSageMaker-ExecutionRole>" 


# Select containers image based on region.  
case "$REGION" in
"us-west-2" )
    IMAGE="174872318107.dkr.ecr.us-west-2.amazonaws.com/linear-learner:latest"
    ;;
"us-east-1" )
    IMAGE="382416733822.dkr.ecr.us-east-1.amazonaws.com/linear-learner:latest" 
    ;;
"us-east-2" )
    IMAGE="404615174143.dkr.ecr.us-east-2.amazonaws.com/linear-learner:latest" 
    ;;
"eu-west-1" )
    IMAGE="438346466558.dkr.ecr.eu-west-1.amazonaws.com/linear-learner:latest" 
    ;;
 *)
    echo "Invalid Region Name"
    exit 1 ;  
esac

# Start training job and creating model artifact 
TRAINING_JOB_NAME=TRAIN-${DTTIME} 
S3OUTPUT="s3://<your bucket name>/model/" 
INSTANCETYPE="ml.m4.xlarge"
INSTANCECOUNT=1
VOLUMESIZE=5 
aws sagemaker create-training-job --training-job-name ${TRAINING_JOB_NAME} --region ${REGION}  --algorithm-specification TrainingImage=${IMAGE},TrainingInputMode=File --role-arn ${ROLE}  --input-data-config '[{ "ChannelName": "train", "DataSource": { "S3DataSource": { "S3DataType": "S3Prefix", "S3Uri": "s3://<your bucket name>/<datasource path>/", "S3DataDistributionType": "FullyReplicated" } }, "ContentType": "text/csv", "CompressionType": "None" , "RecordWrapperType": "None"  }]'  --output-data-config S3OutputPath=${S3OUTPUT} --resource-config  InstanceType=${INSTANCETYPE},InstanceCount=${INSTANCECOUNT},VolumeSizeInGB=${VOLUMESIZE} --stopping-condition MaxRuntimeInSeconds=120 --hyper-parameters feature_dim=20,predictor_type=binary_classifier  

# Wait until job completed 
aws sagemaker wait training-job-completed-or-stopped --training-job-name ${TRAINING_JOB_NAME}  --region ${REGION}

# Get newly created model artifact and create model
MODELARTIFACT=`aws sagemaker describe-training-job --training-job-name ${TRAINING_JOB_NAME} --region ${REGION}  --query 'ModelArtifacts.S3ModelArtifacts' --output text `
MODELNAME=MODEL-${DTTIME}
aws sagemaker create-model --region ${REGION} --model-name ${MODELNAME}  --primary-container Image=${IMAGE},ModelDataUrl=${MODELARTIFACT}  --execution-role-arn ${ROLE}

# create a new endpoint configuration 
CONFIGNAME=CONFIG-${DTTIME}
aws sagemaker  create-endpoint-config --region ${REGION} --endpoint-config-name ${CONFIGNAME}  --production-variants  VariantName=Users,ModelName=${MODELNAME},InitialInstanceCount=1,InstanceType=ml.m4.xlarge

# create or update the endpoint
STATUS=`aws sagemaker describe-endpoint --endpoint-name  ServiceEndpoint --query 'EndpointStatus' --output text --region ${REGION} `
if [[ $STATUS -ne "InService" ]] ;
then
    aws sagemaker  create-endpoint --endpoint-name  ServiceEndpoint  --endpoint-config-name ${CONFIGNAME} --region ${REGION}    
else
    aws sagemaker  update-endpoint --endpoint-name  ServiceEndpoint  --endpoint-config-name ${CONFIGNAME} --region ${REGION}
fi

Grant permission

Before you execute the script, you must grant proper permission to Data Pipeline. Data Pipeline uses the DataPipelineDefaultResourceRole role by default. I added the following policy to DataPipelineDefaultResourceRole to allow Data Pipeline to create, delete, and update the Amazon SageMaker model and data source in the script.

{
 "Version": "2012-10-17",
 "Statement": [
 {
 "Effect": "Allow",
 "Action": [
 "sagemaker:CreateTrainingJob",
 "sagemaker:DescribeTrainingJob",
 "sagemaker:CreateModel",
 "sagemaker:CreateEndpointConfig",
 "sagemaker:DescribeEndpoint",
 "sagemaker:CreateEndpoint",
 "sagemaker:UpdateEndpoint",
 "iam:PassRole"
 ],
 "Resource": "*"
 }
 ]
}

Use real-time prediction

After you deploy a model into production using Amazon SageMaker hosting services, your client applications use this API to get inferences from the model hosted at the specified endpoint. This approach is useful for interactive web, mobile, or desktop applications.

Following, I provide a simple Python code example that queries against Amazon SageMaker endpoint URL with its name (“ServiceEndpoint”) and then uses them for real-time prediction.

=== Python sample for real-time prediction ===

#!/usr/bin/env python
import boto3
import json 

client = boto3.client('sagemaker-runtime', region_name ='<your region>' )
new_customer_info = '34,10,2,4,1,2,1,1,6,3,190,1,3,4,3,-1.7,94.055,-39.8,0.715,4991.6'
response = client.invoke_endpoint(
    EndpointName='ServiceEndpoint',
    Body=new_customer_info, 
    ContentType='text/csv'
)
result = json.loads(response['Body'].read().decode())
print(result)
--- output(response) ---
{u'predictions': [{u'score': 0.7528127431869507, u'predicted_label': 1.0}]}

Solution summary

The solution takes the following steps:

  1. Data Pipeline exports DynamoDB table data into S3. The original JSON data should be kept to recover the table in the rare event that this is needed. Data Pipeline then converts JSON to CSV so that Amazon SageMaker can read the data.Note: You should select only meaningful attributes when you convert CSV. For example, if you judge that the “campaign” attribute is not correlated, you can eliminate this attribute from the CSV.
  2. Train the Amazon SageMaker model with the new data source.
  3. When a new customer comes to your site, you can judge how likely it is for this customer to subscribe to your new product based on “predictedScores” provided by Amazon SageMaker.
  4. If the new user subscribes your new product, your application must update the attribute “y” to the value 1 (for yes). This updated data is provided for the next model renewal as a new data source. It serves to improve the accuracy of your prediction. With each new entry, your application can become smarter and deliver better predictions.

Running ad hoc queries using Amazon Athena

Amazon Athena is a serverless query service that makes it easy to analyze large amounts of data stored in Amazon S3 using standard SQL. Athena is useful for examining data and collecting statistics or informative summaries about data. You can also use the powerful analytic functions of Presto, as described in the topic Aggregate Functions of Presto in the Presto documentation.

With the Data Pipeline scheduled activity, recent CSV data is always located in S3 so that you can run ad hoc queries against the data using Amazon Athena. I show this with example SQL statements following. For an in-depth description of this process, see the post Interactive SQL Queries for Data in Amazon S3 on the AWS News Blog. 

Creating an Amazon Athena table and running it

Simply, you can create an EXTERNAL table for the CSV data on S3 in Amazon Athena Management Console.

=== Table Creation ===
CREATE EXTERNAL TABLE datasource (
 age int, 
 job string, 
 marital string , 
 education string, 
 default string, 
 housing string, 
 loan string, 
 contact string, 
 month string, 
 day_of_week string, 
 duration int, 
 campaign int, 
 pdays int , 
 previous int , 
 poutcome string, 
 emp_var_rate double, 
 cons_price_idx double,
 cons_conf_idx double, 
 euribor3m double, 
 nr_employed double, 
 y int 
)
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ',' ESCAPED BY '\\' LINES TERMINATED BY '\n' 
LOCATION 's3://<your bucket name>/<datasource path>/';

The following query calculates the correlation coefficient between the target attribute and other attributes using Amazon Athena.

=== Sample Query ===

SELECT corr(age,y) AS correlation_age_and_target, 
 corr(duration,y) AS correlation_duration_and_target, 
 corr(campaign,y) AS correlation_campaign_and_target,
 corr(contact,y) AS correlation_contact_and_target
FROM ( SELECT age , duration , campaign , y , 
 CASE WHEN contact = 'telephone' THEN 1 ELSE 0 END AS contact 
 FROM datasource 
 ) datasource ;

Conclusion

In this post, I introduce an example of how to analyze data in DynamoDB by using table data in Amazon S3 to optimize DynamoDB table read capacity. You can then use the analyzed data as a new data source to train an Amazon SageMaker model for accurate real-time prediction. In addition, you can run ad hoc queries against the data on S3 using Amazon Athena. I also present how to automate these procedures by using Data Pipeline.

You can adapt this example to your specific use case at hand, and hopefully this post helps you accelerate your development. You can find more examples and use cases for Amazon SageMaker in the video AWS 2017: Introducing Amazon SageMaker on the AWS website.

 


Additional Reading

If you found this post useful, be sure to check out Serving Real-Time Machine Learning Predictions on Amazon EMR and Analyzing Data in S3 using Amazon Athena.

 


About the Author

Yong Seong Lee is a Cloud Support Engineer for AWS Big Data Services. He is interested in every technology related to data/databases and helping customers who have difficulties in using AWS services. His motto is “Enjoy life, be curious and have maximum experience.”

 

 

Serverless Architectures with AWS Lambda: Overview and Best Practices

Post Syndicated from Andrew Baird original https://aws.amazon.com/blogs/architecture/serverless-architectures-with-aws-lambda-overview-and-best-practices/

For some organizations, the idea of “going serverless” can be daunting. But with an understanding of best practices – and the right tools — many serverless applications can be fully functional with only a few lines of code and little else.

Examples of fully-serverless-application use cases include:

  • Web or mobile backends – Create fully-serverless, mobile applications or websites by creating user-facing content in a native mobile application or static web content in an S3 bucket. Then have your front-end content integrate with Amazon API Gateway as a backend service API. Lambda functions will then execute the business logic you’ve written for each of the API Gateway methods in your backend API.
  • Chatbots and virtual assistants – Build new serverless ways to interact with your customers, like customer support assistants and bots ready to engage customers on your company-run social media pages. The Amazon Alexa Skills Kit (ASK) and Amazon Lex have the ability to apply natural-language understanding to user-voice and freeform-text input so that a Lambda function you write can intelligently respond and engage with them.
  • Internet of Things (IoT) backends – AWS IoT has direct-integration for device messages to be routed to and processed by Lambda functions. That means you can implement serverless backends for highly secure, scalable IoT applications for uses like connected consumer appliances and intelligent manufacturing facilities.

Using AWS Lambda as the logic layer of a serverless application can enable faster development speed and greater experimentation – and innovation — than in a traditional, server-based environment.

We recently published the “Serverless Architectures with AWS Lambda: Overview and Best Practices” whitepaper to provide the guidance and best practices you need to write better Lambda functions and build better serverless architectures.

Once you’ve finished reading the whitepaper, below are a couple additional resources I recommend as your next step:

  1. If you would like to better understand some of the architecture pattern possibilities for serverless applications: Thirty Serverless Architectures in 30 Minutes (re:Invent 2017 video)
  2. If you’re ready to get hands-on and build a sample serverless application: AWS Serverless Workshops (GitHub Repository)
  3. If you’ve already built a serverless application and you’d like to ensure your application has been Well Architected: The Serverless Application Lens: AWS Well Architected Framework (Whitepaper)

About the Author

 

Andrew Baird is a Sr. Solutions Architect for AWS. Prior to becoming a Solutions Architect, Andrew was a developer, including time as an SDE with Amazon.com. He has worked on large-scale distributed systems, public-facing APIs, and operations automation.

The answers to your questions for Eben Upton

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/eben-q-a-1/

Before Easter, we asked you to tell us your questions for a live Q & A with Raspberry Pi Trading CEO and Raspberry Pi creator Eben Upton. The variety of questions and comments you sent was wonderful, and while we couldn’t get to them all, we picked a handful of the most common to grill him on.

You can watch the video below — though due to this being the first pancake of our live Q&A videos, the sound is a bit iffy — or read Eben’s answers to the first five questions today. We’ll follow up with the rest in the next few weeks!

Live Q&A with Eben Upton, creator of the Raspberry Pi

Get your questions to us now using #AskRaspberryPi on Twitter

Any plans for 64-bit Raspbian?

Raspbian is effectively 32-bit Debian built for the ARMv6 instruction-set architecture supported by the ARM11 processor in the first-generation Raspberry Pi. So maybe the question should be: “Would we release a version of our operating environment that was built on top of 64-bit ARM Debian?”

And the answer is: “Not yet.”

When we released the Raspberry Pi 3 Model B+, we released an operating system image on the same day; the wonderful thing about that image is that it runs on every Raspberry Pi ever made. It even runs on the alpha boards from way back in 2011.

That deep backwards compatibility is really important for us, in large part because we don’t want to orphan our customers. If someone spent $35 on an older-model Raspberry Pi five or six years ago, they still spent $35, so it would be wrong for us to throw them under the bus.

So, if we were going to do a 64-bit version, we’d want to keep doing the 32-bit version, and then that would mean our efforts would be split across the two versions; and remember, we’re still a very small engineering team. Never say never, but it would be a big step for us.

For people wanting a 64-bit operating system, there are plenty of good third-party images out there, including SUSE Linux Enterprise Server.

Given that the 3B+ includes 5GHz wireless and Power over Ethernet (PoE) support, why would manufacturers continue to use the Compute Module?

It’s a form-factor thing.

Very large numbers of people are using the bigger product in an industrial context, and it’s well engineered for that: it has module certification, wireless on board, and now PoE support. But there are use cases that can’t accommodate this form factor. For example, NEC displays: we’ve had this great relationship with NEC for a couple of years now where a lot of their displays have a socket in the back that you can put a Compute Module into. That wouldn’t work with the 3B+ form factor.

Back of an NEC display with a Raspberry Pi Compute Module slotted in.

An NEC display with a Raspberry Pi Compute Module

What are some industrial uses/products Raspberry is used with?

The NEC displays are a good example of the broader trend of using Raspberry Pi in digital signage.

A Raspberry Pi running the wait time signage at The Wizarding World of Harry Potter, Universal Studios.
Image c/o thelonelyredditor1

If you see a monitor at a station, or an airport, or a recording studio, and you look behind it, it’s amazing how often you’ll find a Raspberry Pi sitting there. The original Raspberry Pi was particularly strong for multimedia use cases, so we saw uptake in signage very early on.

An array of many Raspberry Pis

Los Alamos Raspberry Pi supercomputer

Another great example is the Los Alamos National Laboratory building supercomputers out of Raspberry Pis. Many high-end supercomputers now are built using white-box hardware — just regular PCs connected together using some networking fabric — and a collection of Raspberry Pi units can serve as a scale model of that. The Raspberry Pi has less processing power, less memory, and less networking bandwidth than the PC, but it has a balanced amount of each. So if you don’t want to let your apprentice supercomputer engineers loose on your expensive supercomputer, a cluster of Raspberry Pis is a good alternative.

Why is there no power button on the Raspberry Pi?

“Once you start, where do you stop?” is a question we ask ourselves a lot.

There are a whole bunch of useful things that we haven’t included in the Raspberry Pi by default. We don’t have a power button, we don’t have a real-time clock, and we don’t have an analogue-to-digital converter — those are probably the three most common requests. And the issue with them is that they each cost a bit of money, they’re each only useful to a minority of users, and even that minority often can’t agree on exactly what they want. Some people would like a power button that is literally a physical analogue switch between the 5V input and the rest of the board, while others would like something a bit more like a PC power button, which is partway between a physical switch and a ‘shutdown’ button. There’s no consensus about what sort of power button we should add.

So the answer is: accessories. By leaving a feature off the board, we’re not taxing the majority of people who don’t want the feature. And of course, we create an opportunity for other companies in the ecosystem to create and sell accessories to those people who do want them.

Adafruit Push-button Power Switch Breakout Raspberry Pi

The Adafruit Push-button Power Switch Breakout is one of many accessories that fill in the gaps for makers.

We have this neat way of figuring out what features to include by default: we divide through the fraction of people who want it. If you have a 20 cent component that’s going to be used by a fifth of people, we treat that as if it’s a $1 component. And it has to fight its way against the $1 components that will be used by almost everybody.

Do you think that Raspberry Pi is the future of the Internet of Things?

Absolutely, Raspberry Pi is the future of the Internet of Things!

In practice, most of the viable early IoT use cases are in the commercial and industrial spaces rather than the consumer space. Maybe in ten years’ time, IoT will be about putting 10-cent chips into light switches, but right now there’s so much money to be saved by putting automation into factories that you don’t need 10-cent components to address the market. Last year, roughly 2 million $35 Raspberry Pi units went into commercial and industrial applications, and many of those are what you’d call IoT applications.

So I think we’re the future of a particular slice of IoT. And we have ten years to get our price point down to 10 cents 🙂

The post The answers to your questions for Eben Upton appeared first on Raspberry Pi.

Community profile: Dave Akerman

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/community-profile-dave-akerman/

This column is from The MagPi issue 61. You can download a PDF of the full issue for free, or subscribe to receive the print edition through your letterbox or the digital edition on your tablet. All proceeds from the print and digital editions help the Raspberry Pi Foundation achieve our charitable goals.

The pinned tweet on Dave Akerman’s Twitter account shows a table displaying the various components needed for a high-altitude balloon (HAB) flight. Batteries, leads, a camera and Raspberry Pi, plus an unusually themed payload. The caption reads ‘The Queen, The Duke of York, and my TARDIS”, and sums up Dave’s maker career in a heartbeat.

David Akerman on Twitter

The Queen, The Duke of York, and my TARDIS 🙂 #UKHAS #RaspberryPi

Though writing software for industrial automation pays the bills, the majority of Dave’s time is spent in the world of high-altitude ballooning and the ever-growing community that encompasses it. And, while he makes some money sending business-themed balloons to near space for the likes of Aardman Animations, Confused.com, and the BBC, Dave is best known in the Raspberry Pi community for his use of the small computer in every payload, and his work as a tutor alongside the Foundation’s staff at Skycademy events.

Dave Akerman The MagPi Raspberry Pi Community Profile

Dave continues to help others while breaking records and having a good time exploring the atmosphere.

Dave has dedicated many hours and many, many more miles to assist with the Foundation’s Skycademy programme, helping to explore high-altitude ballooning with educators from across the UK. Using a Raspberry Pi and various other pieces of lightweight tech, Dave and Foundation staff member James Robinson explored the incorporation of high-altitude ballooning into education. Through Skycademy, educators were able to learn new skills and take them to the classroom, setting off their own balloons with their students, and recording the results on Raspberry Pis.

Dave Akerman The MagPi Raspberry Pi Community Profile

Dave’s most recent flight broke a new record. On 13 August 2017, his HAB payload was able to send back the highest images taken by any amateur flight.

But education isn’t the only reason for Dave’s involvement in the HAB community. As with anyone passionate about a specific hobby, Dave strives to break records. The most recent record-breaking flight took place on 13 August 2017, when Dave’s Raspberry Pi Zero HAB sent home the highest images taken by any amateur high-altitude balloon launch: at 43014 metres. No other HAB balloon has provided images from such an altitude, and the lightweight nature of the Pi Zero definitely helped, as Dave went on to mention on Twitter a few days later.

Dave Akerman The MagPi Raspberry Pi Community Profile

Dave is recognised as being the first person to incorporate a Raspberry Pi into a HAB payload, and continues to break records with the help of the little green board. More recently, he’s been able to lighten the load by using the Raspberry Pi Zero.

When the first Pi made its way to near space, Dave tore the computer apart in order to meet the weight restriction. The Pi in the Sky board was created to add the extra features needed for the flight. Since then, the HAT has experienced a few changes.

Dave Akerman The MagPi Raspberry Pi Community Profile

The Pi in the Sky board, created specifically for HAB flights.

Dave first fell in love with high-altitude ballooning after coming across the hobby in a video shared on a photographic forum. With a lifelong interest in space thanks to watching the Moon landings as a boy, plus a talent for electronics and photography, it seems a natural progression for him. Throw in his coding skills from learning to program on a Teletype and it’s no wonder he was ready and eager to take to the skies, so to speak, and capture the curvature of the Earth. What was so great about using the Raspberry Pi was the instant gratification he got from receiving images in real time as they were taken during the flight. While other devices could control a camera and store captured images for later retrieval, thanks to the Pi Dave was able to transmit the files back down to Earth and check the progress of his balloon while attempting to break records with a flight.

Dave Akerman The MagPi Raspberry Pi Community Profile Morph

One of the many commercial flights Dave has organised featured the classic children’s TV character Morph, a creation of the Aardman Animations studio known for Wallace and Gromit. Morph took to the sky twice in his mission to reach near space, and finally succeeded in 2016.

High-altitude ballooning isn’t the only part of Dave’s life that incorporates a Raspberry Pi. Having “lost count” of how many Pis he has running tasks, Dave has also created radio receivers for APRS (ham radio data), ADS-B (aircraft tracking), and OGN (gliders), along with a time-lapse camera in his garden, and he has a few more Pi for tinkering purposes.

The post Community profile: Dave Akerman appeared first on Raspberry Pi.

Join us at Raspberry Fields 2018!

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/raspberry-fields-2018/

This summer, the Raspberry Pi Foundation is bringing you an all-new community event taking place in Cambridge, UK!

Raspberry Fields 2018 Raspberry Pi festival

Raspberry Fields

On the weekend of Saturday 30 June and Sunday 1 July 2018, the Pi Towers team, with lots of help from our community of young people, educators, hobbyists, and tech enthusiasts, will be running Raspberry Fields, our brand-new annual festival of digital making!

Raspberry Fields 2018 Raspberry Pi festival

It will be a chance for people of all ages and skill levels to have a go at getting creative with tech, and it will be a celebration of all that our digital makers have already learnt and achieved, whether through taking part in Code Clubs, CoderDojos, or Raspberry Jams, or through trying our resources at home.

Dive into digital making

At Raspberry Fields, you will have the chance to inspire your inner inventor! Learn about amazing projects others in the community are working on, such as cool robots and wearable technology; have a go at a variety of hands-on activities, from home automation projects to remote-controlled vehicles and more; see fascinating science- and technology-related talks and musical performances. After your visit, you’ll be excited to go home and get making!

Raspberry Fields 2018 Raspberry Pi festivalIf you’re wondering about bringing along young children or less technologically minded family members or friends, there’ll be plenty for them to enjoy — with lots of festival-themed activities such as face painting, fun performances, free giveaways, and delicious food, Raspberry Fields will have something for everyone!

Get your tickets

This two-day ticketed event will be taking place at Cambridge Junction, the city’s leading arts centre. Tickets are £5 if you are aged 16 or older, and free for everyone under 16. Get your tickets by clicking the button on the Raspberry Fields web page!

Where: Cambridge Junction, Clifton Way, Cambridge, CB1 7GX, UK
When: Saturday 30 June 2018, 10:30 – 18:00 and Sunday 1 July 2018, 10:00 – 17:30

Get involved

We are currently looking for people who’d like to contribute activities, talks, or performances with digital themes to the festival. This could be something like live music, dance, or other show acts; talks; or drop-in Raspberry Fields 2018 Raspberry Pi festivalmaking activities. In addition, we’re looking for artists who’d like to showcase interactive digital installations, for proud makers who are keen to exhibit their projects, and for vendors who’d like to join in. We particularly encourage young people to showcase projects they’ve created or deliver talks on their digital making journey!Raspberry Fields 2018 Raspberry Pi festival

Your contribution to Raspberry Fields should focus on digital making and be fun and engaging for an audience of various ages. However, it doesn’t need to be specific to Raspberry Pi. You might be keen to demonstrate a project you’ve built, do a short Q&A session on what you’ve learnt, or present something more in-depth in the auditorium; maybe you’re one of our approved resellers wanting to showcase in our market area. We’re also looking for digital makers to run drop-in activity sessions, as well as for people who’d like to be marshals with smiling faces who will ensure that everyone has a wonderful time!

If you’d like to take part in Raspberry Fields, let us know via this form, and we’ll be in touch with you soon.

The post Join us at Raspberry Fields 2018! appeared first on Raspberry Pi.

Our Newest AWS Community Heroes (Spring 2018 Edition)

Post Syndicated from Betsy Chernoff original https://aws.amazon.com/blogs/aws/our-newest-aws-community-heroes-spring-2018-edition/

The AWS Community Heroes program helps shine a spotlight on some of the innovative work being done by rockstar AWS developers around the globe. Marrying cloud expertise with a passion for community building and education, these Heroes share their time and knowledge across social media and in-person events. Heroes also actively help drive content at Meetups, workshops, and conferences.

This March, we have five Heroes that we’re happy to welcome to our network of cloud innovators:

Peter Sbarski

Peter Sbarski is VP of Engineering at A Cloud Guru and the organizer of Serverlessconf, the world’s first conference dedicated entirely to serverless architectures and technologies. His work at A Cloud Guru allows him to work with, talk and write about serverless architectures, cloud computing, and AWS. He has written a book called Serverless Architectures on AWS and is currently collaborating on another book called Serverless Design Patterns with Tim Wagner and Yochay Kiriaty.

Peter is always happy to talk about cloud computing and AWS, and can be found at conferences and meetups throughout the year. He helps to organize Serverless Meetups in Melbourne and Sydney in Australia, and is always keen to share his experience working on interesting and innovative cloud projects.

Peter’s passions include serverless technologies, event-driven programming, back end architecture, microservices, and orchestration of systems. Peter holds a PhD in Computer Science from Monash University, Australia and can be followed on Twitter, LinkedIn, Medium, and GitHub.

 

 

 

Michael Wittig

Michael Wittig is co-founder of widdix, a consulting company focused on cloud architecture, DevOps, and software development on AWS. widdix maintains several AWS related open source projects, most notably a collection of production-ready CloudFormation templates. In 2016, widdix released marbot: a Slack bot supporting your DevOps team to detect and solve incidents on AWS.

In close collaboration with his brother Andreas Wittig, the Wittig brothers are actively creating AWS related content. Their book Amazon Web Services in Action (Manning) introduces AWS with a strong focus on automation. Andreas and Michael run the blog cloudonaut.io where they share their knowledge about AWS with the community. The Wittig brothers also published a bunch of video courses with O’Reilly, Manning, Pluralsight, and A Cloud Guru. You can also find them speaking at conferences and user groups in Europe. Both brothers are co-organizing the AWS user group in Stuttgart.

 

 

 

 

Fernando Hönig

Fernando is an experienced Infrastructure Solutions Leader, holding 5 AWS Certifications, with extensive IT Architecture and Management experience in a variety of market sectors. Working as a Cloud Architect Consultant in United Kingdom since 2014, Fernando built an online community for Hispanic speakers worldwide.

Fernando founded a LinkedIn Group, a Slack Community and a YouTube channel all of them named “AWS en Español”, and started to run a monthly webinar via YouTube streaming where different leaders discuss aspects and challenges around AWS Cloud.

During the last 18 months he’s been helping to run and coach AWS User Group leaders across LATAM and Spain, and 10 new User Groups were founded during this time.

Feel free to follow Fernando on Twitter, connect with him on LinkedIn, or join the ever-growing Hispanic Community via Slack, LinkedIn or YouTube.

 

 

 

Anders Bjørnestad

Anders is a consultant and cloud evangelist at Webstep AS in Norway. He finished his degree in Computer Science at the Norwegian Institute of Technology at about the same time the Internet emerged as a public service. Since then he has been an IT consultant and a passionate advocate of knowledge-sharing.

He architected and implemented his first customer solution on AWS back in 2010, and is essential in building Webstep’s core cloud team. Anders applies his broad expert knowledge across all layers of the organizational stack. He engages with developers on technology and architectures and with top management where he advises about cloud strategies and new business models.

Anders enjoys helping people increase their understanding of AWS and cloud in general, and holds several AWS certifications. He co-founded and co-organizes the AWS User Groups in the largest cities in Norway (Oslo, Bergen, Trondheim and Stavanger), and also uses any opportunity to engage in events related to AWS and cloud wherever he is.

You can follow him on Twitter or connect with him on LinkedIn.

To learn more about the AWS Community Heroes Program and how to get involved with your local AWS community, click here.

 

 

 

 

 

 

 

 

Best Practices for Running Apache Cassandra on Amazon EC2

Post Syndicated from Prasad Alle original https://aws.amazon.com/blogs/big-data/best-practices-for-running-apache-cassandra-on-amazon-ec2/

Apache Cassandra is a commonly used, high performance NoSQL database. AWS customers that currently maintain Cassandra on-premises may want to take advantage of the scalability, reliability, security, and economic benefits of running Cassandra on Amazon EC2.

Amazon EC2 and Amazon Elastic Block Store (Amazon EBS) provide secure, resizable compute capacity and storage in the AWS Cloud. When combined, you can deploy Cassandra, allowing you to scale capacity according to your requirements. Given the number of possible deployment topologies, it’s not always trivial to select the most appropriate strategy suitable for your use case.

In this post, we outline three Cassandra deployment options, as well as provide guidance about determining the best practices for your use case in the following areas:

  • Cassandra resource overview
  • Deployment considerations
  • Storage options
  • Networking
  • High availability and resiliency
  • Maintenance
  • Security

Before we jump into best practices for running Cassandra on AWS, we should mention that we have many customers who decided to use DynamoDB instead of managing their own Cassandra cluster. DynamoDB is fully managed, serverless, and provides multi-master cross-region replication, encryption at rest, and managed backup and restore. Integration with AWS Identity and Access Management (IAM) enables DynamoDB customers to implement fine-grained access control for their data security needs.

Several customers who have been using large Cassandra clusters for many years have moved to DynamoDB to eliminate the complications of administering Cassandra clusters and maintaining high availability and durability themselves. Gumgum.com is one customer who migrated to DynamoDB and observed significant savings. For more information, see Moving to Amazon DynamoDB from Hosted Cassandra: A Leap Towards 60% Cost Saving per Year.

AWS provides options, so you’re covered whether you want to run your own NoSQL Cassandra database, or move to a fully managed, serverless DynamoDB database.

Cassandra resource overview

Here’s a short introduction to standard Cassandra resources and how they are implemented with AWS infrastructure. If you’re already familiar with Cassandra or AWS deployments, this can serve as a refresher.

Resource Cassandra AWS
Cluster

A single Cassandra deployment.

 

This typically consists of multiple physical locations, keyspaces, and physical servers.

A logical deployment construct in AWS that maps to an AWS CloudFormation StackSet, which consists of one or many CloudFormation stacks to deploy Cassandra.
Datacenter A group of nodes configured as a single replication group.

A logical deployment construct in AWS.

 

A datacenter is deployed with a single CloudFormation stack consisting of Amazon EC2 instances, networking, storage, and security resources.

Rack

A collection of servers.

 

A datacenter consists of at least one rack. Cassandra tries to place the replicas on different racks.

A single Availability Zone.
Server/node A physical virtual machine running Cassandra software. An EC2 instance.
Token Conceptually, the data managed by a cluster is represented as a ring. The ring is then divided into ranges equal to the number of nodes. Each node being responsible for one or more ranges of the data. Each node gets assigned with a token, which is essentially a random number from the range. The token value determines the node’s position in the ring and its range of data. Managed within Cassandra.
Virtual node (vnode) Responsible for storing a range of data. Each vnode receives one token in the ring. A cluster (by default) consists of 256 tokens, which are uniformly distributed across all servers in the Cassandra datacenter. Managed within Cassandra.
Replication factor The total number of replicas across the cluster. Managed within Cassandra.

Deployment considerations

One of the many benefits of deploying Cassandra on Amazon EC2 is that you can automate many deployment tasks. In addition, AWS includes services, such as CloudFormation, that allow you to describe and provision all your infrastructure resources in your cloud environment.

We recommend orchestrating each Cassandra ring with one CloudFormation template. If you are deploying in multiple AWS Regions, you can use a CloudFormation StackSet to manage those stacks. All the maintenance actions (scaling, upgrading, and backing up) should be scripted with an AWS SDK. These may live as standalone AWS Lambda functions that can be invoked on demand during maintenance.

You can get started by following the Cassandra Quick Start deployment guide. Keep in mind that this guide does not address the requirements to operate a production deployment and should be used only for learning more about Cassandra.

Deployment patterns

In this section, we discuss various deployment options available for Cassandra in Amazon EC2. A successful deployment starts with thoughtful consideration of these options. Consider the amount of data, network environment, throughput, and availability.

  • Single AWS Region, 3 Availability Zones
  • Active-active, multi-Region
  • Active-standby, multi-Region

Single region, 3 Availability Zones

In this pattern, you deploy the Cassandra cluster in one AWS Region and three Availability Zones. There is only one ring in the cluster. By using EC2 instances in three zones, you ensure that the replicas are distributed uniformly in all zones.

To ensure the even distribution of data across all Availability Zones, we recommend that you distribute the EC2 instances evenly in all three Availability Zones. The number of EC2 instances in the cluster is a multiple of three (the replication factor).

This pattern is suitable in situations where the application is deployed in one Region or where deployments in different Regions should be constrained to the same Region because of data privacy or other legal requirements.

Pros Cons

●     Highly available, can sustain failure of one Availability Zone.

●     Simple deployment

●     Does not protect in a situation when many of the resources in a Region are experiencing intermittent failure.

 

Active-active, multi-Region

In this pattern, you deploy two rings in two different Regions and link them. The VPCs in the two Regions are peered so that data can be replicated between two rings.

We recommend that the two rings in the two Regions be identical in nature, having the same number of nodes, instance types, and storage configuration.

This pattern is most suitable when the applications using the Cassandra cluster are deployed in more than one Region.

Pros Cons

●     No data loss during failover.

●     Highly available, can sustain when many of the resources in a Region are experiencing intermittent failures.

●     Read/write traffic can be localized to the closest Region for the user for lower latency and higher performance.

●     High operational overhead

●     The second Region effectively doubles the cost

 

Active-standby, multi-region

In this pattern, you deploy two rings in two different Regions and link them. The VPCs in the two Regions are peered so that data can be replicated between two rings.

However, the second Region does not receive traffic from the applications. It only functions as a secondary location for disaster recovery reasons. If the primary Region is not available, the second Region receives traffic.

We recommend that the two rings in the two Regions be identical in nature, having the same number of nodes, instance types, and storage configuration.

This pattern is most suitable when the applications using the Cassandra cluster require low recovery point objective (RPO) and recovery time objective (RTO).

Pros Cons

●     No data loss during failover.

●     Highly available, can sustain failure or partitioning of one whole Region.

●     High operational overhead.

●     High latency for writes for eventual consistency.

●     The second Region effectively doubles the cost.

Storage options

In on-premises deployments, Cassandra deployments use local disks to store data. There are two storage options for EC2 instances:

Your choice of storage is closely related to the type of workload supported by the Cassandra cluster. Instance store works best for most general purpose Cassandra deployments. However, in certain read-heavy clusters, Amazon EBS is a better choice.

The choice of instance type is generally driven by the type of storage:

  • If ephemeral storage is required for your application, a storage-optimized (I3) instance is the best option.
  • If your workload requires Amazon EBS, it is best to go with compute-optimized (C5) instances.
  • Burstable instance types (T2) don’t offer good performance for Cassandra deployments.

Instance store

Ephemeral storage is local to the EC2 instance. It may provide high input/output operations per second (IOPs) based on the instance type. An SSD-based instance store can support up to 3.3M IOPS in I3 instances. This high performance makes it an ideal choice for transactional or write-intensive applications such as Cassandra.

In general, instance storage is recommended for transactional, large, and medium-size Cassandra clusters. For a large cluster, read/write traffic is distributed across a higher number of nodes, so the loss of one node has less of an impact. However, for smaller clusters, a quick recovery for the failed node is important.

As an example, for a cluster with 100 nodes, the loss of 1 node is 3.33% loss (with a replication factor of 3). Similarly, for a cluster with 10 nodes, the loss of 1 node is 33% less capacity (with a replication factor of 3).

  Ephemeral storage Amazon EBS Comments

IOPS

(translates to higher query performance)

Up to 3.3M on I3

80K/instance

10K/gp2/volume

32K/io1/volume

This results in a higher query performance on each host. However, Cassandra implicitly scales well in terms of horizontal scale. In general, we recommend scaling horizontally first. Then, scale vertically to mitigate specific issues.

 

Note: 3.3M IOPS is observed with 100% random read with a 4-KB block size on Amazon Linux.

AWS instance types I3 Compute optimized, C5 Being able to choose between different instance types is an advantage in terms of CPU, memory, etc., for horizontal and vertical scaling.
Backup/ recovery Custom Basic building blocks are available from AWS.

Amazon EBS offers distinct advantage here. It is small engineering effort to establish a backup/restore strategy.

a) In case of an instance failure, the EBS volumes from the failing instance are attached to a new instance.

b) In case of an EBS volume failure, the data is restored by creating a new EBS volume from last snapshot.

Amazon EBS

EBS volumes offer higher resiliency, and IOPs can be configured based on your storage needs. EBS volumes also offer some distinct advantages in terms of recovery time. EBS volumes can support up to 32K IOPS per volume and up to 80K IOPS per instance in RAID configuration. They have an annualized failure rate (AFR) of 0.1–0.2%, which makes EBS volumes 20 times more reliable than typical commodity disk drives.

The primary advantage of using Amazon EBS in a Cassandra deployment is that it reduces data-transfer traffic significantly when a node fails or must be replaced. The replacement node joins the cluster much faster. However, Amazon EBS could be more expensive, depending on your data storage needs.

Cassandra has built-in fault tolerance by replicating data to partitions across a configurable number of nodes. It can not only withstand node failures but if a node fails, it can also recover by copying data from other replicas into a new node. Depending on your application, this could mean copying tens of gigabytes of data. This adds additional delay to the recovery process, increases network traffic, and could possibly impact the performance of the Cassandra cluster during recovery.

Data stored on Amazon EBS is persisted in case of an instance failure or termination. The node’s data stored on an EBS volume remains intact and the EBS volume can be mounted to a new EC2 instance. Most of the replicated data for the replacement node is already available in the EBS volume and won’t need to be copied over the network from another node. Only the changes made after the original node failed need to be transferred across the network. That makes this process much faster.

EBS volumes are snapshotted periodically. So, if a volume fails, a new volume can be created from the last known good snapshot and be attached to a new instance. This is faster than creating a new volume and coping all the data to it.

Most Cassandra deployments use a replication factor of three. However, Amazon EBS does its own replication under the covers for fault tolerance. In practice, EBS volumes are about 20 times more reliable than typical disk drives. So, it is possible to go with a replication factor of two. This not only saves cost, but also enables deployments in a region that has two Availability Zones.

EBS volumes are recommended in case of read-heavy, small clusters (fewer nodes) that require storage of a large amount of data. Keep in mind that the Amazon EBS provisioned IOPS could get expensive. General purpose EBS volumes work best when sized for required performance.

Networking

If your cluster is expected to receive high read/write traffic, select an instance type that offers 10–Gb/s performance. As an example, i3.8xlarge and c5.9xlarge both offer 10–Gb/s networking performance. A smaller instance type in the same family leads to a relatively lower networking throughput.

Cassandra generates a universal unique identifier (UUID) for each node based on IP address for the instance. This UUID is used for distributing vnodes on the ring.

In the case of an AWS deployment, IP addresses are assigned automatically to the instance when an EC2 instance is created. With the new IP address, the data distribution changes and the whole ring has to be rebalanced. This is not desirable.

To preserve the assigned IP address, use a secondary elastic network interface with a fixed IP address. Before swapping an EC2 instance with a new one, detach the secondary network interface from the old instance and attach it to the new one. This way, the UUID remains same and there is no change in the way that data is distributed in the cluster.

If you are deploying in more than one region, you can connect the two VPCs in two regions using cross-region VPC peering.

High availability and resiliency

Cassandra is designed to be fault-tolerant and highly available during multiple node failures. In the patterns described earlier in this post, you deploy Cassandra to three Availability Zones with a replication factor of three. Even though it limits the AWS Region choices to the Regions with three or more Availability Zones, it offers protection for the cases of one-zone failure and network partitioning within a single Region. The multi-Region deployments described earlier in this post protect when many of the resources in a Region are experiencing intermittent failure.

Resiliency is ensured through infrastructure automation. The deployment patterns all require a quick replacement of the failing nodes. In the case of a regionwide failure, when you deploy with the multi-Region option, traffic can be directed to the other active Region while the infrastructure is recovering in the failing Region. In the case of unforeseen data corruption, the standby cluster can be restored with point-in-time backups stored in Amazon S3.

Maintenance

In this section, we look at ways to ensure that your Cassandra cluster is healthy:

  • Scaling
  • Upgrades
  • Backup and restore

Scaling

Cassandra is horizontally scaled by adding more instances to the ring. We recommend doubling the number of nodes in a cluster to scale up in one scale operation. This leaves the data homogeneously distributed across Availability Zones. Similarly, when scaling down, it’s best to halve the number of instances to keep the data homogeneously distributed.

Cassandra is vertically scaled by increasing the compute power of each node. Larger instance types have proportionally bigger memory. Use deployment automation to swap instances for bigger instances without downtime or data loss.

Upgrades

All three types of upgrades (Cassandra, operating system patching, and instance type changes) follow the same rolling upgrade pattern.

In this process, you start with a new EC2 instance and install software and patches on it. Thereafter, remove one node from the ring. For more information, see Cassandra cluster Rolling upgrade. Then, you detach the secondary network interface from one of the EC2 instances in the ring and attach it to the new EC2 instance. Restart the Cassandra service and wait for it to sync. Repeat this process for all nodes in the cluster.

Backup and restore

Your backup and restore strategy is dependent on the type of storage used in the deployment. Cassandra supports snapshots and incremental backups. When using instance store, a file-based backup tool works best. Customers use rsync or other third-party products to copy data backups from the instance to long-term storage. For more information, see Backing up and restoring data in the DataStax documentation. This process has to be repeated for all instances in the cluster for a complete backup. These backup files are copied back to new instances to restore. We recommend using S3 to durably store backup files for long-term storage.

For Amazon EBS based deployments, you can enable automated snapshots of EBS volumes to back up volumes. New EBS volumes can be easily created from these snapshots for restoration.

Security

We recommend that you think about security in all aspects of deployment. The first step is to ensure that the data is encrypted at rest and in transit. The second step is to restrict access to unauthorized users. For more information about security, see the Cassandra documentation.

Encryption at rest

Encryption at rest can be achieved by using EBS volumes with encryption enabled. Amazon EBS uses AWS KMS for encryption. For more information, see Amazon EBS Encryption.

Instance store–based deployments require using an encrypted file system or an AWS partner solution. If you are using DataStax Enterprise, it supports transparent data encryption.

Encryption in transit

Cassandra uses Transport Layer Security (TLS) for client and internode communications.

Authentication

The security mechanism is pluggable, which means that you can easily swap out one authentication method for another. You can also provide your own method of authenticating to Cassandra, such as a Kerberos ticket, or if you want to store passwords in a different location, such as an LDAP directory.

Authorization

The authorizer that’s plugged in by default is org.apache.cassandra.auth.Allow AllAuthorizer. Cassandra also provides a role-based access control (RBAC) capability, which allows you to create roles and assign permissions to these roles.

Conclusion

In this post, we discussed several patterns for running Cassandra in the AWS Cloud. This post describes how you can manage Cassandra databases running on Amazon EC2. AWS also provides managed offerings for a number of databases. To learn more, see Purpose-built databases for all your application needs.

If you have questions or suggestions, please comment below.


Additional Reading

If you found this post useful, be sure to check out Analyze Your Data on Amazon DynamoDB with Apache Spark and Analysis of Top-N DynamoDB Objects using Amazon Athena and Amazon QuickSight.


About the Authors

Prasad Alle is a Senior Big Data Consultant with AWS Professional Services. He spends his time leading and building scalable, reliable Big data, Machine learning, Artificial Intelligence and IoT solutions for AWS Enterprise and Strategic customers. His interests extend to various technologies such as Advanced Edge Computing, Machine learning at Edge. In his spare time, he enjoys spending time with his family.

 

 

 

Provanshu Dey is a Senior IoT Consultant with AWS Professional Services. He works on highly scalable and reliable IoT, data and machine learning solutions with our customers. In his spare time, he enjoys spending time with his family and tinkering with electronics & gadgets.

 

 

 

Wanted: Senior Systems Administrator

Post Syndicated from Yev original https://www.backblaze.com/blog/wanted-senior-systems-administrator/

Wanted: Senior Systems Administrator

We’re looking for someone who enjoys solving difficult problems, running down elusive tech gremlins, and improving our environment one server at a time. If you enjoy being stretched, learning new skills, and want to look forward to seeing your co-workers every day, then we want you!

Backblaze is a small (in headcount) cloud storage (and backup!) company with a big mission, bringing feature-rich and accessible services to the masses, even if they don’t have unlimited VC funding (because we don’t either)! We believe in a fun and positive work environment where people can learn and grow, and where a sense of community is not just a buzzword from a company handbook (though you might probably find it in there).

What You’ll Be Doing

  • Mastering your craft, becoming a subject matter expert, and acting as an escalation point for areas of expertise (this means responding to pages in your areas of ownership as well)
  • Leading projects across a range of IT operations disciplines
  • Developing a thorough understanding of the environment and the skills necessary to troubleshoot all systems and services
  • Collaborating closely with other teams (Engineering, Infrastructure, etc.) to build out new systems and improve existing ones
  • Participating in on-call rotation when necessary
  • Petting the office dogs when appropriate

What You Should Have

  • 5+ years of work as a Systems Administrator (or equivalent college degree)
  • Expert knowledge of Linux systems administration (Debian preferred)
  • Ability to work under pressure in a fast-paced startup environment
  • A passion for build and improving all manner of systems and services
  • Excellent problem solving, investigative, and troubleshooting skills
  • Strong interpersonal communication skills
  • Local enough to commute to San Mateo office

Highly Desirable Skills

  • Experience working at a technology/software startup
  • Configuration management and automation software (Ansible preferred)
  • Familiarity with server and storage system hardware and configurations
  • Understanding of Java servlet containers (Tomcat preferred)
  • Skill in administration of different software suites and cloud-based integrations (G Suite, PagerDuty, etc.)
  • Comprehension of standard web services and packages (WordPress, Apache, etc.)

Some Backblaze Perks

  • Generous healthcare plans
  • Competitive compensation and 401k
  • All employees receive Option grants
  • Unlimited vacation days
  • Strong coffee
  • Fully stocked Micro kitchens
  • Weekly catered breakfast and lunches
  • Awesome people who work on awesome projects
  • Childcare bonus (human children only)
  • Get to bring your (well behaved) pets into the office
  • Backblaze is an Equal Opportunity Employer and we offer competitive salary and benefits, including our no policy vacation policy

If this sounds like you — follow these steps:

  1. Send an email to jobscontact@backblaze.com with the position in the subject line.
  2. Include your resume.
  3. Tell us a bit about your experience and why you’re excited to work with Backblaze.

The post Wanted: Senior Systems Administrator appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Getting product security engineering right

Post Syndicated from Michal Zalewski original http://lcamtuf.blogspot.com/2018/02/getting-product-security-engineering.html

Product security is an interesting animal: it is a uniquely cross-disciplinary endeavor that spans policy, consulting,
process automation, in-depth software engineering, and cutting-edge vulnerability research. And in contrast to many
other specializations in our field of expertise – say, incident response or network security – we have virtually no
time-tested and coherent frameworks for setting it up within a company of any size.

In my previous post, I shared some thoughts
on nurturing technical organizations and cultivating the right kind of leadership within. Today, I figured it would
be fitting to follow up with several notes on what I learned about structuring product security work – and about actually
making the effort count.

The “comfort zone” trap

For security engineers, knowing your limits is a sought-after quality: there is nothing more dangerous than a security
expert who goes off script and starts dispensing authoritatively-sounding but bogus advice on a topic they know very
little about. But that same quality can be destructive when it prevents us from growing beyond our most familiar role: that of
a critic who pokes holes in other people’s designs.

The role of a resident security critic lends itself all too easily to a sense of supremacy: the mistaken
belief that our cognitive skills exceed the capabilities of the engineers and product managers who come to us for help
– and that the cool bugs we file are the ultimate proof of our special gift. We start taking pride in the mere act
of breaking somebody else’s software – and then write scathing but ineffectual critiques addressed to executives,
demanding that they either put a stop to a project or sign off on a risk. And hey, in the latter case, they better
brace for our triumphant “I told you so” at some later date.

Of course, escalations of this type have their place, but they need to be a very rare sight; when practiced routinely, they are a telltale
sign of a dysfunctional team. We might be failing to think up viable alternatives that are in tune with business or engineering needs; we might
be very unpersuasive, failing to communicate with other rational people in a language they understand; or it might be that our tolerance for risk
is badly out of whack with the rest of the company. Whatever the cause, I’ve seen high-level escalations where the security team
spoke of valiant efforts to resist inexplicably awful design decisions or data sharing setups; and where product leads in turn talked about
pressing business needs randomly blocked by obstinate security folks. Sometimes, simply having them compare their notes would be enough to arrive
at a technical solution – such as sharing a less sensitive subset of the data at hand.

To be effective, any product security program must be rooted in a partnership with the rest of the company, focused on helping them get stuff done
while eliminating or reducing security risks. To combat the toxic us-versus-them mentality, I found it helpful to have some team members with
software engineering backgrounds, even if it’s the ownership of a small open-source project or so. This can broaden our horizons, helping us see
that we all make the same mistakes – and that not every solution that sounds good on paper is usable once we code it up.

Getting off the treadmill

All security programs involve a good chunk of operational work. For product security, this can be a combination of product launch reviews, design consulting requests, incoming bug reports, or compliance-driven assessments of some sort. And curiously, such reactive work also has the property of gradually expanding to consume all the available resources on a team: next year is bound to bring even more review requests, even more regulatory hurdles, and even more incoming bugs to triage and fix.

Being more tractable, such routine tasks are also more readily enshrined in SDLs, SLAs, and all kinds of other official documents that are often mistaken for a mission statement that justifies the existence of our teams. Soon, instead of explaining to a developer why they should fix a particular problem right away, we end up pointing them to page 17 in our severity classification guideline, which defines that “severity 2” vulnerabilities need to be resolved within a month. Meanwhile, another policy may be telling them that they need to run a fuzzer or a web application scanner for a particular number of CPU-hours – no matter whether it makes sense or whether the job is set up right.

To run a product security program that scales sublinearly, stays abreast of future threats, and doesn’t erect bureaucratic speed bumps just for the sake of it, we need to recognize this inherent tendency for operational work to take over – and we need to reign it in. No matter what the last year’s policy says, we usually don’t need to be doing security reviews with a particular cadence or to a particular depth; if we need to scale them back 10% to staff a two-quarter project that fixes an important API and squashes an entire class of bugs, it’s a short-term risk we should feel empowered to take.

As noted in my earlier post, I find contingency planning to be a valuable tool in this regard: why not ask ourselves how the team would cope if the workload went up another 30%, but bad financial results precluded any team growth? It’s actually fun to think about such hypotheticals ahead of the time – and hey, if the ideas sound good, why not try them out today?

Living for a cause

It can be difficult to understand if our security efforts are structured and prioritized right; when faced with such uncertainty, it is natural to stick to the safe fundamentals – investing most of our resources into the very same things that everybody else in our industry appears to be focusing on today.

I think it’s important to combat this mindset – and if so, we might as well tackle it head on. Rather than focusing on tactical objectives and policy documents, try to write down a concise mission statement explaining why you are a team in the first place, what specific business outcomes you are aiming for, how do you prioritize it, and how you want it all to change in a year or two. It should be a fluid narrative that reads right and that everybody on your team can take pride in; my favorite way of starting the conversation is telling folks that we could always have a new VP tomorrow – and that the VP’s first order of business could be asking, “why do you have so many people here and how do I know they are doing the right thing?”. It’s a playful but realistic framing device that motivates people to get it done.

In general, a comprehensive product security program should probably start with the assumption that no matter how many resources we have at our disposal, we will never be able to stay in the loop on everything that’s happening across the company – and even if we did, we’re not going to be able to catch every single bug. It follows that one of our top priorities for the team should be making sure that bugs don’t happen very often; a scalable way of getting there is equipping engineers with intuitive and usable tools that make it easy to perform common tasks without having to worry about security at all. Examples include standardized, managed containers for production jobs; safe-by-default APIs, such as strict contextual autoescaping for XSS or type safety for SQL; security-conscious style guidelines; or plug-and-play libraries that take care of common crypto or ACL enforcement tasks.

Of course, not all problems can be addressed on framework level, and not every engineer will always reach for the right tools. Because of this, the next principle that I found to be worth focusing on is containment and mitigation: making sure that bugs are difficult to exploit when they happen, or that the damage is kept in check. The solutions in this space can range from low-level enhancements (say, hardened allocators or seccomp-bpf sandboxes) to client-facing features such as browser origin isolation or Content Security Policy.

The usual consulting, review, and outreach tasks are an important facet of a product security program, but probably shouldn’t be the sole focus of your team. It’s also best to avoid undue emphasis on vulnerability showmanship: while valuable in some contexts, it creates a hypercompetitive environment that may be hostile to less experienced team members – not to mention, squashing individual bugs offers very limited value if the same issue is likely to be reintroduced into the codebase the next day. I like to think of security reviews as a teaching opportunity instead: it’s a way to raise awareness, form partnerships with engineers, and help them develop lasting habits that reduce the incidence of bugs. Metrics to understand the impact of your work are important, too; if your engagements are seen mostly as a yet another layer of red tape, product teams will stop reaching out to you for advice.

The other tenet of a healthy product security effort requires us to recognize at a scale and given enough time, every defense mechanism is bound to fail – and so, we need ways to prevent bugs from turning into incidents. The efforts in this space may range from developing product-specific signals for the incident response and monitoring teams; to offering meaningful vulnerability reward programs and nourishing a healthy and respectful relationship with the research community; to organizing regular offensive exercises in hopes of spotting bugs before anybody else does.

Oh, one final note: an important feature of a healthy security program is the existence of multiple feedback loops that help you spot problems without the need to micromanage the organization and without being deathly afraid of taking chances. For example, the data coming from bug bounty programs, if analyzed correctly, offers a wonderful way to alert you to systemic problems in your codebase – and later on, to measure the impact of any remediation and hardening work.