Tag Archives: hosting

Host Your Apps with AWS Amplify Console from the AWS Amplify CLI

Post Syndicated from Brandon West original https://aws.amazon.com/blogs/aws/host-your-apps-with-aws-amplify-console-from-the-aws-amplify-cli/

Have you tried out AWS Amplify and AWS Amplify Console yet? In my opinion, they provide one of the fastest ways to get a new web application from idea to prototype on AWS. So what are they? AWS Amplify is an opinionated framework for building modern applications, with a toolchain for easily adding services like authentication (via Amazon Cognito) or storage (via Amazon Simple Storage Service (S3)) or GraphQL APIs, all via a command-line interface. AWS Amplify Console makes continuous deployment and hosting for your modern web apps easy. It supports hosting the frontend and backend assets for single page app (SPA) frameworks including React, Angular, Vue.js, Ionic, and Ember. It also supports static site generators like Gatsby, Eleventy, Hugo, VuePress, and Jekyll.

With today’s launch, hosting options available from the AWS Amplify CLI now include Amplify Console in addition to S3 and Amazon CloudFront. By using Amplify Console, you can take advantage of features like continuous deployment, instant cache invalidation, custom redirects, and simple configuration of custom domains.

Initializing an Amplify App

Let’s take a look at a quick example. We’ll be deploying a static site demo of Amazon Transcribe. I’ve already got the AWS Command Line Interface (CLI) installed, as well as the AWS Amplify CLI. I’ve forked and then cloned the sample code to my local machine. In the following gif, you can see the initialization process for an AWS Amplify app. (I sped things up a little for the gif. It might take a few seconds for your app to be created.)

Terminal session showing the "amplify init" workflow

Now that I’ve got my app initialized, I can add additional services. Let’s add some hosting via AWS Amplify Console. After choosing Amplify Console for hosting, I can pick manual deployment or continuous deployment using a git-based workflow.

Continuous Deployment

First, I’m going to set up continuous deployment so that changes to our git repo will trigger a build and deploy.

A screenshot of a terminal session adding Amplify Console to an Amplify project

The workflow for configuring continuous deployment requires a quick browser session. First, I select our git provider. The forked repo is on GitHub, so I need to authorize Amplify Console to use my GitHub account.

Screenshot of git provider selection

Once a provider is authorized, I choose the repo and branch to watch for changes.

Screenshot of repo and branch selection

AWS Amplify Console auto-detected the correct build settings, based on the contents of package.json.

Screenshot of build settings

Once I’ve confirmed the settings, the initial build and deploy will start. Then any changes to the selected git branch will result in additional builds and deploys. Now I need to finish the workflow in the CLI, and I the need the ARN of the new Amplify Console app for that. In the browser, under App Settings and then General, I copy the ARN and then paste it into my terminal, and check the status.

A screenshot of a terminal window where the app ARN is being set

A quick check of the URL in my browser confirms that the app has been successfully deployed.

A screenshot of the sample app we deployed in this post

Manual Deploys

Manual deploys with Amplify Console also provide a bunch of useful features. The CLI can now manage front-end environments, making it easy to add a test or dev environment. It’s also easy to add URL redirects and rewrites, or add a username/password via HTTP Basic Auth.

Configuring manual deploys is straightforward. Just set your environment name. When it’s time to deploy, run amplify publishand the build scripts defined during the initialization of the project will be run. The generated artifact will then be uploaded automatically.

A screenshot of a terminal window where manual deploys are configured

With manual deployments, you can set up multiple frontend environments (e.g. dev and prod) directly from the CLI. To create a new dev environment, run amplify env add (name it dev) and amplify publish. This will create a second frontend environment in Amplify Console. To view all your frontend and backend environments, run amplify console from the CLI to open your Amplify Console app.

Ever since using AWS Amplify Console for the first time a few weeks ago, it has become my go-to way to deploy applications, especially static sites. I’m excited to see the simplicity of hosting with AWS Amplify Console extended to the Amplify CLI, and I hope you are too. Happy building!

— Brandon

Amazon SageMaker Updates – Tokyo Region, CloudFormation, Chainer, and GreenGrass ML

Post Syndicated from Randall Hunt original https://aws.amazon.com/blogs/aws/sagemaker-tokyo-summit-2018/

Today, at the AWS Summit in Tokyo we announced a number of updates and new features for Amazon SageMaker. Starting today, SageMaker is available in Asia Pacific (Tokyo)! SageMaker also now supports CloudFormation. A new machine learning framework, Chainer, is now available in the SageMaker Python SDK, in addition to MXNet and Tensorflow. Finally, support for running Chainer models on several devices was added to AWS Greengrass Machine Learning.

Amazon SageMaker Chainer Estimator


Chainer is a popular, flexible, and intuitive deep learning framework. Chainer networks work on a “Define-by-Run” scheme, where the network topology is defined dynamically via forward computation. This is in contrast to many other frameworks which work on a “Define-and-Run” scheme where the topology of the network is defined separately from the data. A lot of developers enjoy the Chainer scheme since it allows them to write their networks with native python constructs and tools.

Luckily, using Chainer with SageMaker is just as easy as using a TensorFlow or MXNet estimator. In fact, it might even be a bit easier since it’s likely you can take your existing scripts and use them to train on SageMaker with very few modifications. With TensorFlow or MXNet users have to implement a train function with a particular signature. With Chainer your scripts can be a little bit more portable as you can simply read from a few environment variables like SM_MODEL_DIR, SM_NUM_GPUS, and others. We can wrap our existing script in a if __name__ == '__main__': guard and invoke it locally or on sagemaker.


import argparse
import os

if __name__ =='__main__':

    parser = argparse.ArgumentParser()

    # hyperparameters sent by the client are passed as command-line arguments to the script.
    parser.add_argument('--epochs', type=int, default=10)
    parser.add_argument('--batch-size', type=int, default=64)
    parser.add_argument('--learning-rate', type=float, default=0.05)

    # Data, model, and output directories
    parser.add_argument('--output-data-dir', type=str, default=os.environ['SM_OUTPUT_DATA_DIR'])
    parser.add_argument('--model-dir', type=str, default=os.environ['SM_MODEL_DIR'])
    parser.add_argument('--train', type=str, default=os.environ['SM_CHANNEL_TRAIN'])
    parser.add_argument('--test', type=str, default=os.environ['SM_CHANNEL_TEST'])

    args, _ = parser.parse_known_args()

    # ... load from args.train and args.test, train a model, write model to args.model_dir.

Then, we can run that script locally or use the SageMaker Python SDK to launch it on some GPU instances in SageMaker. The hyperparameters will get passed in to the script as CLI commands and the environment variables above will be autopopulated. When we call fit the input channels we pass will be populated in the SM_CHANNEL_* environment variables.


from sagemaker.chainer.estimator import Chainer
# Create my estimator
chainer_estimator = Chainer(
    entry_point='example.py',
    train_instance_count=1,
    train_instance_type='ml.p3.2xlarge',
    hyperparameters={'epochs': 10, 'batch-size': 64}
)
# Train my estimator
chainer_estimator.fit({'train': train_input, 'test': test_input})

# Deploy my estimator to a SageMaker Endpoint and get a Predictor
predictor = chainer_estimator.deploy(
    instance_type="ml.m4.xlarge",
    initial_instance_count=1
)

Now, instead of bringing your own docker container for training and hosting with Chainer, you can just maintain your script. You can see the full sagemaker-chainer-containers on github. One of my favorite features of the new container is built-in chainermn for easy multi-node distribution of your chainer training jobs.

There’s a lot more documentation and information available in both the README and the example notebooks.

AWS GreenGrass ML with Chainer

AWS GreenGrass ML now includes a pre-built Chainer package for all devices powered by Intel Atom, NVIDIA Jetson, TX2, and Raspberry Pi. So, now GreenGrass ML provides pre-built packages for TensorFlow, Apache MXNet, and Chainer! You can train your models on SageMaker then easily deploy it to any GreenGrass-enabled device using GreenGrass ML.

JAWS UG

I want to give a quick shout out to all of our wonderful and inspirational friends in the JAWS UG who attended the AWS Summit in Tokyo today. I’ve very much enjoyed seeing your pictures of the summit. Thanks for making Japan an amazing place for AWS developers! I can’t wait to visit again and meet with all of you.

Randall

Security and Human Behavior (SHB 2018)

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2018/05/security_and_hu_7.html

I’m at Carnegie Mellon University, at the eleventh Workshop on Security and Human Behavior.

SHB is a small invitational gathering of people studying various aspects of the human side of security, organized each year by Alessandro Acquisti, Ross Anderson, and myself. The 50 or so people in the room include psychologists, economists, computer security researchers, sociologists, political scientists, neuroscientists, designers, lawyers, philosophers, anthropologists, business school professors, and a smattering of others. It’s not just an interdisciplinary event; most of the people here are individually interdisciplinary.

The goal is to maximize discussion and interaction. We do that by putting everyone on panels, and limiting talks to 7-10 minutes. The rest of the time is left to open discussion. Four hour-and-a-half panels per day over two days equals eight panels; six people per panel means that 48 people get to speak. We also have lunches, dinners, and receptions — all designed so people from different disciplines talk to each other.

I invariably find this to be the most intellectually stimulating conference of my year. It influences my thinking in many different, and sometimes surprising, ways.

This year’s program is here. This page lists the participants and includes links to some of their work. As he does every year, Ross Anderson is liveblogging the talks. (Ross also maintains a good webpage of psychology and security resources.)

Here are my posts on the first, second, third, fourth, fifth, sixth, seventh, eighth, ninth, and tenth SHB workshops. Follow those links to find summaries, papers, and occasionally audio recordings of the various workshops.

Next year, I’ll be hosting the event at Harvard.

Analyze data in Amazon DynamoDB using Amazon SageMaker for real-time prediction

Post Syndicated from YongSeong Lee original https://aws.amazon.com/blogs/big-data/analyze-data-in-amazon-dynamodb-using-amazon-sagemaker-for-real-time-prediction/

Many companies across the globe use Amazon DynamoDB to store and query historical user-interaction data. DynamoDB is a fast NoSQL database used by applications that need consistent, single-digit millisecond latency.

Often, customers want to turn their valuable data in DynamoDB into insights by analyzing a copy of their table stored in Amazon S3. Doing this separates their analytical queries from their low-latency critical paths. This data can be the primary source for understanding customers’ past behavior, predicting future behavior, and generating downstream business value. Customers often turn to DynamoDB because of its great scalability and high availability. After a successful launch, many customers want to use the data in DynamoDB to predict future behaviors or provide personalized recommendations.

DynamoDB is a good fit for low-latency reads and writes, but it’s not practical to scan all data in a DynamoDB database to train a model. In this post, I demonstrate how you can use DynamoDB table data copied to Amazon S3 by AWS Data Pipeline to predict customer behavior. I also demonstrate how you can use this data to provide personalized recommendations for customers using Amazon SageMaker. You can also run ad hoc queries using Amazon Athena against the data. DynamoDB recently released on-demand backups to create full table backups with no performance impact. However, it’s not suitable for our purposes in this post, so I chose AWS Data Pipeline instead to create managed backups are accessible from other services.

To do this, I describe how to read the DynamoDB backup file format in Data Pipeline. I also describe how to convert the objects in S3 to a CSV format that Amazon SageMaker can read. In addition, I show how to schedule regular exports and transformations using Data Pipeline. The sample data used in this post is from Bank Marketing Data Set of UCI.

The solution that I describe provides the following benefits:

  • Separates analytical queries from production traffic on your DynamoDB table, preserving your DynamoDB read capacity units (RCUs) for important production requests
  • Automatically updates your model to get real-time predictions
  • Optimizes for performance (so it doesn’t compete with DynamoDB RCUs after the export) and for cost (using data you already have)
  • Makes it easier for developers of all skill levels to use Amazon SageMaker

All code and data set in this post are available in this .zip file.

Solution architecture

The following diagram shows the overall architecture of the solution.

The steps that data follows through the architecture are as follows:

  1. Data Pipeline regularly copies the full contents of a DynamoDB table as JSON into an S3
  2. Exported JSON files are converted to comma-separated value (CSV) format to use as a data source for Amazon SageMaker.
  3. Amazon SageMaker renews the model artifact and update the endpoint.
  4. The converted CSV is available for ad hoc queries with Amazon Athena.
  5. Data Pipeline controls this flow and repeats the cycle based on the schedule defined by customer requirements.

Building the auto-updating model

This section discusses details about how to read the DynamoDB exported data in Data Pipeline and build automated workflows for real-time prediction with a regularly updated model.

Download sample scripts and data

Before you begin, take the following steps:

  1. Download sample scripts in this .zip file.
  2. Unzip the src.zip file.
  3. Find the automation_script.sh file and edit it for your environment. For example, you need to replace 's3://<your bucket>/<datasource path>/' with your own S3 path to the data source for Amazon ML. In the script, the text enclosed by angle brackets—< and >—should be replaced with your own path.
  4. Upload the json-serde-1.3.6-SNAPSHOT-jar-with-dependencies.jar file to your S3 path so that the ADD jar command in Apache Hive can refer to it.

For this solution, the banking.csv  should be imported into a DynamoDB table.

Export a DynamoDB table

To export the DynamoDB table to S3, open the Data Pipeline console and choose the Export DynamoDB table to S3 template. In this template, Data Pipeline creates an Amazon EMR cluster and performs an export in the EMRActivity activity. Set proper intervals for backups according to your business requirements.

One core node(m3.xlarge) provides the default capacity for the EMR cluster and should be suitable for the solution in this post. Leave the option to resize the cluster before running enabled in the TableBackupActivity activity to let Data Pipeline scale the cluster to match the table size. The process of converting to CSV format and renewing models happens in this EMR cluster.

For a more in-depth look at how to export data from DynamoDB, see Export Data from DynamoDB in the Data Pipeline documentation.

Add the script to an existing pipeline

After you export your DynamoDB table, you add an additional EMR step to EMRActivity by following these steps:

  1. Open the Data Pipeline console and choose the ID for the pipeline that you want to add the script to.
  2. For Actions, choose Edit.
  3. In the editing console, choose the Activities category and add an EMR step using the custom script downloaded in the previous section, as shown below.

Paste the following command into the new step after the data ­­upload step:

s3://#{myDDBRegion}.elasticmapreduce/libs/script-runner/script-runner.jar,s3://<your bucket name>/automation_script.sh,#{output.directoryPath},#{myDDBRegion}

The element #{output.directoryPath} references the S3 path where the data pipeline exports DynamoDB data as JSON. The path should be passed to the script as an argument.

The bash script has two goals, converting data formats and renewing the Amazon SageMaker model. Subsequent sections discuss the contents of the automation script.

Automation script: Convert JSON data to CSV with Hive

We use Apache Hive to transform the data into a new format. The Hive QL script to create an external table and transform the data is included in the custom script that you added to the Data Pipeline definition.

When you run the Hive scripts, do so with the -e option. Also, define the Hive table with the 'org.openx.data.jsonserde.JsonSerDe' row format to parse and read JSON format. The SQL creates a Hive EXTERNAL table, and it reads the DynamoDB backup data on the S3 path passed to it by Data Pipeline.

Note: You should create the table with the “EXTERNAL” keyword to avoid the backup data being accidentally deleted from S3 if you drop the table.

The full automation script for converting follows. Add your own bucket name and data source path in the highlighted areas.

#!/bin/bash
hive -e "
ADD jar s3://<your bucket name>/json-serde-1.3.6-SNAPSHOT-jar-with-dependencies.jar ; 
DROP TABLE IF EXISTS blog_backup_data ;
CREATE EXTERNAL TABLE blog_backup_data (
 customer_id map<string,string>,
 age map<string,string>, job map<string,string>, 
 marital map<string,string>,education map<string,string>, 
 default map<string,string>, housing map<string,string>,
 loan map<string,string>, contact map<string,string>, 
 month map<string,string>, day_of_week map<string,string>, 
 duration map<string,string>, campaign map<string,string>,
 pdays map<string,string>, previous map<string,string>, 
 poutcome map<string,string>, emp_var_rate map<string,string>, 
 cons_price_idx map<string,string>, cons_conf_idx map<string,string>,
 euribor3m map<string,string>, nr_employed map<string,string>, 
 y map<string,string> ) 
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' 
LOCATION '$1/';

INSERT OVERWRITE DIRECTORY 's3://<your bucket name>/<datasource path>/' 
SELECT concat( customer_id['s'],',', 
 age['n'],',', job['s'],',', 
 marital['s'],',', education['s'],',', default['s'],',', 
 housing['s'],',', loan['s'],',', contact['s'],',', 
 month['s'],',', day_of_week['s'],',', duration['n'],',', 
 campaign['n'],',',pdays['n'],',',previous['n'],',', 
 poutcome['s'],',', emp_var_rate['n'],',', cons_price_idx['n'],',',
 cons_conf_idx['n'],',', euribor3m['n'],',', nr_employed['n'],',', y['n'] ) 
FROM blog_backup_data
WHERE customer_id['s'] > 0 ; 

After creating an external table, you need to read data. You then use the INSERT OVERWRITE DIRECTORY ~ SELECT command to write CSV data to the S3 path that you designated as the data source for Amazon SageMaker.

Depending on your requirements, you can eliminate or process the columns in the SELECT clause in this step to optimize data analysis. For example, you might remove some columns that have unpredictable correlations with the target value because keeping the wrong columns might expose your model to “overfitting” during the training. In this post, customer_id  columns is removed. Overfitting can make your prediction weak. More information about overfitting can be found in the topic Model Fit: Underfitting vs. Overfitting in the Amazon ML documentation.

Automation script: Renew the Amazon SageMaker model

After the CSV data is replaced and ready to use, create a new model artifact for Amazon SageMaker with the updated dataset on S3.  For renewing model artifact, you must create a new training job.  Training jobs can be run using the AWS SDK ( for example, Amazon SageMaker boto3 ) or the Amazon SageMaker Python SDK that can be installed with “pip install sagemaker” command as well as the AWS CLI for Amazon SageMaker described in this post.

In addition, consider how to smoothly renew your existing model without service impact, because your model is called by applications in real time. To do this, you need to create a new endpoint configuration first and update a current endpoint with the endpoint configuration that is just created.

#!/bin/bash
## Define variable 
REGION=$2
DTTIME=`date +%Y-%m-%d-%H-%M-%S`
ROLE="<your AmazonSageMaker-ExecutionRole>" 


# Select containers image based on region.  
case "$REGION" in
"us-west-2" )
    IMAGE="174872318107.dkr.ecr.us-west-2.amazonaws.com/linear-learner:latest"
    ;;
"us-east-1" )
    IMAGE="382416733822.dkr.ecr.us-east-1.amazonaws.com/linear-learner:latest" 
    ;;
"us-east-2" )
    IMAGE="404615174143.dkr.ecr.us-east-2.amazonaws.com/linear-learner:latest" 
    ;;
"eu-west-1" )
    IMAGE="438346466558.dkr.ecr.eu-west-1.amazonaws.com/linear-learner:latest" 
    ;;
 *)
    echo "Invalid Region Name"
    exit 1 ;  
esac

# Start training job and creating model artifact 
TRAINING_JOB_NAME=TRAIN-${DTTIME} 
S3OUTPUT="s3://<your bucket name>/model/" 
INSTANCETYPE="ml.m4.xlarge"
INSTANCECOUNT=1
VOLUMESIZE=5 
aws sagemaker create-training-job --training-job-name ${TRAINING_JOB_NAME} --region ${REGION}  --algorithm-specification TrainingImage=${IMAGE},TrainingInputMode=File --role-arn ${ROLE}  --input-data-config '[{ "ChannelName": "train", "DataSource": { "S3DataSource": { "S3DataType": "S3Prefix", "S3Uri": "s3://<your bucket name>/<datasource path>/", "S3DataDistributionType": "FullyReplicated" } }, "ContentType": "text/csv", "CompressionType": "None" , "RecordWrapperType": "None"  }]'  --output-data-config S3OutputPath=${S3OUTPUT} --resource-config  InstanceType=${INSTANCETYPE},InstanceCount=${INSTANCECOUNT},VolumeSizeInGB=${VOLUMESIZE} --stopping-condition MaxRuntimeInSeconds=120 --hyper-parameters feature_dim=20,predictor_type=binary_classifier  

# Wait until job completed 
aws sagemaker wait training-job-completed-or-stopped --training-job-name ${TRAINING_JOB_NAME}  --region ${REGION}

# Get newly created model artifact and create model
MODELARTIFACT=`aws sagemaker describe-training-job --training-job-name ${TRAINING_JOB_NAME} --region ${REGION}  --query 'ModelArtifacts.S3ModelArtifacts' --output text `
MODELNAME=MODEL-${DTTIME}
aws sagemaker create-model --region ${REGION} --model-name ${MODELNAME}  --primary-container Image=${IMAGE},ModelDataUrl=${MODELARTIFACT}  --execution-role-arn ${ROLE}

# create a new endpoint configuration 
CONFIGNAME=CONFIG-${DTTIME}
aws sagemaker  create-endpoint-config --region ${REGION} --endpoint-config-name ${CONFIGNAME}  --production-variants  VariantName=Users,ModelName=${MODELNAME},InitialInstanceCount=1,InstanceType=ml.m4.xlarge

# create or update the endpoint
STATUS=`aws sagemaker describe-endpoint --endpoint-name  ServiceEndpoint --query 'EndpointStatus' --output text --region ${REGION} `
if [[ $STATUS -ne "InService" ]] ;
then
    aws sagemaker  create-endpoint --endpoint-name  ServiceEndpoint  --endpoint-config-name ${CONFIGNAME} --region ${REGION}    
else
    aws sagemaker  update-endpoint --endpoint-name  ServiceEndpoint  --endpoint-config-name ${CONFIGNAME} --region ${REGION}
fi

Grant permission

Before you execute the script, you must grant proper permission to Data Pipeline. Data Pipeline uses the DataPipelineDefaultResourceRole role by default. I added the following policy to DataPipelineDefaultResourceRole to allow Data Pipeline to create, delete, and update the Amazon SageMaker model and data source in the script.

{
 "Version": "2012-10-17",
 "Statement": [
 {
 "Effect": "Allow",
 "Action": [
 "sagemaker:CreateTrainingJob",
 "sagemaker:DescribeTrainingJob",
 "sagemaker:CreateModel",
 "sagemaker:CreateEndpointConfig",
 "sagemaker:DescribeEndpoint",
 "sagemaker:CreateEndpoint",
 "sagemaker:UpdateEndpoint",
 "iam:PassRole"
 ],
 "Resource": "*"
 }
 ]
}

Use real-time prediction

After you deploy a model into production using Amazon SageMaker hosting services, your client applications use this API to get inferences from the model hosted at the specified endpoint. This approach is useful for interactive web, mobile, or desktop applications.

Following, I provide a simple Python code example that queries against Amazon SageMaker endpoint URL with its name (“ServiceEndpoint”) and then uses them for real-time prediction.

=== Python sample for real-time prediction ===

#!/usr/bin/env python
import boto3
import json 

client = boto3.client('sagemaker-runtime', region_name ='<your region>' )
new_customer_info = '34,10,2,4,1,2,1,1,6,3,190,1,3,4,3,-1.7,94.055,-39.8,0.715,4991.6'
response = client.invoke_endpoint(
    EndpointName='ServiceEndpoint',
    Body=new_customer_info, 
    ContentType='text/csv'
)
result = json.loads(response['Body'].read().decode())
print(result)
--- output(response) ---
{u'predictions': [{u'score': 0.7528127431869507, u'predicted_label': 1.0}]}

Solution summary

The solution takes the following steps:

  1. Data Pipeline exports DynamoDB table data into S3. The original JSON data should be kept to recover the table in the rare event that this is needed. Data Pipeline then converts JSON to CSV so that Amazon SageMaker can read the data.Note: You should select only meaningful attributes when you convert CSV. For example, if you judge that the “campaign” attribute is not correlated, you can eliminate this attribute from the CSV.
  2. Train the Amazon SageMaker model with the new data source.
  3. When a new customer comes to your site, you can judge how likely it is for this customer to subscribe to your new product based on “predictedScores” provided by Amazon SageMaker.
  4. If the new user subscribes your new product, your application must update the attribute “y” to the value 1 (for yes). This updated data is provided for the next model renewal as a new data source. It serves to improve the accuracy of your prediction. With each new entry, your application can become smarter and deliver better predictions.

Running ad hoc queries using Amazon Athena

Amazon Athena is a serverless query service that makes it easy to analyze large amounts of data stored in Amazon S3 using standard SQL. Athena is useful for examining data and collecting statistics or informative summaries about data. You can also use the powerful analytic functions of Presto, as described in the topic Aggregate Functions of Presto in the Presto documentation.

With the Data Pipeline scheduled activity, recent CSV data is always located in S3 so that you can run ad hoc queries against the data using Amazon Athena. I show this with example SQL statements following. For an in-depth description of this process, see the post Interactive SQL Queries for Data in Amazon S3 on the AWS News Blog. 

Creating an Amazon Athena table and running it

Simply, you can create an EXTERNAL table for the CSV data on S3 in Amazon Athena Management Console.

=== Table Creation ===
CREATE EXTERNAL TABLE datasource (
 age int, 
 job string, 
 marital string , 
 education string, 
 default string, 
 housing string, 
 loan string, 
 contact string, 
 month string, 
 day_of_week string, 
 duration int, 
 campaign int, 
 pdays int , 
 previous int , 
 poutcome string, 
 emp_var_rate double, 
 cons_price_idx double,
 cons_conf_idx double, 
 euribor3m double, 
 nr_employed double, 
 y int 
)
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ',' ESCAPED BY '\\' LINES TERMINATED BY '\n' 
LOCATION 's3://<your bucket name>/<datasource path>/';

The following query calculates the correlation coefficient between the target attribute and other attributes using Amazon Athena.

=== Sample Query ===

SELECT corr(age,y) AS correlation_age_and_target, 
 corr(duration,y) AS correlation_duration_and_target, 
 corr(campaign,y) AS correlation_campaign_and_target,
 corr(contact,y) AS correlation_contact_and_target
FROM ( SELECT age , duration , campaign , y , 
 CASE WHEN contact = 'telephone' THEN 1 ELSE 0 END AS contact 
 FROM datasource 
 ) datasource ;

Conclusion

In this post, I introduce an example of how to analyze data in DynamoDB by using table data in Amazon S3 to optimize DynamoDB table read capacity. You can then use the analyzed data as a new data source to train an Amazon SageMaker model for accurate real-time prediction. In addition, you can run ad hoc queries against the data on S3 using Amazon Athena. I also present how to automate these procedures by using Data Pipeline.

You can adapt this example to your specific use case at hand, and hopefully this post helps you accelerate your development. You can find more examples and use cases for Amazon SageMaker in the video AWS 2017: Introducing Amazon SageMaker on the AWS website.

 


Additional Reading

If you found this post useful, be sure to check out Serving Real-Time Machine Learning Predictions on Amazon EMR and Analyzing Data in S3 using Amazon Athena.

 


About the Author

Yong Seong Lee is a Cloud Support Engineer for AWS Big Data Services. He is interested in every technology related to data/databases and helping customers who have difficulties in using AWS services. His motto is “Enjoy life, be curious and have maximum experience.”

 

 

CI/CD with Data: Enabling Data Portability in a Software Delivery Pipeline with AWS Developer Tools, Kubernetes, and Portworx

Post Syndicated from Kausalya Rani Krishna Samy original https://aws.amazon.com/blogs/devops/cicd-with-data-enabling-data-portability-in-a-software-delivery-pipeline-with-aws-developer-tools-kubernetes-and-portworx/

This post is written by Eric Han – Vice President of Product Management Portworx and Asif Khan – Solutions Architect

Data is the soul of an application. As containers make it easier to package and deploy applications faster, testing plays an even more important role in the reliable delivery of software. Given that all applications have data, development teams want a way to reliably control, move, and test using real application data or, at times, obfuscated data.

For many teams, moving application data through a CI/CD pipeline, while honoring compliance and maintaining separation of concerns, has been a manual task that doesn’t scale. At best, it is limited to a few applications, and is not portable across environments. The goal should be to make running and testing stateful containers (think databases and message buses where operations are tracked) as easy as with stateless (such as with web front ends where they are often not).

Why is state important in testing scenarios? One reason is that many bugs manifest only when code is tested against real data. For example, we might simply want to test a database schema upgrade but a small synthetic dataset does not exercise the critical, finer corner cases in complex business logic. If we want true end-to-end testing, we need to be able to easily manage our data or state.

In this blog post, we define a CI/CD pipeline reference architecture that can automate data movement between applications. We also provide the steps to follow to configure the CI/CD pipeline.

 

Stateful Pipelines: Need for Portable Volumes

As part of continuous integration, testing, and deployment, a team may need to reproduce a bug found in production against a staging setup. Here, the hosting environment is comprised of a cluster with Kubernetes as the scheduler and Portworx for persistent volumes. The testing workflow is then automated by AWS CodeCommit, AWS CodePipeline, and AWS CodeBuild.

Portworx offers Kubernetes storage that can be used to make persistent volumes portable between AWS environments and pipelines. The addition of Portworx to the AWS Developer Tools continuous deployment for Kubernetes reference architecture adds persistent storage and storage orchestration to a Kubernetes cluster. The example uses MongoDB as the demonstration of a stateful application. In practice, the workflow applies to any containerized application such as Cassandra, MySQL, Kafka, and Elasticsearch.

Using the reference architecture, a developer calls CodePipeline to trigger a snapshot of the running production MongoDB database. Portworx then creates a block-based, writable snapshot of the MongoDB volume. Meanwhile, the production MongoDB database continues serving end users and is uninterrupted.

Without the Portworx integrations, a manual process would require an application-level backup of the database instance that is outside of the CI/CD process. For larger databases, this could take hours and impact production. The use of block-based snapshots follows best practices for resilient and non-disruptive backups.

As part of the workflow, CodePipeline deploys a new MongoDB instance for staging onto the Kubernetes cluster and mounts the second Portworx volume that has the data from production. CodePipeline triggers the snapshot of a Portworx volume through an AWS Lambda function, as shown here

 

 

 

AWS Developer Tools with Kubernetes: Integrated Workflow with Portworx

In the following workflow, a developer is testing changes to a containerized application that calls on MongoDB. The tests are performed against a staging instance of MongoDB. The same workflow applies if changes were on the server side. The original production deployment is scheduled as a Kubernetes deployment object and uses Portworx as the storage for the persistent volume.

The continuous deployment pipeline runs as follows:

  • Developers integrate bug fix changes into a main development branch that gets merged into a CodeCommit master branch.
  • Amazon CloudWatch triggers the pipeline when code is merged into a master branch of an AWS CodeCommit repository.
  • AWS CodePipeline sends the new revision to AWS CodeBuild, which builds a Docker container image with the build ID.
  • AWS CodeBuild pushes the new Docker container image tagged with the build ID to an Amazon ECR registry.
  • Kubernetes downloads the new container (for the database client) from Amazon ECR and deploys the application (as a pod) and staging MongoDB instance (as a deployment object).
  • AWS CodePipeline, through a Lambda function, calls Portworx to snapshot the production MongoDB and deploy a staging instance of MongoDB• Portworx provides a snapshot of the production instance as the persistent storage of the staging MongoDB
    • The MongoDB instance mounts the snapshot.

At this point, the staging setup mimics a production environment. Teams can run integration and full end-to-end tests, using partner tooling, without impacting production workloads. The full pipeline is shown here.

 

Summary

This reference architecture showcases how development teams can easily move data between production and staging for the purposes of testing. Instead of taking application-specific manual steps, all operations in this CodePipeline architecture are automated and tracked as part of the CI/CD process.

This integrated experience is part of making stateful containers as easy as stateless. With AWS CodePipeline for CI/CD process, developers can easily deploy stateful containers onto a Kubernetes cluster with Portworx storage and automate data movement within their process.

The reference architecture and code are available on GitHub:

● Reference architecture: https://github.com/portworx/aws-kube-codesuite
● Lambda function source code for Portworx additions: https://github.com/portworx/aws-kube-codesuite/blob/master/src/kube-lambda.py

For more information about persistent storage for containers, visit the Portworx website. For more information about Code Pipeline, see the AWS CodePipeline User Guide.

AIY Projects 2: Google’s AIY Projects Kits get an upgrade

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/google-aiy-projects-2/

After the outstanding success of their AIY Projects Voice and Vision Kits, Google has announced the release of upgraded kits, complete with Raspberry Pi Zero WH, Camera Module, and preloaded SD card.

Google AIY Projects Vision Kit 2 Raspberry Pi

Google’s AIY Projects Kits

Google launched the AIY Projects Voice Kit last year, first as a cover gift with The MagPi magazine and later as a standalone product.

Makers needed to provide their own Raspberry Pi for the original kit. The new kits include everything you need, from Pi to SD card.

Within a DIY cardboard box, makers were able to assemble their own voice-activated AI assistant akin to the Amazon Alexa, Apple’s Siri, and Google’s own Google Home Assistant. The Voice Kit was an instant hit that spurred no end of maker videos and tutorials, including our own free tutorial for controlling a robot using voice commands.

Later in the year, the team followed up the success of the Voice Kit with the AIY Projects Vision Kit — the same cardboard box hosting a camera perfect for some pretty nifty image recognition projects.

For more on the AIY Voice Kit, here’s our release video hosted by the rather delightful Rob Zwetsloot.

AIY Projects adds natural human interaction to your Raspberry Pi

Check out the exclusive Google AIY Projects Kit that comes free with The MagPi 57! Grab yourself a copy in stores or online now: http://magpi.cc/2pI6IiQ This first AIY Projects kit taps into the Google Assistant SDK and Cloud Speech API using the AIY Projects Voice HAT (Hardware Accessory on Top) board, stereo microphone, and speaker (included free with the magazine).

AIY Projects 2

So what’s new with version 2 of the AIY Projects Voice Kit? The kit now includes the recently released Raspberry Pi Zero WH, our Zero W with added pre-soldered header pins for instant digital making accessibility. Purchasers of the kits will also get a micro SD card with preloaded OS to help them get started without having to set the card up themselves.

Google AIY Projects Vision Kit 2 Raspberry Pi

Everything you need to build your own Raspberry Pi-powered Google voice assistant

In the newly upgraded AIY Projects Vision Kit v1.2, makers are also treated to an official Raspberry Pi Camera Module v2, the latest model of our add-on camera.

Google AIY Projects Vision Kit 2 Raspberry Pi

“Everything you need to get started is right there in the box,” explains Billy Rutledge, Google’s Director of AIY Projects. “We knew from our research that even though makers are interested in AI, many felt that adding it to their projects was too difficult or required expensive hardware.”

Google AIY Projects Vision Kit 2 Raspberry Pi
Google AIY Projects Vision Kit 2 Raspberry Pi
Google AIY Projects Vision Kit 2 Raspberry Pi

Google is also hard at work producing AIY Projects companion apps for Android, iOS, and Chrome. The Android app is available now to coincide with the launch of the upgraded kits, with the other two due for release soon. The app supports wireless setup of the AIY Kit, though avid coders will still be able to hack theirs to better suit their projects.

Google has also updated the AIY Projects website with an AIY Models section highlighting a range of neural network projects for the kits.

Get your kit

The updated Voice and Vision Kits were announced last night, and in the US they are available now from Target. UK-based makers should be able to get their hands on them this summer — keep an eye on our social channels for updates and links.

The post AIY Projects 2: Google’s AIY Projects Kits get an upgrade appeared first on Raspberry Pi.

Cybersecurity Insurance

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2018/04/cybersecurity_i_1.html

Good article about how difficult it is to insure an organization against Internet attacks, and how expensive the insurance is.

Companies like retailers, banks, and healthcare providers began seeking out cyberinsurance in the early 2000s, when states first passed data breach notification laws. But even with 20 years’ worth of experience and claims data in cyberinsurance, underwriters still struggle with how to model and quantify a unique type of risk.

“Typically in insurance we use the past as prediction for the future, and in cyber that’s very difficult to do because no two incidents are alike,” said Lori Bailey, global head of cyberrisk for the Zurich Insurance Group. Twenty years ago, policies dealt primarily with data breaches and third-party liability coverage, like the costs associated with breach class-action lawsuits or settlements. But more recent policies tend to accommodate first-party liability coverage, including costs like online extortion payments, renting temporary facilities during an attack, and lost business due to systems failures, cloud or web hosting provider outages, or even IT configuration errors.

In my new book — out in September — I write:

There are challenges to creating these new insurance products. There are two basic models for insurance. There’s the fire model, where individual houses catch on fire at a fairly steady rate, and the insurance industry can calculate premiums based on that rate. And there’s the flood model, where an infrequent large-scale event affects large numbers of people — but again at a fairly steady rate. Internet+ insurance is complicated because it follows neither of those models but instead has aspects of both: individuals are hacked at a steady (albeit increasing) rate, while class breaks and massive data breaches affect lots of people at once. Also, the constantly changing technology landscape makes it difficult to gather and analyze the historical data necessary to calculate premiums.

BoingBoing article.

Cloud Empire: Meet the Rebel Alliance

Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/cloud-empire-meet-the-rebel-alliance/

Cloud Empire: Meet the Rebel Alliance

Last week Backblaze made the exciting announcement that through partnerships with Packet and ServerCentral, cloud computing is available to Backblaze B2 Cloud Storage customers.

Those of you familiar with cloud computing will understand the significance of this news. We are now offering the least expensive cloud storage + cloud computing available anywhere. You no longer have to submit to the lock-in tactics and exorbitant prices charged by the other big players in the cloud services biz.

As Robin Harris wrote in ZDNet about last week’s computing partners announcement, Cloud Empire: Meet the Rebel Alliance.

We understand that some of our cloud backup and storage customers might be unfamiliar with cloud computing. Backblaze made its name in cloud backup and object storage, and that’s what our customers know us for. In response to customers requests, we’ve directly connected our B2 cloud object storage with cloud compute providers. This adds the ability to use and run programs on data once it’s in the B2 cloud, opening up a world of new uses for B2. Just some of the possibilities include media transcoding and rendering, web hosting, application development and testing, business analytics, disaster recovery, on-demand computing capacity (cloud bursting), AI, and mobile and IoT applications.

The world has been moving to a multi-cloud / hybrid cloud world, and customers are looking for more choices than those offered by the existing cloud players. Our B2 compute partnerships build on our mission to offer cloud storage that’s astonishingly easy and low-cost. They enable our customers to move into a more flexible and affordable cloud services ecosystem that provides a greater variety of choices and costs far less. We believe we are helping to fulfill the promise of the internet by allowing customers to choose the best-of-breed services from the best vendors.

If You’re Not Familiar with Cloud Computing, Here’s a Quick Overview

Cloud computing is another component of cloud services, like object storage, that replicates in the cloud a basic function of a computer system. Think of services that operate in a cloud as an infinitely scalable version of what happens on your desktop computer. In your desktop computer you have computing/processing (CPU), fast storage (like an SSD), data storage (like your disk drive), and memory (RAM). Their counterparts in the cloud are computing (CPU), block storage (fast storage), object storage (data storage), and processing memory (RAM).

Computer building blocks

CPU, RAM, fast internal storage, and a hard drive are the basic building blocks of a computer
They also are the basic building blocks of cloud computing

Some customers require only some of these services, such as cloud storage. B2 as a standalone service has proven to be an outstanding solution for those customers interested in backing up or archiving data. There are many customers that would like additional capabilities, such as performing operations on that data once it’s in the cloud. They need object storage combined with computing.

With the just announced compute partnerships, Backblaze is able to offer computing services to anyone using B2. A direct connection between Backblaze’s and our partners’ data centers means that our customers can process data stored in B2 with high speed, low latency, and zero data transfer costs.

Backblaze, Packet and Server Central cloud compute workflow diagram

Cloud service providers package up CPU, storage, and memory into services that you can rent on an hourly basis
You can scale up and down and add or remove services as you need them

How Does Computing + B2 Work?

Those wanting to use B2 with computing will need to sign up for accounts with Backblaze and either Packet or ServerCentral. Packet customers need only select “SJC1” as their region and then get started. The process is also simple for ServerCentral customers — they just need to register with a ServerCentral account rep.

The direct connection between B2 and our compute partners means customers will experience very low latency (less than 10ms) between services. Even better, all data transfers between B2 and the compute partner are free. When combined with Backblaze B2, customers can obtain cloud computing services for as little as 50% of the cost of Amazon’s Elastic Compute Cloud (EC2).

Opening Up the Cloud “Walled Garden”

Traditionally, cloud vendors charge fees for customers to move data outside the “walled garden” of that particular vendor. These fees reach upwards of $0.12 per gigabyte (GB) for data egress. This large fee for customers accessing their own data restricts users from using a multi-cloud approach and taking advantage of less expensive or better performing options. With free transfers between B2 and Packet or ServerCentral, customers now have a predictable, scalable solution for computing and data storage while avoiding vendor lock in. Dropbox made waves when they saved $75 million by migrating off of AWS. Adding computing to B2 helps anyone interested in moving some or all of their computing off of AWS and thereby cutting their AWS bill by 50% or more.

What are the Advantages of Cloud Storage + Computing?

Using computing and storage in the cloud provide a number of advantages over using in-house resources.

  1. You don’t have to purchase the actual hardware, software licenses, and provide space and IT resources for the systems.
  2. Cloud computing is available with just a few minutes notice and you only pay for whatever period of time you need. You avoid having additional hardware on your balance sheet.
  3. Resources are in the cloud and can provide online services to customers, mobile users, and partners located anywhere in the world.
  4. You can isolate the work on these systems from your normal production environment, making them ideal for testing and trying out new applications and development projects.
  5. Computing resources scale when you need them to, providing temporary or ongoing extra resources for expected or unexpected demand.
  6. They can provide redundant and failover services when and if your primary systems are unavailable for whatever reason.

Where Can I Learn More?

We encourage B2 customers to explore the options available at our partner sites, Packet and ServerCentral. They are happy to help customers understand what services are available and how to get started.

We are excited to see what you build! And please tell us in the comments what you are doing or have planned with B2 + computing.

P.S. May the force be with all of us!

The post Cloud Empire: Meet the Rebel Alliance appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Amazon SageMaker Now Supports Additional Instance Types, Local Mode, Open Sourced Containers, MXNet and Tensorflow Updates

Post Syndicated from Randall Hunt original https://aws.amazon.com/blogs/aws/amazon-sagemaker-roundup-sf/

Amazon SageMaker continues to iterate quickly and release new features on behalf of customers. Starting today, SageMaker adds support for many new instance types, local testing with the SDK, and Apache MXNet 1.1.0 and Tensorflow 1.6.0. Let’s take a quick look at each of these updates.

New Instance Types

Amazon SageMaker customers now have additional options for right-sizing their workloads for notebooks, training, and hosting. Notebook instances now support almost all T2, M4, P2, and P3 instance types with the exception of t2.micro, t2.small, and m4.large instances. Model training now supports nearly all M4, M5, C4, C5, P2, and P3 instances with the exception of m4.large, c4.large, and c5.large instances. Finally, model hosting now supports nearly all T2, M4, M5, C4, C5, P2, and P3 instances with the exception of m4.large instances. Many customers can take advantage of the newest P3, C5, and M5 instances to get the best price/performance for their workloads. Customers also take advantage of the burstable compute model on T2 instances for endpoints or notebooks that are used less frequently.

Open Sourced Containers, Local Mode, and TensorFlow 1.6.0 and MXNet 1.1.0

Today Amazon SageMaker has open sourced the MXNet and Tensorflow deep learning containers that power the MXNet and Tensorflow estimators in the SageMaker SDK. The ability to write Python scripts that conform to simple interface is still one of my favorite SageMaker features and now those containers can be additionally customized to include any additional libraries. You can download these containers locally to iterate and experiment which can accelerate your debugging cycle. When you’re ready go from local testing to production training and hosting you just change one line of code.

These containers launch with support for Tensorflow 1.6.0 and MXNet 1.1.0 as well. Tensorflow has a number of new 1.6.0 features including support for CUDA 9.0, cuDNN 7, and AVX instructions which allows for significant speedups in many training applications. MXNet 1.1.0 adds a number of new features including a Text API mxnet.text with support for text processing, indexing, glossaries, and more. Two of the really cool pre-trained embeddings included are GloVe and fastText.
<

Available Now
All of the features mentioned above are available today. As always please let us know on Twitter or in the comments below if you have any questions or if you’re building something interesting. Now, if you’ll excuse me I’m going to go experiment with some of those new MXNet APIs!

Randall

Pi 3B+: 48 hours later

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/3b-plus-aftermath/

Unless you’ve been AFK for the last two days, you’ll no doubt be aware of the release of the brand-spanking-new Raspberry Pi 3 Model B+. With faster connectivity, more computing power, Power over Ethernet (PoE) pins, and the same $35 price point, the new board has been a hit across all our social media accounts! So while we wind down from launch week, let’s all pull up a chair, make yet another cup of coffee, and look through some of our favourite reactions from the last 48 hours.

Twitter

Our Twitter mentions were refreshing at hyperspeed on Wednesday, as you all began to hear the news and spread the word about the newest member to the Raspberry Pi family.

Tanya Fish on Twitter

Happy Pi Day, people! New @Raspberry_Pi 3B+ is out.

News outlets, maker sites, and hobbyists published posts and articles about the new Pi’s spec upgrades and their plans for the device.

Hackster.io on Twitter

This sort of attention to detail work is exactly what I love about being involved with @Raspberry_Pi. We’re squeezing the last drops of performance out of the 40nm process node, and perfecting Pi 3 in the same way that the original B+ perfected Pi 1.” https://t.co/hEj7JZOGeZ

And I think we counted about 150 uses of this GIF on Twitter alone:

YouTube

Andy Warburton 👾 on Twitter

Is something going on with the @Raspberry_Pi today? You’d never guess from my YouTube subscriptions page… 😀

A few members of our community were lucky enough to get their hands on a 3B+ early, and sat eagerly by the YouTube publish button, waiting to release their impressions of our new board to the world. Others, with no new Pi in hand yet, posted reaction vids to the launch, discussing their plans for the upgraded Pi and comparing statistics against its predecessors.

New Raspberry Pi 3 B+ (2018) Review and Speed Tests

Happy Pi Day World! There is a new Raspberry Pi 3, the B+! In this video I will review the new Pi 3 B+ and do some speed tests. Let me know in the comments if you are getting one and what you are planning on making with it!

Long-standing community members such as The Raspberry Pi Guy, Alex “RasPi.TV” Eames, and Michael Horne joined Adafruit, element14, and RS Components (whose team produced the most epic 3B+ video we’ve seen so far), and makers Tinkernut and Estefannie Explains It All in sharing their thoughts, performance tests, and baked goods on the big day.

What’s new on the Raspberry Pi 3 B+

It’s Pi day! Sorry, wondrous Mathematical constant, this day is no longer about you. The Raspberry Pi foundation just released a new version of the Raspberry Pi called the Rapsberry Pi B+.

If you have a YouTube or Vimeo channel, or if you create videos for other social media channels, and have published your impressions of the new Raspberry Pi, be sure to share a link with us so we can see what you think!

Instagram

We shared a few photos and videos on Instagram, and over 30000 of you checked out our Instagram Story on the day.

Some glamour shots of the latest member of the #RaspberryPi family – the Raspberry Pi 3 Model B+ . Will you be getting one? What are your plans for our newest Pi?

5,609 Likes, 103 Comments – Raspberry Pi (@raspberrypifoundation) on Instagram: “Some glamour shots of the latest member of the #RaspberryPi family – the Raspberry Pi 3 Model B+ ….”

As hot off the press (out of the oven? out of the solder bath?) Pi 3B+ boards start to make their way to eager makers’ homes, they are all broadcasting their excitement, and we love seeing what they plan to get up to with it.

The new #raspberrypi 3B+ suits the industrial setting. Check out my website for #RPI3B Vs RPI3BPlus network speed test. #NotEnoughTECH #network #test #internet

8 Likes, 1 Comments – Mat (@notenoughtech) on Instagram: “The new #raspberrypi 3B+ suits the industrial setting. Check out my website for #RPI3B Vs RPI3BPlus…”

The new Raspberry Pi 3 Model B+ is here and will be used for our Python staging server for our APIs #raspberrypi #pythoncode #googleadwords #shopify #datalayer

16 Likes, 3 Comments – Rob Edlin (@niddocks) on Instagram: “The new Raspberry Pi 3 Model B+ is here and will be used for our Python staging server for our APIs…”

In the news

Eben made an appearance on ITV Anglia on Wednesday, talking live on Facebook about the new Raspberry Pi.

ITV Anglia

As the latest version of the Raspberry Pi computer is launched in Cambridge, Dr Eben Upton talks about the inspiration of Professor Stephen Hawking and his legacy to science. Add your questions in…

He was also fortunate enough to spend the morning with some Sixth Form students from the local area.

Sascha Williams on Twitter

On a day where science is making the headlines, lovely to see the scientists of the future in our office – getting tips from fab @Raspberry_Pi founder @EbenUpton #scientists #RaspberryPi #PiDay2018 @sirissac6thform

Principal Hardware Engineer Roger Thornton will also make a live appearance online this week: he is co-hosting Hack Chat later today. And of course, you can see more of Roger and Eben in the video where they discuss the new 3B+.

Introducing the Raspberry Pi 3 Model B+

Raspberry Pi 3 Model B+ is now on sale now for $35.

It’s been a supremely busy week here at Pi Towers and across the globe in the offices of our Approved Resellers, and seeing your wonderful comments and sharing in your excitement has made it all worth it. Please keep it up, and be sure to share the arrival of your 3B+ as well as the projects into which you’ll be integrating them.

If you’d like to order a Raspberry Pi 3 Model B+, you can do so via our product page. And if you have any questions at all regarding the 3B+, the conversation is still taking place in the comments of Wednesday’s launch post, so head on over.

The post Pi 3B+: 48 hours later appeared first on Raspberry Pi.

Raspberry Jam Big Birthday Weekend 2018 roundup

Post Syndicated from Ben Nuttall original https://www.raspberrypi.org/blog/big-birthday-weekend-2018-roundup/

A couple of weekends ago, we celebrated our sixth birthday by coordinating more than 100 simultaneous Raspberry Jam events around the world. The Big Birthday Weekend was a huge success: our fantastic community organised Jams in 40 countries, covering six continents!

We sent the Jams special birthday kits to help them celebrate in style, and a video message featuring a thank you from Philip and Eben:

Raspberry Jam Big Birthday Weekend 2018

To celebrate the Raspberry Pi’s sixth birthday, we coordinated Raspberry Jams all over the world to take place over the Raspberry Jam Big Birthday Weekend, 3-4 March 2018. A massive thank you to everyone who ran an event and attended.

The Raspberry Jam photo booth

I put together code for a Pi-powered photo booth which overlaid the Big Birthday Weekend logo onto photos and (optionally) tweeted them. We included an arcade button in the Jam kits so they could build one — and it seemed to be quite popular. Some Jams put great effort into housing their photo booth:



Here are some of my favourite photo booth tweets:

RGVSA on Twitter

PiParty photo booth @RGVSA & @ @Nerdvana_io #Rjam

Denis Stretton on Twitter

The @SouthendRPIJams #PiParty photo booth

rpijamtokyo on Twitter

PiParty photo booth

Preston Raspberry Jam on Twitter

Preston Raspberry Jam Photobooth #RJam #PiParty

If you want to try out the photo booth software yourself, find the code on GitHub.

The great Raspberry Jam bake-off

Traditionally, in the UK, people have a cake on their birthday. And we had a few! We saw (and tasted) a great selection of Pi-themed cakes and other baked goods throughout the weekend:






Raspberry Jams everywhere

We always say that every Jam is different, but there’s a common and recognisable theme amongst them. It was great to see so many different venues around the world filling up with like-minded Pi enthusiasts, Raspberry Jam–branded banners, and Raspberry Pi balloons!

Europe

Sergio Martinez on Twitter

Thank you so much to all the attendees of the Ikana Jam in Krakow past Saturday! We shared fun experiences, some of them… also painful 😉 A big thank you to @Raspberry_Pi for these global celebrations! And a big thank you to @hubraum for their hospitality! #PiParty #rjam

NI Raspberry Jam on Twitter

We also had a super successful set of wearables workshops using @adafruit Circuit Playground Express boards and conductive thread at today’s @Raspberry_Pi Jam! Very popular! #PiParty

Suzystar on Twitter

My SenseHAT workshop, going well! @SouthendRPiJams #PiParty

Worksop College Raspberry Jam on Twitter

Learning how to scare the zombies in case of an apocalypse- it worked on our young learners #PiParty @worksopcollege @Raspberry_Pi https://t.co/pntEm57TJl

Africa

Rita on Twitter

Being one of the two places in Kenya where the #PiParty took place, it was an amazing time spending the day with this team and getting to learn and have fun. @TaitaTavetaUni and @Raspberry_Pi thank you for your support. @TTUTechlady @mictecttu ch

GABRIEL ONIFADE on Twitter

@TheMagP1

GABRIEL ONIFADE on Twitter

@GABONIAVERACITY #PiParty Lagos Raspberry Jam 2018 Special International Celebration – 6th Raspberry-Pi Big Birthday! Lagos Nigeria @Raspberry_Pi @ben_nuttall #RJam #RaspberryJam #raspberrypi #physicalcomputing #robotics #edtech #coding #programming #edTechAfrica #veracityhouse https://t.co/V7yLxaYGNx

North America

Heidi Baynes on Twitter

The Riverside Raspberry Jam @Vocademy is underway! #piparty

Brad Derstine on Twitter

The Philly & Pi #PiParty event with @Bresslergroup and @TechGirlzorg was awesome! The Scratch and Pi workshop was amazing! It was overall a great day of fun and tech!!! Thank you everyone who came out!

Houston Raspi on Twitter

Thanks everyone who came out to the @Raspberry_Pi Big Birthday Jam! Special thanks to @PBFerrell @estefanniegg @pcsforme @pandafulmanda @colnels @bquentin3 couldn’t’ve put on this amazing community event without you guys!

Merge Robotics 2706 on Twitter

We are back at @SciTechMuseum for the second day of @OttawaPiJam! Our robot Mergius loves playing catch with the kids! #pijam #piparty #omgrobots

South America

Javier Garzón on Twitter

Así terminamos el #Raspberry Jam Big Birthday Weekend #Bogota 2018 #PiParty de #RaspberryJamBogota 2018 @Raspberry_Pi Nos vemos el 7 de marzo en #ArduinoDayBogota 2018 y #RaspberryJamBogota 2018

Asia

Fablab UP Cebu on Twitter

Happy 6th birthday, @Raspberry_Pi! Greetings all the way from CEBU,PH! #PiParty #IoTCebu Thanks @CebuXGeeks X Ramos for these awesome pics. #Fablab #UPCebu

福野泰介 on Twitter

ラズパイ、6才のお誕生日会スタート in Tokyo PCNブースで、いろいろ展示とhttps://t.co/L6E7KgyNHFとIchigoJamつないだ、こどもIoTハッカソンmini体験やってます at 東京蒲田駅近 https://t.co/yHEuqXHvqe #piparty #pipartytokyo #rjam #opendataday

Ren Camp on Twitter

Happy birthday @Raspberry_Pi! #piparty #iotcebu @coolnumber9 https://t.co/2ESVjfRJ2d

Oceania

Glenunga Raspberry Pi Club on Twitter

PiParty photo booth

Personally, I managed to get to three Jams over the weekend: two run by the same people who put on the first two Jams to ever take place, and also one brand-new one! The Preston Raspberry Jam team, who usually run their event on a Monday evening, wanted to do something extra special for the birthday, so they came up with the idea of putting on a Raspberry Jam Sandwich — on the Friday and Monday around the weekend! This meant I was able to visit them on Friday, then attend the Manchester Raspberry Jam on Saturday, and finally drop by the new Jam at Worksop College on my way home on Sunday.

Ben Nuttall on Twitter

I’m at my first Raspberry Jam #PiParty event of the big birthday weekend! @PrestonRJam has been running for nearly 6 years and is a great place to start the celebrations!

Ben Nuttall on Twitter

Back at @McrRaspJam at @DigInnMMU for #PiParty

Ben Nuttall on Twitter

Great to see mine & @Frans_facts Balloon Pi-Tay popper project in action at @worksopjam #rjam #PiParty https://t.co/GswFm0UuPg

Various members of the Foundation team attended Jams around the UK and US, and James from the Code Club International team visited AmsterJam.

hackerfemo on Twitter

Thanks to everyone who came to our Jam and everyone who helped out. @phoenixtogether thanks for amazing cake & hosting. Ademir you’re so cool. It was awesome to meet Craig Morley from @Raspberry_Pi too. #PiParty

Stuart Fox on Twitter

Great #PiParty today at the @cotswoldjam with bloody delicious cake and lots of raspberry goodness. Great to see @ClareSutcliffe @martinohanlon playing on my new pi powered arcade build:-)

Clare Sutcliffe on Twitter

Happy 6th Birthday @Raspberry_Pi from everyone at the #PiParty at #cotswoldjam in Cheltenham!

Code Club on Twitter

It’s @Raspberry_Pi 6th birthday and we’re celebrating by taking part in @amsterjam__! Happy Birthday Raspberry Pi, we’re so happy to be a part of the family! #PiParty

For more Jammy birthday goodness, check out the PiParty hashtag on Twitter!

The Jam makers!

A lot of preparation went into each Jam, and we really appreciate all the hard work the Jam makers put in to making these events happen, on the Big Birthday Weekend and all year round. Thanks also to all the teams that sent us a group photo:

Lots of the Jams that took place were brand-new events, so we hope to see them continue throughout 2018 and beyond, growing the Raspberry Pi community around the world and giving more people, particularly youths, the opportunity to learn digital making skills.

Philip Colligan on Twitter

So many wonderful people in the @Raspberry_Pi community. Thanks to everyone at #PottonPiAndPints for a great afternoon and for everything you do to help young people learn digital making. #PiParty

Special thanks to ModMyPi for shipping the special Raspberry Jam kits all over the world!

Don’t forget to check out our Jam page to find an event near you! This is also where you can find free resources to help you get a new Jam started, and download free starter projects made especially for Jam activities. These projects are available in English, Français, Français Canadien, Nederlands, Deutsch, Italiano, and 日本語. If you’d like to help us translate more content into these and other languages, please get in touch!

PS Some of the UK Jams were postponed due to heavy snowfall, so you may find there’s a belated sixth-birthday Jam coming up where you live!

S Organ on Twitter

@TheMagP1 Ours was rescheduled until later in the Spring due to the snow but here is Babbage enjoying the snow!

The post Raspberry Jam Big Birthday Weekend 2018 roundup appeared first on Raspberry Pi.

2018-03-09 чл. 13

Post Syndicated from Vasil Kolev original https://vasil.ludost.net/blog/?p=3379

(Даже вестниците вече писаха за това, крайно време е да довърша тоя post)

Та, има “нова” директива на ЕК, за унифициране на законите за авторските права из европейския съюз.
(нова – на 10на години е, ама чак сега се опитва да влезе, и има някакви бегли шансове да се вкара по времето на нашето председателство)

В нея има количество глупости, но частта, с която имам занимание е чл. 13, който в общи линии казва “internet/hosting доставчиците са отговорни за съдържанието, което се качва при тях и трябва да го филтрират”. Не знам за вас, но за мен цялото нещо е твърде много deja vu, в последните 20на години все някой се опитва да пробута нещо подобно, в някакъв вид. Това без значение, че няма как да се направи работеща система, без значение, че всъщност е инфраструктура за цензура (но по-мощна от тази, която има в закона за хазарта) и че в крайна сметка няма да доведе до подобрение за когото и да е, освен може би хората, които продават “филтриращи” системи.

Един от хубавите доводи за това, че такава система няма как да стане идва от wikipedia – има две правителства, които са им поискали да филтрират съдържание, Иран и Китай. Не знам за колко са добре технологично иранците, но ми се струва, че китайците ако можеха, вече щяха да са направили работеща такава система и нямаше да занимават wikipedia. Нещата, които аз съм виждал, са неефективни и с достатъчно грешки (и false positives, и false negatives), че да са неизползваеми за среден размер сайт, понеже така и така някой трябва да преглежда проблемите.
(Вероятно не съм единствения, дето поради идиотщини на youtube постоянно има отворени няколко copyright dispute-а. В един момент човек решава да си ползва собствен streaming и да не го занимават с такива неща…)

Та, има една инициатива по темата това да не влезе, article13.bg, към които съм се включил. Подкрепя ги фондация “Отворени проекти”, initLab и всякакви други добри хора. Вижте, подкрепете, споменете на всички, дето ще ги засегне (всички хостинги, например…).

Auto Scaling is now available for Amazon SageMaker

Post Syndicated from Ana Visneski original https://aws.amazon.com/blogs/aws/auto-scaling-is-now-available-for-amazon-sagemaker/

Kumar Venkateswar, Product Manager on the AWS ML Platforms Team, shares details on the announcement of Auto Scaling with Amazon SageMaker.


With Amazon SageMaker, thousands of customers have been able to easily build, train and deploy their machine learning (ML) models. Today, we’re making it even easier to manage production ML models, with Auto Scaling for Amazon SageMaker. Instead of having to manually manage the number of instances to match the scale that you need for your inferences, you can now have SageMaker automatically scale the number of instances based on an AWS Auto Scaling Policy.

SageMaker has made managing the ML process easier for many customers. We’ve seen customers take advantage of managed Jupyter notebooks and managed distributed training. We’ve seen customers deploying their models to SageMaker hosting for inferences, as they integrate machine learning with their applications. SageMaker makes this easy –  you don’t have to think about patching the operating system (OS) or frameworks on your inference hosts, and you don’t have to configure inference hosts across Availability Zones. You just deploy your models to SageMaker, and it handles the rest.

Until now, you have needed to specify the number and type of instances per endpoint (or production variant) to provide the scale that you need for your inferences. If your inference volume changes, you can change the number and/or type of instances that back each endpoint to accommodate that change, without incurring any downtime. In addition to making it easy to change provisioning, customers have asked us how we can make managing capacity for SageMaker even easier.

With Auto Scaling for Amazon SageMaker, in the SageMaker console, the AWS Auto Scaling API, and the AWS SDK, this becomes much easier. Now, instead of having to closely monitor inference volume, and change the endpoint configuration in response, customers can configure a scaling policy to be used by AWS Auto Scaling. Auto Scaling adjusts the number of instances up or down in response to actual workloads, determined by using Amazon CloudWatch metrics and target values defined in the policy. In this way, customers can automatically adjust their inference capacity to maintain predictable performance at a low cost. You simply specify the target inference throughput per instance and provide upper and lower bounds for the number of instances for each production variant. SageMaker will then monitor throughput per instance using Amazon CloudWatch alarms, and then it will adjust provisioned capacity up or down as needed.

After you configure the endpoint with Auto Scaling, SageMaker will continue to monitor your deployed models to automatically adjust the instance count. SageMaker will keep throughput within desired levels, in response to changes in application traffic. This makes it easier to manage models in production, and it can help reduce the cost of deployed models, as you no longer have to provision sufficient capacity in order to manage your peak load. Instead, you configure the limits to accommodate your minimum expected traffic and the maximum peak, and Amazon SageMaker will work within those limits to minimize cost.

How do you get started? Open the SageMaker console. For existing endpoints, you first access the endpoint to modify the settings.


Then, scroll to the Endpoint runtime settings section, select the variant, and choose Configure auto scaling.


First, configure the minimum and maximum number of instances.

Next, choose the throughput per instance at which you want to add an additional instance, given previous load testing.

You can optionally set cool down periods for scaling in or out, to avoid oscillation during periods of wide fluctuation in workload. If not, SageMaker will assume default values.

And that’s it! You now have an endpoint that will automatically scale with increasing inferences.

You pay for the capacity used at regular SageMaker pay-as-you-go pricing, so you no longer have to pay for unused capacity during relative idle periods!

Auto Scaling in Amazon SageMaker is available today in the US East (N. Virginia & Ohio), EU (Ireland), and U.S. West (Oregon) AWS regions. To learn more, see the Amazon SageMaker Auto Scaling documentation.


Kumar Venkateswar is a Product Manager in the AWS ML Platforms team, which includes Amazon SageMaker, Amazon Machine Learning, and the AWS Deep Learning AMIs. When not working, Kumar plays the violin and Magic: The Gathering.

 

 

 

 


 

 

[$] The true costs of hosting in the cloud

Post Syndicated from jake original https://lwn.net/Articles/748106/rss

Should we host in the cloud or on our own servers? This question was
at the center of Dmytro Dyachuk’s talk, given
during KubeCon +
CloudNativeCon
last November. While many services
simply launch in the cloud without the organizations behind them
considering other options, large
content-hosting services have actually
moved back to their own data centers: Dropbox
migrated in 2016

and Instagram
in 2014
. Because such transitions can be expensive
and risky, understanding the economics of hosting is a critical part
of launching a new service. Actual hosting costs are often
misunderstood, or secret, so it is sometimes difficult to get the
numbers right. In this article, we’ll use Dyachuk’s talk to try to
answer the “million dollar question”: “buy or rent?”

The Challenges of Opening a Data Center — Part 1

Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/choosing-data-center/

Backblaze storage pod in new data center

This is part one of a series. The second part will be posted later this week. Use the Join button above to receive notification of future posts in this series.

Though most of us have never set foot inside of a data center, as citizens of a data-driven world we nonetheless depend on the services that data centers provide almost as much as we depend on a reliable water supply, the electrical grid, and the highway system. Every time we send a tweet, post to Facebook, check our bank balance or credit score, watch a YouTube video, or back up a computer to the cloud we are interacting with a data center.

In this series, The Challenges of Opening a Data Center, we’ll talk in general terms about the factors that an organization needs to consider when opening a data center and the challenges that must be met in the process. Many of the factors to consider will be similar for opening a private data center or seeking space in a public data center, but we’ll assume for the sake of this discussion that our needs are more modest than requiring a data center dedicated solely to our own use (i.e. we’re not Google, Facebook, or China Telecom).

Data center technology and management are changing rapidly, with new approaches to design and operation appearing every year. This means we won’t be able to cover everything happening in the world of data centers in our series, however, we hope our brief overview proves useful.

What is a Data Center?

A data center is the structure that houses a large group of networked computer servers typically used by businesses, governments, and organizations for the remote storage, processing, or distribution of large amounts of data.

While many organizations will have computing services in the same location as their offices that support their day-to-day operations, a data center is a structure dedicated to 24/7 large-scale data processing and handling.

Depending on how you define the term, there are anywhere from a half million data centers in the world to many millions. While it’s possible to say that an organization’s on-site servers and data storage can be called a data center, in this discussion we are using the term data center to refer to facilities that are expressly dedicated to housing computer systems and associated components, such as telecommunications and storage systems. The facility might be a private center, which is owned or leased by one tenant only, or a shared data center that offers what are called “colocation services,” and rents space, services, and equipment to multiple tenants in the center.

A large, modern data center operates around the clock, placing a priority on providing secure and uninterrrupted service, and generally includes redundant or backup power systems or supplies, redundant data communication connections, environmental controls, fire suppression systems, and numerous security devices. Such a center is an industrial-scale operation often using as much electricity as a small town.

Types of Data Centers

There are a number of ways to classify data centers according to how they will be used, whether they are owned or used by one or multiple organizations, whether and how they fit into a topology of other data centers; which technologies and management approaches they use for computing, storage, cooling, power, and operations; and increasingly visible these days: how green they are.

Data centers can be loosely classified into three types according to who owns them and who uses them.

Exclusive Data Centers are facilities wholly built, maintained, operated and managed by the business for the optimal operation of its IT equipment. Some of these centers are well-known companies such as Facebook, Google, or Microsoft, while others are less public-facing big telecoms, insurance companies, or other service providers.

Managed Hosting Providers are data centers managed by a third party on behalf of a business. The business does not own data center or space within it. Rather, the business rents IT equipment and infrastructure it needs instead of investing in the outright purchase of what it needs.

Colocation Data Centers are usually large facilities built to accommodate multiple businesses within the center. The business rents its own space within the data center and subsequently fills the space with its IT equipment, or possibly uses equipment provided by the data center operator.

Backblaze, for example, doesn’t own its own data centers but colocates in data centers owned by others. As Backblaze’s storage needs grow, Backblaze increases the space it uses within a given data center and/or expands to other data centers in the same or different geographic areas.

Availability is Key

When designing or selecting a data center, an organization needs to decide what level of availability is required for its services. The type of business or service it provides likely will dictate this. Any organization that provides real-time and/or critical data services will need the highest level of availability and redundancy, as well as the ability to rapidly failover (transfer operation to another center) when and if required. Some organizations require multiple data centers not just to handle the computer or storage capacity they use, but to provide alternate locations for operation if something should happen temporarily or permanently to one or more of their centers.

Organizations operating data centers that can’t afford any downtime at all will typically operate data centers that have a mirrored site that can take over if something happens to the first site, or they operate a second site in parallel to the first one. These data center topologies are called Active/Passive, and Active/Active, respectively. Should disaster or an outage occur, disaster mode would dictate immediately moving all of the primary data center’s processing to the second data center.

While some data center topologies are spread throughout a single country or continent, others extend around the world. Practically, data transmission speeds put a cap on centers that can be operated in parallel with the appearance of simultaneous operation. Linking two data centers located apart from each other — say no more than 60 miles to limit data latency issues — together with dark fiber (leased fiber optic cable) could enable both data centers to be operated as if they were in the same location, reducing staffing requirements yet providing immediate failover to the secondary data center if needed.

This redundancy of facilities and ensured availability is of paramount importance to those needing uninterrupted data center services.

Active/Passive Data Centers

Active/Active Data Centers

LEED Certification

Leadership in Energy and Environmental Design (LEED) is a rating system devised by the United States Green Building Council (USGBC) for the design, construction, and operation of green buildings. Facilities can achieve ratings of certified, silver, gold, or platinum based on criteria within six categories: sustainable sites, water efficiency, energy and atmosphere, materials and resources, indoor environmental quality, and innovation and design.

Green certification has become increasingly important in data center design and operation as data centers require great amounts of electricity and often cooling water to operate. Green technologies can reduce costs for data center operation, as well as make the arrival of data centers more amenable to environmentally-conscious communities.

The ACT, Inc. data center in Iowa City, Iowa was the first data center in the U.S. to receive LEED-Platinum certification, the highest level available.

ACT Data Center exterior

ACT Data Center exterior

ACT Data Center interior

ACT Data Center interior

Factors to Consider When Selecting a Data Center

There are numerous factors to consider when deciding to build or to occupy space in a data center. Aspects such as proximity to available power grids, telecommunications infrastructure, networking services, transportation lines, and emergency services can affect costs, risk, security and other factors that need to be taken into consideration.

The size of the data center will be dictated by the business requirements of the owner or tenant. A data center can occupy one room of a building, one or more floors, or an entire building. Most of the equipment is often in the form of servers mounted in 19 inch rack cabinets, which are usually placed in single rows forming corridors (so-called aisles) between them. This allows staff access to the front and rear of each cabinet. Servers differ greatly in size from 1U servers (i.e. one “U” or “RU” rack unit measuring 44.50 millimeters or 1.75 inches), to Backblaze’s Storage Pod design that fits a 4U chassis, to large freestanding storage silos that occupy many square feet of floor space.

Location

Location will be one of the biggest factors to consider when selecting a data center and encompasses many other factors that should be taken into account, such as geological risks, neighboring uses, and even local flight paths. Access to suitable available power at a suitable price point is often the most critical factor and the longest lead time item, followed by broadband service availability.

With more and more data centers available providing varied levels of service and cost, the choices increase each year. Data center brokers can be employed to find a data center, just as one might use a broker for home or other commercial real estate.

Websites listing available colocation space, such as upstack.io, or entire data centers for sale or lease, are widely used. A common practice is for a customer to publish its data center requirements, and the vendors compete to provide the most attractive bid in a reverse auction.

Business and Customer Proximity

The center’s closeness to a business or organization may or may not be a factor in the site selection. The organization might wish to be close enough to manage the center or supervise the on-site staff from a nearby business location. The location of customers might be a factor, especially if data transmission speeds and latency are important, or the business or customers have regulatory, political, tax, or other considerations that dictate areas suitable or not suitable for the storage and processing of data.

Climate

Local climate is a major factor in data center design because the climatic conditions dictate what cooling technologies should be deployed. In turn this impacts uptime and the costs associated with cooling, which can total as much as 50% or more of a center’s power costs. The topology and the cost of managing a data center in a warm, humid climate will vary greatly from managing one in a cool, dry climate. Nevertheless, data centers are located in both extremely cold regions and extremely hot ones, with innovative approaches used in both extremes to maintain desired temperatures within the center.

Geographic Stability and Extreme Weather Events

A major obvious factor in locating a data center is the stability of the actual site as regards weather, seismic activity, and the likelihood of weather events such as hurricanes, as well as fire or flooding.

Backblaze’s Sacramento data center describes its location as one of the most stable geographic locations in California, outside fault zones and floodplains.

Sacramento Data Center

Sometimes the location of the center comes first and the facility is hardened to withstand anticipated threats, such as Equinix’s NAP of the Americas data center in Miami, one of the largest single-building data centers on the planet (six stories and 750,000 square feet), which is built 32 feet above sea level and designed to withstand category 5 hurricane winds.

Equinix Data Center in Miami

Equinix “NAP of the Americas” Data Center in Miami

Most data centers don’t have the extreme protection or history of the Bahnhof data center, which is located inside the ultra-secure former nuclear bunker Pionen, in Stockholm, Sweden. It is buried 100 feet below ground inside the White Mountains and secured behind 15.7 in. thick metal doors. It prides itself on its self-described “Bond villain” ambiance.

Bahnhof Data Center under White Mountain in Stockholm

Usually, the data center owner or tenant will want to take into account the balance between cost and risk in the selection of a location. The Ideal quadrant below is obviously favored when making this compromise.

Cost vs Risk in selecting a data center

Cost = Construction/lease, power, bandwidth, cooling, labor, taxes
Risk = Environmental (seismic, weather, water, fire), political, economic

Risk mitigation also plays a strong role in pricing. The extent to which providers must implement special building techniques and operating technologies to protect the facility will affect price. When selecting a data center, organizations must make note of the data center’s certification level on the basis of regulatory requirements in the industry. These certifications can ensure that an organization is meeting necessary compliance requirements.

Power

Electrical power usually represents the largest cost in a data center. The cost a service provider pays for power will be affected by the source of the power, the regulatory environment, the facility size and the rate concessions, if any, offered by the utility. At higher level tiers, battery, generator, and redundant power grids are a required part of the picture.

Fault tolerance and power redundancy are absolutely necessary to maintain uninterrupted data center operation. Parallel redundancy is a safeguard to ensure that an uninterruptible power supply (UPS) system is in place to provide electrical power if necessary. The UPS system can be based on batteries, saved kinetic energy, or some type of generator using diesel or another fuel. The center will operate on the UPS system with another UPS system acting as a backup power generator. If a power outage occurs, the additional UPS system power generator is available.

Many data centers require the use of independent power grids, with service provided by different utility companies or services, to prevent against loss of electrical service no matter what the cause. Some data centers have intentionally located themselves near national borders so that they can obtain redundant power from not just separate grids, but from separate geopolitical sources.

Higher redundancy levels required by a company will of invariably lead to higher prices. If one requires high availability backed by a service-level agreement (SLA), one can expect to pay more than another company with less demanding redundancy requirements.

Stay Tuned for Part 2 of The Challenges of Opening a Data Center

That’s it for part 1 of this post. In subsequent posts, we’ll take a look at some other factors to consider when moving into a data center such as network bandwidth, cooling, and security. We’ll take a look at what is involved in moving into a new data center (including stories from Backblaze’s experiences). We’ll also investigate what it takes to keep a data center running, and some of the new technologies and trends affecting data center design and use. You can discover all posts on our blog tagged with “Data Center” by following the link https://www.backblaze.com/blog/tag/data-center/.

The second part of this series on The Challenges of Opening a Data Center will be posted later this week. Use the Join button above to receive notification of future posts in this series.

The post The Challenges of Opening a Data Center — Part 1 appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.