Tag Archives: paramount

ISP Telenor Will Block The Pirate Bay in Sweden Without a Shot Fired

Post Syndicated from Andy original https://torrentfreak.com/isp-telenor-will-block-the-pirate-bay-in-sweden-without-a-shot-fired-180520/

Back in 2014, Universal Music, Sony Music, Warner Music, Nordisk Film and the Swedish Film Industry filed a lawsuit against Bredbandsbolaget, one of Sweden’s largest ISPs.

The copyright holders asked the Stockholm District Court to order the ISP to block The Pirate Bay and streaming site Swefilmer, claiming that the provider knowingly facilitated access to the pirate platforms and assisted their pirating users.

Soon after the ISP fought back, refusing to block the sites in a determined response to the Court.

“Bredbandsbolaget’s role is to provide its subscribers with access to the Internet, thereby contributing to the free flow of information and the ability for people to reach each other and communicate,” the company said in a statement.

“Bredbandsbolaget does not block content or services based on individual organizations’ requests. There is no legal obligation for operators to block either The Pirate Bay or Swefilmer.”

In February 2015 the parties met in court, with Bredbandsbolaget arguing in favor of the “important principle” that ISPs should not be held responsible for content exchanged over the Internet, in the same way the postal service isn’t responsible for the contents of an envelope.

But with TV companies SVT, TV4 Group, MTG TV, SBS Discovery and C More teaming up with the IFPI alongside Paramount, Disney, Warner and Sony in the case, Bredbandsbolaget would need to pull out all the stops to obtain victory. The company worked hard and initially the news was good.

In November 2015, the Stockholm District Court decided that the copyright holders could not force Bredbandsbolaget to block the pirate sites, ruling that the ISP’s operations did not amount to participation in the copyright infringement offenses carried out by some of its ‘pirate’ subscribers.

However, the case subsequently went to appeal, with the brand new Patent and Market Court of Appeal hearing arguments. In February 2017 it handed down its decision, which overruled the earlier ruling of the District Court and ordered Bredbandsbolaget to implement “technical measures” to prevent its customers accessing the ‘pirate’ sites through a number of domain names and URLs.

With nowhere left to go, Bredbandsbolaget and owner Telenor were left hanging onto their original statement which vehemently opposed site-blocking.

“It is a dangerous path to go down, which forces Internet providers to monitor and evaluate content on the Internet and block websites with illegal content in order to avoid becoming accomplices,” they said.

In March 2017, Bredbandsbolaget blocked The Pirate Bay but said it would not give up the fight.

“We are now forced to contest any future blocking demands. It is the only way for us and other Internet operators to ensure that private players should not have the last word regarding the content that should be accessible on the Internet,” Bredbandsbolaget said.

While it’s not clear whether any additional blocking demands have been filed with the ISP, this week an announcement by Bredbandsbolaget parent company Telenor revealed an unexpected knock-on effect. Seemingly without a single shot being fired, The Pirate Bay will now be blocked by Telenor too.

The background lies in Telenor’s acquisition of Bredbandsbolaget back in 2005. Until this week the companies operated under separate brands but will now merge into one entity.

“Telenor Sweden and Bredbandsbolaget today take the final step on their joint trip and become the same company with the same name. As a result, Telenor becomes a comprehensive provider of broadband, TV and mobile communications,” the company said in a statement this week.

“Telenor Sweden and Bredbandsbolaget have shared both logo and organization for the last 13 years. Today, we take the last step in the relationship and consolidate the companies under the same name.”

Up until this final merger, 600,000 Bredbandsbolaget broadband customers were denied access to The Pirate Bay. Now it appears that Telenor’s 700,000 fiber and broadband customers will be affected too. The new single-brand company says it has decided to block the notorious torrent site across its entire network.

“We have not discontinued Bredbandsbolaget, but we have merged Telenor and Bredbandsbolaget and become one,” the company said.

“When we share the same network, The Pirate Bay is blocked by both Telenor and Bredbandsbolaget and there is nothing we plan to change in the future.”

TorrentFreak contacted the PR departments of both Telenor and Bredbandsbolaget requesting information on why a court order aimed at only the latter’s customers would now affect those of the former too, more than doubling the blockade’s reach. Neither company responded which leaves only speculation as to its motives.

On the one hand, the decision to voluntarily implement an expanded blockade could perhaps be viewed as a little unusual given how much time, effort and money has been invested in fighting web-blockades in Sweden.

On the other, the merger of the companies may present legal difficulties as far as the court order goes and it could certainly cause friction among the customer base of Telenor if some customers could access TPB, and others could not.

In any event, the legal basis for web-blocking on copyright infringement grounds was firmly established last year at the EU level, which means that Telenor would lose any future legal battle, should it decide to dig in its heels. On that basis alone, the decision to block all customers probably makes perfect commercial sense.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN reviews, discounts, offers and coupons.

Aussie Federal Court Orders ISPs to Block Pirate IPTV Service

Post Syndicated from Andy original https://torrentfreak.com/aussie-federal-court-orders-isps-to-block-pirate-iptv-service-180427/

After successful applying for ISP blocks against dozens of traditional torrent and streaming portals, Village Roadshow and a coalition of movie studios switched tack last year.

With the threat of pirate subscription IPTV services looming large, Roadshow, Disney, Universal, Warner Bros, Twentieth Century Fox, and Paramount targeted HDSubs+ (also known as PressPlayPlus), a fairly well-known service that provides hundreds of otherwise premium live channels, movies, and sports for a relatively small monthly fee.

The injunction, which was filed last October, targets Australia’s largest ISPs including Telstra, Optus, TPG, and Vocus, plus subsidiaries.

Unlike blocking injunctions targeting regular sites, the studios sought to have several elements of HD Subs+ infrastructure rendered inaccessible, so that its sales platform, EPG (electronic program guide), software (such as an Android and set-top box app), updates, and sundry other services would fail to operate in Australia.

After a six month wait, the Federal Court granted the application earlier today, compelling Australia’s ISPs to block “16 online locations” associated with the HD Subs+ service, rendering its TV services inaccessible Down Under.

“Each respondent must, within 15 business days of service of these orders, take reasonable steps to disable access to the target online locations,” said Justice Nicholas, as quoted by ZDNet.

A small selection of channels in the HDSubs+ package

The ISPs were given flexibility in how to implement the ban, with the Judge noting that DNS blocking, IP address blocking or rerouting, URL blocking, or “any alternative technical means for disabling access”, would be acceptable.

The rightsholders are required to pay a fee of AU$50 fee for each domain they want to block but Village Roadshow says it doesn’t mind doing so, since blocking is in “public interest”. Continuing a pattern established last year, none of the ISPs showed up to the judgment.

A similar IPTV blocking application was filed by Hong Kong-based broadcaster Television Broadcasts Limited (TVB) last year.

TVB wants ISPs including Telstra, Optus, Vocus, and TPG plus their subsidiaries to block access to seven Android-based services named as A1, BlueTV, EVPAD, FunTV, MoonBox, Unblock, and hTV5.

The application was previously heard alongside the HD Subs+ case but will now be handled separately following complications. In April it was revealed that TVB not only wants to block Internet locations related to the technical operation of the service, but also hosting sites that fulfill a role similar to that of Google Play or Apple’s App Store.

TVB wants to have these app marketplaces blocked by Australian ISPs, which would not only render the illicit apps inaccessible to the public but all of the non-infringing ones too.

Justice Nicholas will now have to decide whether the “primary purpose” of these marketplaces is to infringe or facilitate the infringement of TVB’s copyrights. However, there is also a question of whether China-focused live programming has copyright status in Australia. An additional hearing is scheduled for May 2 for these matters to be addressed.

Also on Friday, Foxtel filed yet another blocking application targeting “15 online locations” involving 27 domain names connected to traditional BitTorrent and streaming services.

According to ComputerWorld the injunction targets the same set of ISPs but this time around, Foxtel is trying to save on costs.

The company doesn’t want to have expert witnesses present in court, doesn’t want to stage live demos of websites, and would like to rely on videos and screenshots instead. Foxtel also says that if the ISPs agree, it won’t serve its evidence on them as it has done previously.

The company asked Justice Nicholas to deal with the injunction application “on paper” but he declined, setting a hearing for June 18 but accepting screenshots and videos as evidence.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN reviews, discounts, offers and coupons.

The Challenges of Opening a Data Center — Part 1

Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/choosing-data-center/

Backblaze storage pod in new data center

This is part one of a series. The second part will be posted later this week. Use the Join button above to receive notification of future posts in this series.

Though most of us have never set foot inside of a data center, as citizens of a data-driven world we nonetheless depend on the services that data centers provide almost as much as we depend on a reliable water supply, the electrical grid, and the highway system. Every time we send a tweet, post to Facebook, check our bank balance or credit score, watch a YouTube video, or back up a computer to the cloud we are interacting with a data center.

In this series, The Challenges of Opening a Data Center, we’ll talk in general terms about the factors that an organization needs to consider when opening a data center and the challenges that must be met in the process. Many of the factors to consider will be similar for opening a private data center or seeking space in a public data center, but we’ll assume for the sake of this discussion that our needs are more modest than requiring a data center dedicated solely to our own use (i.e. we’re not Google, Facebook, or China Telecom).

Data center technology and management are changing rapidly, with new approaches to design and operation appearing every year. This means we won’t be able to cover everything happening in the world of data centers in our series, however, we hope our brief overview proves useful.

What is a Data Center?

A data center is the structure that houses a large group of networked computer servers typically used by businesses, governments, and organizations for the remote storage, processing, or distribution of large amounts of data.

While many organizations will have computing services in the same location as their offices that support their day-to-day operations, a data center is a structure dedicated to 24/7 large-scale data processing and handling.

Depending on how you define the term, there are anywhere from a half million data centers in the world to many millions. While it’s possible to say that an organization’s on-site servers and data storage can be called a data center, in this discussion we are using the term data center to refer to facilities that are expressly dedicated to housing computer systems and associated components, such as telecommunications and storage systems. The facility might be a private center, which is owned or leased by one tenant only, or a shared data center that offers what are called “colocation services,” and rents space, services, and equipment to multiple tenants in the center.

A large, modern data center operates around the clock, placing a priority on providing secure and uninterrrupted service, and generally includes redundant or backup power systems or supplies, redundant data communication connections, environmental controls, fire suppression systems, and numerous security devices. Such a center is an industrial-scale operation often using as much electricity as a small town.

Types of Data Centers

There are a number of ways to classify data centers according to how they will be used, whether they are owned or used by one or multiple organizations, whether and how they fit into a topology of other data centers; which technologies and management approaches they use for computing, storage, cooling, power, and operations; and increasingly visible these days: how green they are.

Data centers can be loosely classified into three types according to who owns them and who uses them.

Exclusive Data Centers are facilities wholly built, maintained, operated and managed by the business for the optimal operation of its IT equipment. Some of these centers are well-known companies such as Facebook, Google, or Microsoft, while others are less public-facing big telecoms, insurance companies, or other service providers.

Managed Hosting Providers are data centers managed by a third party on behalf of a business. The business does not own data center or space within it. Rather, the business rents IT equipment and infrastructure it needs instead of investing in the outright purchase of what it needs.

Colocation Data Centers are usually large facilities built to accommodate multiple businesses within the center. The business rents its own space within the data center and subsequently fills the space with its IT equipment, or possibly uses equipment provided by the data center operator.

Backblaze, for example, doesn’t own its own data centers but colocates in data centers owned by others. As Backblaze’s storage needs grow, Backblaze increases the space it uses within a given data center and/or expands to other data centers in the same or different geographic areas.

Availability is Key

When designing or selecting a data center, an organization needs to decide what level of availability is required for its services. The type of business or service it provides likely will dictate this. Any organization that provides real-time and/or critical data services will need the highest level of availability and redundancy, as well as the ability to rapidly failover (transfer operation to another center) when and if required. Some organizations require multiple data centers not just to handle the computer or storage capacity they use, but to provide alternate locations for operation if something should happen temporarily or permanently to one or more of their centers.

Organizations operating data centers that can’t afford any downtime at all will typically operate data centers that have a mirrored site that can take over if something happens to the first site, or they operate a second site in parallel to the first one. These data center topologies are called Active/Passive, and Active/Active, respectively. Should disaster or an outage occur, disaster mode would dictate immediately moving all of the primary data center’s processing to the second data center.

While some data center topologies are spread throughout a single country or continent, others extend around the world. Practically, data transmission speeds put a cap on centers that can be operated in parallel with the appearance of simultaneous operation. Linking two data centers located apart from each other — say no more than 60 miles to limit data latency issues — together with dark fiber (leased fiber optic cable) could enable both data centers to be operated as if they were in the same location, reducing staffing requirements yet providing immediate failover to the secondary data center if needed.

This redundancy of facilities and ensured availability is of paramount importance to those needing uninterrupted data center services.

Active/Passive Data Centers

Active/Active Data Centers

LEED Certification

Leadership in Energy and Environmental Design (LEED) is a rating system devised by the United States Green Building Council (USGBC) for the design, construction, and operation of green buildings. Facilities can achieve ratings of certified, silver, gold, or platinum based on criteria within six categories: sustainable sites, water efficiency, energy and atmosphere, materials and resources, indoor environmental quality, and innovation and design.

Green certification has become increasingly important in data center design and operation as data centers require great amounts of electricity and often cooling water to operate. Green technologies can reduce costs for data center operation, as well as make the arrival of data centers more amenable to environmentally-conscious communities.

The ACT, Inc. data center in Iowa City, Iowa was the first data center in the U.S. to receive LEED-Platinum certification, the highest level available.

ACT Data Center exterior

ACT Data Center exterior

ACT Data Center interior

ACT Data Center interior

Factors to Consider When Selecting a Data Center

There are numerous factors to consider when deciding to build or to occupy space in a data center. Aspects such as proximity to available power grids, telecommunications infrastructure, networking services, transportation lines, and emergency services can affect costs, risk, security and other factors that need to be taken into consideration.

The size of the data center will be dictated by the business requirements of the owner or tenant. A data center can occupy one room of a building, one or more floors, or an entire building. Most of the equipment is often in the form of servers mounted in 19 inch rack cabinets, which are usually placed in single rows forming corridors (so-called aisles) between them. This allows staff access to the front and rear of each cabinet. Servers differ greatly in size from 1U servers (i.e. one “U” or “RU” rack unit measuring 44.50 millimeters or 1.75 inches), to Backblaze’s Storage Pod design that fits a 4U chassis, to large freestanding storage silos that occupy many square feet of floor space.

Location

Location will be one of the biggest factors to consider when selecting a data center and encompasses many other factors that should be taken into account, such as geological risks, neighboring uses, and even local flight paths. Access to suitable available power at a suitable price point is often the most critical factor and the longest lead time item, followed by broadband service availability.

With more and more data centers available providing varied levels of service and cost, the choices increase each year. Data center brokers can be employed to find a data center, just as one might use a broker for home or other commercial real estate.

Websites listing available colocation space, such as upstack.io, or entire data centers for sale or lease, are widely used. A common practice is for a customer to publish its data center requirements, and the vendors compete to provide the most attractive bid in a reverse auction.

Business and Customer Proximity

The center’s closeness to a business or organization may or may not be a factor in the site selection. The organization might wish to be close enough to manage the center or supervise the on-site staff from a nearby business location. The location of customers might be a factor, especially if data transmission speeds and latency are important, or the business or customers have regulatory, political, tax, or other considerations that dictate areas suitable or not suitable for the storage and processing of data.

Climate

Local climate is a major factor in data center design because the climatic conditions dictate what cooling technologies should be deployed. In turn this impacts uptime and the costs associated with cooling, which can total as much as 50% or more of a center’s power costs. The topology and the cost of managing a data center in a warm, humid climate will vary greatly from managing one in a cool, dry climate. Nevertheless, data centers are located in both extremely cold regions and extremely hot ones, with innovative approaches used in both extremes to maintain desired temperatures within the center.

Geographic Stability and Extreme Weather Events

A major obvious factor in locating a data center is the stability of the actual site as regards weather, seismic activity, and the likelihood of weather events such as hurricanes, as well as fire or flooding.

Backblaze’s Sacramento data center describes its location as one of the most stable geographic locations in California, outside fault zones and floodplains.

Sacramento Data Center

Sometimes the location of the center comes first and the facility is hardened to withstand anticipated threats, such as Equinix’s NAP of the Americas data center in Miami, one of the largest single-building data centers on the planet (six stories and 750,000 square feet), which is built 32 feet above sea level and designed to withstand category 5 hurricane winds.

Equinix Data Center in Miami

Equinix “NAP of the Americas” Data Center in Miami

Most data centers don’t have the extreme protection or history of the Bahnhof data center, which is located inside the ultra-secure former nuclear bunker Pionen, in Stockholm, Sweden. It is buried 100 feet below ground inside the White Mountains and secured behind 15.7 in. thick metal doors. It prides itself on its self-described “Bond villain” ambiance.

Bahnhof Data Center under White Mountain in Stockholm

Usually, the data center owner or tenant will want to take into account the balance between cost and risk in the selection of a location. The Ideal quadrant below is obviously favored when making this compromise.

Cost vs Risk in selecting a data center

Cost = Construction/lease, power, bandwidth, cooling, labor, taxes
Risk = Environmental (seismic, weather, water, fire), political, economic

Risk mitigation also plays a strong role in pricing. The extent to which providers must implement special building techniques and operating technologies to protect the facility will affect price. When selecting a data center, organizations must make note of the data center’s certification level on the basis of regulatory requirements in the industry. These certifications can ensure that an organization is meeting necessary compliance requirements.

Power

Electrical power usually represents the largest cost in a data center. The cost a service provider pays for power will be affected by the source of the power, the regulatory environment, the facility size and the rate concessions, if any, offered by the utility. At higher level tiers, battery, generator, and redundant power grids are a required part of the picture.

Fault tolerance and power redundancy are absolutely necessary to maintain uninterrupted data center operation. Parallel redundancy is a safeguard to ensure that an uninterruptible power supply (UPS) system is in place to provide electrical power if necessary. The UPS system can be based on batteries, saved kinetic energy, or some type of generator using diesel or another fuel. The center will operate on the UPS system with another UPS system acting as a backup power generator. If a power outage occurs, the additional UPS system power generator is available.

Many data centers require the use of independent power grids, with service provided by different utility companies or services, to prevent against loss of electrical service no matter what the cause. Some data centers have intentionally located themselves near national borders so that they can obtain redundant power from not just separate grids, but from separate geopolitical sources.

Higher redundancy levels required by a company will of invariably lead to higher prices. If one requires high availability backed by a service-level agreement (SLA), one can expect to pay more than another company with less demanding redundancy requirements.

Stay Tuned for Part 2 of The Challenges of Opening a Data Center

That’s it for part 1 of this post. In subsequent posts, we’ll take a look at some other factors to consider when moving into a data center such as network bandwidth, cooling, and security. We’ll take a look at what is involved in moving into a new data center (including stories from Backblaze’s experiences). We’ll also investigate what it takes to keep a data center running, and some of the new technologies and trends affecting data center design and use. You can discover all posts on our blog tagged with “Data Center” by following the link https://www.backblaze.com/blog/tag/data-center/.

The second part of this series on The Challenges of Opening a Data Center will be posted later this week. Use the Join button above to receive notification of future posts in this series.

The post The Challenges of Opening a Data Center — Part 1 appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Automating Security Group Updates with AWS Lambda

Post Syndicated from Ian Scofield original https://aws.amazon.com/blogs/compute/automating-security-group-updates-with-aws-lambda/

Customers often use public endpoints to perform cross-region replication or other application layer communication to remote regions. But a common problem is how do you protect these endpoints? It can be tempting to open up the security groups to the world due to the complexity of keeping security groups in sync across regions with a dynamically changing infrastructure.

Consider a situation where you are running large clusters of instances in different regions that all require internode connectivity. One approach would be to use a VPN tunnel between regions to provide a secure tunnel over which to send your traffic. A good example of this is the Transit VPC Solution, which is a published AWS solution to help customers quickly get up and running. However, this adds additional cost and complexity to your solution due to the newly required additional infrastructure.

Another approach, which I’ll explore in this post, is to restrict access to the nodes by whitelisting the public IP addresses of your hosts in the opposite region. Today, I’ll outline a solution that allows for cross-region security group updates, can handle remote region failures, and supports external actions such as manually terminating instances or adding instances to an existing Auto Scaling group.

Solution overview

The overview of this solution is diagrammed below. Although this post covers limiting access to your instances, you should still implement encryption to protect your data in transit.

If your entire infrastructure is running in a single region, you can reference a security group as the source, allowing your IP addresses to change without any updates required. However, if you’re going across the public internet between regions to perform things like application-level traffic or cross-region replication, this is no longer an option. Security groups are regional. When you go across regions it can be tempting to drop security to enable this communication.

Although using an Elastic IP address can provide you with a static IP address that you can define as a source for your security groups, this may not always be feasible, especially when automatic scaling is desired.

In this example scenario, you have a distributed database that requires full internode communication for replication. If you place a cluster in us-east-1 and us-west-2, you must provide a secure method of communication between the two. Because the database uses cloud best practices, you can add or remove nodes as the load varies.

To start the process of updating your security groups, you must know when an instance has come online to trigger your workflow. Auto Scaling groups have the concept of lifecycle hooks that enable you to perform custom actions as the group launches or terminates instances.

When Auto Scaling begins to launch or terminate an instance, it puts the instance into a wait state (Pending:Wait or Terminating:Wait). The instance remains in this state while you perform your various actions until either you tell Auto Scaling to Continue, Abandon, or the timeout period ends. A lifecycle hook can trigger a CloudWatch event, publish to an Amazon SNS topic, or send to an Amazon SQS queue. For this example, you use CloudWatch Events to trigger an AWS Lambda function that updates an Amazon DynamoDB table.

Component breakdown

Here’s a quick breakdown of the components involved in this solution:

• Lambda function
• CloudWatch event
• DynamoDB table

Lambda function

The Lambda function automatically updates your security groups, in the following way:

1. Determines whether a change was triggered by your Auto Scaling group lifecycle hook or manually invoked for a “true up” functionality, which I discuss later in this post.
2. Describes the instances in the Auto Scaling group and obtain public IP addresses for each instance.
3. Updates both local and remote DynamoDB tables.
4. Compares the list of public IP addresses for both local and remote clusters with what’s already in the local region security group. Update the security group.
5. Compares the list of public IP addresses for both local and remote clusters with what’s already in the remote region security group. Update the security group
6. Signals CONTINUE back to the lifecycle hook.

CloudWatch event

The CloudWatch event triggers when an instance passes through either the launching or terminating states. When the Lambda function gets invoked, it receives an event that looks like the following:

{
	"account": "123456789012",
	"region": "us-east-1",
	"detail": {
		"LifecycleHookName": "hook-launching",
		"AutoScalingGroupName": "",
		"LifecycleActionToken": "33965228-086a-4aeb-8c26-f82ed3bef495",
		"LifecycleTransition": "autoscaling:EC2_INSTANCE_LAUNCHING",
		"EC2InstanceId": "i-017425ec54f22f994"
	},
	"detail-type": "EC2 Instance-launch Lifecycle Action",
	"source": "aws.autoscaling",
	"version": "0",
	"time": "2017-05-03T02:20:59Z",
	"id": "cb930cf8-ce8b-4b6c-8011-af17966eb7e2",
	"resources": [
		"arn:aws:autoscaling:us-east-1:123456789012:autoScalingGroup:d3fe9d96-34d0-4c62-b9bb-293a41ba3765:autoScalingGroupName/"
	]
}

DynamoDB table

You use DynamoDB to store lists of remote IP addresses in a local table that is updated by the opposite region as a failsafe source of truth. Although you can describe your Auto Scaling group for the local region, you must maintain a list of IP addresses for the remote region.

To minimize the number of describe calls and prevent an issue in the remote region from blocking your local scaling actions, we keep a list of the remote IP addresses in a local DynamoDB table. Each Lambda function in each region is responsible for updating the public IP addresses of its Auto Scaling group for both the local and remote tables.

As with all the infrastructure in this solution, there is a DynamoDB table in both regions that mirror each other. For example, the following screenshot shows a sample DynamoDB table. The Lambda function in us-east-1 would update the DynamoDB entry for us-east-1 in both tables in both regions.

By updating a DynamoDB table in both regions, it allows the local region to gracefully handle issues with the remote region, which would otherwise prevent your ability to scale locally. If the remote region becomes inaccessible, you have a copy of the latest configuration from the table that you can use to continue to sync with your security groups. When the remote region comes back online, it pushes its updated public IP addresses to the DynamoDB table. The security group is updated to reflect the current status by the remote Lambda function.

 

Walkthrough

Note: All of the following steps are performed in both regions. The Launch Stack buttons will default to the us-east-1 region.

Here’s a quick overview of the steps involved in this process:

1. An instance is launched or terminated, which triggers an Auto Scaling group lifecycle hook, triggering the Lambda function via CloudWatch Events.
2. The Lambda function retrieves the list of public IP addresses for all instances in the local region Auto Scaling group.
3. The Lambda function updates the local and remote region DynamoDB tables with the public IP addresses just received for the local Auto Scaling group.
4. The Lambda function updates the local region security group with the public IP addresses, removing and adding to ensure that it mirrors what is present for the local and remote Auto Scaling groups.
5. The Lambda function updates the remote region security group with the public IP addresses, removing and adding to ensure that it mirrors what is present for the local and remote Auto Scaling groups.

Prerequisites

To deploy this solution, you need to have Auto Scaling groups, launch configurations, and a base security group in both regions. To expedite this process, this CloudFormation template can be launched in both regions.

Step 1: Launch the AWS SAM template in the first region

To make the deployment process easy, I’ve created an AWS Serverless Application Model (AWS SAM) template, which is a new specification that makes it easier to manage and deploy serverless applications on AWS. This template creates the following resources:

• A Lambda function, to perform the various security group actions
• A DynamoDB table, to track the state of the local and remote Auto Scaling groups
• Auto Scaling group lifecycle hooks for instance launching and terminating
• A CloudWatch event, to track the EC2 Instance-Launch Lifecycle-Action and EC2 Instance-terminate Lifecycle-Action events
• A pointer from the CloudWatch event to the Lambda function, and the necessary permissions

Download the template from here or click to launch.

Upon launching the template, you’ll be presented with a list of parameters which includes the remote/local names for your Auto Scaling Groups, AWS region, Security Group IDs, DynamoDB table names, as well as where the code for the Lambda function is located. Because this is the first region you’re launching the stack in, fill out all the parameters except for the RemoteTable parameter as it hasn’t been created yet (you fill this in later).

Step 2: Test the local region

After the stack has finished launching, you can test the local region. Open the EC2 console and find the Auto Scaling group that was created when launching the prerequisite stack. Change the desired number of instances from 0 to 1.

For both regions, check your security group to verify that the public IP address of the instance created is now in the security group.

Local region:

Remote region:

Now, change the desired number of instances for your group back to 0 and verify that the rules are properly removed.

Local region:

Remote region:

Step 3: Launch in the remote region

When you deploy a Lambda function using CloudFormation, the Lambda zip file needs to reside in the same region you are launching the template. Once you choose your remote region, create an Amazon S3 bucket and upload the Lambda zip file there. Next, go to the remote region and launch the same SAM template as before, but make sure you update the CodeBucket and CodeKey parameters. Also, because this is the second launch, you now have all the values and can fill out all the parameters, specifically the RemoteTable value.

 

Step 4: Update the local region Lambda environment variable

When you originally launched the template in the local region, you didn’t have the name of the DynamoDB table for the remote region, because you hadn’t created it yet. Now that you have launched the remote template, you can perform a CloudFormation stack update on the initial SAM template. This populates the remote DynamoDB table name into the initial Lambda function’s environment variables.

In the CloudFormation console in the initial region, select the stack. Under Actions, choose Update Stack, and select the SAM template used for both regions. Under Parameters, populate the remote DynamoDB table name, as shown below. Choose Next and let the stack update complete. This updates your Lambda function and completes the setup process.

 

Step 5: Final testing

You now have everything fully configured and in place to trigger security group changes based on instances being added or removed to your Auto Scaling groups in both regions. Test this by changing the desired capacity of your group in both regions.

True up functionality
If an instance is manually added or removed from the Auto Scaling group, the lifecycle hooks don’t get triggered. To account for this, the Lambda function supports a “true up” functionality in which the function can be manually invoked. If you paste in the following JSON text for your test event, it kicks off the entire workflow. For added peace of mind, you can also have this function fire via a CloudWatch event with a CRON expression for nearly continuous checking.

{
	"detail": {
		"AutoScalingGroupName": "<your ASG name>"
	},
	"trueup":true
}

Extra credit

Now that all the resources are created in both regions, go back and break down the policy to incorporate resource-level permissions for specific security groups, Auto Scaling groups, and the DynamoDB tables.

Although this post is centered around using public IP addresses for your instances, you could instead use a VPN between regions. In this case, you would still be able to use this solution to scope down the security groups to the cluster instances. However, the code would need to be modified to support private IP addresses.

 

Conclusion

At this point, you now have a mechanism in place that captures when a new instance is added to or removed from your cluster and updates the security groups in both regions. This ensures that you are locking down your infrastructure securely by allowing access only to other cluster members.

Keep in mind that this architecture (lifecycle hooks, CloudWatch event, Lambda function, and DynamoDB table) requires that the infrastructure to be deployed in both regions, to have synchronization going both ways.

Because this Lambda function is modifying security group rules, it’s important to have an audit log of what has been modified and who is modifying them. The out-of-the-box function provides logs in CloudWatch for what IP addresses are being added and removed for which ports. As these are all API calls being made, they are logged in CloudTrail and can be traced back to the IAM role that you created for your lifecycle hooks. This can provide historical data that can be used for troubleshooting or auditing purposes.

Security is paramount at AWS. We want to ensure that customers are protecting access to their resources. This solution helps you keep your security groups in both regions automatically in sync with your Auto Scaling group resources. Let us know if you have any questions or other solutions you’ve come up with!

Nazis, are bad

Post Syndicated from Eevee original https://eev.ee/blog/2017/08/13/nazis-are-bad/

Anonymous asks:

Could you talk about something related to the management/moderation and growth of online communities? IOW your thoughts on online community management, if any.

I think you’ve tweeted about this stuff in the past so I suspect you have thoughts on this, but if not, again, feel free to just blog about … anything 🙂

Oh, I think I have some stuff to say about community management, in light of recent events. None of it hasn’t already been said elsewhere, but I have to get this out.

Hopefully the content warning is implicit in the title.


I am frustrated.

I’ve gone on before about a particularly bothersome phenomenon that hurts a lot of small online communities: often, people are willing to tolerate the misery of others in a community, but then get up in arms when someone pushes back. Someone makes a lot of off-hand, off-color comments about women? Uses a lot of dog-whistle terms? Eh, they’re not bothering anyone, or at least not bothering me. Someone else gets tired of it and tells them to knock it off? Whoa there! Now we have the appearance of conflict, which is unacceptable, and people will turn on the person who’s pissed off — even though they’ve been at the butt end of an invisible conflict for who knows how long. The appearance of peace is paramount, even if it means a large chunk of the population is quietly miserable.

Okay, so now, imagine that on a vastly larger scale, and also those annoying people who know how to skirt the rules are Nazis.


The label “Nazi” gets thrown around a lot lately, probably far too easily. But when I see a group of people doing the Hitler salute, waving large Nazi flags, wearing Nazi armbands styled after the SS, well… if the shoe fits, right? I suppose they might have flown across the country to join a torch-bearing mob ironically, but if so, the joke is going way over my head. (Was the murder ironic, too?) Maybe they’re not Nazis in the sense that the original party doesn’t exist any more, but for ease of writing, let’s refer to “someone who espouses Nazi ideology and deliberately bears a number of Nazi symbols” as, well, “a Nazi”.

This isn’t a new thing, either; I’ve stumbled upon any number of Twitter accounts that are decorated in Nazi regalia. I suppose the trouble arises when perfectly innocent members of the alt-right get unfairly labelled as Nazis.

But hang on; this march was called “Unite the Right” and was intended to bring together various far right sub-groups. So what does their choice of aesthetic say about those sub-groups? I haven’t heard, say, alt-right coiner Richard Spencer denounce the use of Nazi symbology — extra notable since he was fucking there and apparently didn’t care to discourage it.


And so begins the rule-skirting. “Nazi” is definitely overused, but even using it to describe white supremacists who make not-so-subtle nods to Hitler is likely to earn you some sarcastic derailment. A Nazi? Oh, so is everyone you don’t like and who wants to establish a white ethno state a Nazi?

Calling someone a Nazi — or even a white supremacist — is an attack, you see. Merely expressing the desire that people of color not exist is perfectly peaceful, but identifying the sentiment for what it is causes visible discord, which is unacceptable.

These clowns even know this sort of thing and strategize around it. Or, try, at least. Maybe it wasn’t that successful this weekend — though flicking through Charlottesville headlines now, they seem to be relatively tame in how they refer to the ralliers.

I’m reminded of a group of furries — the alt-furries — who have been espousing white supremacy and wearing red armbands with a white circle containing a black… pawprint. Ah, yes, that’s completely different.


So, what to do about this?

Ignore them” is a popular option, often espoused to bullied children by parents who have never been bullied, shortly before they resume complaining about passive-aggressive office politics. The trouble with ignoring them is that, just like in smaller communitiest, they have a tendency to fester. They take over large chunks of influential Internet surface area like 4chan and Reddit; they help get an inept buffoon elected; and then they start to have torch-bearing rallies and run people over with cars.

4chan illustrates a kind of corollary here. Anyone who’s steeped in Internet Culture™ is surely familiar with 4chan; I was never a regular visitor, but it had enough influence that I was still aware of it and some of its culture. It was always thick with irony, which grew into a sort of ironic detachment — perhaps one of the major sources of the recurring online trope that having feelings is bad — which proceeded into ironic racism.

And now the ironic racism is indistinguishable from actual racism, as tends to be the case. Do they “actually” “mean it”, or are they just trying to get a rise out of people? What the hell is unironic racism if not trying to get a rise out of people? What difference is there to onlookers, especially as they move to become increasingly involved with politics?

It’s just a joke” and “it was just a thoughtless comment” are exceptionally common defenses made by people desperate to preserve the illusion of harmony, but the strain of overt white supremacy currently running rampant through the US was built on those excuses.


The other favored option is to debate them, to defeat their ideas with better ideas.

Well, hang on. What are their ideas, again? I hear they were chanting stuff like “go back to Africa” and “fuck you, faggots”. Given that this was an overtly political rally (and again, the Nazi fucking regalia), I don’t think it’s a far cry to describe their ideas as “let’s get rid of black people and queer folks”.

This is an underlying proposition: that white supremacy is inherently violent. After all, if the alt-right seized total political power, what would they do with it? If I asked the same question of Democrats or Republicans, I’d imagine answers like “universal health care” or “screw over poor people”. But people whose primary goal is to have a country full of only white folks? What are they going to do, politely ask everyone else to leave? They’re invoking the memory of people who committed genocide and also tried to take over the fucking world. They are outright saying, these are the people we look up to, this is who we think had a great idea.

How, precisely, does one defeat these ideas with rational debate?

Because the underlying core philosophy beneath all this is: “it would be good for me if everything were about me”. And that’s true! (Well, it probably wouldn’t work out how they imagine in practice, but it’s true enough.) Consider that slavery is probably fantastic if you’re the one with the slaves; the issue is that it’s reprehensible, not that the very notion contains some kind of 101-level logical fallacy. That’s probably why we had a fucking war over it instead of hashing it out over brunch.

…except we did hash it out over brunch once, and the result was that slavery was still allowed but slaves only counted as 60% of a person for the sake of counting how much political power states got. So that’s how rational debate worked out. I’m sure the slaves were thrilled with that progress.


That really only leaves pushing back, which raises the question of how to push back.

And, I don’t know. Pushing back is much harder in spaces you don’t control, spaces you’re already struggling to justify your own presence in. For most people, that’s most spaces. It’s made all the harder by that tendency to preserve illusory peace; even the tamest request that someone knock off some odious behavior can be met by pushback, even by third parties.

At the same time, I’m aware that white supremacists prey on disillusioned young white dudes who feel like they don’t fit in, who were promised the world and inherited kind of a mess. Does criticism drive them further away? The alt-right also opposes “political correctness”, i.e. “not being a fucking asshole”.

God knows we all suck at this kind of behavior correction, even within our own in-groups. Fandoms have become almost ridiculously vicious as platforms like Twitter and Tumblr amplify individual anger to deafening levels. It probably doesn’t help that we’re all just exhausted, that every new fuck-up feels like it bears the same weight as the last hundred combined.

This is the part where I admit I don’t know anything about people and don’t have any easy answers. Surprise!


The other alternative is, well, punching Nazis.

That meme kind of haunts me. It raises really fucking complicated questions about when violence is acceptable, in a culture that’s completely incapable of answering them.

America’s relationship to violence is so bizarre and two-faced as to be almost incomprehensible. We worship it. We have the biggest military in the world by an almost comical margin. It’s fairly mainstream to own deadly weapons for the express stated purpose of armed revolution against the government, should that become necessary, where “necessary” is left ominously undefined. Our movies are about explosions and beating up bad guys; our video games are about explosions and shooting bad guys. We fantasize about solving foreign policy problems by nuking someone — hell, our talking heads are currently in polite discussion about whether we should nuke North Korea and annihilate up to twenty-five million people, as punishment for daring to have the bomb that only we’re allowed to have.

But… violence is bad.

That’s about as far as the other side of the coin gets. It’s bad. We condemn it in the strongest possible terms. Also, guess who we bombed today?

I observe that the one time Nazis were a serious threat, America was happy to let them try to take over the world until their allies finally showed up on our back porch.

Maybe I don’t understand what “violence” means. In a quest to find out why people are talking about “leftist violence” lately, I found a National Review article from May that twice suggests blocking traffic is a form of violence. Anarchists have smashed some windows and set a couple fires at protests this year — and, hey, please knock that crap off? — which is called violence against, I guess, Starbucks. Black Lives Matter could be throwing a birthday party and Twitter would still be abuzz with people calling them thugs.

Meanwhile, there’s a trend of murderers with increasingly overt links to the alt-right, and everyone is still handling them with kid gloves. First it was murders by people repeating their talking points; now it’s the culmination of a torches-and-pitchforks mob. (Ah, sorry, not pitchforks; assault rifles.) And we still get this incredibly bizarre both-sides-ism, a White House that refers to the people who didn’t murder anyone as “just as violent if not more so“.


Should you punch Nazis? I don’t know. All I know is that I’m extremely dissatisfied with discourse that’s extremely alarmed by hypothetical punches — far more mundane than what you’d see after a sporting event — but treats a push for ethnic cleansing as a mere difference of opinion.

The equivalent to a punch in an online space is probably banning, which is almost laughable in comparison. It doesn’t cause physical harm, but it is a use of concrete force. Doesn’t pose quite the same moral quandary, though.

Somewhere in the middle is the currently popular pastime of doxxing (doxxxxxxing) people spotted at the rally in an attempt to get them fired or whatever. Frankly, that skeeves me out, though apparently not enough that I’m directly chastizing anyone for it.


We aren’t really equipped, as a society, to deal with memetic threats. We aren’t even equipped to determine what they are. We had a fucking world war over this, and now people are outright saying “hey I’m like those people we went and killed a lot in that world war” and we give them interviews and compliment their fashion sense.

A looming question is always, what if they then do it to you? What if people try to get you fired, to punch you for your beliefs?

I think about that a lot, and then I remember that it’s perfectly legal to fire someone for being gay in half the country. (Courts are currently wrangling whether Title VII forbids this, but with the current administration, I’m not optimistic.) I know people who’ve been fired for coming out as trans. I doubt I’d have to look very far to find someone who’s been punched for either reason.

And these aren’t even beliefs; they’re just properties of a person. You can stop being a white supremacist, one of those people yelling “fuck you, faggots”.

So I have to recuse myself from this asinine question, because I can’t fairly judge the risk of retaliation when it already happens to people I care about.

Meanwhile, if a white supremacist does get punched, I absolutely still want my tax dollars to pay for their universal healthcare.


The same wrinkle comes up with free speech, which is paramount.

The ACLU reminds us that the First Amendment “protects vile, hateful, and ignorant speech”. I think they’ve forgotten that that’s a side effect, not the goal. No one sat down and suggested that protecting vile speech was some kind of noble cause, yet that’s how we seem to be treating it.

The point was to avoid a situation where the government is arbitrarily deciding what qualifies as vile, hateful, and ignorant, and was using that power to eliminate ideas distasteful to politicians. You know, like, hypothetically, if they interrogated and jailed a bunch of people for supporting the wrong economic system. Or convicted someone under the Espionage Act for opposing the draft. (Hey, that’s where the “shouting fire in a crowded theater” line comes from.)

But these are ideas that are already in the government. Bannon, a man who was chair of a news organization he himself called “the platform for the alt-right”, has the President’s ear! How much more mainstream can you get?

So again I’m having a little trouble balancing “we need to defend the free speech of white supremacists or risk losing it for everyone” against “we fairly recently were ferreting out communists and the lingering public perception is that communists are scary, not that the government is”.


This isn’t to say that freedom of speech is bad, only that the way we talk about it has become fanatical to the point of absurdity. We love it so much that we turn around and try to apply it to corporations, to platforms, to communities, to interpersonal relationships.

Look at 4chan. It’s completely public and anonymous; you only get banned for putting the functioning of the site itself in jeopardy. Nothing is stopping a larger group of people from joining its politics board and tilting sentiment the other way — except that the current population is so odious that no one wants to be around them. Everyone else has evaporated away, as tends to happen.

Free speech is great for a government, to prevent quashing politics that threaten the status quo (except it’s a joke and they’ll do it anyway). People can’t very readily just bail when the government doesn’t like them, anyway. It’s also nice to keep in mind to some degree for ubiquitous platforms. But the smaller you go, the easier it is for people to evaporate away, and the faster pure free speech will turn the place to crap. You’ll be left only with people who care about nothing.


At the very least, it seems clear that the goal of white supremacists is some form of destabilization, of disruption to the fabric of a community for purely selfish purposes. And those are the kinds of people you want to get rid of as quickly as possible.

Usually this is hard, because they act just nicely enough to create some plausible deniability. But damn, if someone is outright telling you they love Hitler, maybe skip the principled hand-wringing and eject them.

Don’t Get Trapped in iCloud

Post Syndicated from Peter Cohen original https://www.backblaze.com/blog/dont-get-trapped-icloud/

Don't Get Trapped in iCloud

Let me preface this with a bit of history: I’ve been using Macs for more than 30 years. I’ve seen an enormous amount of changes at Apple, and I’ve been using their online services since the AppleLink days (it was a pre-Internet dial-up service for Apple dealers and service people).

Over the past few years Apple’s made a lot of changes to iCloud. They’ve added some great additions to make it a world-class cloud service. But there are drawbacks. In the course of selling, supporting and writing about these devices, I consistently see people make the same mistakes. So with that background let’s get to my central point: I think it’s a big mistake to trust Apple alone with your data. Let me tell you why.

Apple aggressively promotes iCloud to its customers as a way to securely store information, photos and other vital data, leading to a false sense of security that all of your data is safe from harm. It isn’t. Let’s talk about some of the biggest mistakes you can make with iCloud.

iCloud Sync Does Not = Backing Up

Even if the picture of your puppy’s first bath time is on your iPhone and your iPad, it isn’t backed up. One of the biggest mistakes you can make is to assume that since your photos, contacts, and calendar sync between devices, they’re backed up. There’s a big difference between syncing and backing up.

Repeat after me:
Syncing Is Not Backing Up
Syncing Is Not Backing Up
Syncing Is Not Backing Up

iCloud helps you sync content between devices. Add an event to the calendar app on your phone and iCloud pushes that change to the calendar on your Mac too. Take a photo with the iPhone and find it in your Mac’s Photos library without having to connect the phone to the computer. That’s convenient. I use that functionality all the time.

Syncing can be confusing, though. iCloud Photo Library is what Apple calls iCloud’s ability to sync photos between Apple devices seamlessly. But it’s a two-way street. If you delete a photo from your Mac, it gets removed from your iPhone too, because it’s all in iCloud, there is no backup copy anywhere else.

Recently my wife decided that she didn’t want to have the same photos on her Mac and iPhone. Extricating herself from that means shutting off iCloud Photo Library and manually syncing the iPhone and Mac. That adds extra steps to back everything up! Now the phone has to be connected to the Mac, and my wife has to remember to do it. Bottom line: Syncs between the computer and phone happen less frequently when they are manual, which means there’s more opportunity for pictures to get lost. But with Apple’s syncing enabled, my wife runs the risk of deleting photos that are important not just on one device but everywhere.

Relying on any of these features without having a solid backup strategy means you’re leaving it to Apple and iCloud to keep your pictures and other info safe. If the complex and intricate ecosystem that keeps that stuff working goes awry – and as Murphy’s Law demands, stuff always goes wrong – you can find yourself without pictures, music, and important files.

Better to be safe than sorry. Backing up your data is the way to make sure your memories are safe. Most of the people I’ve helped over the years haven’t realized that iCloud is not backing them up. Some of them have found out the hard way.

iCloud Doesn’t Back Up Your Computer

Apple does have something called “iCloud Backup.” iCloud Backup backs up critical info on the iPhone and iPad to iCloud. But it’s only for mobile devices. The “stuff” on your computer is not backed up by iCloud Backup.

Making matters worse, it’s a “space permitting” solution. Apple gives you a scant 5 GB of free space with an iCloud account. To put that in context, the smallest iPhone 7 ships with 32 GB of space. So right off the bat, you have to pay extra to back up a new device. Many of us who use the free account don’t want to pay for more, so we get messages telling us that our devices can’t be backed up.

More importantly, iCloud doesn’t back up your Mac. So while data may be synced between devices in iCloud, most of the content on your Mac isn’t getting backed up directly.

Be Wary of “Store In iCloud” and “Optimize Storage”

macOS X 10.12 “Sierra” introduced new remote storage functions for iCloud including “Store in iCloud” and “Optimize Storage.” Both of these features move information from your Mac to the cloud. The Mac leaves frequently accessed files locally, but files you don’t use regularly get moved to iCloud and purged from the hard drive.

Your data is yours.

Macs, with their high-performance hard drives, can run chronically short of local storage space. These new storage optimization features can offset that problem by moving what you’re not using to iCloud. As long as you stay connected to iCloud. If iCloud isn’t available, neither are your files.

Your data is yours. It should always be in your possession. Ideally, you’d have a local backup of your data (time machine, extra hard drive, etc) AND an offsite copy… not OR. We call that 3-2-1 Backup Strategy. That way you’re not dependent on Apple and a stable Internet connection to get your files when you want them.

iCloud Drive Isn’t a Backup Either

iCloud Drive is another iCloud feature that can lull you into a false sense of security. It’s a Dropbox-style sync repository – files put in iCloud Drive appear on the Mac, iPhone, and iPad. However, any files you don’t choose to add to iCloud Drive are only available locally and are not backed up.

iCloud Drive has limits, too. You can’t upload a file larger than 15 GB. And you can only store as much as you’ve paid for – hit your limit, and you’ll have to pay more. But only up to 2 TB, which will cost you $19.99/month.

Trust But Verify (and Back Up Yourself)

I’ve used iCloud from the start and I continue to do so. iCloud is an excellent sync service. It makes the Apple ecosystem of hardware and software easier to use. But it isn’t infallible. I’ve had problems with calendar syncing, contacts disappearing, and my music getting messed up by iTunes In the Cloud.

That was a real painful lesson for me. I synced thousands of tracks of music I’d had for many years, ripped from the original CDs I owned and had long since put in storage. iTunes In the Cloud synced my music library so I could share it with all my Apple devices. To save space and bandwidth, the service doesn’t upload your library when it can replace tracks with what it thinks are matches in iTunes’ own library. I didn’t want Apple’s versions – I wanted mine, because I’d customized them with album art and spent a lot of time crafting them. Apple’s versions sometimes looked and sounded differently than mine.

If I hadn’t kept a backup copy locally, I’d be stuck with Apple’s versions. That wasn’t what I wanted. My data is mine.

The prospect of downloading thousands of files, and all the time that would take is daunting. That’s why we created the Restore Return Refund program – you can get your backed up files delivered by FedEx on a USB thumbdrive or hard disk drive. You can’t do that with iCloud.

It’s experiences like that which explain why I think it’s so important to understand iCloud’s inherent shortcomings as a backup service. Having your data sync across your devices is a great feature and one I use all the time. However, as a sole backup solution, it’s a recipe for disaster.

Like all sync services if you accidently delete a file on one device it’s gone on all of your devices as soon as the next sync happens. Unfortunately “user error” is an all too common problem and when it comes to your data, it’s not one you want to take for granted.

Which brings us to the last point I want to make. It’s easy to get complacent with one company’s ecosystem, but circumstances change. What happens when you get rid of that Mac or that iPhone and get something that doesn’t integrate as easily with the Apple world? Extricating yourself from any company’s ecosystem can, quite frankly, be an intimidating experience, with lots of opportunities to overlook or lose important files. You can avoid such data insecurity by having your info backed up.

With a family that uses lots of Apple products, I pay for Apple’s iCloud and other Apple services. With a Mac and iPhone, iCloud’s ability to sync content means that my workflow is seamless from mobile to desktop and back. I spend less time fiddling with my devices and more time getting work done. The data on iCloud makes up my digital life. Like anything valuable, it’s common sense to keep my info close and well protected. That’s why I keep a local backup, with offsite backup through Backblaze, of course.

The safety, security, and integrity of your data are paramount. Do whatever you can to make sure it’s safe. Back up your files locally and offsite away from iCloud. Backblaze is here to help. If you need more advice for backing up your Mac, check out our complete Mac Backup Guide for details.

The post Don’t Get Trapped in iCloud appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

In Case You Missed These: AWS Security Blog Posts from January, February, and March

Post Syndicated from Craig Liebendorfer original https://aws.amazon.com/blogs/security/in-case-you-missed-these-aws-security-blog-posts-from-january-february-and-march/

Image of lock and key

In case you missed any AWS Security Blog posts published so far in 2017, they are summarized and linked to below. The posts are shown in reverse chronological order (most recent first), and the subject matter ranges from protecting dynamic web applications against DDoS attacks to monitoring AWS account configuration changes and API calls to Amazon EC2 security groups.

March

March 22: How to Help Protect Dynamic Web Applications Against DDoS Attacks by Using Amazon CloudFront and Amazon Route 53
Using a content delivery network (CDN) such as Amazon CloudFront to cache and serve static text and images or downloadable objects such as media files and documents is a common strategy to improve webpage load times, reduce network bandwidth costs, lessen the load on web servers, and mitigate distributed denial of service (DDoS) attacks. AWS WAF is a web application firewall that can be deployed on CloudFront to help protect your application against DDoS attacks by giving you control over which traffic to allow or block by defining security rules. When users access your application, the Domain Name System (DNS) translates human-readable domain names (for example, www.example.com) to machine-readable IP addresses (for example, 192.0.2.44). A DNS service, such as Amazon Route 53, can effectively connect users’ requests to a CloudFront distribution that proxies requests for dynamic content to the infrastructure hosting your application’s endpoints. In this blog post, I show you how to deploy CloudFront with AWS WAF and Route 53 to help protect dynamic web applications (with dynamic content such as a response to user input) against DDoS attacks. The steps shown in this post are key to implementing the overall approach described in AWS Best Practices for DDoS Resiliency and enable the built-in, managed DDoS protection service, AWS Shield.

March 21: New AWS Encryption SDK for Python Simplifies Multiple Master Key Encryption
The AWS Cryptography team is happy to announce a Python implementation of the AWS Encryption SDK. This new SDK helps manage data keys for you, and it simplifies the process of encrypting data under multiple master keys. As a result, this new SDK allows you to focus on the code that drives your business forward. It also provides a framework you can easily extend to ensure that you have a cryptographic library that is configured to match and enforce your standards. The SDK also includes ready-to-use examples. If you are a Java developer, you can refer to this blog post to see specific Java examples for the SDK. In this blog post, I show you how you can use the AWS Encryption SDK to simplify the process of encrypting data and how to protect your encryption keys in ways that help improve application availability by not tying you to a single region or key management solution.

March 21: Updated CJIS Workbook Now Available by Request
The need for guidance when implementing Criminal Justice Information Services (CJIS)–compliant solutions has become of paramount importance as more law enforcement customers and technology partners move to store and process criminal justice data in the cloud. AWS services allow these customers to easily and securely architect a CJIS-compliant solution when handling criminal justice data, creating a durable, cost-effective, and secure IT infrastructure that better supports local, state, and federal law enforcement in carrying out their public safety missions. AWS has created several documents (collectively referred to as the CJIS Workbook) to assist you in aligning with the FBI’s CJIS Security Policy. You can use the workbook as a framework for developing CJIS-compliant architecture in the AWS Cloud. The workbook helps you define and test the controls you operate, and document the dependence on the controls that AWS operates (compute, storage, database, networking, regions, Availability Zones, and edge locations).

March 9: New Cloud Directory API Makes It Easier to Query Data Along Multiple Dimensions
Today, we made available a new Cloud Directory API, ListObjectParentPaths, that enables you to retrieve all available parent paths for any directory object across multiple hierarchies. Use this API when you want to fetch all parent objects for a specific child object. The order of the paths and objects returned is consistent across iterative calls to the API, unless objects are moved or deleted. In case an object has multiple parents, the API allows you to control the number of paths returned by using a paginated call pattern. In this blog post, I use an example directory to demonstrate how this new API enables you to retrieve data across multiple dimensions to implement powerful applications quickly.

March 8: How to Access the AWS Management Console Using AWS Microsoft AD and Your On-Premises Credentials
AWS Directory Service for Microsoft Active Directory, also known as AWS Microsoft AD, is a managed Microsoft Active Directory (AD) hosted in the AWS Cloud. Now, AWS Microsoft AD makes it easy for you to give your users permission to manage AWS resources by using on-premises AD administrative tools. With AWS Microsoft AD, you can grant your on-premises users permissions to resources such as the AWS Management Console instead of adding AWS Identity and Access Management (IAM) user accounts or configuring AD Federation Services (AD FS) with Security Assertion Markup Language (SAML). In this blog post, I show how to use AWS Microsoft AD to enable your on-premises AD users to sign in to the AWS Management Console with their on-premises AD user credentials to access and manage AWS resources through IAM roles.

March 7: How to Protect Your Web Application Against DDoS Attacks by Using Amazon Route 53 and an External Content Delivery Network
Distributed Denial of Service (DDoS) attacks are attempts by a malicious actor to flood a network, system, or application with more traffic, connections, or requests than it is able to handle. To protect your web application against DDoS attacks, you can use AWS Shield, a DDoS protection service that AWS provides automatically to all AWS customers at no additional charge. You can use AWS Shield in conjunction with DDoS-resilient web services such as Amazon CloudFront and Amazon Route 53 to improve your ability to defend against DDoS attacks. Learn more about architecting for DDoS resiliency by reading the AWS Best Practices for DDoS Resiliency whitepaper. You also have the option of using Route 53 with an externally hosted content delivery network (CDN). In this blog post, I show how you can help protect the zone apex (also known as the root domain) of your web application by using Route 53 to perform a secure redirect to prevent discovery of your application origin.

Image of lock and key

February

February 27: Now Generally Available – AWS Organizations: Policy-Based Management for Multiple AWS Accounts
Today, AWS Organizations moves from Preview to General Availability. You can use Organizations to centrally manage multiple AWS accounts, with the ability to create a hierarchy of organizational units (OUs). You can assign each account to an OU, define policies, and then apply those policies to an entire hierarchy, specific OUs, or specific accounts. You can invite existing AWS accounts to join your organization, and you can also create new accounts. All of these functions are available from the AWS Management Console, the AWS Command Line Interface (CLI), and through the AWS Organizations API.To read the full AWS Blog post about today’s launch, see AWS Organizations – Policy-Based Management for Multiple AWS Accounts.

February 23: s2n Is Now Handling 100 Percent of SSL Traffic for Amazon S3
Today, we’ve achieved another important milestone for securing customer data: we have replaced OpenSSL with s2n for all internal and external SSL traffic in Amazon Simple Storage Service (Amazon S3) commercial regions. This was implemented with minimal impact to customers, and multiple means of error checking were used to ensure a smooth transition, including client integration tests, catching potential interoperability conflicts, and identifying memory leaks through fuzz testing.

February 22: Easily Replace or Attach an IAM Role to an Existing EC2 Instance by Using the EC2 Console
AWS Identity and Access Management (IAM) roles enable your applications running on Amazon EC2 to use temporary security credentials. IAM roles for EC2 make it easier for your applications to make API requests securely from an instance because they do not require you to manage AWS security credentials that the applications use. Recently, we enabled you to use temporary security credentials for your applications by attaching an IAM role to an existing EC2 instance by using the AWS CLI and SDK. To learn more, see New! Attach an AWS IAM Role to an Existing Amazon EC2 Instance by Using the AWS CLI. Starting today, you can attach an IAM role to an existing EC2 instance from the EC2 console. You can also use the EC2 console to replace an IAM role attached to an existing instance. In this blog post, I will show how to attach an IAM role to an existing EC2 instance from the EC2 console.

February 22: How to Audit Your AWS Resources for Security Compliance by Using Custom AWS Config Rules
AWS Config Rules enables you to implement security policies as code for your organization and evaluate configuration changes to AWS resources against these policies. You can use Config rules to audit your use of AWS resources for compliance with external compliance frameworks such as CIS AWS Foundations Benchmark and with your internal security policies related to the US Health Insurance Portability and Accountability Act (HIPAA), the Federal Risk and Authorization Management Program (FedRAMP), and other regimes. AWS provides some predefined, managed Config rules. You also can create custom Config rules based on criteria you define within an AWS Lambda function. In this post, I show how to create a custom rule that audits AWS resources for security compliance by enabling VPC Flow Logs for an Amazon Virtual Private Cloud (VPC). The custom rule meets requirement 4.3 of the CIS AWS Foundations Benchmark: “Ensure VPC flow logging is enabled in all VPCs.”

February 13: AWS Announces CISPE Membership and Compliance with First-Ever Code of Conduct for Data Protection in the Cloud
I have two exciting announcements today, both showing AWS’s continued commitment to ensuring that customers can comply with EU Data Protection requirements when using our services.

February 13: How to Enable Multi-Factor Authentication for AWS Services by Using AWS Microsoft AD and On-Premises Credentials
You can now enable multi-factor authentication (MFA) for users of AWS services such as Amazon WorkSpaces and Amazon QuickSight and their on-premises credentials by using your AWS Directory Service for Microsoft Active Directory (Enterprise Edition) directory, also known as AWS Microsoft AD. MFA adds an extra layer of protection to a user name and password (the first “factor”) by requiring users to enter an authentication code (the second factor), which has been provided by your virtual or hardware MFA solution. These factors together provide additional security by preventing access to AWS services, unless users supply a valid MFA code.

February 13: How to Create an Organizational Chart with Separate Hierarchies by Using Amazon Cloud Directory
Amazon Cloud Directory enables you to create directories for a variety of use cases, such as organizational charts, course catalogs, and device registries. Cloud Directory offers you the flexibility to create directories with hierarchies that span multiple dimensions. For example, you can create an organizational chart that you can navigate through separate hierarchies for reporting structure, location, and cost center. In this blog post, I show how to use Cloud Directory APIs to create an organizational chart with two separate hierarchies in a single directory. I also show how to navigate the hierarchies and retrieve data. I use the Java SDK for all the sample code in this post, but you can use other language SDKs or the AWS CLI.

February 10: How to Easily Log On to AWS Services by Using Your On-Premises Active Directory
AWS Directory Service for Microsoft Active Directory (Enterprise Edition), also known as Microsoft AD, now enables your users to log on with just their on-premises Active Directory (AD) user name—no domain name is required. This new domainless logon feature makes it easier to set up connections to your on-premises AD for use with applications such as Amazon WorkSpaces and Amazon QuickSight, and it keeps the user logon experience free from network naming. This new interforest trusts capability is now available when using Microsoft AD with Amazon WorkSpaces and Amazon QuickSight Enterprise Edition. In this blog post, I explain how Microsoft AD domainless logon works with AD interforest trusts, and I show an example of setting up Amazon WorkSpaces to use this capability.

February 9: New! Attach an AWS IAM Role to an Existing Amazon EC2 Instance by Using the AWS CLI
AWS Identity and Access Management (IAM) roles enable your applications running on Amazon EC2 to use temporary security credentials that AWS creates, distributes, and rotates automatically. Using temporary credentials is an IAM best practice because you do not need to maintain long-term keys on your instance. Using IAM roles for EC2 also eliminates the need to use long-term AWS access keys that you have to manage manually or programmatically. Starting today, you can enable your applications to use temporary security credentials provided by AWS by attaching an IAM role to an existing EC2 instance. You can also replace the IAM role attached to an existing EC2 instance. In this blog post, I show how you can attach an IAM role to an existing EC2 instance by using the AWS CLI.

February 8: How to Remediate Amazon Inspector Security Findings Automatically
The Amazon Inspector security assessment service can evaluate the operating environments and applications you have deployed on AWS for common and emerging security vulnerabilities automatically. As an AWS-built service, Amazon Inspector is designed to exchange data and interact with other core AWS services not only to identify potential security findings but also to automate addressing those findings. Previous related blog posts showed how you can deliver Amazon Inspector security findings automatically to third-party ticketing systems and automate the installation of the Amazon Inspector agent on new Amazon EC2 instances. In this post, I show how you can automatically remediate findings generated by Amazon Inspector. To get started, you must first run an assessment and publish any security findings to an Amazon Simple Notification Service (SNS) topic. Then, you create an AWS Lambda function that is triggered by those notifications. Finally, the Lambda function examines the findings and then implements the appropriate remediation based on the type of issue.

February 6: How to Simplify Security Assessment Setup Using Amazon EC2 Systems Manager and Amazon Inspector
In a July 2016 AWS Blog post, I discussed how to integrate Amazon Inspector with third-party ticketing systems by using Amazon Simple Notification Service (SNS) and AWS Lambda. This AWS Security Blog post continues in the same vein, describing how to use Amazon Inspector to automate various aspects of security management. In this post, I show you how to install the Amazon Inspector agent automatically through the Amazon EC2 Systems Manager when a new Amazon EC2 instance is launched. In a subsequent post, I will show you how to update EC2 instances automatically that run Linux when Amazon Inspector discovers a missing security patch.

Image of lock and key

January

January 30: How to Protect Data at Rest with Amazon EC2 Instance Store Encryption
Encrypting data at rest is vital for regulatory compliance to ensure that sensitive data saved on disks is not readable by any user or application without a valid key. Some compliance regulations such as PCI DSS and HIPAA require that data at rest be encrypted throughout the data lifecycle. To this end, AWS provides data-at-rest options and key management to support the encryption process. For example, you can encrypt Amazon EBS volumes and configure Amazon S3 buckets for server-side encryption (SSE) using AES-256 encryption. Additionally, Amazon RDS supports Transparent Data Encryption (TDE). Instance storage provides temporary block-level storage for Amazon EC2 instances. This storage is located on disks attached physically to a host computer. Instance storage is ideal for temporary storage of information that frequently changes, such as buffers, caches, and scratch data. By default, files stored on these disks are not encrypted. In this blog post, I show a method for encrypting data on Linux EC2 instance stores by using Linux built-in libraries. This method encrypts files transparently, which protects confidential data. As a result, applications that process the data are unaware of the disk-level encryption.

January 27: How to Detect and Automatically Remediate Unintended Permissions in Amazon S3 Object ACLs with CloudWatch Events
Amazon S3 Access Control Lists (ACLs) enable you to specify permissions that grant access to S3 buckets and objects. When S3 receives a request for an object, it verifies whether the requester has the necessary access permissions in the associated ACL. For example, you could set up an ACL for an object so that only the users in your account can access it, or you could make an object public so that it can be accessed by anyone. If the number of objects and users in your AWS account is large, ensuring that you have attached correctly configured ACLs to your objects can be a challenge. For example, what if a user were to call the PutObjectAcl API call on an object that is supposed to be private and make it public? Or, what if a user were to call the PutObject with the optional Acl parameter set to public-read, therefore uploading a confidential file as publicly readable? In this blog post, I show a solution that uses Amazon CloudWatch Events to detect PutObject and PutObjectAcl API calls in near-real time and helps ensure that the objects remain private by making automatic PutObjectAcl calls, when necessary.

January 26: Now Available: Amazon Cloud Directory—A Cloud-Native Directory for Hierarchical Data
Today we are launching Amazon Cloud Directory. This service is purpose-built for storing large amounts of strongly typed hierarchical data. With the ability to scale to hundreds of millions of objects while remaining cost-effective, Cloud Directory is a great fit for all sorts of cloud and mobile applications.

January 24: New SOC 2 Report Available: Confidentiality
As with everything at Amazon, the success of our security and compliance program is primarily measured by one thing: our customers’ success. Our customers drive our portfolio of compliance reports, attestations, and certifications that support their efforts in running a secure and compliant cloud environment. As a result of our engagement with key customers across the globe, we are happy to announce the publication of our new SOC 2 Confidentiality report. This report is available now through AWS Artifact in the AWS Management Console.

January 18: Compliance in the Cloud for New Financial Services Cybersecurity Regulations
Financial regulatory agencies are focused more than ever on ensuring responsible innovation. Consequently, if you want to achieve compliance with financial services regulations, you must be increasingly agile and employ dynamic security capabilities. AWS enables you to achieve this by providing you with the tools you need to scale your security and compliance capabilities on AWS. The following breakdown of the most recent cybersecurity regulations, NY DFS Rule 23 NYCRR 500, demonstrates how AWS continues to focus on your regulatory needs in the financial services sector.

January 9: New Amazon GameDev Blog Post: Protect Multiplayer Game Servers from DDoS Attacks by Using Amazon GameLift
In online gaming, distributed denial of service (DDoS) attacks target a game’s network layer, flooding servers with requests until performance degrades considerably. These attacks can limit a game’s availability to players and limit the player experience for those who can connect. Today’s new Amazon GameDev Blog post uses a typical game server architecture to highlight DDoS attack vulnerabilities and discusses how to stay protected by using built-in AWS Cloud security, AWS security best practices, and the security features of Amazon GameLift. Read the post to learn more.

January 6: The Top 10 Most Downloaded AWS Security and Compliance Documents in 2016
The following list includes the 10 most downloaded AWS security and compliance documents in 2016. Using this list, you can learn about what other people found most interesting about security and compliance last year.

January 6: FedRAMP Compliance Update: AWS GovCloud (US) Region Receives a JAB-Issued FedRAMP High Baseline P-ATO for Three New Services
Three new services in the AWS GovCloud (US) region have received a Provisional Authority to Operate (P-ATO) from the Joint Authorization Board (JAB) under the Federal Risk and Authorization Management Program (FedRAMP). JAB issued the authorization at the High baseline, which enables US government agencies and their service providers the capability to use these services to process the government’s most sensitive unclassified data, including Personal Identifiable Information (PII), Protected Health Information (PHI), Controlled Unclassified Information (CUI), criminal justice information (CJI), and financial data.

January 4: The Top 20 Most Viewed AWS IAM Documentation Pages in 2016
The following 20 pages were the most viewed AWS Identity and Access Management (IAM) documentation pages in 2016. I have included a brief description with each link to give you a clearer idea of what each page covers. Use this list to see what other people have been viewing and perhaps to pique your own interest about a topic you’ve been meaning to research.

January 3: The Most Viewed AWS Security Blog Posts in 2016
The following 10 posts were the most viewed AWS Security Blog posts that we published during 2016. You can use this list as a guide to catch up on your blog reading or even read a post again that you found particularly useful.

January 3: How to Monitor AWS Account Configuration Changes and API Calls to Amazon EC2 Security Groups
You can use AWS security controls to detect and mitigate risks to your AWS resources. The purpose of each security control is defined by its control objective. For example, the control objective of an Amazon VPC security group is to permit only designated traffic to enter or leave a network interface. Let’s say you have an Internet-facing e-commerce website, and your security administrator has determined that only HTTP (TCP port 80) and HTTPS (TCP 443) traffic should be allowed access to the public subnet. As a result, your administrator configures a security group to meet this control objective. What if, though, someone were to inadvertently change this security group’s rules and enable FTP or other protocols to access the public subnet from any location on the Internet? That expanded access could weaken the security posture of your assets. Consequently, your administrator might need to monitor the integrity of your company’s security controls so that the controls maintain their desired effectiveness. In this blog post, I explore two methods for detecting unintended changes to VPC security groups. The two methods address not only control objectives but also control failures.

If you have questions about or issues with implementing the solutions in any of these posts, please start a new thread on the forum identified near the end of each post.

– Craig

Updated CJIS Workbook Now Available by Request

Post Syndicated from Chris Gile original https://aws.amazon.com/blogs/security/updated-cjis-workbook-now-available-by-request/

CJIS logo

The need for guidance when implementing Criminal Justice Information Services (CJIS)–compliant solutions has become of paramount importance as more law enforcement customers and technology partners move to store and process criminal justice data in the cloud. AWS services allow these customers to easily and securely architect a CJIS-compliant solution when handling criminal justice data, creating a durable, cost-effective, and secure IT infrastructure that better supports local, state, and federal law enforcement in carrying out their public safety missions.

AWS has created several documents (collectively referred to as the CJIS Workbook) to assist you in aligning with the FBI’s CJIS Security Policy. You can use the workbook as a framework for developing CJIS-compliant architecture in the AWS Cloud. The workbook helps you define and test the controls you operate, and document the dependence on the controls that AWS operates (compute, storage, database, networking, regions, Availability Zones, and edge locations).

Our most recent updates to the CJIS Workbook include:

AWS’s commitment to facilitating CJIS processes with customers is exemplified by the recent CJIS Agreements put in place with the states of California, Colorado, Louisiana, Minnesota, Oregon, Utah and Washington (to name but a few). As we continue to sign CJIS agreements across the country, law enforcement agencies are able to implement innovations to improve communities’ and officers’ safety, including body cameras, real-time gunshot notifications, and data analytics. With the release of our updated CJIS Workbook, AWS remains dedicated to enabling cloud usage for the law enforcement market.

Please reach out to AWS Compliance if you have additional questions about CJIS or any other set of compliance standards.

– Chris Gile, AWS Risk and Compliance

Defense against Doxing

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/03/defense_against.html

A decade ago, I wrote about the death of ephemeral conversation. As computers were becoming ubiquitous, some unintended changes happened, too. Before computers, what we said disappeared once we’d said it. Neither face-to-face conversations nor telephone conversations were routinely recorded. A permanent communication was something different and special; we called it correspondence.

The Internet changed this. We now chat by text message and e-mail, on Facebook and on Instagram. These conversations — with friends, lovers, colleagues, fellow employees — all leave electronic trails. And while we know this intellectually, we haven’t truly internalized it. We still think of conversation as ephemeral, forgetting that we’re being recorded and what we say has the permanence of correspondence.

That our data is used by large companies for psychological manipulation ­– we call this advertising –­ is well known. So is its use by governments for law enforcement and, depending on the country, social control. What made the news over the past year were demonstrations of how vulnerable all of this data is to hackers and the effects of having it hacked, copied, and then published online. We call this doxing.

Doxing isn’t new, but it has become more common. It’s been perpetrated against corporations, law firms, individuals, the NSA and — just this week — the CIA. It’s largely harassment and not whistleblowing, and it’s not going to change anytime soon. The data in your computer and in the cloud are, and will continue to be, vulnerable to hacking and publishing online. Depending on your prominence and the details of this data, you may need some new strategies to secure your private life.

There are two basic ways hackers can get at your e-mail and private documents. One way is to guess your password. That’s how hackers got their hands on personal photos of celebrities from iCloud in 2014.

How to protect yourself from this attack is pretty obvious. First, don’t choose a guessable password. This is more than not using “password1” or “qwerty”; most easily memorizable passwords are guessable. My advice is to generate passwords you have to remember by using either the XKCD scheme or the Schneier scheme, and to use large random passwords stored in a password manager for everything else.

Second, turn on two-factor authentication where you can, like Google’s 2-Step Verification. This adds another step besides just entering a password, such as having to type in a one-time code that’s sent to your mobile phone. And third, don’t reuse the same password on any sites you actually care about.

You’re not done, though. Hackers have accessed accounts by exploiting the “secret question” feature and resetting the password. That was how Sarah Palin’s e-mail account was hacked in 2008. The problem with secret questions is that they’re not very secret and not very random. My advice is to refuse to use those features. Type randomness into your keyboard, or choose a really random answer and store it in your password manager.

Finally, you also have to stay alert to phishing attacks, where a hacker sends you an enticing e-mail with a link that sends you to a web page that looks almost like the expected page, but which actually isn’t. This sort of thing can bypass two-factor authentication, and is almost certainly what tricked John Podesta and Colin Powell.

The other way hackers can get at your personal stuff is by breaking in to the computers the information is stored on. This is how the Russians got into the Democratic National Committee’s network and how a lone hacker got into the Panamanian law firm Mossack Fonseca. Sometimes individuals are targeted, as when China hacked Google in 2010 to access the e-mail accounts of human rights activists. Sometimes the whole network is the target, and individuals are inadvertent victims, as when thousands of Sony employees had their e-mails published by North Korea in 2014.

Protecting yourself is difficult, because it often doesn’t matter what you do. If your e-mail is stored with a service provider in the cloud, what matters is the security of that network and that provider. Most users have no control over that part of the system. The only way to truly protect yourself is to not keep your data in the cloud where someone could get to it. This is hard. We like the fact that all of our e-mail is stored on a server somewhere and that we can instantly search it. But that convenience comes with risk. Consider deleting old e-mail, or at least downloading it and storing it offline on a portable hard drive. In fact, storing data offline is one of the best things you can do to protect it from being hacked and exposed. If it’s on your computer, what matters is the security of your operating system and network, not the security of your service provider.

Consider this for files on your own computer. The more things you can move offline, the safer you’ll be.

E-mail, no matter how you store it, is vulnerable. If you’re worried about your conversations becoming public, think about an encrypted chat program instead, such as Signal, WhatsApp or Off-the-Record Messaging. Consider using communications systems that don’t save everything by default.

None of this is perfect, of course. Portable hard drives are vulnerable when you connect them to your computer. There are ways to jump air gaps and access data on computers not connected to the Internet. Communications and data files you delete might still exist in backup systems somewhere — either yours or those of the various cloud providers you’re using. And always remember that there’s always another copy of any of your conversations stored with the person you’re conversing with. Even with these caveats, though, these measures will make a big difference.

When secrecy is truly paramount, go back to communications systems that are still ephemeral. Pick up the telephone and talk. Meet face to face. We don’t yet live in a world where everything is recorded and everything is saved, although that era is coming. Enjoy the last vestiges of ephemeral conversation while you still can.

This essay originally appeared in the Washington Post.

The Economics of Hybrid Cloud Storage

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/hybrid-cloud-storage-economics/

“Hybrid Cloud” has jumped into the IT vernacular over the last few years. Hybrid Cloud solutions intelligently divide processing and data storage between on-premise and off-premise resources to maximize efficiency. Businesses seamlessly extend their data processing and data storage capabilities to the cloud enabling them to manage unusual or fluctuating demands for services. More recently businesses are utilizing cloud computing and storage resources for their day-to-day operations instead of building out their own infrastructure.

Companies in the media and entertainment industry are candidates for considering the hybrid cloud as on any given day these organizations ingest and process large amounts of data in the form of video and audio files. Effectively processing and storing this data is paramount in managing the cost of a project and keeping it on schedule. Below we’ll examine the data storage aspects of the hybrid cloud when considering such a solution for a media and entertainment organization.

The Classic Storage Environment

In the media and entertainment industry, much of the video and audio collected is either never used or reviewed once and then archived. A rough estimate is that 10% of all the audio and video collected is used in the various drafts produced. That means that 90% of the data is archived: stored on the local storage systems or perhaps saved off to tape. This archived data can not be deleted until the project owner agrees, an event that can take months and sometimes years.

Using local storage to keep this archived data means you have to “overbuy” your on-premise storage to accommodate the maximum amount of data you might ever need to hold. While this allows the data to be easily accessed and restored, you have to purchase or lease substantially more storage than you really need.

As a consequence, many organizations decided to use tape storage for their archived data to reduce the need for on-premise data storage. They soon discovered the hidden costs of tape systems: ongoing tape maintenance, supply costs, and continuing personnel expenses. In addition, to recover an archived video or audio file from tape was often slow, cumbersome, and fraught with error.

Hybrid Cloud Storage

Cantemo’s Media Asset Management Portal can identify and automatically route video and audio data to a storage destination – on-premise, cloud, tape, etc. – as needed. Let’s consider a model where 20% of the data ingested is needed for the duration of a given project. The remaining 80% is evaluated and then determined that it can be archived, although we might need to access a video or audio clip at a later time. What is the best destination for the Cantemo Portal to route video and audio that optimizes both cost and access? Let’s review each of our choices: on-premise disk, tape, and cloud storage.

Data Destinations

To compare the three solutions, we’ve considered the cost of each system over a five year period for: initial purchase cost, ongoing costs and supplies, maintenance costs, personnel cost for operations, and subscription costs.

  • On-Premise Disk Storage – On-premise storage can range from a 1 petabyte NAS (Network Attached Storage) system to a multi-petabyte SAN (Storage Area Network). The cost ranges from $12/terabyte/month to $20/terabyte/month (or more). These figures assume new equipment at “street” prices where available. These systems are used for instant access to the data over a high-speed network connection. The data, or a proxy, can be altered multiple times and versioning is required.
  • Tape Storage – Typically these are LTO (Linear Tape-Open) systems with a minimum of two local tape systems, operational costs, etc. The data is stored, typically in batch mode, and accessed infrequently. The tapes can be stored on-site or off-site. Off-site storage costs more. The cost for LTO tape ranges from $7/terabyte/month to $10/terabyte/month, with much of that being the ongoing operational costs. The design includes one incremental tape per day, 2-week retention, first week on-site, second week off-site, with weekly pickup/drop-off. Also included are weekly, monthly, and yearly full backups, rotated on/off site as needed for tape testing, data recovery, etc.
  • Cloud Storage – The cost of cloud storage has come down over the last few years and currently ranges from $5/terabyte/month to $25/terabyte/month for storage depending on the vendor. Video and audio stored in cloud storage is typically easy to locate and readily available for recovery if needed. In most cases, there are minimal operational costs as, for example, the Cantemo Portal software is designed to locate and recover files that are required, but not present on the on-premise storage system.

Of course, a given organization will have their own costs, but in general they should fall within the ranges noted above.

Comparing Storage Costs

In comparing costs of the different methods noted above, we’ll present three scenarios. For each scenario we’ll use data storage amounts of 100 terabytes, 1 petabyte, and 2 petabytes. Each table is the same format, all we’ve done is change how the data is distributed: on-premise, tape, or cloud. The math can be adapted for any set of numbers you wish to use.

SCENARIO 1 – 100% of data is in on-premise storage

Scenario 1Data StoredData StoredData Stored
Data stored On-Premise: 100%100 TB1,000 TB2,000 TB
On-premise cost rangeMonthly CostMonthly CostMonthly Cost
Low – $12/TB/Month$1,200$12,000$24,000
High – $20/TB/Month$2,000$20,000$40,000

SCENARIO 2 – 20% of data is in on-premise storage and 80% of data is on LTO Tape

Scenario 2Data StoredData StoredData Stored
Data stored On-Premise: 20%20 TB200 TB400 TB
Data stored Tape: 80%80 TB800 TB1,600 TB
On-premise cost rangeMonthly CostMonthly CostMonthly Cost
Low – $12/TB/Month$240$2,400$4,800
High – $20/TB/Month$400$4,000$8,000
LTO Tape cost rangeMonthly CostMonthly CostMonthly Cost
Low – $7/TB/Month$560$5,600$11,200
High – $10/TB/Month$800$8,000$16,000
TOTAL Cost of Scenario 2Monthly CostMonthly CostMonthly Cost
Low$800$8,000$16,000
High$1,200$12,000$24,000

Using tape to store 80% of the data can reduce the cost 33% over just using on-premise data storage.

SCENARIO 3 – 20% of data is in on-premise storage and 80% of data is in cloud storage

Scenario 3Data StoredData StoredData Stored
Data stored On-Premise: 20%20 TB200 TB400 TB
Data stored in Cloud: 80%80 TB800 TB1,600 TB
On-premise cost rangeMonthly CostMonthly CostMonthly Cost
Low – $12/TB/Month$240$2,400$4,800
High – $20/TB/Month$400$4,000$8,000
LTO Tape cost rangeMonthly CostMonthly CostMonthly Cost
Low – $5/TB/Month$400$4,000$8,000
High – $25/TB/Month$2,000$20,000$40,000
TOTAL Cost of Scenario 3Monthly CostMonthly CostMonthly Cost
Low$640$6,400$12,800
High$2,400$24,000$48,000

Storing 80% of the data in the cloud can lead a 46% savings on the low end, but could actually be more expensive depending on the vendor selected.

Separate the Costs

Often, cloud storage costs are combined with cloud computing costs in the Hybrid Cloud model, thus hiding the true cost of the cloud storage, perhaps, until it’s too late. The savings gained by using cloud computing services a few times a day may be completely offset by the high cost of cloud storage, which you would be using the entire time. Here are some recommendations.

  1. Ask to have your Hybrid Cloud costs broken out into computing and storage costs, it should be clear what you are paying for each service.
  2. Consider moving the cloud data storage cost to a low cost provider such as Backblaze B2 Cloud Storage, which charges only $5/terabyte/month for cloud storage. This is particularly useful for archived data that still needs to be accessible as Backblaze cloud storage is readily available.
  3. If compute, data distribution, and data archiving services are required, the Cantemo Portal allows you to designate different cloud storage vendors depending on the usage. For example, data requiring computing services can be stored with Amazon S3 and data designated for archiving can be stored in Backblaze. This allows you optimize access, while minimizing costs.

Considering Hybrid Data Storage

Today, most companies in the Media and Entertainment industry have large amounts of data. The hybrid cloud has the potential to change how the industry does business by moving to cloud-based platforms that allow for global collaboration around the clock. In these scenarios, the amount of data created and stored will be staggering, even by today’s standards. As a consequence, it will be paramount for you to know the most cost efficient way to store and access your data.

The latest version of Cantemo Portal includes native integration to Backblaze B2 Cloud Storage, making it easy to create custom rules for archiving to the cloud and access archived files when needed.

(Author’s note: I used on-premise throughout this document as it is the common vernacular used in the tech industry. Apologies to those grammatically offended.)

The post The Economics of Hybrid Cloud Storage appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Security and the Internet of Things

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/02/security_and_th.html

Last year, on October 21, your digital video recorder ­- or at least a DVR like yours ­- knocked Twitter off the internet. Someone used your DVR, along with millions of insecure webcams, routers, and other connected devices, to launch an attack that started a chain reaction, resulting in Twitter, Reddit, Netflix, and many sites going off the internet. You probably didn’t realize that your DVR had that kind of power. But it does.

All computers are hackable. This has as much to do with the computer market as it does with the technologies. We prefer our software full of features and inexpensive, at the expense of security and reliability. That your computer can affect the security of Twitter is a market failure. The industry is filled with market failures that, until now, have been largely ignorable. As computers continue to permeate our homes, cars, businesses, these market failures will no longer be tolerable. Our only solution will be regulation, and that regulation will be foisted on us by a government desperate to “do something” in the face of disaster.

In this article I want to outline the problems, both technical and political, and point to some regulatory solutions. Regulation might be a dirty word in today’s political climate, but security is the exception to our small-government bias. And as the threats posed by computers become greater and more catastrophic, regulation will be inevitable. So now’s the time to start thinking about it.

We also need to reverse the trend to connect everything to the internet. And if we risk harm and even death, we need to think twice about what we connect and what we deliberately leave uncomputerized.

If we get this wrong, the computer industry will look like the pharmaceutical industry, or the aircraft industry. But if we get this right, we can maintain the innovative environment of the internet that has given us so much.

**********

We no longer have things with computers embedded in them. We have computers with things attached to them.

Your modern refrigerator is a computer that keeps things cold. Your oven, similarly, is a computer that makes things hot. An ATM is a computer with money inside. Your car is no longer a mechanical device with some computers inside; it’s a computer with four wheels and an engine. Actually, it’s a distributed system of over 100 computers with four wheels and an engine. And, of course, your phones became full-power general-purpose computers in 2007, when the iPhone was introduced.

We wear computers: fitness trackers and computer-enabled medical devices ­- and, of course, we carry our smartphones everywhere. Our homes have smart thermostats, smart appliances, smart door locks, even smart light bulbs. At work, many of those same smart devices are networked together with CCTV cameras, sensors that detect customer movements, and everything else. Cities are starting to embed smart sensors in roads, streetlights, and sidewalk squares, also smart energy grids and smart transportation networks. A nuclear power plant is really just a computer that produces electricity, and ­- like everything else we’ve just listed -­ it’s on the internet.

The internet is no longer a web that we connect to. Instead, it’s a computerized, networked, and interconnected world that we live in. This is the future, and what we’re calling the Internet of Things.

Broadly speaking, the Internet of Things has three parts. There are the sensors that collect data about us and our environment: smart thermostats, street and highway sensors, and those ubiquitous smartphones with their motion sensors and GPS location receivers. Then there are the “smarts” that figure out what the data means and what to do about it. This includes all the computer processors on these devices and ­- increasingly ­- in the cloud, as well as the memory that stores all of this information. And finally, there are the actuators that affect our environment. The point of a smart thermostat isn’t to record the temperature; it’s to control the furnace and the air conditioner. Driverless cars collect data about the road and the environment to steer themselves safely to their destinations.

You can think of the sensors as the eyes and ears of the internet. You can think of the actuators as the hands and feet of the internet. And you can think of the stuff in the middle as the brain. We are building an internet that senses, thinks, and acts.

This is the classic definition of a robot. We’re building a world-size robot, and we don’t even realize it.

To be sure, it’s not a robot in the classical sense. We think of robots as discrete autonomous entities, with sensors, brain, and actuators all together in a metal shell. The world-size robot is distributed. It doesn’t have a singular body, and parts of it are controlled in different ways by different people. It doesn’t have a central brain, and it has nothing even remotely resembling a consciousness. It doesn’t have a single goal or focus. It’s not even something we deliberately designed. It’s something we have inadvertently built out of the everyday objects we live with and take for granted. It is the extension of our computers and networks into the real world.

This world-size robot is actually more than the Internet of Things. It’s a combination of several decades-old computing trends: mobile computing, cloud computing, always-on computing, huge databases of personal information, the Internet of Things ­- or, more precisely, cyber-physical systems ­- autonomy, and artificial intelligence. And while it’s still not very smart, it’ll get smarter. It’ll get more powerful and more capable through all the interconnections we’re building.

It’ll also get much more dangerous.

**********

Computer security has been around for almost as long as computers have been. And while it’s true that security wasn’t part of the design of the original internet, it’s something we have been trying to achieve since its beginning.

I have been working in computer security for over 30 years: first in cryptography, then more generally in computer and network security, and now in general security technology. I have watched computers become ubiquitous, and have seen firsthand the problems ­- and solutions ­- of securing these complex machines and systems. I’m telling you all this because what used to be a specialized area of expertise now affects everything. Computer security is now everything security. There’s one critical difference, though: The threats have become greater.

Traditionally, computer security is divided into three categories: confidentiality, integrity, and availability. For the most part, our security concerns have largely centered around confidentiality. We’re concerned about our data and who has access to it ­- the world of privacy and surveillance, of data theft and misuse.

But threats come in many forms. Availability threats: computer viruses that delete our data, or ransomware that encrypts our data and demands payment for the unlock key. Integrity threats: hackers who can manipulate data entries can do things ranging from changing grades in a class to changing the amount of money in bank accounts. Some of these threats are pretty bad. Hospitals have paid tens of thousands of dollars to criminals whose ransomware encrypted critical medical files. JPMorgan Chase spends half a billion on cybersecurity a year.

Today, the integrity and availability threats are much worse than the confidentiality threats. Once computers start affecting the world in a direct and physical manner, there are real risks to life and property. There is a fundamental difference between crashing your computer and losing your spreadsheet data, and crashing your pacemaker and losing your life. This isn’t hyperbole; recently researchers found serious security vulnerabilities in St. Jude Medical’s implantable heart devices. Give the internet hands and feet, and it will have the ability to punch and kick.

Take a concrete example: modern cars, those computers on wheels. The steering wheel no longer turns the axles, nor does the accelerator pedal change the speed. Every move you make in a car is processed by a computer, which does the actual controlling. A central computer controls the dashboard. There’s another in the radio. The engine has 20 or so computers. These are all networked, and increasingly autonomous.

Now, let’s start listing the security threats. We don’t want car navigation systems to be used for mass surveillance, or the microphone for mass eavesdropping. We might want it to be used to determine a car’s location in the event of a 911 call, and possibly to collect information about highway congestion. We don’t want people to hack their own cars to bypass emissions-control limitations. We don’t want manufacturers or dealers to be able to do that, either, as Volkswagen did for years. We can imagine wanting to give police the ability to remotely and safely disable a moving car; that would make high-speed chases a thing of the past. But we definitely don’t want hackers to be able to do that. We definitely don’t want them disabling the brakes in every car without warning, at speed. As we make the transition from driver-controlled cars to cars with various driver-assist capabilities to fully driverless cars, we don’t want any of those critical components subverted. We don’t want someone to be able to accidentally crash your car, let alone do it on purpose. And equally, we don’t want them to be able to manipulate the navigation software to change your route, or the door-lock controls to prevent you from opening the door. I could go on.

That’s a lot of different security requirements, and the effects of getting them wrong range from illegal surveillance to extortion by ransomware to mass death.

**********

Our computers and smartphones are as secure as they are because companies like Microsoft, Apple, and Google spend a lot of time testing their code before it’s released, and quickly patch vulnerabilities when they’re discovered. Those companies can support large, dedicated teams because those companies make a huge amount of money, either directly or indirectly, from their software ­ and, in part, compete on its security. Unfortunately, this isn’t true of embedded systems like digital video recorders or home routers. Those systems are sold at a much lower margin, and are often built by offshore third parties. The companies involved simply don’t have the expertise to make them secure.

At a recent hacker conference, a security researcher analyzed 30 home routers and was able to break into half of them, including some of the most popular and common brands. The denial-of-service attacks that forced popular websites like Reddit and Twitter off the internet last October were enabled by vulnerabilities in devices like webcams and digital video recorders. In August, two security researchers demonstrated a ransomware attack on a smart thermostat.

Even worse, most of these devices don’t have any way to be patched. Companies like Microsoft and Apple continuously deliver security patches to your computers. Some home routers are technically patchable, but in a complicated way that only an expert would attempt. And the only way for you to update the firmware in your hackable DVR is to throw it away and buy a new one.

The market can’t fix this because neither the buyer nor the seller cares. The owners of the webcams and DVRs used in the denial-of-service attacks don’t care. Their devices were cheap to buy, they still work, and they don’t know any of the victims of the attacks. The sellers of those devices don’t care: They’re now selling newer and better models, and the original buyers only cared about price and features. There is no market solution, because the insecurity is what economists call an externality: It’s an effect of the purchasing decision that affects other people. Think of it kind of like invisible pollution.

**********

Security is an arms race between attacker and defender. Technology perturbs that arms race by changing the balance between attacker and defender. Understanding how this arms race has unfolded on the internet is essential to understanding why the world-size robot we’re building is so insecure, and how we might secure it. To that end, I have five truisms, born from what we’ve already learned about computer and internet security. They will soon affect the security arms race everywhere.

Truism No. 1: On the internet, attack is easier than defense.

There are many reasons for this, but the most important is the complexity of these systems. More complexity means more people involved, more parts, more interactions, more mistakes in the design and development process, more of everything where hidden insecurities can be found. Computer-security experts like to speak about the attack surface of a system: all the possible points an attacker might target and that must be secured. A complex system means a large attack surface. The defender has to secure the entire attack surface. The attacker just has to find one vulnerability ­- one unsecured avenue for attack -­ and gets to choose how and when to attack. It’s simply not a fair battle.

There are other, more general, reasons why attack is easier than defense. Attackers have a natural agility that defenders often lack. They don’t have to worry about laws, and often not about morals or ethics. They don’t have a bureaucracy to contend with, and can more quickly make use of technical innovations. Attackers also have a first-mover advantage. As a society, we’re generally terrible at proactive security; we rarely take preventive security measures until an attack actually happens. So more advantages go to the attacker.

Truism No. 2: Most software is poorly written and insecure.

If complexity isn’t enough, we compound the problem by producing lousy software. Well-written software, like the kind found in airplane avionics, is both expensive and time-consuming to produce. We don’t want that. For the most part, poorly written software has been good enough. We’d all rather live with buggy software than pay the prices good software would require. We don’t mind if our games crash regularly, or our business applications act weird once in a while. Because software has been largely benign, it hasn’t mattered. This has permeated the industry at all levels. At universities, we don’t teach how to code well. Companies don’t reward quality code in the same way they reward fast and cheap. And we consumers don’t demand it.

But poorly written software is riddled with bugs, sometimes as many as one per 1,000 lines of code. Some of them are inherent in the complexity of the software, but most are programming mistakes. Not all bugs are vulnerabilities, but some are.

Truism No. 3: Connecting everything to each other via the internet will expose new vulnerabilities.

The more we network things together, the more vulnerabilities on one thing will affect other things. On October 21, vulnerabilities in a wide variety of embedded devices were all harnessed together to create what hackers call a botnet. This botnet was used to launch a distributed denial-of-service attack against a company called Dyn. Dyn provided a critical internet function for many major internet sites. So when Dyn went down, so did all those popular websites.

These chains of vulnerabilities are everywhere. In 2012, journalist Mat Honan suffered a massive personal hack because of one of them. A vulnerability in his Amazon account allowed hackers to get into his Apple account, which allowed them to get into his Gmail account. And in 2013, the Target Corporation was hacked by someone stealing credentials from its HVAC contractor.

Vulnerabilities like these are particularly hard to fix, because no one system might actually be at fault. It might be the insecure interaction of two individually secure systems.

Truism No. 4: Everybody has to stop the best attackers in the world.

One of the most powerful properties of the internet is that it allows things to scale. This is true for our ability to access data or control systems or do any of the cool things we use the internet for, but it’s also true for attacks. In general, fewer attackers can do more damage because of better technology. It’s not just that these modern attackers are more efficient, it’s that the internet allows attacks to scale to a degree impossible without computers and networks.

This is fundamentally different from what we’re used to. When securing my home against burglars, I am only worried about the burglars who live close enough to my home to consider robbing me. The internet is different. When I think about the security of my network, I have to be concerned about the best attacker possible, because he’s the one who’s going to create the attack tool that everyone else will use. The attacker that discovered the vulnerability used to attack Dyn released the code to the world, and within a week there were a dozen attack tools using it.

Truism No. 5: Laws inhibit security research.

The Digital Millennium Copyright Act is a terrible law that fails at its purpose of preventing widespread piracy of movies and music. To make matters worse, it contains a provision that has critical side effects. According to the law, it is a crime to bypass security mechanisms that protect copyrighted work, even if that bypassing would otherwise be legal. Since all software can be copyrighted, it is arguably illegal to do security research on these devices and to publish the result.

Although the exact contours of the law are arguable, many companies are using this provision of the DMCA to threaten researchers who expose vulnerabilities in their embedded systems. This instills fear in researchers, and has a chilling effect on research, which means two things: (1) Vendors of these devices are more likely to leave them insecure, because no one will notice and they won’t be penalized in the market, and (2) security engineers don’t learn how to do security better.
Unfortunately, companies generally like the DMCA. The provisions against reverse-engineering spare them the embarrassment of having their shoddy security exposed. It also allows them to build proprietary systems that lock out competition. (This is an important one. Right now, your toaster cannot force you to only buy a particular brand of bread. But because of this law and an embedded computer, your Keurig coffee maker can force you to buy a particular brand of coffee.)

**********
In general, there are two basic paradigms of security. We can either try to secure something well the first time, or we can make our security agile. The first paradigm comes from the world of dangerous things: from planes, medical devices, buildings. It’s the paradigm that gives us secure design and secure engineering, security testing and certifications, professional licensing, detailed preplanning and complex government approvals, and long times-to-market. It’s security for a world where getting it right is paramount because getting it wrong means people dying.

The second paradigm comes from the fast-moving and heretofore largely benign world of software. In this paradigm, we have rapid prototyping, on-the-fly updates, and continual improvement. In this paradigm, new vulnerabilities are discovered all the time and security disasters regularly happen. Here, we stress survivability, recoverability, mitigation, adaptability, and muddling through. This is security for a world where getting it wrong is okay, as long as you can respond fast enough.

These two worlds are colliding. They’re colliding in our cars -­ literally -­ in our medical devices, our building control systems, our traffic control systems, and our voting machines. And although these paradigms are wildly different and largely incompatible, we need to figure out how to make them work together.

So far, we haven’t done very well. We still largely rely on the first paradigm for the dangerous computers in cars, airplanes, and medical devices. As a result, there are medical systems that can’t have security patches installed because that would invalidate their government approval. In 2015, Chrysler recalled 1.4 million cars to fix a software vulnerability. In September 2016, Tesla remotely sent a security patch to all of its Model S cars overnight. Tesla sure sounds like it’s doing things right, but what vulnerabilities does this remote patch feature open up?

**********
Until now we’ve largely left computer security to the market. Because the computer and network products we buy and use are so lousy, an enormous after-market industry in computer security has emerged. Governments, companies, and people buy the security they think they need to secure themselves. We’ve muddled through well enough, but the market failures inherent in trying to secure this world-size robot will soon become too big to ignore.

Markets alone can’t solve our security problems. Markets are motivated by profit and short-term goals at the expense of society. They can’t solve collective-action problems. They won’t be able to deal with economic externalities, like the vulnerabilities in DVRs that resulted in Twitter going offline. And we need a counterbalancing force to corporate power.

This all points to policy. While the details of any computer-security system are technical, getting the technologies broadly deployed is a problem that spans law, economics, psychology, and sociology. And getting the policy right is just as important as getting the technology right because, for internet security to work, law and technology have to work together. This is probably the most important lesson of Edward Snowden’s NSA disclosures. We already knew that technology can subvert law. Snowden demonstrated that law can also subvert technology. Both fail unless each work. It’s not enough to just let technology do its thing.

Any policy changes to secure this world-size robot will mean significant government regulation. I know it’s a sullied concept in today’s world, but I don’t see any other possible solution. It’s going to be especially difficult on the internet, where its permissionless nature is one of the best things about it and the underpinning of its most world-changing innovations. But I don’t see how that can continue when the internet can affect the world in a direct and physical manner.

**********

I have a proposal: a new government regulatory agency. Before dismissing it out of hand, please hear me out.

We have a practical problem when it comes to internet regulation. There’s no government structure to tackle this at a systemic level. Instead, there’s a fundamental mismatch between the way government works and the way this technology works that makes dealing with this problem impossible at the moment.

Government operates in silos. In the U.S., the FAA regulates aircraft. The NHTSA regulates cars. The FDA regulates medical devices. The FCC regulates communications devices. The FTC protects consumers in the face of “unfair” or “deceptive” trade practices. Even worse, who regulates data can depend on how it is used. If data is used to influence a voter, it’s the Federal Election Commission’s jurisdiction. If that same data is used to influence a consumer, it’s the FTC’s. Use those same technologies in a school, and the Department of Education is now in charge. Robotics will have its own set of problems, and no one is sure how that is going to be regulated. Each agency has a different approach and different rules. They have no expertise in these new issues, and they are not quick to expand their authority for all sorts of reasons.

Compare that with the internet. The internet is a freewheeling system of integrated objects and networks. It grows horizontally, demolishing old technological barriers so that people and systems that never previously communicated now can. Already, apps on a smartphone can log health information, control your energy use, and communicate with your car. That’s a set of functions that crosses jurisdictions of at least four different government agencies, and it’s only going to get worse.

Our world-size robot needs to be viewed as a single entity with millions of components interacting with each other. Any solutions here need to be holistic. They need to work everywhere, for everything. Whether we’re talking about cars, drones, or phones, they’re all computers.

This has lots of precedent. Many new technologies have led to the formation of new government regulatory agencies. Trains did, cars did, airplanes did. Radio led to the formation of the Federal Radio Commission, which became the FCC. Nuclear power led to the formation of the Atomic Energy Commission, which eventually became the Department of Energy. The reasons were the same in every case. New technologies need new expertise because they bring with them new challenges. Governments need a single agency to house that new expertise, because its applications cut across several preexisting agencies. It’s less that the new agency needs to regulate -­ although that’s often a big part of it -­ and more that governments recognize the importance of the new technologies.

The internet has famously eschewed formal regulation, instead adopting a multi-stakeholder model of academics, businesses, governments, and other interested parties. My hope is that we can keep the best of this approach in any regulatory agency, looking more at the new U.S. Digital Service or the 18F office inside the General Services Administration. Both of those organizations are dedicated to providing digital government services, and both have collected significant expertise by bringing people in from outside of government, and both have learned how to work closely with existing agencies. Any internet regulatory agency will similarly need to engage in a high level of collaborate regulation -­ both a challenge and an opportunity.

I don’t think any of us can predict the totality of the regulations we need to ensure the safety of this world, but here’s a few. We need government to ensure companies follow good security practices: testing, patching, secure defaults -­ and we need to be able to hold companies liable when they fail to do these things. We need government to mandate strong personal data protections, and limitations on data collection and use. We need to ensure that responsible security research is legal and well-funded. We need to enforce transparency in design, some sort of code escrow in case a company goes out of business, and interoperability between devices of different manufacturers, to counterbalance the monopolistic effects of interconnected technologies. Individuals need the right to take their data with them. And internet-enabled devices should retain some minimal functionality if disconnected from the internet

I’m not the only one talking about this. I’ve seen proposals for a National Institutes of Health analog for cybersecurity. University of Washington law professor Ryan Calo has proposed a Federal Robotics Commission. I think it needs to be broader: maybe a Department of Technology Policy.

Of course there will be problems. There’s a lack of expertise in these issues inside government. There’s a lack of willingness in government to do the hard regulatory work. Industry is worried about any new bureaucracy: both that it will stifle innovation by regulating too much and that it will be captured by industry and regulate too little. A domestic regulatory agency will have to deal with the fundamentally international nature of the problem.

But government is the entity we use to solve problems like this. Governments have the scope, scale, and balance of interests to address the problems. It’s the institution we’ve built to adjudicate competing social interests and internalize market externalities. Left to their own devices, the market simply can’t. That we’re currently in the middle of an era of low government trust, where many of us can’t imagine government doing anything positive in an area like this, is to our detriment.

Here’s the thing: Governments will get involved, regardless. The risks are too great, and the stakes are too high. Government already regulates dangerous physical systems like cars and medical devices. And nothing motivates the U.S. government like fear. Remember 2001? A nominally small-government Republican president created the Office of Homeland Security 11 days after the terrorist attacks: a rushed and ill-thought-out decision that we’ve been trying to fix for over a decade. A fatal disaster will similarly spur our government into action, and it’s unlikely to be well-considered and thoughtful action. Our choice isn’t between government involvement and no government involvement. Our choice is between smarter government involvement and stupider government involvement. We have to start thinking about this now. Regulations are necessary, important, and complex; and they’re coming. We can’t afford to ignore these issues until it’s too late.

We also need to start disconnecting systems. If we cannot secure complex systems to the level required by their real-world capabilities, then we must not build a world where everything is computerized and interconnected.

There are other models. We can enable local communications only. We can set limits on collected and stored data. We can deliberately design systems that don’t interoperate with each other. We can deliberately fetter devices, reversing the current trend of turning everything into a general-purpose computer. And, most important, we can move toward less centralization and more distributed systems, which is how the internet was first envisioned.

This might be a heresy in today’s race to network everything, but large, centralized systems are not inevitable. The technical elites are pushing us in that direction, but they really don’t have any good supporting arguments other than the profits of their ever-growing multinational corporations.

But this will change. It will change not only because of security concerns, it will also change because of political concerns. We’re starting to chafe under the worldview of everything producing data about us and what we do, and that data being available to both governments and corporations. Surveillance capitalism won’t be the business model of the internet forever. We need to change the fabric of the internet so that evil governments don’t have the tools to create a horrific totalitarian state. And while good laws and regulations in Western democracies are a great second line of defense, they can’t be our only line of defense.

My guess is that we will soon reach a high-water mark of computerization and connectivity, and that afterward we will make conscious decisions about what and how we decide to interconnect. But we’re still in the honeymoon phase of connectivity. Governments and corporations are punch-drunk on our data, and the rush to connect everything is driven by an even greater desire for power and market share. One of the presentations released by Edward Snowden contained the NSA mantra: “Collect it all.” A similar mantra for the internet today might be: “Connect it all.”

The inevitable backlash will not be driven by the market. It will be deliberate policy decisions that put the safety and welfare of society above individual corporations and industries. It will be deliberate policy decisions that prioritize the security of our systems over the demands of the FBI to weaken them in order to make their law-enforcement jobs easier. It’ll be hard policy for many to swallow, but our safety will depend on it.

**********

The scenarios I’ve outlined, both the technological and economic trends that are causing them and the political changes we need to make to start to fix them, come from my years of working in internet-security technology and policy. All of this is informed by an understanding of both technology and policy. That turns out to be critical, and there aren’t enough people who understand both.

This brings me to my final plea: We need more public-interest technologists.

Over the past couple of decades, we’ve seen examples of getting internet-security policy badly wrong. I’m thinking of the FBI’s “going dark” debate about its insistence that computer devices be designed to facilitate government access, the “vulnerability equities process” about when the government should disclose and fix a vulnerability versus when it should use it to attack other systems, the debacle over paperless touch-screen voting machines, and the DMCA that I discussed above. If you watched any of these policy debates unfold, you saw policy-makers and technologists talking past each other.

Our world-size robot will exacerbate these problems. The historical divide between Washington and Silicon Valley -­ the mistrust of governments by tech companies and the mistrust of tech companies by governments ­- is dangerous.

We have to fix this. Getting IoT security right depends on the two sides working together and, even more important, having people who are experts in each working on both. We need technologists to get involved in policy, and we need policy-makers to get involved in technology. We need people who are experts in making both technology and technological policy. We need technologists on congressional staffs, inside federal agencies, working for NGOs, and as part of the press. We need to create a viable career path for public-interest technologists, much as there already is one for public-interest attorneys. We need courses, and degree programs in colleges, for people interested in careers in public-interest technology. We need fellowships in organizations that need these people. We need technology companies to offer sabbaticals for technologists wanting to go down this path. We need an entire ecosystem that supports people bridging the gap between technology and law. We need a viable career path that ensures that even though people in this field won’t make as much as they would in a high-tech start-up, they will have viable careers. The security of our computerized and networked future ­ meaning the security of ourselves, families, homes, businesses, and communities ­ depends on it.

This plea is bigger than security, actually. Pretty much all of the major policy debates of this century will have a major technological component. Whether it’s weapons of mass destruction, robots drastically affecting employment, climate change, food safety, or the increasing ubiquity of ever-shrinking drones, understanding the policy means understanding the technology. Our society desperately needs technologists working on the policy. The alternative is bad policy.

**********

The world-size robot is less designed than created. It’s coming without any forethought or architecting or planning; most of us are completely unaware of what we’re building. In fact, I am not convinced we can actually design any of this. When we try to design complex sociotechnical systems like this, we are regularly surprised by their emergent properties. The best we can do is observe and channel these properties as best we can.

Market thinking sometimes makes us lose sight of the human choices and autonomy at stake. Before we get controlled ­ or killed ­ by the world-size robot, we need to rebuild confidence in our collective governance institutions. Law and policy may not seem as cool as digital tech, but they’re also places of critical innovation. They’re where we collectively bring about the world we want to live in.

While I might sound like a Cassandra, I’m actually optimistic about our future. Our society has tackled bigger problems than this one. It takes work and it’s not easy, but we eventually find our way clear to make the hard choices necessary to solve our real problems.

The world-size robot we’re building can only be managed responsibly if we start making real choices about the interconnected world we live in. Yes, we need security systems as robust as the threat landscape. But we also need laws that effectively regulate these dangerous technologies. And, more generally, we need to make moral, ethical, and political decisions on how those systems should work. Until now, we’ve largely left the internet alone. We gave programmers a special right to code cyberspace as they saw fit. This was okay because cyberspace was separate and relatively unimportant: That is, it didn’t matter. Now that that’s changed, we can no longer give programmers and the companies they work for this power. Those moral, ethical, and political decisions need, somehow, to be made by everybody. We need to link people with the same zeal that we are currently linking machines. “Connect it all” must be countered with “connect us all.”

This essay previously appeared in New York Magazine.

Five Mistakes Everyone Makes With Cloud Backup

Post Syndicated from Peter Cohen original https://www.backblaze.com/blog/5-common-cloud-backup-mistakes/

cloud backup error

Cloud-based storage and file sync services are ubiquitous: Everywhere we turn new services pop up (and often shut down), promising free or low-cost storage of everything and anything on our computers and mobile devices.

When you depend on the cloud it’s very easy to get lulled into a false sense of security. Don’t. Here are five common mistakes all of us make with cloud backup and sync services. I’ve added suggestions for how to avoid these pitfalls.

Assuming the Cloud Is Backing Things Up

“I have iCloud or Google Drive, so everything’s backed up.”

Some cloud backup and file sync services make it really easy to put files online, but they may not be all the files you need. Don’t just assume the cloud services you use are doing a complete backup of your device – check to see what is actually being backed up. The services you use may only back up specific folders or directories on your computer’s hard drive.

Read this for more info on how Backblaze backs up.

There’s a big difference between file backup services and sync services, by the way. Which brings me to my next point:

Confusing Sync for Backup

“I don’t need backup. I’ve got my files synced.”

Sync service enables you to keep consistent contents between multiple devices – think Dropbox or iCloud Drive, for example. Make one change to the contents of that shared info, and the same thing happens across all devices, including file changes and deletions. Depending on how you have syncing and sharing set-up you can delete a file on one device and have it disappear on all the other shared devices.

I’ve also found it handy to have a backup service that enables you to restore multiple versions. In point of fact, Dropbox lets you restore previous versions. Apple’s Time Machine, built into the Mac, does this too. So does Backblaze (we keep track of multiple versions up to 30 days). Not to say you shouldn’t use Dropbox, we do! We wrote about how we are complimentary services.

Thinking One Backup Is Enough

“Hey, I’m backing up to the cloud. That’s better than nothing, right?”

It’s better than nothing but it’s not enough. You want a local backup too. That’s why I recommend a 3-2-1 Backup strategy. In addition to the “live” copy of the data on your hard drive, make sure you have a local backup, and use the cloud for offsite storage. Likewise, if you’re only storing data on a local backup, you’re putting all your eggs in that basket. Add offsite backup to complete your backup strategy. Conversely, if you only store your data in the cloud, you’re susceptible to those services being down as well. So having a local copy can keep you productive even if your favorite service is temporarily down.

Leaving Things Insecure

“I’m not backing up anything important enough for hackers to bother with.”

With identity theft on the rise, the security of all of your data online should be paramount. Strong encryption is important, so make sure it’s supported by the services you depend on.

Even if a bad actor doesn’t want your data, they still may want your computer for nefarious purposes, like driving a botnet used to launch a DDOS (Distributed Denial of Service) attack. That’s exactly what recently happened to Dyn, a company that provides core Internet services for other popular Internet services like Twitter and Spotify.

Make sure to protect your computer with strong passwords, practice safe surfing and keep your computer updated with the latest software. Also check periodically for malware and get rid of it when you find it.

Thinking That it’s Taken Care Of

“I have a backup strategy in place, so I don’t have to think about it anymore.”

I think it’s wise to observe an old aphorism: “Trust but verify.”

There’s absolutely nothing wrong with developing an automated backup strategy. But it’s vitally important to periodically test your backups to make sure they’re doing what they’re supposed to.

You should test your most important, mission-critical data first. Tax returns? Important legal documents? Irreplaceable baby pictures? Make sure the files that are important to you are retrievable and intact by actually trying to recover them. Find out more about how to test your backup.

Backblaze too. Test all your backups – we even recommend it in our Best Practices.

Got more cloud backup myths to bust? Share them with me in the comments!

 

The post Five Mistakes Everyone Makes With Cloud Backup appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Real-time Clickstream Anomaly Detection with Amazon Kinesis Analytics

Post Syndicated from Chris Marshall original https://blogs.aws.amazon.com/bigdata/post/Tx1XNQPQ2ARGT81/Real-time-Clickstream-Anomaly-Detection-with-Amazon-Kinesis-Analytics

Chris Marshall is a Solutions Architect for Amazon Web Services

Analyzing web log traffic to gain insights that drive business decisions has historically been performed using batch processing.  While effective, this approach results in delayed responses to emerging trends and user activities.  There are solutions to deal with processing data in real time using streaming and micro-batching technologies, but they can be complex to set up and maintain.  Amazon Kinesis Analytics is a managed service that makes it very easy to identify and respond to changes in behavior in real-time.   

One use case where it’s valuable to have immediate insights is analyzing clickstream data.   In the world of digital advertising, an impression is when an ad is displayed in a browser and a clickthrough represents a user clicking on that ad.  A clickthrough rate (CTR) is one way to monitor the ad’s effectiveness.  CTR is calculated in the form of: CTR = Clicks / Impressions * 100.  Digital marketers are interested in monitoring CTR to know when particular ads perform better than normal, giving them a chance to optimize placements within the ad campaign.  They may also be interested in anomalous low-end CTR that could be a result of a bad image or bidding model.

In this post, I show an analytics pipeline which detects anomalies in real time for a web traffic stream, using the RANDOM_CUT_FOREST function available in Amazon Kinesis Analytics. 

RANDOM_CUT_FOREST 

Amazon Kinesis Analytics includes a powerful set of analytics functions to analyze streams of data.  One such function is RANDOM_CUT_FOREST.  This function detects anomalies by scoring data flowing through a dynamic data stream. This novel approach identifies a normal pattern in streaming data, then compares new data points in reference to it. For more information, see Robust Random Cut Forest Based Anomaly Detection On Streams.

Analytics pipeline components

To demonstrate how the RANDOM_CUT_FOREST function can be used to detect anomalies in real-time click through rates, I will walk you through how to build an analytics pipeline and generate web traffic using a simple Python script.  When your injected anomaly is detected, you get an email or SMS message to alert you to the event. 

This post first walks through the components of the analytics pipeline, then discusses how to build and test it in your own AWS account.  The accompanying AWS CloudFormation script builds the Amazon API Gateway API, Amazon Kinesis streams, AWS Lambda function, Amazon SNS components, and IAM roles required for this walkthrough. Then you manually create the Amazon Kinesis Analytics application that uses the RANDOM_CUT_FOREST function in SQL. 

Web Client

Because Python is a popular scripting language available on many clients, we’ll use it to generate web impressions and click data by making HTTP GET requests with the requests library.  This script mimics beacon traffic generated by web browsers.  The GET request contains a query string value of click or impression to indicate the type of beacon being captured.  The script has a while loop that iterates 2500 times.  For each pass through the loop, an Impression beacon is sent.  Then, the random function is used to potentially send an additional click beacon.  For most of the passes through the loop, the click request is generated at the rate of roughly 10% of the time.  However, for some web requests, the CTR jumps to a rate of 50% representing an anomaly.  These are artificially high CTR rates used for the purpose of illustration for this post, but the functionality would be the same using small fractional values

import requests
import random
import sys
import argparse

def getClicked(rate):
	if random.random() <= rate:
		return True
	else:
		return False

def httpGetImpression():
	url = args.target + '?browseraction=Impression' 
	r = requests.get(url)

def httpGetClick():
	url = args.target + '?browseraction=Click' 
	r = requests.get(url)
	sys.stdout.write('+')

parser = argparse.ArgumentParser()
parser.add_argument("target",help=" the http(s) location to send the GET request")
args = parser.parse_args()
i = 0
while (i < 2500):
	httpGetImpression()
	if(i<1950 or i>=2000):
		clicked = getClicked(.1)
		sys.stdout.write('_')
	else:
		clicked = getClicked(.5)
		sys.stdout.write('-')
	if(clicked):
		httpGetClick()
	i = i + 1
	sys.stdout.flush()

Web “front door”

Client web requests are handled by Amazon API Gateway, a fully managed service that makes it easy for developers to create, publish, maintain, monitor, and secure APIs at any scale. This handles the web request traffic to get data into your analytics pipeline by being a service proxy to an Amazon Kinesis stream.  API Gateway takes click and impression web traffic and converts the HTTP header and query string data into a JSON message and place it into your stream.  The CloudFormation script creates an API endpoint with a body mapping template similar to the following:

#set($inputRoot = $input.path('$'))
{
  "Data": "$util.base64Encode("{ ""browseraction"" : ""$input.params('browseraction')"", ""site"" : ""$input.params('Host')"" }")",
  "PartitionKey" : "$input.params('Host')",
  "StreamName" : "CSEKinesisBeaconInputStream"
}

(API Gateway service proxy body mapping template)

This tells API Gateway to take the host header and query string inputs and convert them into a payload to be put into a stream using a service proxy for Amazon Kinesis Streams.

Input stream

Amazon Kinesis Streams can capture terabytes of data per hour from thousands of sources.  In this case, you use it to deliver your JSON messages to Amazon Kinesis Analytics. 

Analytics application

Amazon Kinesis Analytics processes the incoming messages and allow you to perform analytics on the dynamic stream of data.  You use SQL to calculate the CTR and pass that into your RANDOM_CUT_FOREST function to calculate an anomaly score. 

First, calculate the counts of impressions and clicks using SQL with a tumbling window in Amazon Kinesis Analytics.  Analytics uses streams and pumps in SQL to process data.  See our earlier post to get an overview of streaming data with Kinesis Analytics. Inside the Analytics SQL, a stream is analogous to a table and a pump is the flow of data into those tables.  Here’s the description of streams for the impression and click data.

CREATE OR REPLACE STREAM "CLICKSTREAM" ( 
   "CLICKCOUNT" DOUBLE
);


CREATE OR REPLACE STREAM "IMPRESSIONSTREAM" ( 
   "IMPRESSIONCOUNT" DOUBLE
);

Later, you calculate the clicks divided by impressions; you are using a DOUBLE data type to contain the count values.  If you left them as integers, the division below would yield only a 0 or 1.  Next, you define the pumps that populate the stream using a tumbling window, a window of time during which all records received are considered as part of the statement. 

CREATE OR REPLACE PUMP "CLICKPUMP" STARTED AS 
INSERT INTO "CLICKSTREAM" ("CLICKCOUNT") 
SELECT STREAM COUNT(*) 
FROM "SOURCE_SQL_STREAM_001"
WHERE "browseraction" = 'Click'
GROUP BY FLOOR(
  ("SOURCE_SQL_STREAM_001".ROWTIME - TIMESTAMP '1970-01-01 00:00:00')
    SECOND / 10 TO SECOND
);
CREATE OR REPLACE PUMP "IMPRESSIONPUMP" STARTED AS 
INSERT INTO "IMPRESSIONSTREAM" ("IMPRESSIONCOUNT") 
SELECT STREAM COUNT(*) 
FROM "SOURCE_SQL_STREAM_001"
WHERE "browseraction" = 'Impression'
GROUP BY FLOOR(
  ("SOURCE_SQL_STREAM_001".ROWTIME - TIMESTAMP '1970-01-01 00:00:00')
    SECOND / 10 TO SECOND
);

The GROUP BY statement uses FLOOR and ROWTIME to create a tumbling window of 10 seconds.  This tumbling window will output one record every ten seconds assuming we are receiving data from the corresponding pump.  In this case, you output the total number of records that were clicks or impressions. 

Next, use the output of these pumps to calculate the CTR:

CREATE OR REPLACE STREAM "CTRSTREAM" (
  "CTR" DOUBLE
);

CREATE OR REPLACE PUMP "CTRPUMP" STARTED AS 
INSERT INTO "CTRSTREAM" ("CTR")
SELECT STREAM "CLICKCOUNT" / "IMPRESSIONCOUNT" * 100 as "CTR"
FROM "IMPRESSIONSTREAM",
  "CLICKSTREAM"
WHERE "IMPRESSIONSTREAM".ROWTIME = "CLICKSTREAM".ROWTIME;

Finally, these CTR values are used in RANDOM_CUT_FOREST to detect anomalous values.  The DESTINATION_SQL_STREAM is the output stream of data from the Amazon Kinesis Analytics application. 

CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
    "CTRPERCENT" DOUBLE,
    "ANOMALY_SCORE" DOUBLE
);

CREATE OR REPLACE PUMP "OUTPUT_PUMP" STARTED AS 
INSERT INTO "DESTINATION_SQL_STREAM" 
SELECT STREAM * FROM
TABLE (RANDOM_CUT_FOREST( 
             CURSOR(SELECT STREAM "CTR" FROM "CTRSTREAM"), --inputStream
             100, --numberOfTrees (default)
             12, --subSampleSize 
             100000, --timeDecay (default)
             1) --shingleSize (default)
)
WHERE ANOMALY_SCORE > 2; 

The RANDOM_CUT_FOREST function greatly simplifies the programming required for anomaly detection.  However, understanding your data domain is paramount when performing data analytics.  The RANDOM_CUT_FOREST function is a tool for data scientists, not a replacement for them.  Knowing whether your data is logarithmic, circadian rhythmic, linear, etc. will provide the insights necessary to select the right parameters for RANDOM_CUT_FOREST.  For more information about parameters, see the RANDOM_CUT_FOREST Function

Fortunately, the default values work in a wide variety of cases. In this case, use the default values for all but the subSampleSize parameter.  Typically, you would use a larger sample size to increase the pool of random samples used to calculate the anomaly score; for this post, use 12 samples so as to start evaluating the anomaly scores sooner.  

Your SQL query outputs one record every ten seconds from the tumbling window so you’ll have enough evaluation values after two minutes to start calculating the anomaly score.  You are also using a cutoff value where records are only output to “DESTINATION_SQL_STREAM” if the anomaly score from the function is greater than 2 using the WHERE clause. To help visualize the cutoff point, here are the data points from a few runs through the pipeline using the sample Python script:

As you can see, the majority of the CTR data points are grouped around a 10% rate.  As the values move away from the 10% rate, their anomaly scores go up because they are more infrequent.  In the graph, an anomaly score above the cutoff is shaded orange. A cutoff value of 2 works well for this data set.  

Output stream

The Amazon Kinesis Analytics application outputs the values from “DESTINATION_SQL_STREAM” to an Amazon Kinesis stream when the record has an anomaly score greater than 2.

Anomaly execution function

The output stream is an input event for an AWS Lambda function with a batch size of 1, so the function gets called one time for each message put in the output stream.  This function represents a place where you could react to anomalous events.  In a real world scenario, you may want to instead update a real-time bidding system to react to current events.  In this example, you send a message to SNS to see a tangible response to the changing CTR.  The CloudFormation template creates a Lambda function similar to the following:

var AWS = require('aws-sdk');
var sns = new AWS.SNS( { region: "<region>" });
	exports.handler = function(event, context) {
	//our batch size is 1 record so loop expected to process only once
	event.Records.forEach(function(record) {
		// Kinesis data is base64 encoded so decode here
		var payload = new Buffer(record.kinesis.data, 'base64').toString('ascii');
		var rec = payload.split(',');
		var ctr = rec[0];
		var anomaly_score = rec[1];
		var detail = 'Anomaly detected with a click through rate of ' + ctr + '% and an anomaly score of ' + anomaly_score;
		var subject = 'Anomaly Detected';
		var params = {
			Message: detail,
			MessageStructure: 'String',
			Subject: subject,
			TopicArn: 'arn:aws:sns:us-east-1::ClickStreamEvent'
		};
		sns.publish(params, function(err, data) {
			if (err) context.fail(err.stack);
			else     context.succeed('Published Notification');
		});
	});
}; 

Notification system

Amazon SNS is a fully managed, highly scalable messaging platform.  For this walkthrough, you send messages to SNS with subscribers that send email and text messages to a mobile phone.

Analytics pipeline walkthrough

Sometimes it’s best to build out a solution so you can see all the parts working and get a good sense of how it works.   Here are the steps to build out the entire pipeline as described above in your own account and perform real-time clickstream analysis yourself.  Following this walkthrough creates resources in your account that you can use to test the example pipeline.

Prerequisites

To complete the walkthrough, you’ll need an AWS account and a place to execute Python scripts.

Configure the CloudFormation template

Download the CloudFormation template named CSE-cfn.json from the AWS Big Data blog Github repository. In the CloudFormation console, set your region to N. Virginia, Oregon, or Ireland and then browse to that file.

When your script executes and an anomaly is detected, SNS sends a notification.  To see these in an email or text message, enter your email address and mobile phone number in the Parameters section and choose Next

This CloudFormation script creates policies and roles necessary to process the incoming messages.  Acknowledge this by selecting the field or you will get a warning about IAM_CAPABILITY.

If you entered a valid email or SMS number, you will receive a validation message when the SNS subscriptions are created by CloudFormation.  If you don’t wish to receive email or texts from your pipeline, you don’t need to complete the verification process.  

Run the Python script

Copy ClickImpressionGenerator.py to a local folder.

This script uses the requests library to make HTTP requests.  If you don’t have that Python package globally installed, you can install it locally in by running the following command in the same folder as the Python script:

pip install requests –t .

When the CloudFormation stack is complete, choose Outputs, ExecutionCommand, and run the command from the folder with the Python script. 

This will start generating web traffic and send it to the API Gateway in the front of your analytics pipeline.  It is important to have data flowing into your Amazon Kinesis Analytics application for the next section so that a schema can be generated based on the incoming data.  If the script completes before you finish with the next section, simply restart the script with the same command.

Create the Analytics pipeline application

Open the Amazon Kinesis console and choose Go to Analytics.

Choose Create new application to create the Kinesis Analytics application, enter an application name and description, and then choose Save and continue

Choose Connect to a source to see a list of streams.  Select the stream that starts with the name of your CloudFormation stack and contains CSEKinesisBeaconInputStream in the form of: <stack name>- CSEKinesisBeaconInputStream-<random string>.  This is the stream that your click and impression requests will be sent to from API Gateway. 

Scroll down and choose Choose an IAM role.  Select the role with the name in the form of <stack name>-CSEKinesisAnalyticsRole-<random string>.  If your ClickImpressionGenerator.py script is still running and generating data, you should see a stream sample.  Scroll to the bottom and choose Save and continue

Next, add the SQL for your real-time analytics.  Choose the Go to SQL editor, choose Yes, start application, and paste the SQL contents from CSE-SQL.sql into the editor. Choose Save and run SQL.  After a minute or so, you should see data showing up every ten seconds in the CLICKSTREAM.  Choose Exit when you are done editing. 

Choose Connect to a destination and select the <stack name>-CSEKinesisBeaconOutputStream-<random string> from the list of streams.  Change the output format to CSV. 

Choose Choose an IAM role and select <stack name>-CSEKinesisAnalyticsRole-<random string> again.  Choose Save and continue.  Now your analytics pipeline is complete.  When your script gets to the injected anomaly section, you should get a text message or email to notify you of the anomaly.  To clean up the resources created here, delete the stack in CloudFormation, stop the Kinesis Analytics application then delete it. 

Summary

A pipeline like this can be used for many use cases where anomaly detection is valuable. What solutions have you enabled with this architecture? If you have questions or suggestions, please comment below.

—————————–

Related

Process Amazon Kinesis Aggregated Data with AWS Lambda

Real-time Clickstream Anomaly Detection with Amazon Kinesis Analytics

Post Syndicated from Chris Marshall original https://aws.amazon.com/blogs/big-data/real-time-clickstream-anomaly-detection-with-amazon-kinesis-analytics/

Chris Marshall is a Solutions Architect for Amazon Web Services

Analyzing web log traffic to gain insights that drive business decisions has historically been performed using batch processing.  While effective, this approach results in delayed responses to emerging trends and user activities.  There are solutions to deal with processing data in real time using streaming and micro-batching technologies, but they can be complex to set up and maintain.  Amazon Kinesis Analytics is a managed service that makes it very easy to identify and respond to changes in behavior in real-time.

One use case where it’s valuable to have immediate insights is analyzing clickstream data.   In the world of digital advertising, an impression is when an ad is displayed in a browser and a clickthrough represents a user clicking on that ad.  A clickthrough rate (CTR) is one way to monitor the ad’s effectiveness.  CTR is calculated in the form of: CTR = Clicks / Impressions * 100.  Digital marketers are interested in monitoring CTR to know when particular ads perform better than normal, giving them a chance to optimize placements within the ad campaign.  They may also be interested in anomalous low-end CTR that could be a result of a bad image or bidding model.

In this post, I show an analytics pipeline which detects anomalies in real time for a web traffic stream, using the RANDOM_CUT_FOREST function available in Amazon Kinesis Analytics.

RANDOM_CUT_FOREST 

Amazon Kinesis Analytics includes a powerful set of analytics functions to analyze streams of data.  One such function is RANDOM_CUT_FOREST.  This function detects anomalies by scoring data flowing through a dynamic data stream. This novel approach identifies a normal pattern in streaming data, then compares new data points in reference to it. For more information, see Robust Random Cut Forest Based Anomaly Detection On Streams.

Analytics pipeline components

To demonstrate how the RANDOM_CUT_FOREST function can be used to detect anomalies in real-time click through rates, I will walk you through how to build an analytics pipeline and generate web traffic using a simple Python script.  When your injected anomaly is detected, you get an email or SMS message to alert you to the event.

This post first walks through the components of the analytics pipeline, then discusses how to build and test it in your own AWS account.  The accompanying AWS CloudFormation script builds the Amazon API Gateway API, Amazon Kinesis streams, AWS Lambda function, Amazon SNS components, and IAM roles required for this walkthrough. Then you manually create the Amazon Kinesis Analytics application that uses the RANDOM_CUT_FOREST function in SQL.

Web Client

Because Python is a popular scripting language available on many clients, we’ll use it to generate web impressions and click data by making HTTP GET requests with the requests library.  This script mimics beacon traffic generated by web browsers.  The GET request contains a query string value of click or impression to indicate the type of beacon being captured.  The script has a while loop that iterates 2500 times.  For each pass through the loop, an Impression beacon is sent.  Then, the random function is used to potentially send an additional click beacon.  For most of the passes through the loop, the click request is generated at the rate of roughly 10% of the time.  However, for some web requests, the CTR jumps to a rate of 50% representing an anomaly.  These are artificially high CTR rates used for the purpose of illustration for this post, but the functionality would be the same using small fractional values

import requests
import random
import sys
import argparse

def getClicked(rate):
	if random.random() <= rate:
		return True
	else:
		return False

def httpGetImpression():
	url = args.target + '?browseraction=Impression' 
	r = requests.get(url)

def httpGetClick():
	url = args.target + '?browseraction=Click' 
	r = requests.get(url)
	sys.stdout.write('+')

parser = argparse.ArgumentParser()
parser.add_argument("target",help=" the http(s) location to send the GET request")
args = parser.parse_args()
i = 0
while (i < 2500):
	httpGetImpression()
	if(i<1950 or i>=2000):
		clicked = getClicked(.1)
		sys.stdout.write('_')
	else:
		clicked = getClicked(.5)
		sys.stdout.write('-')
	if(clicked):
		httpGetClick()
	i = i + 1
	sys.stdout.flush()

Web “front door”

Client web requests are handled by Amazon API Gateway, a fully managed service that makes it easy for developers to create, publish, maintain, monitor, and secure APIs at any scale. This handles the web request traffic to get data into your analytics pipeline by being a service proxy to an Amazon Kinesis stream.  API Gateway takes click and impression web traffic and converts the HTTP header and query string data into a JSON message and place it into your stream.  The CloudFormation script creates an API endpoint with a body mapping template similar to the following:

#set($inputRoot = $input.path('$'))
{
  "Data": "$util.base64Encode("{ ""browseraction"" : ""$input.params('browseraction')"", ""site"" : ""$input.params('Host')"" }")",
  "PartitionKey" : "$input.params('Host')",
  "StreamName" : "CSEKinesisBeaconInputStream"
}

(API Gateway service proxy body mapping template)

This tells API Gateway to take the host header and query string inputs and convert them into a payload to be put into a stream using a service proxy for Amazon Kinesis Streams.

Input stream

Amazon Kinesis Streams can capture terabytes of data per hour from thousands of sources.  In this case, you use it to deliver your JSON messages to Amazon Kinesis Analytics.

Analytics application

Amazon Kinesis Analytics processes the incoming messages and allow you to perform analytics on the dynamic stream of data.  You use SQL to calculate the CTR and pass that into your RANDOM_CUT_FOREST function to calculate an anomaly score.

First, calculate the counts of impressions and clicks using SQL with a tumbling window in Amazon Kinesis Analytics.  Analytics uses streams and pumps in SQL to process data.  See our earlier post to get an overview of streaming data with Kinesis Analytics. Inside the Analytics SQL, a stream is analogous to a table and a pump is the flow of data into those tables.  Here’s the description of streams for the impression and click data.

CREATE OR REPLACE STREAM "CLICKSTREAM" ( 
   "CLICKCOUNT" DOUBLE
);


CREATE OR REPLACE STREAM "IMPRESSIONSTREAM" ( 
   "IMPRESSIONCOUNT" DOUBLE
);

Later, you calculate the clicks divided by impressions; you are using a DOUBLE data type to contain the count values.  If you left them as integers, the division below would yield only a 0 or 1.  Next, you define the pumps that populate the stream using a tumbling window, a window of time during which all records received are considered as part of the statement.

CREATE OR REPLACE PUMP "CLICKPUMP" STARTED AS 
INSERT INTO "CLICKSTREAM" ("CLICKCOUNT") 
SELECT STREAM COUNT(*) 
FROM "SOURCE_SQL_STREAM_001"
WHERE "browseraction" = 'Click'
GROUP BY FLOOR(
  ("SOURCE_SQL_STREAM_001".ROWTIME - TIMESTAMP '1970-01-01 00:00:00')
    SECOND / 10 TO SECOND
);
CREATE OR REPLACE PUMP "IMPRESSIONPUMP" STARTED AS 
INSERT INTO "IMPRESSIONSTREAM" ("IMPRESSIONCOUNT") 
SELECT STREAM COUNT(*) 
FROM "SOURCE_SQL_STREAM_001"
WHERE "browseraction" = 'Impression'
GROUP BY FLOOR(
  ("SOURCE_SQL_STREAM_001".ROWTIME - TIMESTAMP '1970-01-01 00:00:00')
    SECOND / 10 TO SECOND
);

The GROUP BY statement uses FLOOR and ROWTIME to create a tumbling window of 10 seconds.  This tumbling window will output one record every ten seconds assuming we are receiving data from the corresponding pump.  In this case, you output the total number of records that were clicks or impressions.

Next, use the output of these pumps to calculate the CTR:

CREATE OR REPLACE STREAM "CTRSTREAM" (
  "CTR" DOUBLE
);

CREATE OR REPLACE PUMP "CTRPUMP" STARTED AS 
INSERT INTO "CTRSTREAM" ("CTR")
SELECT STREAM "CLICKCOUNT" / "IMPRESSIONCOUNT" * 100 as "CTR"
FROM "IMPRESSIONSTREAM",
  "CLICKSTREAM"
WHERE "IMPRESSIONSTREAM".ROWTIME = "CLICKSTREAM".ROWTIME;

Finally, these CTR values are used in RANDOM_CUT_FOREST to detect anomalous values.  The DESTINATION_SQL_STREAM is the output stream of data from the Amazon Kinesis Analytics application.

CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
    "CTRPERCENT" DOUBLE,
    "ANOMALY_SCORE" DOUBLE
);

CREATE OR REPLACE PUMP "OUTPUT_PUMP" STARTED AS 
INSERT INTO "DESTINATION_SQL_STREAM" 
SELECT STREAM * FROM
TABLE (RANDOM_CUT_FOREST( 
             CURSOR(SELECT STREAM "CTR" FROM "CTRSTREAM"), --inputStream
             100, --numberOfTrees (default)
             12, --subSampleSize 
             100000, --timeDecay (default)
             1) --shingleSize (default)
)
WHERE ANOMALY_SCORE > 2; 

The RANDOM_CUT_FOREST function greatly simplifies the programming required for anomaly detection.  However, understanding your data domain is paramount when performing data analytics.  The RANDOM_CUT_FOREST function is a tool for data scientists, not a replacement for them.  Knowing whether your data is logarithmic, circadian rhythmic, linear, etc. will provide the insights necessary to select the right parameters for RANDOM_CUT_FOREST.  For more information about parameters, see the RANDOM_CUT_FOREST Function.

Fortunately, the default values work in a wide variety of cases. In this case, use the default values for all but the subSampleSize parameter.  Typically, you would use a larger sample size to increase the pool of random samples used to calculate the anomaly score; for this post, use 12 samples so as to start evaluating the anomaly scores sooner.

Your SQL query outputs one record every ten seconds from the tumbling window so you’ll have enough evaluation values after two minutes to start calculating the anomaly score.  You are also using a cutoff value where records are only output to “DESTINATION_SQL_STREAM” if the anomaly score from the function is greater than 2 using the WHERE clause. To help visualize the cutoff point, here are the data points from a few runs through the pipeline using the sample Python script:

As you can see, the majority of the CTR data points are grouped around a 10% rate.  As the values move away from the 10% rate, their anomaly scores go up because they are more infrequent.  In the graph, an anomaly score above the cutoff is shaded orange. A cutoff value of 2 works well for this data set.

Output stream

The Amazon Kinesis Analytics application outputs the values from “DESTINATION_SQL_STREAM” to an Amazon Kinesis stream when the record has an anomaly score greater than 2.

Anomaly execution function

The output stream is an input event for an AWS Lambda function with a batch size of 1, so the function gets called one time for each message put in the output stream.  This function represents a place where you could react to anomalous events.  In a real world scenario, you may want to instead update a real-time bidding system to react to current events.  In this example, you send a message to SNS to see a tangible response to the changing CTR.  The CloudFormation template creates a Lambda function similar to the following:

var AWS = require('aws-sdk');
var sns = new AWS.SNS( { region: "<region>" });
	exports.handler = function(event, context) {
	//our batch size is 1 record so loop expected to process only once
	event.Records.forEach(function(record) {
		// Kinesis data is base64 encoded so decode here
		var payload = new Buffer(record.kinesis.data, 'base64').toString('ascii');
		var rec = payload.split(',');
		var ctr = rec[0];
		var anomaly_score = rec[1];
		var detail = 'Anomaly detected with a click through rate of ' + ctr + '% and an anomaly score of ' + anomaly_score;
		var subject = 'Anomaly Detected';
		var params = {
			Message: detail,
			MessageStructure: 'String',
			Subject: subject,
			TopicArn: 'arn:aws:sns:us-east-1::ClickStreamEvent'
		};
		sns.publish(params, function(err, data) {
			if (err) context.fail(err.stack);
			else     context.succeed('Published Notification');
		});
	});
}; 

Notification system

Amazon SNS is a fully managed, highly scalable messaging platform.  For this walkthrough, you send messages to SNS with subscribers that send email and text messages to a mobile phone.

Analytics pipeline walkthrough

Sometimes it’s best to build out a solution so you can see all the parts working and get a good sense of how it works.   Here are the steps to build out the entire pipeline as described above in your own account and perform real-time clickstream analysis yourself.  Following this walkthrough creates resources in your account that you can use to test the example pipeline.

Prerequisites

To complete the walkthrough, you’ll need an AWS account and a place to execute Python scripts.

Configure the CloudFormation template

Download the CloudFormation template named CSE-cfn.json from the AWS Big Data blog Github repository. In the CloudFormation console, set your region to N. Virginia, Oregon, or Ireland and then browse to that file.

When your script executes and an anomaly is detected, SNS sends a notification.  To see these in an email or text message, enter your email address and mobile phone number in the Parameters section and choose Next.

This CloudFormation script creates policies and roles necessary to process the incoming messages.  Acknowledge this by selecting the field or you will get a warning about IAM_CAPABILITY.

If you entered a valid email or SMS number, you will receive a validation message when the SNS subscriptions are created by CloudFormation.  If you don’t wish to receive email or texts from your pipeline, you don’t need to complete the verification process.

Run the Python script

Copy ClickImpressionGenerator.py to a local folder.

This script uses the requests library to make HTTP requests.  If you don’t have that Python package globally installed, you can install it locally in by running the following command in the same folder as the Python script:

pip install requests –t .

When the CloudFormation stack is complete, choose Outputs, ExecutionCommand, and run the command from the folder with the Python script.

This will start generating web traffic and send it to the API Gateway in the front of your analytics pipeline.  It is important to have data flowing into your Amazon Kinesis Analytics application for the next section so that a schema can be generated based on the incoming data.  If the script completes before you finish with the next section, simply restart the script with the same command.

Create the Analytics pipeline application

Open the Amazon Kinesis console and choose Go to Analytics.

Choose Create new application to create the Kinesis Analytics application, enter an application name and description, and then choose Save and continue.

Choose Connect to a source to see a list of streams.  Select the stream that starts with the name of your CloudFormation stack and contains CSEKinesisBeaconInputStream in the form of: <stack name>- CSEKinesisBeaconInputStream-<random string>.  This is the stream that your click and impression requests will be sent to from API Gateway.

Scroll down and choose Choose an IAM role.  Select the role with the name in the form of <stack name>-CSEKinesisAnalyticsRole-<random string>.  If your ClickImpressionGenerator.py script is still running and generating data, you should see a stream sample.  Scroll to the bottom and choose Save and continue.

Next, add the SQL for your real-time analytics.  Choose the Go to SQL editor, choose Yes, start application, and paste the SQL contents from CSE-SQL.sql into the editor. Choose Save and run SQL.  After a minute or so, you should see data showing up every ten seconds in the CLICKSTREAM.  Choose Exit when you are done editing.

Choose Connect to a destination and select the <stack name>-CSEKinesisBeaconOutputStream-<random string> from the list of streams.  Change the output format to CSV.

Choose Choose an IAM role and select <stack name>-CSEKinesisAnalyticsRole-<random string> again.  Choose Save and continue.  Now your analytics pipeline is complete.  When your script gets to the injected anomaly section, you should get a text message or email to notify you of the anomaly.  To clean up the resources created here, delete the stack in CloudFormation, stop the Kinesis Analytics application then delete it.

Summary

A pipeline like this can be used for many use cases where anomaly detection is valuable. What solutions have you enabled with this architecture? If you have questions or suggestions, please comment below.

 


Related

Process Amazon Kinesis Aggregated Data with AWS Lambda

Recovering an iPhone 5c Passcode

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2016/09/recovering_an_i.html

Remember the San Bernardino killer’s iPhone, and how the FBI maintained that they couldn’t get the encryption key without Apple providing them with a universal backdoor? Many of us computer-security experts said that they were wrong, and there were several possible techniques they could use. One of them was manually removing the flash chip from the phone, extracting the memory, and then running a brute-force attack without worrying about the phone deleting the key.

The FBI said it was impossible. We all said they were wrong. Now, Sergei Skorobogatov has proved them wrong. Here’s his paper:

Abstract: This paper is a short summary of a real world mirroring attack on the Apple iPhone 5c passcode retry counter under iOS 9. This was achieved by desoldering the NAND Flash chip of a sample phone in order to physically access its connection to the SoC and partially reverse engineering its proprietary bus protocol. The process does not require any expensive and sophisticated equipment. All needed parts are low cost and were obtained from local electronics distributors. By using the described and successful hardware mirroring process it was possible to bypass the limit on passcode retry attempts. This is the first public demonstration of the working prototype and the real hardware mirroring process for iPhone 5c. Although the process can be improved, it is still a successful proof-of-concept project. Knowledge of the possibility of mirroring will definitely help in designing systems with better protection. Also some reliability issues related to the NAND memory allocation in iPhone 5c are revealed. Some future research directions are outlined in this paper and several possible countermeasures are suggested. We show that claims that iPhone 5c NAND mirroring was infeasible were ill-advised.

Susan Landau explains why this is important:

The moral of the story? It’s not, as the FBI has been requesting, a bill to make it easier to access encrypted communications, as in the proposed revised Burr-Feinstein bill. Such “solutions” would make us less secure, not more so. Instead we need to increase law enforcement’s capabilities to handle encrypted communications and devices. This will also take more funding as well as redirection of efforts. Increased security of our devices and simultaneous increased capabilities of law enforcement are the only sensible approach to a world where securing the bits, whether of health data, financial information, or private emails, has become of paramount importance.

Or: The FBI needs computer-security expertise, not backdoors.

Patrick Ball writes about the dangers of backdoors.

EDITED TO ADD (9/23): Good article from the Economist.

Surviving the Zombie Apocalypse with Serverless Microservices

Post Syndicated from Aaron Kao original https://aws.amazon.com/blogs/compute/surviving-the-zombie-apocalypse-with-serverless-microservices/

Run Apps without the Bite!

by: Kyle Somers – Associate Solutions Architect

Let’s face it, managing servers is a pain! Capacity management and scaling is even worse. Now imagine dedicating your time to SysOps during a zombie apocalypse — barricading the door from flesh eaters with one arm while patching an OS with the other.

This sounds like something straight out of a nightmare. Lucky for you, this doesn’t have to be the case. Over at AWS, we’re making it easier than ever to build and power apps at scale with powerful managed services, so you can focus on your core business – like surviving – while we handle the infrastructure management that helps you do so.

Join the AWS Lambda Signal Corps!

At AWS re:Invent in 2015, we piloted a workshop where participants worked in groups to build a serverless chat application for zombie apocalypse survivors, using Amazon S3, Amazon DynamoDB, Amazon API Gateway, and AWS Lambda. Participants learned about microservices design patterns and best practices. They then extended the functionality of the serverless chat application with various add-on functionalities – such as mobile SMS integration, and zombie motion detection – using additional services like Amazon SNS and Amazon Elasticsearch Service.

Between the widespread interest in serverless architectures and AWS Lambda by our customers, we’ve recognized the excitement around this subject. Therefore, we are happy to announce that we’ll be taking this event on the road in the U.S. and abroad to recruit new developers for the AWS Lambda Signal Corps!

 

Help us save humanity! Learn More and Register Here!

 

Washington, DC | March 10 – Mission Accomplished!

San Francisco, CA @ AWS Loft | March 24 – Mission Accomplished!

New York City, NY @ AWS Loft | April 13 – Mission Accomplished!

London, England @ AWS Loft | April 25

Austin, TX | April 26

Atlanta, GA | May 4

Santa Monica, CA | June 7

Berlin, Germany | July 19

San Francisco, CA @ AWS Loft | August 16

New York City, NY @ AWS Loft | August 18

 

If you’re unable to join us at one of these workshops, that’s OK! In this post, I’ll show you how our survivor chat application incorporates some important microservices design patterns and how you can power your apps in the same way using a serverless architecture.


 

What Are Serverless Architectures?

At AWS, we know that infrastructure management can be challenging. We also understand that customers prefer to focus on delivering value to their business and customers. There’s a lot of undifferentiated heavy lifting to be building and running applications, such as installing software, managing servers, coordinating patch schedules, and scaling to meet demand. Serverless architectures allow you to build and run applications and services without having to manage infrastructure. Your application still runs on servers, but all the server management is done for you by AWS. Serverless architectures can make it easier to build, manage, and scale applications in the cloud by eliminating much of the heavy lifting involved with server management.

Key Benefits of Serverless Architectures

  • No Servers to Manage: There are no servers for you to provision and manage. All the server management is done for you by AWS.
  • Increased Productivity: You can now fully focus your attention on building new features and apps because you are freed from the complexities of server management, allowing you to iterate faster and reduce your development time.
  • Continuous Scaling: Your applications and services automatically scale up and down based on size of the workload.

What Should I Expect to Learn at a Zombie Microservices Workshop?

The workshop content we developed is designed to demonstrate best practices for serverless architectures using AWS. In this post we’ll discuss the following topics:

  • Which services are useful when designing a serverless application on AWS (see below!)
  • Design considerations for messaging, data transformation, and business or app-tier logic when building serverless microservices.
  • Best practices demonstrated in the design of our zombie survivor chat application.
  • Next steps for you to get started building your own serverless microservices!

Several AWS services were used to design our zombie survivor chat application. Each of these services are managed and highly scalable. Let’s take a quick at look at which ones we incorporated in the architecture:

  • AWS Lambda allows you to run your code without provisioning or managing servers. Just upload your code (currently Node.js, Python, or Java) and Lambda takes care of everything required to run and scale your code with high availability. You can set up your code to automatically trigger from other AWS services or call it directly from any web or mobile app. Lambda is used to power many use cases, such as application back ends, scheduled administrative tasks, and even big data workloads via integration with other AWS services such as Amazon S3, DynamoDB, Redshift, and Kinesis.
  • Amazon Simple Storage Service (Amazon S3) is our object storage service, which provides developers and IT teams with secure, durable, and scalable storage in the cloud. S3 is used to support a wide variety of use cases and is easy to use with a simple interface for storing and retrieving any amount of data. In the case of our survivor chat application, it can even be used to host static websites with CORS and DNS support.
  • Amazon API Gateway makes it easy to build RESTful APIs for your applications. API Gateway is scalable and simple to set up, allowing you to build integrations with back-end applications, including code running on AWS Lambda, while the service handles the scaling of your API requests.
  • Amazon DynamoDB is a fast and flexible NoSQL database service for all applications that need consistent, single-digit millisecond latency at any scale. It is a fully managed cloud database and supports both document and key-value store models. Its flexible data model and reliable performance make it a great fit for mobile, web, gaming, ad tech, IoT, and many other applications.

Overview of the Zombie Survivor Chat App

The survivor chat application represents a completely serverless architecture that delivers a baseline chat application (written using AngularJS) to workshop participants upon which additional functionality can be added. In order to deliver this baseline chat application, an AWS CloudFormation template is provided to participants, which spins up the environment in their account. The following diagram represents a high level architecture of the components that are launched automatically:

High-Level Architecture of Survivor Serverless Chat App

  • Amazon S3 bucket is created to store the static web app contents of the chat application.
  • AWS Lambda functions are created to serve as the back-end business logic tier for processing reads/writes of chat messages.
  • API endpoints are created using API Gateway and mapped to Lambda functions. The API Gateway POST method points to a WriteMessages Lambda function. The GET method points to a GetMessages Lambda function.
  • A DynamoDB messages table is provisioned to act as our data store for the messages from the chat application.

Serverless Survivor Chat App Hosted on Amazon S3

With the CloudFormation stack launched and the components built out, the end result is a fully functioning chat app hosted in S3, using API Gateway and Lambda to process requests, and DynamoDB as the persistence for our chat messages.

With this baseline app, participants join in teams to build out additional functionality, including the following:

  • Integration of SMS/MMS via Twilio. Send messages to chat from SMS.
  • Motion sensor detection of nearby zombies with Amazon SNS and Intel® Edison and Grove IoT Starter Kit. AWS provides a shared motion sensor for the workshop, and you consume its messages from SNS.
  • Help-me panic button with IoT.
  • Integration with Slack for messaging from another platform.
  • Typing indicator to see which survivors are typing.
  • Serverless analytics of chat messages using Amazon Elasticsearch Service (Amazon ES).
  • Any other functionality participants can think of!

As a part of the workshop, AWS provides guidance for most of these tasks. With these add-ons completed, the architecture of the chat system begins to look quite a bit more sophisticated, as shown below:

Architecture of Survivor Chat with Additional Add-on Functionality

Architectural Tenants of the Serverless Survivor Chat

For the most part, the design patterns you’d see in a traditional server-yes environment you will also find in a serverless environment. No surprises there. With that said, it never hurts to revisit best practices while learning new ones. So let’s review some key patterns we incorporated in our serverless application.

Decoupling Is Paramount

In the survivor chat application, Lambda functions are serving as our tier for business logic. Since users interact with Lambda at the function level, it serves you well to split up logic into separate functions as much as possible so you can scale the logic tier independently from the source and destinations upon which it serves.

As you’ll see in the architecture diagram in the above section, the application has separate Lambda functions for the chat service, the search service, the indicator service, etc. Decoupling is also incorporated through the use of API Gateway, which exposes our back-end logic via a unified RESTful interface. This model allows us to design our back-end logic with potentially different programming languages, systems, or communications channels, while keeping the requesting endpoints unaware of the implementation. Use this pattern and you won’t cry for help when you need to scale, update, add, or remove pieces of your environment.

Separate Your Data Stores

Treat each data store as an isolated application component of the service it supports. One common pitfall when following microservices architectures is to forget about the data layer. By keeping the data stores specific to the service they support, you can better manage the resources needed at the data layer specifically for that service. This is the true value in microservices.

In the survivor chat application, this practice is illustrated with the Activity and Messages DynamoDB tables. The activity indicator service has its own data store (Activity table) while the chat service has its own (Messages). These tables can scale independently along with their respective services. This scenario also represents a good example of statefuless. The implementation of the talking indicator add-on uses DynamoDB via the Activity table to track state information about which users are talking. Remember, many of the benefits of microservices are lost if the components are still all glued together at the data layer in the end, creating a messy common denominator for scaling.

Leverage Data Transformations up the Stack

When designing a service, data transformation and compatibility are big components. How will you handle inputs from many different clients, users, systems for your service? Will you run different flavors of your environment to correspond with different incoming request standards?  Absolutely not!

With API Gateway, data transformation becomes significantly easier through built-in models and mapping templates. With these features you can build data transformation and mapping logic into the API layer for requests and responses. This results in less work for you since API Gateway is a managed service. In the case of our survivor chat app, AWS Lambda and our survivor chat app require JSON while Twilio likes XML for the SMS integration. This type of transformation can be offloaded to API Gateway, leaving you with a cleaner business tier and one less thing to design around!

Use API Gateway as your interface and Lambda as your common backend implementation. API Gateway uses Apache Velocity Template Language (VTL) and JSONPath for transformation logic. Of course, there is a trade-off to be considered, as a lot of transformation logic could be handled in your business-logic tier (Lambda). But, why manage that yourself in application code when you can transparently handle it in a fully managed service through API Gateway? Here are a few things to keep in mind when handling transformations using API Gateway and Lambda:

  • Transform first; then call your common back-end logic.
  • Use API Gateway VTL transformations first when possible.
  • Use Lambda to preprocess data in ways that VTL can’t.

Using API Gateway VTL for Input/Output Data Transformations

 

Security Through Service Isolation and Least Privilege

As a general recommendation when designing your services, always utilize least privilege and isolate components of your application to provide control over access. In the survivor chat application, a permissions-based model is used via AWS Identity and Access Management (IAM). IAM is integrated in every service on the AWS platform and provides the capability for services and applications to assume roles with strict permission sets to perform their least-privileged access needs. Along with access controls, you should implement audit and access logging to provide the best visibility into your microservices. This is made easy with Amazon CloudWatch Logs and AWS CloudTrail. CloudTrail enables audit capability of API calls made on the platform while CloudWatch Logs enables you to ship custom log data to AWS. Although our implementation of Amazon Elasticsearch in the survivor chat is used for analyzing chat messages, you can easily ship your log data to it and perform analytics on your application. You can incorporate security best practices in the following ways with the survivor chat application:

  • Each Lambda function should have an IAM role to access only the resources it needs. For example, the GetMessages function can read from the Messages table while the WriteMessages function can write to it. But they cannot access the Activities table that is used to track who is typing for the indicator service.
  • Each API Gateway endpoint must have IAM permissions to execute the Lambda function(s) it is tied to. This model ensures that Lambda is only executed from the principle that is allowed to execute it, in this case the API Gateway method that triggers the back end function.
  • DynamoDB requires read/write permissions via IAM, which limits anonymous database activity.
  • Use AWS CloudTrail to audit API activity on the platform and among the various services. This provides traceability, especially to see who is invoking your Lambda functions.
  • Design Lambda functions to publish meaningful outputs, as these are logged to CloudWatch Logs on your behalf.

FYI, in our application, we allow anonymous access to the chat API Gateway endpoints. We want to encourage all survivors to plug into the service without prior registration and start communicating. We’ve assumed zombies aren’t intelligent enough to hack into our communication channels. Until the apocalypse, though, stay true to API keys and authorization with signatures, which API Gateway supports!

Don’t Abandon Dev/Test

When developing with microservices, you can still leverage separate development and test environments as a part of the deployment lifecycle. AWS provides several features to help you continue building apps along the same trajectory as before, including these:

  • Lambda function versioning and aliases: Use these features to version your functions based on the stages of deployment such as development, testing, staging, pre-production, etc. Or perhaps make changes to an existing Lambda function in production without downtime.
  • Lambda service blueprints: Lambda comes with dozens of blueprints to get you started with prewritten code that you can use as a skeleton, or a fully functioning solution, to complete your serverless back end. These include blueprints with hooks into Slack, S3, DynamoDB, and more.
  • API Gateway deployment stages: Similar to Lambda versioning, this feature lets you configure separate API stages, along with unique stage variables and deployment versions within each stage. This allows you to test your API with the same or different back ends while it progresses through changes that you make at the API layer.
  • Mock Integrations with API Gateway: Configure dummy responses that developers can use to test their code while the true implementation of your API is being developed. Mock integrations make it faster to iterate through the API portion of a development lifecycle by streamlining pieces that used to be very sequential/waterfall.

Using Mock Integrations with API Gateway

Stay Tuned for Updates!

Now that you’ve got the necessary best practices to design your microservices, do you have what it takes to fight against the zombie hoard? The serverless options we explored are ready for you to get started with and the survivors are counting on you!

Be sure to keep an eye on the AWS GitHub repo. Although I didn’t cover each component of the survivor chat app in this post, we’ll be deploying this workshop and code soon for you to launch on your own! Keep an eye out for Zombie Workshops coming to your city, or nominate your city for a workshop here.

For more information on how you can get started with serverless architectures on AWS, refer to the following resources:

Whitepaper – AWS Serverless Multi-Tier Architectures

Reference Architectures and Sample Code

*Special thanks to my colleagues Ben Snively, Curtis Bray, Dean Bryen, Warren Santner, and Aaron Kao at AWS. They were instrumental to our team developing the content referenced in this post.