Tag Archives: Conditions

Friday Squid Blogging: Do Cephalopods Contain Alien DNA?

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2018/06/friday_squid_bl_627.html

Maybe not DNA, but biological somethings.

Cause of Cambrian explosion — Terrestrial or Cosmic?“:

Abstract: We review the salient evidence consistent with or predicted by the Hoyle-Wickramasinghe (H-W) thesis of Cometary (Cosmic) Biology. Much of this physical and biological evidence is multifactorial. One particular focus are the recent studies which date the emergence of the complex retroviruses of vertebrate lines at or just before the Cambrian Explosion of ~500 Ma. Such viruses are known to be plausibly associated with major evolutionary genomic processes. We believe this coincidence is not fortuitous but is consistent with a key prediction of H-W theory whereby major extinction-diversification evolutionary boundaries coincide with virus-bearing cometary-bolide bombardment events. A second focus is the remarkable evolution of intelligent complexity (Cephalopods) culminating in the emergence of the Octopus. A third focus concerns the micro-organism fossil evidence contained within meteorites as well as the detection in the upper atmosphere of apparent incoming life-bearing particles from space. In our view the totality of the multifactorial data and critical analyses assembled by Fred Hoyle, Chandra Wickramasinghe and their many colleagues since the 1960s leads to a very plausible conclusion — life may have been seeded here on Earth by life-bearing comets as soon as conditions on Earth allowed it to flourish (about or just before 4.1 Billion years ago); and living organisms such as space-resistant and space-hardy bacteria, viruses, more complex eukaryotic cells, fertilised ova and seeds have been continuously delivered ever since to Earth so being one important driver of further terrestrial evolution which has resulted in considerable genetic diversity and which has led to the emergence of mankind.

Two commentaries.

This is almost certainly not true.

As usual, you can also use this squid post to talk about the security stories in the news that I haven’t covered.

Read my blog posting guidelines here.

Protecting coral reefs with Nemo-Pi, the underwater monitor

Post Syndicated from Janina Ander original https://www.raspberrypi.org/blog/coral-reefs-nemo-pi/

The German charity Save Nemo works to protect coral reefs, and they are developing Nemo-Pi, an underwater “weather station” that monitors ocean conditions. Right now, you can vote for Save Nemo in the Google.org Impact Challenge.

Nemo-Pi — Save Nemo

Save Nemo

The organisation says there are two major threats to coral reefs: divers, and climate change. To make diving saver for reefs, Save Nemo installs buoy anchor points where diving tour boats can anchor without damaging corals in the process.

reef damaged by anchor
boat anchored at buoy

In addition, they provide dos and don’ts for how to behave on a reef dive.

The Nemo-Pi

To monitor the effects of climate change, and to help divers decide whether conditions are right at a reef while they’re still on shore, Save Nemo is also in the process of perfecting Nemo-Pi.

Nemo-Pi schematic — Nemo-Pi — Save Nemo

This Raspberry Pi-powered device is made up of a buoy, a solar panel, a GPS device, a Pi, and an array of sensors. Nemo-Pi measures water conditions such as current, visibility, temperature, carbon dioxide and nitrogen oxide concentrations, and pH. It also uploads its readings live to a public webserver.

Inside the Nemo-Pi device — Save Nemo
Inside the Nemo-Pi device — Save Nemo
Inside the Nemo-Pi device — Save Nemo

The Save Nemo team is currently doing long-term tests of Nemo-Pi off the coast of Thailand and Indonesia. They are also working on improving the device’s power consumption and durability, and testing prototypes with the Raspberry Pi Zero W.

web dashboard — Nemo-Pi — Save Nemo

The web dashboard showing live Nemo-Pi data

Long-term goals

Save Nemo aims to install a network of Nemo-Pis at shallow reefs (up to 60 metres deep) in South East Asia. Then diving tour companies can check the live data online and decide day-to-day whether tours are feasible. This will lower the impact of humans on reefs and help the local flora and fauna survive.

Coral reefs with fishes

A healthy coral reef

Nemo-Pi data may also be useful for groups lobbying for reef conservation, and for scientists and activists who want to shine a spotlight on the awful effects of climate change on sea life, such as coral bleaching caused by rising water temperatures.

Bleached coral

A bleached coral reef

Vote now for Save Nemo

If you want to help Save Nemo in their mission today, vote for them to win the Google.org Impact Challenge:

  1. Head to the voting web page
  2. Click “Abstimmen” in the footer of the page to vote
  3. Click “JA” in the footer to confirm

Voting is open until 6 June. You can also follow Save Nemo on Facebook or Twitter. We think this organisation is doing valuable work, and that their projects could be expanded to reefs across the globe. It’s fantastic to see the Raspberry Pi being used to help protect ocean life.

The post Protecting coral reefs with Nemo-Pi, the underwater monitor appeared first on Raspberry Pi.

Pirate IPTV Sellers Sign Abstention Agreements Under Pressure From BREIN

Post Syndicated from Andy original https://torrentfreak.com/pirate-iptv-sellers-sign-abstention-agreement-under-pressure-from-brein-180528/

Earlier this month, Dutch anti-piracy outfit BREIN revealed details of its case against Netherlands-based company Leaper Beheer BV.

BREIN’s complaint, which was filed at the Limburg District Court in Maastricht, claimed that
Leaper sold access to unlicensed live TV streams and on-demand movies. Around 4,000 live channels and 1,000 movies were included in the package, which was distributed to customers in the form of an .M3U playlist.

BREIN said that distribution of the playlist amounted to a communication to the public in contravention of the EU Copyright Directive. In its defense, Leaper argued that it is not a distributor of content itself and did not make anything available that wasn’t already public.

In a detailed ruling the Court sided with BREIN, noting that Leaper communicated works to a new audience that wasn’t taken into account when the content’s owners initially gave permission for their work to be distributed to the public.

The Court ordered Leaper to stop providing access to the unlicensed streams or face penalties of 5,000 euros per IPTV subscription sold, link offered, or days exceeded, to a maximum of one million euros. Further financial penalties were threatened for non-compliance with other aspects of the ruling.

In a fresh announcement Friday, BREIN revealed that three companies and their directors (Leaper included) have signed agreements to cease-and-desist, in order to avert summary proceedings. According to BREIN, the companies are the biggest sellers of pirate IPTV subscriptions in the Netherlands.

In addition to Leaper Beheer BV, Growler BV, DITisTV and their respective directors are bound by a number of conditions in their agreements but primarily to cease-and-desist offering hyperlinks or other technical means to access protected works belonging to BREIN’s affiliates and their members.

Failure to comply with the terms of the agreement will see the companies face penalties of 10,000 euros per infringement or per day (or part thereof).

DITisTV’s former website now appears to sell shoes and a search for the company using Google doesn’t reveal many flattering results. Consumer website Consumentenbond.nl enjoys the top spot with an article reporting that it received 300 complaints about DITisTV.

“The complainants report that after they have paid, they have not received their order, or that they were not given a refund if they sent back a malfunctioning media player. Some consumers have been waiting for their money for several months,” the article reads.

According to the report, DiTisTV pulled the plug on its website last June, probably in response to the European Court of Justice ruling which found that selling piracy-configured media players is illegal.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN reviews, discounts, offers and coupons.

The devil wears Pravda

Post Syndicated from Robert Graham original https://blog.erratasec.com/2018/05/the-devil-wears-pravda.html

Classic Bond villain, Elon Musk, has a new plan to create a website dedicated to measuring the credibility and adherence to “core truth” of journalists. He is, without any sense of irony, going to call this “Pravda”. This is not simply wrong but evil.

Musk has a point. Journalists do suck, and many suck consistently. I see this in my own industry, cybersecurity, and I frequently criticize them for their suckage.

But what he’s doing here is not correcting them when they make mistakes (or what Musk sees as mistakes), but questioning their legitimacy. This legitimacy isn’t measured by whether they follow established journalism ethics, but whether their “core truths” agree with Musk’s “core truths”.

An example of the problem is how the press fixates on Tesla car crashes due to its “autopilot” feature. Pretty much every autopilot crash makes national headlines, while the press ignores the other 40,000 car crashes that happen in the United States each year. Musk spies on Tesla drivers (hello, classic Bond villain everyone) so he can see the dip in autopilot usage every time such a news story breaks. He’s got good reason to be concerned about this.

He argues that autopilot is safer than humans driving, and he’s got the statistics and government studies to back this up. Therefore, the press’s fixation on Tesla crashes is illegitimate “fake news”, titillating the audience with distorted truth.

But here’s the thing: that’s still only Musk’s version of the truth. Yes, on a mile-per-mile basis, autopilot is safer, but there’s nuance here. Autopilot is used primarily on freeways, which already have a low mile-per-mile accident rate. People choose autopilot only when conditions are incredibly safe and drivers are unlikely to have an accident anyway. Musk is therefore being intentionally deceptive comparing apples to oranges. Autopilot may still be safer, it’s just that the numbers Musk uses don’t demonstrate this.

And then there is the truth calling it “autopilot” to begin with, because it isn’t. The public is overrating the capabilities of the feature. It’s little different than “lane keeping” and “adaptive cruise control” you can now find in other cars. In many ways, the technology is behind — my Tesla doesn’t beep at me when a pedestrian walks behind my car while backing up, but virtually every new car on the market does.

Yes, the press unduly covers Tesla autopilot crashes, but Musk has only himself to blame by unduly exaggerating his car’s capabilities by calling it “autopilot”.

What’s “core truth” is thus rather difficult to obtain. What the press satisfies itself with instead is smaller truths, what they can document. The facts are in such cases that the accident happened, and they try to get Tesla or Musk to comment on it.

What you can criticize a journalist for is therefore not “core truth” but whether they did journalism correctly. When such stories criticize “autopilot”, but don’t do their diligence in getting Tesla’s side of the story, then that’s a violation of journalistic practice. When I criticize journalists for their poor handling of stories in my industry, I try to focus on which journalistic principles they get wrong. For example, the NYTimes reporters do a lot of stories quoting anonymous government sources in clear violation of journalistic principles.

If “credibility” is the concern, then it’s the classic Bond villain here that’s the problem: Musk himself. His track record on business statements is abysmal. For example, when he announced the Model 3 he claimed production targets that every Wall Street analyst claimed were absurd. He didn’t make those targets, he didn’t come close. Model 3 production is still lagging behind Musk’s twice adjusted targets.

https://www.bloomberg.com/graphics/2018-tesla-tracker/

So who has a credibility gap here, the press, or Musk himself?

Not only is Musk’s credibility problem ironic, so is the name he chose, “Pravada”, the Russian word for truth that was the name of the Soviet Union Communist Party’s official newspaper. This is so absurd this has to be a joke, yet Musk claims to be serious about all this.

Yes, the press has a lot of problems, and if Musk were some journalism professor concerned about journalists meeting the objective standards of their industry (e.g. abusing anonymous sources), then this would be a fine thing. But it’s not. It’s Musk who is upset the press’s version of “core truth” does not agree with his version — a version that he’s proven time and time again differs from “real truth”.

Just in case Musk is serious, I’ve already registered “www.antipravda.com” to start measuring the credibility of statements by billionaire playboy CEOs. Let’s see who blinks first.


I stole the title, with permission, from this tweet:

Analyze Apache Parquet optimized data using Amazon Kinesis Data Firehose, Amazon Athena, and Amazon Redshift

Post Syndicated from Roy Hasson original https://aws.amazon.com/blogs/big-data/analyzing-apache-parquet-optimized-data-using-amazon-kinesis-data-firehose-amazon-athena-and-amazon-redshift/

Amazon Kinesis Data Firehose is the easiest way to capture and stream data into a data lake built on Amazon S3. This data can be anything—from AWS service logs like AWS CloudTrail log files, Amazon VPC Flow Logs, Application Load Balancer logs, and others. It can also be IoT events, game events, and much more. To efficiently query this data, a time-consuming ETL (extract, transform, and load) process is required to massage and convert the data to an optimal file format, which increases the time to insight. This situation is less than ideal, especially for real-time data that loses its value over time.

To solve this common challenge, Kinesis Data Firehose can now save data to Amazon S3 in Apache Parquet or Apache ORC format. These are optimized columnar formats that are highly recommended for best performance and cost-savings when querying data in S3. This feature directly benefits you if you use Amazon Athena, Amazon Redshift, AWS Glue, Amazon EMR, or any other big data tools that are available from the AWS Partner Network and through the open-source community.

Amazon Connect is a simple-to-use, cloud-based contact center service that makes it easy for any business to provide a great customer experience at a lower cost than common alternatives. Its open platform design enables easy integration with other systems. One of those systems is Amazon Kinesis—in particular, Kinesis Data Streams and Kinesis Data Firehose.

What’s really exciting is that you can now save events from Amazon Connect to S3 in Apache Parquet format. You can then perform analytics using Amazon Athena and Amazon Redshift Spectrum in real time, taking advantage of this key performance and cost optimization. Of course, Amazon Connect is only one example. This new capability opens the door for a great deal of opportunity, especially as organizations continue to build their data lakes.

Amazon Connect includes an array of analytics views in the Administrator dashboard. But you might want to run other types of analysis. In this post, I describe how to set up a data stream from Amazon Connect through Kinesis Data Streams and Kinesis Data Firehose and out to S3, and then perform analytics using Athena and Amazon Redshift Spectrum. I focus primarily on the Kinesis Data Firehose support for Parquet and its integration with the AWS Glue Data Catalog, Amazon Athena, and Amazon Redshift.

Solution overview

Here is how the solution is laid out:

 

 

The following sections walk you through each of these steps to set up the pipeline.

1. Define the schema

When Kinesis Data Firehose processes incoming events and converts the data to Parquet, it needs to know which schema to apply. The reason is that many times, incoming events contain all or some of the expected fields based on which values the producers are advertising. A typical process is to normalize the schema during a batch ETL job so that you end up with a consistent schema that can easily be understood and queried. Doing this introduces latency due to the nature of the batch process. To overcome this issue, Kinesis Data Firehose requires the schema to be defined in advance.

To see the available columns and structures, see Amazon Connect Agent Event Streams. For the purpose of simplicity, I opted to make all the columns of type String rather than create the nested structures. But you can definitely do that if you want.

The simplest way to define the schema is to create a table in the Amazon Athena console. Open the Athena console, and paste the following create table statement, substituting your own S3 bucket and prefix for where your event data will be stored. A Data Catalog database is a logical container that holds the different tables that you can create. The default database name shown here should already exist. If it doesn’t, you can create it or use another database that you’ve already created.

CREATE EXTERNAL TABLE default.kfhconnectblog (
  awsaccountid string,
  agentarn string,
  currentagentsnapshot string,
  eventid string,
  eventtimestamp string,
  eventtype string,
  instancearn string,
  previousagentsnapshot string,
  version string
)
STORED AS parquet
LOCATION 's3://your_bucket/kfhconnectblog/'
TBLPROPERTIES ("parquet.compression"="SNAPPY")

That’s all you have to do to prepare the schema for Kinesis Data Firehose.

2. Define the data streams

Next, you need to define the Kinesis data streams that will be used to stream the Amazon Connect events.  Open the Kinesis Data Streams console and create two streams.  You can configure them with only one shard each because you don’t have a lot of data right now.

3. Define the Kinesis Data Firehose delivery stream for Parquet

Let’s configure the Data Firehose delivery stream using the data stream as the source and Amazon S3 as the output. Start by opening the Kinesis Data Firehose console and creating a new data delivery stream. Give it a name, and associate it with the Kinesis data stream that you created in Step 2.

As shown in the following screenshot, enable Record format conversion (1) and choose Apache Parquet (2). As you can see, Apache ORC is also supported. Scroll down and provide the AWS Glue Data Catalog database name (3) and table names (4) that you created in Step 1. Choose Next.

To make things easier, the output S3 bucket and prefix fields are automatically populated using the values that you defined in the LOCATION parameter of the create table statement from Step 1. Pretty cool. Additionally, you have the option to save the raw events into another location as defined in the Source record S3 backup section. Don’t forget to add a trailing forward slash “ / “ so that Data Firehose creates the date partitions inside that prefix.

On the next page, in the S3 buffer conditions section, there is a note about configuring a large buffer size. The Parquet file format is highly efficient in how it stores and compresses data. Increasing the buffer size allows you to pack more rows into each output file, which is preferred and gives you the most benefit from Parquet.

Compression using Snappy is automatically enabled for both Parquet and ORC. You can modify the compression algorithm by using the Kinesis Data Firehose API and update the OutputFormatConfiguration.

Be sure to also enable Amazon CloudWatch Logs so that you can debug any issues that you might run into.

Lastly, finalize the creation of the Firehose delivery stream, and continue on to the next section.

4. Set up the Amazon Connect contact center

After setting up the Kinesis pipeline, you now need to set up a simple contact center in Amazon Connect. The Getting Started page provides clear instructions on how to set up your environment, acquire a phone number, and create an agent to accept calls.

After setting up the contact center, in the Amazon Connect console, choose your Instance Alias, and then choose Data Streaming. Under Agent Event, choose the Kinesis data stream that you created in Step 2, and then choose Save.

At this point, your pipeline is complete.  Agent events from Amazon Connect are generated as agents go about their day. Events are sent via Kinesis Data Streams to Kinesis Data Firehose, which converts the event data from JSON to Parquet and stores it in S3. Athena and Amazon Redshift Spectrum can simply query the data without any additional work.

So let’s generate some data. Go back into the Administrator console for your Amazon Connect contact center, and create an agent to handle incoming calls. In this example, I creatively named mine Agent One. After it is created, Agent One can get to work and log into their console and set their availability to Available so that they are ready to receive calls.

To make the data a bit more interesting, I also created a second agent, Agent Two. I then made some incoming and outgoing calls and caused some failures to occur, so I now have enough data available to analyze.

5. Analyze the data with Athena

Let’s open the Athena console and run some queries. One thing you’ll notice is that when we created the schema for the dataset, we defined some of the fields as Strings even though in the documentation they were complex structures.  The reason for doing that was simply to show some of the flexibility of Athena to be able to parse JSON data. However, you can define nested structures in your table schema so that Kinesis Data Firehose applies the appropriate schema to the Parquet file.

Let’s run the first query to see which agents have logged into the system.

The query might look complex, but it’s fairly straightforward:

WITH dataset AS (
  SELECT 
    from_iso8601_timestamp(eventtimestamp) AS event_ts,
    eventtype,
    -- CURRENT STATE
    json_extract_scalar(
      currentagentsnapshot,
      '$.agentstatus.name') AS current_status,
    from_iso8601_timestamp(
      json_extract_scalar(
        currentagentsnapshot,
        '$.agentstatus.starttimestamp')) AS current_starttimestamp,
    json_extract_scalar(
      currentagentsnapshot, 
      '$.configuration.firstname') AS current_firstname,
    json_extract_scalar(
      currentagentsnapshot,
      '$.configuration.lastname') AS current_lastname,
    json_extract_scalar(
      currentagentsnapshot, 
      '$.configuration.username') AS current_username,
    json_extract_scalar(
      currentagentsnapshot, 
      '$.configuration.routingprofile.defaultoutboundqueue.name') AS               current_outboundqueue,
    json_extract_scalar(
      currentagentsnapshot, 
      '$.configuration.routingprofile.inboundqueues[0].name') as current_inboundqueue,
    -- PREVIOUS STATE
    json_extract_scalar(
      previousagentsnapshot, 
      '$.agentstatus.name') as prev_status,
    from_iso8601_timestamp(
      json_extract_scalar(
        previousagentsnapshot, 
       '$.agentstatus.starttimestamp')) as prev_starttimestamp,
    json_extract_scalar(
      previousagentsnapshot, 
      '$.configuration.firstname') as prev_firstname,
    json_extract_scalar(
      previousagentsnapshot, 
      '$.configuration.lastname') as prev_lastname,
    json_extract_scalar(
      previousagentsnapshot, 
      '$.configuration.username') as prev_username,
    json_extract_scalar(
      previousagentsnapshot, 
      '$.configuration.routingprofile.defaultoutboundqueue.name') as current_outboundqueue,
    json_extract_scalar(
      previousagentsnapshot, 
      '$.configuration.routingprofile.inboundqueues[0].name') as prev_inboundqueue
  from kfhconnectblog
  where eventtype <> 'HEART_BEAT'
)
SELECT
  current_status as status,
  current_username as username,
  event_ts
FROM dataset
WHERE eventtype = 'LOGIN' AND current_username <> ''
ORDER BY event_ts DESC

The query output looks something like this:

Here is another query that shows the sessions each of the agents engaged with. It tells us where they were incoming or outgoing, if they were completed, and where there were missed or failed calls.

WITH src AS (
  SELECT
     eventid,
     json_extract_scalar(currentagentsnapshot, '$.configuration.username') as username,
     cast(json_extract(currentagentsnapshot, '$.contacts') AS ARRAY(JSON)) as c,
     cast(json_extract(previousagentsnapshot, '$.contacts') AS ARRAY(JSON)) as p
  from kfhconnectblog
),
src2 AS (
  SELECT *
  FROM src CROSS JOIN UNNEST (c, p) AS contacts(c_item, p_item)
),
dataset AS (
SELECT 
  eventid,
  username,
  json_extract_scalar(c_item, '$.contactid') as c_contactid,
  json_extract_scalar(c_item, '$.channel') as c_channel,
  json_extract_scalar(c_item, '$.initiationmethod') as c_direction,
  json_extract_scalar(c_item, '$.queue.name') as c_queue,
  json_extract_scalar(c_item, '$.state') as c_state,
  from_iso8601_timestamp(json_extract_scalar(c_item, '$.statestarttimestamp')) as c_ts,
  
  json_extract_scalar(p_item, '$.contactid') as p_contactid,
  json_extract_scalar(p_item, '$.channel') as p_channel,
  json_extract_scalar(p_item, '$.initiationmethod') as p_direction,
  json_extract_scalar(p_item, '$.queue.name') as p_queue,
  json_extract_scalar(p_item, '$.state') as p_state,
  from_iso8601_timestamp(json_extract_scalar(p_item, '$.statestarttimestamp')) as p_ts
FROM src2
)
SELECT 
  username,
  c_channel as channel,
  c_direction as direction,
  p_state as prev_state,
  c_state as current_state,
  c_ts as current_ts,
  c_contactid as id
FROM dataset
WHERE c_contactid = p_contactid
ORDER BY id DESC, current_ts ASC

The query output looks similar to the following:

6. Analyze the data with Amazon Redshift Spectrum

With Amazon Redshift Spectrum, you can query data directly in S3 using your existing Amazon Redshift data warehouse cluster. Because the data is already in Parquet format, Redshift Spectrum gets the same great benefits that Athena does.

Here is a simple query to show querying the same data from Amazon Redshift. Note that to do this, you need to first create an external schema in Amazon Redshift that points to the AWS Glue Data Catalog.

SELECT 
  eventtype,
  json_extract_path_text(currentagentsnapshot,'agentstatus','name') AS current_status,
  json_extract_path_text(currentagentsnapshot, 'configuration','firstname') AS current_firstname,
  json_extract_path_text(currentagentsnapshot, 'configuration','lastname') AS current_lastname,
  json_extract_path_text(
    currentagentsnapshot,
    'configuration','routingprofile','defaultoutboundqueue','name') AS current_outboundqueue,
FROM default_schema.kfhconnectblog

The following shows the query output:

Summary

In this post, I showed you how to use Kinesis Data Firehose to ingest and convert data to columnar file format, enabling real-time analysis using Athena and Amazon Redshift. This great feature enables a level of optimization in both cost and performance that you need when storing and analyzing large amounts of data. This feature is equally important if you are investing in building data lakes on AWS.

 


Additional Reading

If you found this post useful, be sure to check out Analyzing VPC Flow Logs with Amazon Kinesis Firehose, Amazon Athena, and Amazon QuickSight and Work with partitioned data in AWS Glue.


About the Author

Roy Hasson is a Global Business Development Manager for AWS Analytics. He works with customers around the globe to design solutions to meet their data processing, analytics and business intelligence needs. Roy is big Manchester United fan cheering his team on and hanging out with his family.

 

 

 

10 visualizations to try in Amazon QuickSight with sample data

Post Syndicated from Karthik Kumar Odapally original https://aws.amazon.com/blogs/big-data/10-visualizations-to-try-in-amazon-quicksight-with-sample-data/

If you’re not already familiar with building visualizations for quick access to business insights using Amazon QuickSight, consider this your introduction. In this post, we’ll walk through some common scenarios with sample datasets to provide an overview of how you can connect yuor data, perform advanced analysis and access the results from any web browser or mobile device.

The following visualizations are built from the public datasets available in the links below. Before we jump into that, let’s take a look at the supported data sources, file formats and a typical QuickSight workflow to build any visualization.

Which data sources does Amazon QuickSight support?

At the time of publication, you can use the following data methods:

  • Connect to AWS data sources, including:
    • Amazon RDS
    • Amazon Aurora
    • Amazon Redshift
    • Amazon Athena
    • Amazon S3
  • Upload Excel spreadsheets or flat files (CSV, TSV, CLF, and ELF)
  • Connect to on-premises databases like Teradata, SQL Server, MySQL, and PostgreSQL
  • Import data from SaaS applications like Salesforce and Snowflake
  • Use big data processing engines like Spark and Presto

This list is constantly growing. For more information, see Supported Data Sources.

Answers in instants

SPICE is the Amazon QuickSight super-fast, parallel, in-memory calculation engine, designed specifically for ad hoc data visualization. SPICE stores your data in a system architected for high availability, where it is saved until you choose to delete it. Improve the performance of database datasets by importing the data into SPICE instead of using a direct database query. To calculate how much SPICE capacity your dataset needs, see Managing SPICE Capacity.

Typical Amazon QuickSight workflow

When you create an analysis, the typical workflow is as follows:

  1. Connect to a data source, and then create a new dataset or choose an existing dataset.
  2. (Optional) If you created a new dataset, prepare the data (for example, by changing field names or data types).
  3. Create a new analysis.
  4. Add a visual to the analysis by choosing the fields to visualize. Choose a specific visual type, or use AutoGraph and let Amazon QuickSight choose the most appropriate visual type, based on the number and data types of the fields that you select.
  5. (Optional) Modify the visual to meet your requirements (for example, by adding a filter or changing the visual type).
  6. (Optional) Add more visuals to the analysis.
  7. (Optional) Add scenes to the default story to provide a narrative about some aspect of the analysis data.
  8. (Optional) Publish the analysis as a dashboard to share insights with other users.

The following graphic illustrates a typical Amazon QuickSight workflow.

Visualizations created in Amazon QuickSight with sample datasets

Visualizations for a data analyst

Source:  https://data.worldbank.org/

Download and Resources:  https://datacatalog.worldbank.org/dataset/world-development-indicators

Data catalog:  The World Bank invests into multiple development projects at the national, regional, and global levels. It’s a great source of information for data analysts.

The following graph shows the percentage of the population that has access to electricity (rural and urban) during 2000 in Asia, Africa, the Middle East, and Latin America.

The following graph shows the share of healthcare costs that are paid out-of-pocket (private vs. public). Also, you can maneuver over the graph to get detailed statistics at a glance.

Visualizations for a trading analyst

Source:  Deutsche Börse Public Dataset (DBG PDS)

Download and resources:  https://aws.amazon.com/public-datasets/deutsche-boerse-pds/

Data catalog:  The DBG PDS project makes real-time data derived from Deutsche Börse’s trading market systems available to the public for free. This is the first time that such detailed financial market data has been shared freely and continually from the source provider.

The following graph shows the market trend of max trade volume for different EU banks. It builds on the data available on XETRA engines, which is made up of a variety of equities, funds, and derivative securities. This graph can be scrolled to visualize trade for a period of an hour or more.

The following graph shows the common stock beating the rest of the maximum trade volume over a period of time, grouped by security type.

Visualizations for a data scientist

Source:  https://catalog.data.gov/

Download and resources:  https://catalog.data.gov/dataset/road-weather-information-stations-788f8

Data catalog:  Data derived from different sensor stations placed on the city bridges and surface streets are a core information source. The road weather information station has a temperature sensor that measures the temperature of the street surface. It also has a sensor that measures the ambient air temperature at the station each second.

The following graph shows the present max air temperature in Seattle from different RWI station sensors.

The following graph shows the minimum temperature of the road surface at different times, which helps predicts road conditions at a particular time of the year.

Visualizations for a data engineer

Source:  https://www.kaggle.com/

Download and resources:  https://www.kaggle.com/datasnaek/youtube-new/data

Data catalog:  Kaggle has come up with a platform where people can donate open datasets. Data engineers and other community members can have open access to these datasets and can contribute to the open data movement. They have more than 350 datasets in total, with more than 200 as featured datasets. It has a few interesting datasets on the platform that are not present at other places, and it’s a platform to connect with other data enthusiasts.

The following graph shows the trending YouTube videos and presents the max likes for the top 20 channels. This is one of the most popular datasets for data engineers.

The following graph shows the YouTube daily statistics for the max views of video titles published during a specific time period.

Visualizations for a business user

Source:  New York Taxi Data

Download and resources:  https://data.cityofnewyork.us/Transportation/2016-Green-Taxi-Trip-Data/hvrh-b6nb

Data catalog: NYC Open data hosts some very popular open data sets for all New Yorkers. This platform allows you to get involved in dive deep into the data set to pull some useful visualizations. 2016 Green taxi trip dataset includes trip records from all trips completed in green taxis in NYC in 2016. Records include fields capturing pick-up and drop-off dates/times, pick-up and drop-off locations, trip distances, itemized fares, rate types, payment types, and driver-reported passenger counts.

The following graph presents maximum fare amount grouped by the passenger count during a period of time during a day. This can be further expanded to follow through different day of the month based on the business need.

The following graph shows the NewYork taxi data from January 2016, showing the dip in the number of taxis ridden on January 23, 2016 across all types of taxis.

A quick search for that date and location shows you the following news report:

Summary

Using Amazon QuickSight, you can see patterns across a time-series data by building visualizations, performing ad hoc analysis, and quickly generating insights. We hope you’ll give it a try today!

 


Additional Reading

If you found this post useful, be sure to check out Amazon QuickSight Adds Support for Combo Charts and Row-Level Security and Visualize AWS Cloudtrail Logs Using AWS Glue and Amazon QuickSight.


Karthik Odapally is a Sr. Solutions Architect in AWS. His passion is to build cost effective and highly scalable solutions on the cloud. In his spare time, he bakes cookies and cupcakes for family and friends here in the PNW. He loves vintage racing cars.

 

 

 

Pranabesh Mandal is a Solutions Architect in AWS. He has over a decade of IT experience. He is passionate about cloud technology and focuses on Analytics. In his spare time, he likes to hike and explore the beautiful nature and wild life of most divine national parks around the United States alongside his wife.

 

 

 

 

Easier way to control access to AWS regions using IAM policies

Post Syndicated from Sulay Shah original https://aws.amazon.com/blogs/security/easier-way-to-control-access-to-aws-regions-using-iam-policies/

We made it easier for you to comply with regulatory standards by controlling access to AWS Regions using IAM policies. For example, if your company requires users to create resources in a specific AWS region, you can now add a new condition to the IAM policies you attach to your IAM principal (user or role) to enforce this for all AWS services. In this post, I review conditions in policies, introduce the new condition, and review a policy example to demonstrate how you can control access across multiple AWS services to a specific region.

Condition concepts

Before I introduce the new condition, let’s review the condition element of an IAM policy. A condition is an optional IAM policy element that lets you specify special circumstances under which the policy grants or denies permission. A condition includes a condition key, operator, and value for the condition. There are two types of conditions: service-specific conditions and global conditions. Service-specific conditions are specific to certain actions in an AWS service. For example, the condition key ec2:InstanceType supports specific EC2 actions. Global conditions support all actions across all AWS services.

Now that I’ve reviewed the condition element in an IAM policy, let me introduce the new condition.

AWS:RequestedRegion condition key

The new global condition key, , supports all actions across all AWS services. You can use any string operator and specify any AWS region for its value.

Condition keyDescriptionOperator(s)Value
aws:RequestedRegionAllows you to specify the region to which the IAM principal (user or role) can make API callsAll string operators (for example, StringEqualsAny AWS region (for example, us-east-1)

I’ll now demonstrate the use of the new global condition key.

Example: Policy with region-level control

Let’s say a group of software developers in my organization is working on a project using Amazon EC2 and Amazon RDS. The project requires a web server running on an EC2 instance using Amazon Linux and a MySQL database instance in RDS. The developers also want to test Amazon Lambda, an event-driven platform, to retrieve data from the MySQL DB instance in RDS for future use.

My organization requires all the AWS resources to remain in the Frankfurt, eu-central-1, region. To make sure this project follows these guidelines, I create a single IAM policy for all the AWS services that this group is going to use and apply the new global condition key aws:RequestedRegion for all the services. This way I can ensure that any new EC2 instances launched or any database instances created using RDS are in Frankfurt. This policy also ensures that any Lambda functions this group creates for testing are also in the Frankfurt region.


{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ec2:DescribeAccountAttributes",
                "ec2:DescribeAvailabilityZones",
                "ec2:DescribeInternetGateways",
                "ec2:DescribeSecurityGroups",
                "ec2:DescribeSubnets",
                "ec2:DescribeVpcAttribute",
                "ec2:DescribeVpcs",
                "ec2:DescribeInstances",
                "ec2:DescribeImages",
                "ec2:DescribeKeyPairs",
                "rds:Describe*",
                "iam:ListRolePolicies",
                "iam:ListRoles",
                "iam:GetRole",
                "iam:ListInstanceProfiles",
                "iam:AttachRolePolicy",
                "lambda:GetAccountSettings"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "ec2:RunInstances",
                "rds:CreateDBInstance",
                "rds:CreateDBCluster",
                "lambda:CreateFunction",
                "lambda:InvokeFunction"
            ],
            "Resource": "*",
      "Condition": {"StringEquals": {"aws:RequestedRegion": "eu-central-1"}}

        },
        {
            "Effect": "Allow",
            "Action": [
                "iam:PassRole"
            ],
            "Resource": "arn:aws:iam::account-id:role/*"
        }
    ]
}

The first statement in the above example contains all the read-only actions that let my developers use the console for EC2, RDS, and Lambda. The permissions for IAM-related actions are required to launch EC2 instances with a role, enable enhanced monitoring in RDS, and for AWS Lambda to assume the IAM execution role to execute the Lambda function. I’ve combined all the read-only actions into a single statement for simplicity. The second statement is where I give write access to my developers for the three services and restrict the write access to the Frankfurt region using the aws:RequestedRegion condition key. You can also list multiple AWS regions with the new condition key if your developers are allowed to create resources in multiple regions. The third statement grants permissions for the IAM action iam:PassRole required by AWS Lambda. For more information on allowing users to create a Lambda function, see Using Identity-Based Policies for AWS Lambda.

Summary

You can now use the aws:RequestedRegion global condition key in your IAM policies to specify the region to which the IAM principal (user or role) can invoke an API call. This capability makes it easier for you to restrict the AWS regions your IAM principals can use to comply with regulatory standards and improve account security. For more information about this global condition key and policy examples using aws:RequestedRegion, see the IAM documentation.

If you have comments about this post, submit them in the Comments section below. If you have questions about or suggestions for this solution, start a new thread on the IAM forum.

Want more AWS Security news? Follow us on Twitter.

[$] wait_var_event()

Post Syndicated from corbet original https://lwn.net/Articles/750774/rss

One of the trickiest aspects to concurrency in the kernel is waiting for a
specific event to take place. There is a wide variety of possible events,
including a process exiting, the last reference to a data structure going
away, a device completing an operation, or a timeout occurring.
Waiting is surprisingly hard to get right — race conditions abound to trap
the unwary — so the kernel has
accumulated a
large set of wait_event_*() macros to make the task easier. An
attempt to add a new one, though, has led to the generalization of specific
types of waits for 4.17.

Facebook and Cambridge Analytica

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2018/03/facebook_and_ca.html

In the wake of the Cambridge Analytica scandal, news articles and commentators have focused on what Facebook knows about us. A lot, it turns out. It collects data from our posts, our likes, our photos, things we type and delete without posting, and things we do while not on Facebook and even when we’re offline. It buys data about us from others. And it can infer even more: our sexual orientation, political beliefs, relationship status, drug use, and other personality traits — even if we didn’t take the personality test that Cambridge Analytica developed.

But for every article about Facebook’s creepy stalker behavior, thousands of other companies are breathing a collective sigh of relief that it’s Facebook and not them in the spotlight. Because while Facebook is one of the biggest players in this space, there are thousands of other companies that spy on and manipulate us for profit.

Harvard Business School professor Shoshana Zuboff calls it “surveillance capitalism.” And as creepy as Facebook is turning out to be, the entire industry is far creepier. It has existed in secret far too long, and it’s up to lawmakers to force these companies into the public spotlight, where we can all decide if this is how we want society to operate and — if not — what to do about it.

There are 2,500 to 4,000 data brokers in the United States whose business is buying and selling our personal data. Last year, Equifax was in the news when hackers stole personal information on 150 million people, including Social Security numbers, birth dates, addresses, and driver’s license numbers.

You certainly didn’t give it permission to collect any of that information. Equifax is one of those thousands of data brokers, most of them you’ve never heard of, selling your personal information without your knowledge or consent to pretty much anyone who will pay for it.

Surveillance capitalism takes this one step further. Companies like Facebook and Google offer you free services in exchange for your data. Google’s surveillance isn’t in the news, but it’s startlingly intimate. We never lie to our search engines. Our interests and curiosities, hopes and fears, desires and sexual proclivities, are all collected and saved. Add to that the websites we visit that Google tracks through its advertising network, our Gmail accounts, our movements via Google Maps, and what it can collect from our smartphones.

That phone is probably the most intimate surveillance device ever invented. It tracks our location continuously, so it knows where we live, where we work, and where we spend our time. It’s the first and last thing we check in a day, so it knows when we wake up and when we go to sleep. We all have one, so it knows who we sleep with. Uber used just some of that information to detect one-night stands; your smartphone provider and any app you allow to collect location data knows a lot more.

Surveillance capitalism drives much of the internet. It’s behind most of the “free” services, and many of the paid ones as well. Its goal is psychological manipulation, in the form of personalized advertising to persuade you to buy something or do something, like vote for a candidate. And while the individualized profile-driven manipulation exposed by Cambridge Analytica feels abhorrent, it’s really no different from what every company wants in the end. This is why all your personal information is collected, and this is why it is so valuable. Companies that can understand it can use it against you.

None of this is new. The media has been reporting on surveillance capitalism for years. In 2015, I wrote a book about it. Back in 2010, the Wall Street Journal published an award-winning two-year series about how people are tracked both online and offline, titled “What They Know.”

Surveillance capitalism is deeply embedded in our increasingly computerized society, and if the extent of it came to light there would be broad demands for limits and regulation. But because this industry can largely operate in secret, only occasionally exposed after a data breach or investigative report, we remain mostly ignorant of its reach.

This might change soon. In 2016, the European Union passed the comprehensive General Data Protection Regulation, or GDPR. The details of the law are far too complex to explain here, but some of the things it mandates are that personal data of EU citizens can only be collected and saved for “specific, explicit, and legitimate purposes,” and only with explicit consent of the user. Consent can’t be buried in the terms and conditions, nor can it be assumed unless the user opts in. This law will take effect in May, and companies worldwide are bracing for its enforcement.

Because pretty much all surveillance capitalism companies collect data on Europeans, this will expose the industry like nothing else. Here’s just one example. In preparation for this law, PayPal quietly published a list of over 600 companies it might share your personal data with. What will it be like when every company has to publish this sort of information, and explicitly explain how it’s using your personal data? We’re about to find out.

In the wake of this scandal, even Mark Zuckerberg said that his industry probably should be regulated, although he’s certainly not wishing for the sorts of comprehensive regulation the GDPR is bringing to Europe.

He’s right. Surveillance capitalism has operated without constraints for far too long. And advances in both big data analysis and artificial intelligence will make tomorrow’s applications far creepier than today’s. Regulation is the only answer.

The first step to any regulation is transparency. Who has our data? Is it accurate? What are they doing with it? Who are they selling it to? How are they securing it? Can we delete it? I don’t see any hope of Congress passing a GDPR-like data protection law anytime soon, but it’s not too far-fetched to demand laws requiring these companies to be more transparent in what they’re doing.

One of the responses to the Cambridge Analytica scandal is that people are deleting their Facebook accounts. It’s hard to do right, and doesn’t do anything about the data that Facebook collects about people who don’t use Facebook. But it’s a start. The market can put pressure on these companies to reduce their spying on us, but it can only do that if we force the industry out of its secret shadows.

This essay previously appeared on CNN.com.

EDITED TO ADD (4/2): Slashdot thread.

Performing Unit Testing in an AWS CodeStar Project

Post Syndicated from Jerry Mathen Jacob original https://aws.amazon.com/blogs/devops/performing-unit-testing-in-an-aws-codestar-project/

In this blog post, I will show how you can perform unit testing as a part of your AWS CodeStar project. AWS CodeStar helps you quickly develop, build, and deploy applications on AWS. With AWS CodeStar, you can set up your continuous delivery (CD) toolchain and manage your software development from one place.

Because unit testing tests individual units of application code, it is helpful for quickly identifying and isolating issues. As a part of an automated CI/CD process, it can also be used to prevent bad code from being deployed into production.

Many of the AWS CodeStar project templates come preconfigured with a unit testing framework so that you can start deploying your code with more confidence. The unit testing is configured to run in the provided build stage so that, if the unit tests do not pass, the code is not deployed. For a list of AWS CodeStar project templates that include unit testing, see AWS CodeStar Project Templates in the AWS CodeStar User Guide.

The scenario

As a big fan of superhero movies, I decided to list my favorites and ask my friends to vote on theirs by using a WebService endpoint I created. The example I use is a Python web service running on AWS Lambda with AWS CodeCommit as the code repository. CodeCommit is a fully managed source control system that hosts Git repositories and works with all Git-based tools.

Here’s how you can create the WebService endpoint:

Sign in to the AWS CodeStar console. Choose Start a project, which will take you to the list of project templates.

create project

For code edits I will choose AWS Cloud9, which is a cloud-based integrated development environment (IDE) that you use to write, run, and debug code.

choose cloud9

Here are the other tasks required by my scenario:

  • Create a database table where the votes can be stored and retrieved as needed.
  • Update the logic in the Lambda function that was created for posting and getting the votes.
  • Update the unit tests (of course!) to verify that the logic works as expected.

For a database table, I’ve chosen Amazon DynamoDB, which offers a fast and flexible NoSQL database.

Getting set up on AWS Cloud9

From the AWS CodeStar console, go to the AWS Cloud9 console, which should take you to your project code. I will open up a terminal at the top-level folder under which I will set up my environment and required libraries.

Use the following command to set the PYTHONPATH environment variable on the terminal.

export PYTHONPATH=/home/ec2-user/environment/vote-your-movie

You should now be able to use the following command to execute the unit tests in your project.

python -m unittest discover vote-your-movie/tests

cloud9 setup

Start coding

Now that you have set up your local environment and have a copy of your code, add a DynamoDB table to the project by defining it through a template file. Open template.yml, which is the Serverless Application Model (SAM) template file. This template extends AWS CloudFormation to provide a simplified way of defining the Amazon API Gateway APIs, AWS Lambda functions, and Amazon DynamoDB tables required by your serverless application.

AWSTemplateFormatVersion: 2010-09-09
Transform:
- AWS::Serverless-2016-10-31
- AWS::CodeStar

Parameters:
  ProjectId:
    Type: String
    Description: CodeStar projectId used to associate new resources to team members

Resources:
  # The DB table to store the votes.
  MovieVoteTable:
    Type: AWS::Serverless::SimpleTable
    Properties:
      PrimaryKey:
        # Name of the "Candidate" is the partition key of the table.
        Name: Candidate
        Type: String
  # Creating a new lambda function for retrieving and storing votes.
  MovieVoteLambda:
    Type: AWS::Serverless::Function
    Properties:
      Handler: index.handler
      Runtime: python3.6
      Environment:
        # Setting environment variables for your lambda function.
        Variables:
          TABLE_NAME: !Ref "MovieVoteTable"
          TABLE_REGION: !Ref "AWS::Region"
      Role:
        Fn::ImportValue:
          !Join ['-', [!Ref 'ProjectId', !Ref 'AWS::Region', 'LambdaTrustRole']]
      Events:
        GetEvent:
          Type: Api
          Properties:
            Path: /
            Method: get
        PostEvent:
          Type: Api
          Properties:
            Path: /
            Method: post

We’ll use Python’s boto3 library to connect to AWS services. And we’ll use Python’s mock library to mock AWS service calls for our unit tests.
Use the following command to install these libraries:

pip install --upgrade boto3 mock -t .

install dependencies

Add these libraries to the buildspec.yml, which is the YAML file that is required for CodeBuild to execute.

version: 0.2

phases:
  install:
    commands:

      # Upgrade AWS CLI to the latest version
      - pip install --upgrade awscli boto3 mock

  pre_build:
    commands:

      # Discover and run unit tests in the 'tests' directory. For more information, see <https://docs.python.org/3/library/unittest.html#test-discovery>
      - python -m unittest discover tests

  build:
    commands:

      # Use AWS SAM to package the application by using AWS CloudFormation
      - aws cloudformation package --template template.yml --s3-bucket $S3_BUCKET --output-template template-export.yml

artifacts:
  type: zip
  files:
    - template-export.yml

Open the index.py where we can write the simple voting logic for our Lambda function.

import json
import datetime
import boto3
import os

table_name = os.environ['TABLE_NAME']
table_region = os.environ['TABLE_REGION']

VOTES_TABLE = boto3.resource('dynamodb', region_name=table_region).Table(table_name)
CANDIDATES = {"A": "Black Panther", "B": "Captain America: Civil War", "C": "Guardians of the Galaxy", "D": "Thor: Ragnarok"}

def handler(event, context):
    if event['httpMethod'] == 'GET':
        resp = VOTES_TABLE.scan()
        return {'statusCode': 200,
                'body': json.dumps({item['Candidate']: int(item['Votes']) for item in resp['Items']}),
                'headers': {'Content-Type': 'application/json'}}

    elif event['httpMethod'] == 'POST':
        try:
            body = json.loads(event['body'])
        except:
            return {'statusCode': 400,
                    'body': 'Invalid input! Expecting a JSON.',
                    'headers': {'Content-Type': 'application/json'}}
        if 'candidate' not in body:
            return {'statusCode': 400,
                    'body': 'Missing "candidate" in request.',
                    'headers': {'Content-Type': 'application/json'}}
        if body['candidate'] not in CANDIDATES.keys():
            return {'statusCode': 400,
                    'body': 'You must vote for one of the following candidates - {}.'.format(get_allowed_candidates()),
                    'headers': {'Content-Type': 'application/json'}}

        resp = VOTES_TABLE.update_item(
            Key={'Candidate': CANDIDATES.get(body['candidate'])},
            UpdateExpression='ADD Votes :incr',
            ExpressionAttributeValues={':incr': 1},
            ReturnValues='ALL_NEW'
        )
        return {'statusCode': 200,
                'body': "{} now has {} votes".format(CANDIDATES.get(body['candidate']), resp['Attributes']['Votes']),
                'headers': {'Content-Type': 'application/json'}}

def get_allowed_candidates():
    l = []
    for key in CANDIDATES:
        l.append("'{}' for '{}'".format(key, CANDIDATES.get(key)))
    return ", ".join(l)

What our code basically does is take in the HTTPS request call as an event. If it is an HTTP GET request, it gets the votes result from the table. If it is an HTTP POST request, it sets a vote for the candidate of choice. We also validate the inputs in the POST request to filter out requests that seem malicious. That way, only valid calls are stored in the table.

In the example code provided, we use a CANDIDATES variable to store our candidates, but you can store the candidates in a JSON file and use Python’s json library instead.

Let’s update the tests now. Under the tests folder, open the test_handler.py and modify it to verify the logic.

import os
# Some mock environment variables that would be used by the mock for DynamoDB
os.environ['TABLE_NAME'] = "MockHelloWorldTable"
os.environ['TABLE_REGION'] = "us-east-1"

# The library containing our logic.
import index

# Boto3's core library
import botocore
# For handling JSON.
import json
# Unit test library
import unittest
## Getting StringIO based on your setup.
try:
    from StringIO import StringIO
except ImportError:
    from io import StringIO
## Python mock library
from mock import patch, call
from decimal import Decimal

@patch('botocore.client.BaseClient._make_api_call')
class TestCandidateVotes(unittest.TestCase):

    ## Test the HTTP GET request flow. 
    ## We expect to get back a successful response with results of votes from the table (mocked).
    def test_get_votes(self, boto_mock):
        # Input event to our method to test.
        expected_event = {'httpMethod': 'GET'}
        # The mocked values in our DynamoDB table.
        items_in_db = [{'Candidate': 'Black Panther', 'Votes': Decimal('3')},
                        {'Candidate': 'Captain America: Civil War', 'Votes': Decimal('8')},
                        {'Candidate': 'Guardians of the Galaxy', 'Votes': Decimal('8')},
                        {'Candidate': "Thor: Ragnarok", 'Votes': Decimal('1')}
                    ]
        # The mocked DynamoDB response.
        expected_ddb_response = {'Items': items_in_db}
        # The mocked response we expect back by calling DynamoDB through boto.
        response_body = botocore.response.StreamingBody(StringIO(str(expected_ddb_response)),
                                                        len(str(expected_ddb_response)))
        # Setting the expected value in the mock.
        boto_mock.side_effect = [expected_ddb_response]
        # Expecting that there would be a call to DynamoDB Scan function during execution with these parameters.
        expected_calls = [call('Scan', {'TableName': os.environ['TABLE_NAME']})]

        # Call the function to test.
        result = index.handler(expected_event, {})

        # Run unit test assertions to verify the expected calls to mock have occurred and verify the response.
        assert result.get('headers').get('Content-Type') == 'application/json'
        assert result.get('statusCode') == 200

        result_body = json.loads(result.get('body'))
        # Verifying that the results match to that from the table.
        assert len(result_body) == len(items_in_db)
        for i in range(len(result_body)):
            assert result_body.get(items_in_db[i].get("Candidate")) == int(items_in_db[i].get("Votes"))

        assert boto_mock.call_count == 1
        boto_mock.assert_has_calls(expected_calls)

    ## Test the HTTP POST request flow that places a vote for a selected candidate.
    ## We expect to get back a successful response with a confirmation message.
    def test_place_valid_candidate_vote(self, boto_mock):
        # Input event to our method to test.
        expected_event = {'httpMethod': 'POST', 'body': "{\"candidate\": \"D\"}"}
        # The mocked response in our DynamoDB table.
        expected_ddb_response = {'Attributes': {'Candidate': "Thor: Ragnarok", 'Votes': Decimal('2')}}
        # The mocked response we expect back by calling DynamoDB through boto.
        response_body = botocore.response.StreamingBody(StringIO(str(expected_ddb_response)),
                                                        len(str(expected_ddb_response)))
        # Setting the expected value in the mock.
        boto_mock.side_effect = [expected_ddb_response]
        # Expecting that there would be a call to DynamoDB UpdateItem function during execution with these parameters.
        expected_calls = [call('UpdateItem', {
                                                'TableName': os.environ['TABLE_NAME'], 
                                                'Key': {'Candidate': 'Thor: Ragnarok'},
                                                'UpdateExpression': 'ADD Votes :incr',
                                                'ExpressionAttributeValues': {':incr': 1},
                                                'ReturnValues': 'ALL_NEW'
                                            })]
        # Call the function to test.
        result = index.handler(expected_event, {})
        # Run unit test assertions to verify the expected calls to mock have occurred and verify the response.
        assert result.get('headers').get('Content-Type') == 'application/json'
        assert result.get('statusCode') == 200

        assert result.get('body') == "{} now has {} votes".format(
            expected_ddb_response['Attributes']['Candidate'], 
            expected_ddb_response['Attributes']['Votes'])

        assert boto_mock.call_count == 1
        boto_mock.assert_has_calls(expected_calls)

    ## Test the HTTP POST request flow that places a vote for an non-existant candidate.
    ## We expect to get back a successful response with a confirmation message.
    def test_place_invalid_candidate_vote(self, boto_mock):
        # Input event to our method to test.
        # The valid IDs for the candidates are A, B, C, and D
        expected_event = {'httpMethod': 'POST', 'body': "{\"candidate\": \"E\"}"}
        # Call the function to test.
        result = index.handler(expected_event, {})
        # Run unit test assertions to verify the expected calls to mock have occurred and verify the response.
        assert result.get('headers').get('Content-Type') == 'application/json'
        assert result.get('statusCode') == 400
        assert result.get('body') == 'You must vote for one of the following candidates - {}.'.format(index.get_allowed_candidates())

    ## Test the HTTP POST request flow that places a vote for a selected candidate but associated with an invalid key in the POST body.
    ## We expect to get back a failed (400) response with an appropriate error message.
    def test_place_invalid_data_vote(self, boto_mock):
        # Input event to our method to test.
        # "name" is not the expected input key.
        expected_event = {'httpMethod': 'POST', 'body': "{\"name\": \"D\"}"}
        # Call the function to test.
        result = index.handler(expected_event, {})
        # Run unit test assertions to verify the expected calls to mock have occurred and verify the response.
        assert result.get('headers').get('Content-Type') == 'application/json'
        assert result.get('statusCode') == 400
        assert result.get('body') == 'Missing "candidate" in request.'

    ## Test the HTTP POST request flow that places a vote for a selected candidate but not as a JSON string which the body of the request expects.
    ## We expect to get back a failed (400) response with an appropriate error message.
    def test_place_malformed_json_vote(self, boto_mock):
        # Input event to our method to test.
        # "body" receives a string rather than a JSON string.
        expected_event = {'httpMethod': 'POST', 'body': "Thor: Ragnarok"}
        # Call the function to test.
        result = index.handler(expected_event, {})
        # Run unit test assertions to verify the expected calls to mock have occurred and verify the response.
        assert result.get('headers').get('Content-Type') == 'application/json'
        assert result.get('statusCode') == 400
        assert result.get('body') == 'Invalid input! Expecting a JSON.'

if __name__ == '__main__':
    unittest.main()

I am keeping the code samples well commented so that it’s clear what each unit test accomplishes. It tests the success conditions and the failure paths that are handled in the logic.

In my unit tests I use the patch decorator (@patch) in the mock library. @patch helps mock the function you want to call (in this case, the botocore library’s _make_api_call function in the BaseClient class).
Before we commit our changes, let’s run the tests locally. On the terminal, run the tests again. If all the unit tests pass, you should expect to see a result like this:

You:~/environment $ python -m unittest discover vote-your-movie/tests
.....
----------------------------------------------------------------------
Ran 5 tests in 0.003s

OK
You:~/environment $

Upload to AWS

Now that the tests have passed, it’s time to commit and push the code to source repository!

Add your changes

From the terminal, go to the project’s folder and use the following command to verify the changes you are about to push.

git status

To add the modified files only, use the following command:

git add -u

Commit your changes

To commit the changes (with a message), use the following command:

git commit -m "Logic and tests for the voting webservice."

Push your changes to AWS CodeCommit

To push your committed changes to CodeCommit, use the following command:

git push

In the AWS CodeStar console, you can see your changes flowing through the pipeline and being deployed. There are also links in the AWS CodeStar console that take you to this project’s build runs so you can see your tests running on AWS CodeBuild. The latest link under the Build Runs table takes you to the logs.

unit tests at codebuild

After the deployment is complete, AWS CodeStar should now display the AWS Lambda function and DynamoDB table created and synced with this project. The Project link in the AWS CodeStar project’s navigation bar displays the AWS resources linked to this project.

codestar resources

Because this is a new database table, there should be no data in it. So, let’s put in some votes. You can download Postman to test your application endpoint for POST and GET calls. The endpoint you want to test is the URL displayed under Application endpoints in the AWS CodeStar console.

Now let’s open Postman and look at the results. Let’s create some votes through POST requests. Based on this example, a valid vote has a value of A, B, C, or D.
Here’s what a successful POST request looks like:

POST success

Here’s what it looks like if I use some value other than A, B, C, or D:

 

POST Fail

Now I am going to use a GET request to fetch the results of the votes from the database.

GET success

And that’s it! You have now created a simple voting web service using AWS Lambda, Amazon API Gateway, and DynamoDB and used unit tests to verify your logic so that you ship good code.
Happy coding!

New – Amazon DynamoDB Continuous Backups and Point-In-Time Recovery (PITR)

Post Syndicated from Randall Hunt original https://aws.amazon.com/blogs/aws/new-amazon-dynamodb-continuous-backups-and-point-in-time-recovery-pitr/

The Amazon DynamoDB team is back with another useful feature hot on the heels of encryption at rest. At AWS re:Invent 2017 we launched global tables and on-demand backup and restore of your DynamoDB tables and today we’re launching continuous backups with point-in-time recovery (PITR).

You can enable continuous backups with a single click in the AWS Management Console, a simple API call, or with the AWS Command Line Interface (CLI). DynamoDB can back up your data with per-second granularity and restore to any single second from the time PITR was enabled up to the prior 35 days. We built this feature to protect against accidental writes or deletes. If a developer runs a script against production instead of staging or if someone fat-fingers a DeleteItem call, PITR has you covered. We also built it for the scenarios you can’t normally predict. You can still keep your on-demand backups for as long as needed for archival purposes but PITR works as additional insurance against accidental loss of data. Let’s see how this works.

Continuous Backup

To enable this feature in the console we navigate to our table and select the Backups tab. From there simply click Enable to turn on the feature. I could also turn on continuous backups via the UpdateContinuousBackups API call.

After continuous backup is enabled we should be able to see an Earliest restore date and Latest restore date

Let’s imagine a scenario where I have a lot of old user profiles that I want to delete.

I really only want to send service updates to our active users based on their last_update date. I decided to write a quick Python script to delete all the users that haven’t used my service in a while.

import boto3
table = boto3.resource("dynamodb").Table("VerySuperImportantTable")
items = table.scan(
    FilterExpression="last_update >= :date",
    ExpressionAttributeValues={":date": "2014-01-01T00:00:00"},
    ProjectionExpression="ImportantId"
)['Items']
print("Deleting {} Items! Dangerous.".format(len(items)))
with table.batch_writer() as batch:
    for item in items:
        batch.delete_item(Key=item)

Great! This should delete all those pesky non-users of my service that haven’t logged in since 2013. So,— CTRL+C CTRL+C CTRL+C CTRL+C (interrupt the currently executing command).

Yikes! Do you see where I went wrong? I’ve just deleted my most important users! Oh, no! Where I had a greater-than sign, I meant to put a less-than! Quick, before Jeff Barr can see, I’m going to restore the table. (I probably could have prevented that typo with Boto 3’s handy DynamoDB conditions: Attr("last_update").lt("2014-01-01T00:00:00"))

Restoring

Luckily for me, restoring a table is easy. In the console I’ll navigate to the Backups tab for my table and click Restore to point-in-time.

I’ll specify the time (a few seconds before I started my deleting spree) and a name for the table I’m restoring to.

For a relatively small and evenly distributed table like mine, the restore is quite fast.

The time it takes to restore a table varies based on multiple factors and restore times are not neccesarily coordinated with the size of the table. If your dataset is evenly distributed across your primary keys you’ll be able to take advanatage of parallelization which will speed up your restores.

Learn More & Try It Yourself
There’s plenty more to learn about this new feature in the documentation here.

Pricing for continuous backups varies by region and is based on the current size of the table and all indexes.

A few things to note:

  • PITR works with encrypted tables.
  • If you disable PITR and later reenable it, you reset the start time from which you can recover.
  • Just like on-demand backups, there are no performance or availability impacts to enabling this feature.
  • Stream settings, Time To Live settings, PITR settings, tags, Amazon CloudWatch alarms, and auto scaling policies are not copied to the restored table.
  • Jeff, it turns out, knew I restored the table all along because every PITR API call is recorded in AWS CloudTrail.

Let us know how you’re going to use continuous backups and PITR on Twitter and in the comments.
Randall

Six more companies adopt GPLv3 termination language

Post Syndicated from corbet original https://lwn.net/Articles/749758/rss

Red Hat has announced
that six more companies (CA Technologies, Cisco, HPE, Microsoft, SAP, and
SUSE) have agreed to apply the GPLv3 termination conditions (wherein a
violator’s license is automatically restored if the problem is fixed in a
timely manner) to GPLv2-licensed code. “GPL version 3 (GPLv3)
introduced an approach to termination that offers distributors of the code
an opportunity to correct errors and mistakes in license compliance. This
approach allows for enforcement of license compliance consistent with a
community in which heavy-handed approaches to enforcement, including for
financial gain, are out of place.

The Challenges of Opening a Data Center — Part 2

Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/factors-for-choosing-data-center/

Rows of storage pods in a data center

This is part two of a series on the factors that an organization needs to consider when opening a data center and the challenges that must be met in the process.

In Part 1 of this series, we looked at the different types of data centers, the importance of location in planning a data center, data center certification, and the single most expensive factor in running a data center, power.

In Part 2, we continue to look at factors that need to considered both by those interested in a dedicated data center and those seeking to colocate in an existing center.

Power (continued from Part 1)

In part 1, we began our discussion of the power requirements of data centers.

As we discussed, redundancy and failover is a chief requirement for data center power. A redundantly designed power supply system is also a necessity for maintenance, as it enables repairs to be performed on one network, for example, without having to turn off servers, databases, or electrical equipment.

Power Path

The common critical components of a data center’s power flow are:

  • Utility Supply
  • Generators
  • Transfer Switches
  • Distribution Panels
  • Uninterruptible Power Supplies (UPS)
  • PDUs

Utility Supply is the power that comes from one or more utility grids. While most of us consider the grid to be our primary power supply (hats off to those of you who manage to live off the grid), politics, economics, and distribution make utility supply power susceptible to outages, which is why data centers must have autonomous power available to maintain availability.

Generators are used to supply power when the utility supply is unavailable. They convert mechanical energy, usually from motors, to electrical energy.

Transfer Switches are used to transfer electric load from one source or electrical device to another, such as from one utility line to another, from a generator to a utility, or between generators. The transfer could be manually activated or automatic to ensure continuous electrical power.

Distribution Panels get the power where it needs to go, taking a power feed and dividing it into separate circuits to supply multiple loads.

A UPS, as we touched on earlier, ensures that continuous power is available even when the main power source isn’t. It often consists of batteries that can come online almost instantaneously when the current power ceases. The power from a UPS does not have to last a long time as it is considered an emergency measure until the main power source can be restored. Another function of the UPS is to filter and stabilize the power from the main power supply.

Data Center UPS

Data center UPSs

PDU stands for the Power Distribution Unit and is the device that distributes power to the individual pieces of equipment.

Network

After power, the networking connections to the data center are of prime importance. Can the data center obtain and maintain high-speed networking connections to the building? With networking, as with all aspects of a data center, availability is a primary consideration. Data center designers think of all possible ways service can be interrupted or lost, even briefly. Details such as the vulnerabilities in the route the network connections make from the core network (the backhaul) to the center, and where network connections enter and exit a building, must be taken into consideration in network and data center design.

Routers and switches are used to transport traffic between the servers in the data center and the core network. Just as with power, network redundancy is a prime factor in maintaining availability of data center services. Two or more upstream service providers are required to ensure that availability.

How fast a customer can transfer data to a data center is affected by: 1) the speed of the connections the data center has with the outside world, 2) the quality of the connections between the customer and the data center, and 3) the distance of the route from customer to the data center. The longer the length of the route and the greater the number of packets that must be transferred, the more significant a factor will be played by latency in the data transfer. Latency is the delay before a transfer of data begins following an instruction for its transfer. Generally latency, not speed, will be the most significant factor in transferring data to and from a data center. Packets transferred using the TCP/IP protocol suite, which is the conceptual model and set of communications protocols used on the internet and similar computer networks, must be acknowledged when received (ACK’d) and requires a communications roundtrip for each packet. If the data is in larger packets, the number of ACKs required is reduced, so latency will be a smaller factor in the overall network communications speed.

Latency generally will be less significant for data storage transfers than for cloud computing. Optimizations such as multi-threading, which is used in Backblaze’s Cloud Backup service, will generally improve overall transfer throughput if sufficient bandwidth is available.

Those interested in testing the overall speed and latency of their connection to Backblaze’s data centers can use the Check Your Bandwidth tool on our website.
Data center telecommunications equipment

Data center telecommunications equipment

Data center under floor cable runs

Data center under floor cable runs

Cooling

Computer, networking, and power generation equipment generates heat, and there are a number of solutions employed to rid a data center of that heat. The location and climate of the data center is of great importance to the data center designer because the climatic conditions dictate to a large degree what cooling technologies should be deployed that in turn affect the power used and the cost of using that power. The power required and cost needed to manage a data center in a warm, humid climate will vary greatly from managing one in a cool, dry climate. Innovation is strong in this area and many new approaches to efficient and cost-effective cooling are used in the latest data centers.

Switch's uninterruptible, multi-system, HVAC Data Center Cooling Units

Switch’s uninterruptible, multi-system, HVAC Data Center Cooling Units

There are three primary ways data center cooling can be achieved:

Room Cooling cools the entire operating area of the data center. This method can be suitable for small data centers, but becomes more difficult and inefficient as IT equipment density and center size increase.

Row Cooling concentrates on cooling a data center on a row by row basis. In its simplest form, hot aisle/cold aisle data center design involves lining up server racks in alternating rows with cold air intakes facing one way and hot air exhausts facing the other. The rows composed of rack fronts are called cold aisles. Typically, cold aisles face air conditioner output ducts. The rows the heated exhausts pour into are called hot aisles. Typically, hot aisles face air conditioner return ducts.

Rack Cooling tackles cooling on a rack by rack basis. Air-conditioning units are dedicated to specific racks. This approach allows for maximum densities to be deployed per rack. This works best in data centers with fully loaded racks, otherwise there would be too much cooling capacity, and the air-conditioning losses alone could exceed the total IT load.

Security

Data Centers are high-security facilities as they house business, government, and other data that contains personal, financial, and other secure information about businesses and individuals.

This list contains the physical-security considerations when opening or co-locating in a data center:

Layered Security Zones. Systems and processes are deployed to allow only authorized personnel in certain areas of the data center. Examples include keycard access, alarm systems, mantraps, secure doors, and staffed checkpoints.

Physical Barriers. Physical barriers, fencing and reinforced walls are used to protect facilities. In a colocation facility, one customers’ racks and servers are often inaccessible to other customers colocating in the same data center.

Backblaze racks secured in the data center

Backblaze racks secured in the data center

Monitoring Systems. Advanced surveillance technology monitors and records activity on approaching driveways, building entrances, exits, loading areas, and equipment areas. These systems also can be used to monitor and detect fire and water emergencies, providing early detection and notification before significant damage results.

Top-tier providers evaluate their data center security and facilities on an ongoing basis. Technology becomes outdated quickly, so providers must stay-on-top of new approaches and technologies in order to protect valuable IT assets.

To pass into high security areas of a data center requires passing through a security checkpoint where credentials are verified.

Data Center security

The gauntlet of cameras and steel bars one must pass before entering this data center

Facilities and Services

Data center colocation providers often differentiate themselves by offering value-added services. In addition to the required space, power, cooling, connectivity and security capabilities, the best solutions provide several on-site amenities. These accommodations include offices and workstations, conference rooms, and access to phones, copy machines, and office equipment.

Additional features may consist of kitchen facilities, break rooms and relaxation lounges, storage facilities for client equipment, and secure loading docks and freight elevators.

Moving into A Data Center

Moving into a data center is a major job for any organization. We wrote a post last year, Desert To Data in 7 Days — Our New Phoenix Data Center, about what it was like to move into our new data center in Phoenix, Arizona.

Desert To Data in 7 Days — Our New Phoenix Data Center

Visiting a Data Center

Our Director of Product Marketing Andy Klein wrote a popular post last year on what it’s like to visit a data center called A Day in the Life of a Data Center.

A Day in the Life of a Data Center

Would you Like to Know More about The Challenges of Opening and Running a Data Center?

That’s it for part 2 of this series. If readers are interested, we could write a post about some of the new technologies and trends affecting data center design and use. Please let us know in the comments.

Here's a tip!Here’s a tip on finding all the posts tagged with data center on our blog. Just follow https://www.backblaze.com/blog/tag/data-center/.

Don’t miss future posts on data centers and other topics, including hard drive stats, cloud storage, and tips and tricks for backing up to the cloud. Use the Join button above to receive notification of future posts on our blog.

The post The Challenges of Opening a Data Center — Part 2 appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.