Tag Archives: setup

Migrating .NET Classic Applications to Amazon ECS Using Windows Containers

Post Syndicated from Sundar Narasiman original https://aws.amazon.com/blogs/compute/migrating-net-classic-applications-to-amazon-ecs-using-windows-containers/

This post contributed by Sundar Narasiman, Arun Kannan, and Thomas Fuller.

AWS recently announced the general availability of Windows container management for Amazon Elastic Container Service (Amazon ECS). Docker containers and Amazon ECS make it easy to run and scale applications on a virtual machine by abstracting the complex cluster management and setup needed.

Classic .NET applications are developed with .NET Framework 4.7.1 or older and can run only on a Windows platform. These include Windows Communication Foundation (WCF), ASP.NET Web Forms, and an ASP.NET MVC web app or web API.

Why classic ASP.NET?

ASP.NET MVC 4.6 and older versions of ASP.NET occupy a significant footprint in the enterprise web application space. As enterprises move towards microservices for new or existing applications, containers are one of the stepping stones for migrating from monolithic to microservices architectures. Additionally, the support for Windows containers in Windows 10, Windows Server 2016, and Visual Studio Tooling support for Docker simplifies the containerization of ASP.NET MVC apps.

Getting started

In this post, you pick an ASP.NET 4.6.2 MVC application and get step-by-step instructions for migrating to ECS using Windows containers. The detailed steps, AWS CloudFormation template, Microsoft Visual Studio solution, ECS service definition, and ECS task definition are available in the aws-ecs-windows-aspnet GitHub repository.

To help you getting started running Windows containers, here is the reference architecture for Windows containers on GitHub: ecs-refarch-cloudformation-windows. This reference architecture is the layered CloudFormation stack, in that it calls the other stacks to create the environment. The CloudFormation YAML template in this reference architecture is referenced to create a single JSON CloudFormation stack, which is used in the steps for the migration.

Steps for Migration

The code and templates to implement this migration can be found on GitHub: https://github.com/aws-samples/aws-ecs-windows-aspnet.

  1. Your development environment needs to have the latest version and updates for Visual Studio 2017, Windows 10, and Docker for Windows Stable.
  2. Next, containerize the ASP.NET application and test it locally. The size of Windows container application images is generally larger compared to Linux containers. This is because the base image of the Windows container itself is large in size, typically greater than 9 GB.
  3. After the application is containerized, the container image needs to be pushed to Amazon Elastic Container Registry (Amazon ECR). Images stored in ECR are compressed to improve pull times and reduce storage costs. In this case, you can see that ECR compresses the image to around 1 GB, for an optimization factor of 90%.
  4. Create a CloudFormation stack using the template in the ‘CloudFormation template’ folder. This creates an ECS service, task definition (referring the containerized ASP.NET application), and other related components mentioned in the ECS reference architecture for Windows containers.
  5. After the stack is created, verify the successful creation of the ECS service, ECS instances, running tasks (with the threshold mentioned in the task definition), and the Application Load Balancer’s successful health check against running containers.
  6. Navigate to the Application Load Balancer URL and see the successful rendering of the containerized ASP.NET MVC app in the browser.

Key Notes

  • Generally, Windows container images occupy large amount of space (in the order of few GBs).
  • All the task definition parameters for Linux containers are not available for Windows containers. For more information, see Windows Task Definitions.
  • An Application Load Balancer can be configured to route requests to one or more ports on each container instance in a cluster. The dynamic port mapping allows you to have multiple tasks from a single service on the same container instance.
  • IAM roles for Windows tasks require extra configuration. For more information, see Windows IAM Roles for Tasks. For this post, configuration was handled by the CloudFormation template.
  • The ECS container agent log file can be accessed for troubleshooting Windows containers: C:\ProgramData\Amazon\ECS\log\ecs-agent.log

Summary

In this post, you migrated an ASP.NET MVC application to ECS using Windows containers.

The logical next step is to automate the activities for migration to ECS and build a fully automated continuous integration/continuous deployment (CI/CD) pipeline for Windows containers. This can be orchestrated by leveraging services such as AWS CodeCommit, AWS CodePipeline, AWS CodeBuild, Amazon ECR, and Amazon ECS. You can learn more about how this is done in the Set Up a Continuous Delivery Pipeline for Containers Using AWS CodePipeline and Amazon ECS post.

If you have questions or suggestions, please comment below.

Scale Your Web Application — One Step at a Time

Post Syndicated from Saurabh Shrivastava original https://aws.amazon.com/blogs/architecture/scale-your-web-application-one-step-at-a-time/

I often encounter people experiencing frustration as they attempt to scale their e-commerce or WordPress site—particularly around the cost and complexity related to scaling. When I talk to customers about their scaling plans, they often mention phrases such as horizontal scaling and microservices, but usually people aren’t sure about how to dive in and effectively scale their sites.

Now let’s talk about different scaling options. For instance if your current workload is in a traditional data center, you can leverage the cloud for your on-premises solution. This way you can scale to achieve greater efficiency with less cost. It’s not necessary to set up a whole powerhouse to light a few bulbs. If your workload is already in the cloud, you can use one of the available out-of-the-box options.

Designing your API in microservices and adding horizontal scaling might seem like the best choice, unless your web application is already running in an on-premises environment and you’ll need to quickly scale it because of unexpected large spikes in web traffic.

So how to handle this situation? Take things one step at a time when scaling and you may find horizontal scaling isn’t the right choice, after all.

For example, assume you have a tech news website where you did an early-look review of an upcoming—and highly-anticipated—smartphone launch, which went viral. The review, a blog post on your website, includes both video and pictures. Comments are enabled for the post and readers can also rate it. For example, if your website is hosted on a traditional Linux with a LAMP stack, you may find yourself with immediate scaling problems.

Let’s get more details on the current scenario and dig out more:

  • Where are images and videos stored?
  • How many read/write requests are received per second? Per minute?
  • What is the level of security required?
  • Are these synchronous or asynchronous requests?

We’ll also want to consider the following if your website has a transactional load like e-commerce or banking:

How is the website handling sessions?

  • Do you have any compliance requests—like the Payment Card Industry Data Security Standard (PCI DSS compliance) —if your website is using its own payment gateway?
  • How are you recording customer behavior data and fulfilling your analytics needs?
  • What are your loading balancing considerations (scaling, caching, session maintenance, etc.)?

So, if we take this one step at a time:

Step 1: Ease server load. We need to quickly handle spikes in traffic, generated by activity on the blog post, so let’s reduce server load by moving image and video to some third -party content delivery network (CDN). AWS provides Amazon CloudFront as a CDN solution, which is highly scalable with built-in security to verify origin access identity and handle any DDoS attacks. CloudFront can direct traffic to your on-premises or cloud-hosted server with its 113 Points of Presence (102 Edge Locations and 11 Regional Edge Caches) in 56 cities across 24 countries, which provides efficient caching.
Step 2: Reduce read load by adding more read replicas. MySQL provides a nice mirror replication for databases. Oracle has its own Oracle plug for replication and AWS RDS provide up to five read replicas, which can span across the region and even the Amazon database Amazon Aurora can have 15 read replicas with Amazon Aurora autoscaling support. If a workload is highly variable, you should consider Amazon Aurora Serverless database  to achieve high efficiency and reduced cost. While most mirror technologies do asynchronous replication, AWS RDS can provide synchronous multi-AZ replication, which is good for disaster recovery but not for scalability. Asynchronous replication to mirror instance means replication data can sometimes be stale if network bandwidth is low, so you need to plan and design your application accordingly.

I recommend that you always use a read replica for any reporting needs and try to move non-critical GET services to read replica and reduce the load on the master database. In this case, loading comments associated with a blog can be fetched from a read replica—as it can handle some delay—in case there is any issue with asynchronous reflection.

Step 3: Reduce write requests. This can be achieved by introducing queue to process the asynchronous message. Amazon Simple Queue Service (Amazon SQS) is a highly-scalable queue, which can handle any kind of work-message load. You can process data, like rating and review; or calculate Deal Quality Score (DQS) using batch processing via an SQS queue. If your workload is in AWS, I recommend using a job-observer pattern by setting up Auto Scaling to automatically increase or decrease the number of batch servers, using the number of SQS messages, with Amazon CloudWatch, as the trigger.  For on-premises workloads, you can use SQS SDK to create an Amazon SQS queue that holds messages until they’re processed by your stack. Or you can use Amazon SNS  to fan out your message processing in parallel for different purposes like adding a watermark in an image, generating a thumbnail, etc.

Step 4: Introduce a more robust caching engine. You can use Amazon Elastic Cache for Memcached or Redis to reduce write requests. Memcached and Redis have different use cases so if you can afford to lose and recover your cache from your database, use Memcached. If you are looking for more robust data persistence and complex data structure, use Redis. In AWS, these are managed services, which means AWS takes care of the workload for you and you can also deploy them in your on-premises instances or use a hybrid approach.

Step 5: Scale your server. If there are still issues, it’s time to scale your server.  For the greatest cost-effectiveness and unlimited scalability, I suggest always using horizontal scaling. However, use cases like database vertical scaling may be a better choice until you are good with sharding; or use Amazon Aurora Serverless for variable workloads. It will be wise to use Auto Scaling to manage your workload effectively for horizontal scaling. Also, to achieve that, you need to persist the session. Amazon DynamoDB can handle session persistence across instances.

If your server is on premises, consider creating a multisite architecture, which will help you achieve quick scalability as required and provide a good disaster recovery solution.  You can pick and choose individual services like Amazon Route 53, AWS CloudFormation, Amazon SQS, Amazon SNS, Amazon RDS, etc. depending on your needs.

Your multisite architecture will look like the following diagram:

In this architecture, you can run your regular workload on premises, and use your AWS workload as required for scalability and disaster recovery. Using Route 53, you can direct a precise percentage of users to an AWS workload.

If you decide to move all of your workloads to AWS, the recommended multi-AZ architecture would look like the following:

In this architecture, you are using a multi-AZ distributed workload for high availability. You can have a multi-region setup and use Route53 to distribute your workload between AWS Regions. CloudFront helps you to scale and distribute static content via an S3 bucket and DynamoDB, maintaining your application state so that Auto Scaling can apply horizontal scaling without loss of session data. At the database layer, RDS with multi-AZ standby provides high availability and read replica helps achieve scalability.

This is a high-level strategy to help you think through the scalability of your workload by using AWS even if your workload in on premises and not in the cloud…yet.

I highly recommend creating a hybrid, multisite model by placing your on-premises environment replica in the public cloud like AWS Cloud, and using Amazon Route53 DNS Service and Elastic Load Balancing to route traffic between on-premises and cloud environments. AWS now supports load balancing between AWS and on-premises environments to help you scale your cloud environment quickly, whenever required, and reduce it further by applying Amazon auto-scaling and placing a threshold on your on-premises traffic using Route 53.

Game night 2: Detention, Viatoree, Paletta

Post Syndicated from Eevee original https://eev.ee/blog/2018/01/16/game-night-2-detention-viatoree-paletta/

Game night continues with:

  • Detention
  • Viatoree
  • Paletta

These are impressions, not reviews. I try to avoid major/ending spoilers, but big plot points do tend to leave impressions.

Detention

longish · inventory horror · jan 2017 · lin/mac/win · $12 on steam · website

Inventory horror” is a hell of a genre.

I think this one came from a Twitter thread where glip asked for indie horror recommendations. It’s apparently well-known enough to have a Wikipedia article, but I hadn’t heard of it before.

I love love love the aesthetic here. It’s obviously 2Dish from a side view (though there’s plenty of parallax in a lot of places), and it’s all done with… papercraft? I think of it as papercraft. Everything is built out of painted chunks that look like they were cut out of paper. It’s most obvious when watching the protagonist move around; her legs and skirt swivel as she walks.

Less obvious are the occasional places where tiny details repeat in the background because a paper cutout was reused. I don’t bring that up as a dig on the art; on the contrary, I really liked noticing that once or twice. It made the world feel like it was made with a tileset (albeit with very large chunky tiles), like it’s slightly artificial. I’m used to seeing sidescrollers made from tiles, of course, but the tiles are usually colorful and cartoony pixel art; big gritty full-color tiles are unusual and eerie.

And that’s a good thing in a horror game! Detention’s setting is already slightly unreal, and it’s made all the moreso by my Western perspective: it takes place in a Taiwanese school in the 60’s, a time when Taiwan was apparently under martial law. The Steam page tells you this, but I didn’t even know that much when we started playing, so I’d effectively been dropped somewhere on the globe and left to collect the details myself. Even figuring out we were in Taiwan (rather than mainland China) felt like an insight.

Thinking back, it was kind of a breath of fresh air. Games can be pretty heavy-handed about explaining the setting, but I never got that feeling from Detention. There’s more than enough context to get what’s going on, but there are no “stop and look at the camera while monologuing some exposition” moments. The developers are based in Taiwan, so it’s possible the setting is plenty familiar to them, and my perception of it is a complete accident. Either way, it certainly made an impact. Death of the author and whatnot, I suppose.

One thing in particular that stood out: none of the Chinese text in the environment is directly translated. The protagonist’s thoughts still give away what it says — “this is the nurse’s office” and the like — but that struck me as pretty different from simply repeating the text in English as though I were reading a sign in an RPG. The text is there, perfectly legible, but I can’t read it; I can only ask the protagonist to read it and offer her thoughts. It drives home that I’m experiencing the world through the eyes of the protagonist, who is their own person with their own impression of everything. Again, this is largely an emergent property of the game’s being designed in a culture that is not mine, but I’m left wondering how much thought went into this style of localization.

The game itself sees you wandering through a dark and twisted version of the protagonist’s school, collecting items and solving puzzles with them. There’s no direct combat, though some places feature a couple varieties of spirits called lingered which you have to carefully avoid. As the game progresses, the world starts to break down, alternating between increasingly abstract and increasingly concrete as we find out who the protagonist is and why she’s here.

The payoff is very personal and left a lasting impression… though as I look at the Wikipedia page now, it looks like the ending we got was the non-canon bad ending?! Well, hell. The bad ending is still great, then.

The whole game has a huge Silent Hill vibe, only without the combat and fog. Frankly, the genre might work better without combat; personal demons are more intimidating and meaningful when you can’t literally shoot them with a gun until they’re dead.

FINAL SCORE: 拾

Viatoree

short · platformer · sep 2013 · win · free on itch

I found this because @itchio tweeted about it, and the phrase “atmospheric platform exploration game” is the second most beautiful sequence of words in the English language.

The first paragraph on the itch.io page tells you the setup. That paragraph also contains more text than the entire game. In short: there are five things, and you need to find them. You can walk, jump, and extend your arms straight up to lift yourself to the ceiling. That’s it. No enemies, no shooting, no NPCs (more or less).

The result is, indeed, an atmospheric platform exploration game. The foreground is entirely 1-bit pixel art, save for the occasional white pixel to indicate someone’s eyes, and the background is only a few shades of the same purple hue. The game becomes less about playing and more about just looking at the environmental detail, appreciating how much texture the game manages to squeeze out of chunky colorless pixels. The world is still alive, too, much moreso than most platformers; tiny critters appear here and there, doing some wandering of their own, completely oblivious to you.

The game is really short, but it… just… makes me happy. I’m happy that this can exist, that not only is it okay for someone to make a very compact and short game, but that the result can still resonate with me. Not everything needs to be a sprawling epic or ask me to dedicate hours of time. It takes a few tiny ideas, runs with them, does what it came to do, and ends there. I love games like this.

That sounds silly to write out, but it’s been hard to get into my head! I do like experimenting, but I also feel compelled to reach for the grandiose, and grandiose experiment sounds more like mad science than creative exploration. For whatever reason, Viatoree convinced me that it’s okay to do a small thing, in a way that no other jam game has. It was probably the catalyst that led me to make Roguelike Simulator, and I thank it for that.

Unfortunately, we collected four of the five macguffins before hitting upon on a puzzle we couldn’t make heads or tails of. After about ten minutes of fruitless searching, I decided to abandon this one unfinished, rather than bore my couch partner to tears. Maybe I’ll go take another stab at it after I post this.

FINAL SCORE: ●●●●○

Paletta

medium · puzzle story · nov 2017 · win · free on itch

Paletta, another RPG Maker work, won second place in the month-long Indie Game Maker Contest 2017. Nice! Apparently MOOP came in fourth in the same jam; also nice! I guess that’s why both of them ended up on the itch front page.

The game is set in a world drained of color, and you have to go restore it. Each land contains one lost color, and each color gives you a corresponding spell, which is generally used for some light puzzle-solving in further lands. It’s a very cute and light-hearted game, and it actually does an impressive job of obscuring its RPG Maker roots.

The world feels a little small to me, despite having fairly spacious maps. The progression is pretty linear: you enter one land, talk to a small handful of NPCs, solve the one puzzle, get the color, and move on. I think all the areas were continuously connected, too, which may have thrown me off a bit — these areas are described as though they were vast regions, but they’re all a hundred feet wide and nestled right next to each other.

I love playing with color as a concept, and I wish the game had run further with it somehow. Rescuing a color does add some color back to the world, but at times it seemed like the color that reappeared was somewhat arbitrary? It’s not like you rescue green and now all the green is back. Thinking back on it now, I wonder if each rescued color actually changed a fixed set of sprites from gray to colorized? But it’s been a month (oops) and now I’m not sure.

I’m not trying to pick on the authors for the brevity of their jam game and also first game they’ve ever finished. I enjoyed playing it and found it plenty charming! It just happens that this time, what left the biggest impression on me was a nebulous feeling that something was missing. I think that’s still plenty important to ponder.

FINAL SCORE: ❤️💛💚💙💜

Raspbery Pi-newood Derby

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/pinewood-derby/

Andre Miron’s Pinewood Derby Instant Replay System (sorry, not sorry for the pun in the title) uses a Raspberry Pi to monitor the finishing line and play back a slow-motion instant replay, putting an end to “No, I won!” squabbles once and for all.

Raspberry Pi Based Pinewood Derby Instant Replay Demo

This is the same system I demo in this video (https://youtu.be/-QyMxKfBaAE), but on our actual track with real pinewood derby cars. Glad to report that it works great!

Pinewood Derby

For those unfamiliar with the term, the Pinewood Derby is a racing event for Cub Scouts in the USA. Cub Scouts, often with the help of a guardian, build race cars out of wood according to rules regarding weight, size, materials, etc.

Pinewood derby race car

The Cubs then race their cars in heats, with the winners advancing to district and council races.

Who won?

Andre’s Instant Replay System registers the race cars as they cross the finishing line, and it plays back slow-motion video of the crossing on a monitor. As he explains on YouTube:

The Pi is recording a constant stream of video, and when the replay is triggered, it records another half-second of video, then takes the last second and a half and saves it in slow motion (recording is done at 90 fps), before replaying.

The build also uses an attached Arduino, connected to GPIO pin 5, to trigger the recording and playback as it registers the passing cars via a voltage splitter. Additionally, the system announces the finishing places on a rather attractive-looking display above the finishing line.

Pinewood derby race car Raspberry Pi

The result? No more debate about whose car crossed the line first in neck-and-neck races.

Build your own

Andre takes us through the physical setup of the build in the video below, and you’ll find the complete code pasted in the description of the video here. Thanks, Andre!

Raspberry Pi based Pinewood Derby Instant Replay System

See the system on our actual track here: https://youtu.be/B3lcQHWGq88 Raspberry Pi based instant replay system, triggered by Arduino Pinewood Derby Timer. The Pi uses GPIO pin 5 attached to a voltage splitter on Arduino output 11 (and ground-ground) to detect when a car crosses the finish line, which triggers the replay.

Digital making in your club

If you’re a member of an various after-school association such as the Scouts or Guides, then using the Raspberry Pi and our free project resources, or visiting a Code Club or CoderDojo, are excellent ways to work towards various badges and awards. So talk to your club leader to discover all the ways in which you can incorporate digital making into your club!

The post Raspbery Pi-newood Derby appeared first on Raspberry Pi.

AWS Glue Now Supports Scala Scripts

Post Syndicated from Mehul Shah original https://aws.amazon.com/blogs/big-data/aws-glue-now-supports-scala-scripts/

We are excited to announce AWS Glue support for running ETL (extract, transform, and load) scripts in Scala. Scala lovers can rejoice because they now have one more powerful tool in their arsenal. Scala is the native language for Apache Spark, the underlying engine that AWS Glue offers for performing data transformations.

Beyond its elegant language features, writing Scala scripts for AWS Glue has two main advantages over writing scripts in Python. First, Scala is faster for custom transformations that do a lot of heavy lifting because there is no need to shovel data between Python and Apache Spark’s Scala runtime (that is, the Java virtual machine, or JVM). You can build your own transformations or invoke functions in third-party libraries. Second, it’s simpler to call functions in external Java class libraries from Scala because Scala is designed to be Java-compatible. It compiles to the same bytecode, and its data structures don’t need to be converted.

To illustrate these benefits, we walk through an example that analyzes a recent sample of the GitHub public timeline available from the GitHub archive. This site is an archive of public requests to the GitHub service, recording more than 35 event types ranging from commits and forks to issues and comments.

This post shows how to build an example Scala script that identifies highly negative issues in the timeline. It pulls out issue events in the timeline sample, analyzes their titles using the sentiment prediction functions from the Stanford CoreNLP libraries, and surfaces the most negative issues.

Getting started

Before we start writing scripts, we use AWS Glue crawlers to get a sense of the data—its structure and characteristics. We also set up a development endpoint and attach an Apache Zeppelin notebook, so we can interactively explore the data and author the script.

Crawl the data

The dataset used in this example was downloaded from the GitHub archive website into our sample dataset bucket in Amazon S3, and copied to the following locations:

s3://aws-glue-datasets-<region>/examples/scala-blog/githubarchive/data/

Choose the best folder by replacing <region> with the region that you’re working in, for example, us-east-1. Crawl this folder, and put the results into a database named githubarchive in the AWS Glue Data Catalog, as described in the AWS Glue Developer Guide. This folder contains 12 hours of the timeline from January 22, 2017, and is organized hierarchically (that is, partitioned) by year, month, and day.

When finished, use the AWS Glue console to navigate to the table named data in the githubarchive database. Notice that this data has eight top-level columns, which are common to each event type, and three partition columns that correspond to year, month, and day.

Choose the payload column, and you will notice that it has a complex schema—one that reflects the union of the payloads of event types that appear in the crawled data. Also note that the schema that crawlers generate is a subset of the true schema because they sample only a subset of the data.

Set up the library, development endpoint, and notebook

Next, you need to download and set up the libraries that estimate the sentiment in a snippet of text. The Stanford CoreNLP libraries contain a number of human language processing tools, including sentiment prediction.

Download the Stanford CoreNLP libraries. Unzip the .zip file, and you’ll see a directory full of jar files. For this example, the following jars are required:

  • stanford-corenlp-3.8.0.jar
  • stanford-corenlp-3.8.0-models.jar
  • ejml-0.23.jar

Upload these files to an Amazon S3 path that is accessible to AWS Glue so that it can load these libraries when needed. For this example, they are in s3://glue-sample-other/corenlp/.

Development endpoints are static Spark-based environments that can serve as the backend for data exploration. You can attach notebooks to these endpoints to interactively send commands and explore and analyze your data. These endpoints have the same configuration as that of AWS Glue’s job execution system. So, commands and scripts that work there also work the same when registered and run as jobs in AWS Glue.

To set up an endpoint and a Zeppelin notebook to work with that endpoint, follow the instructions in the AWS Glue Developer Guide. When you are creating an endpoint, be sure to specify the locations of the previously mentioned jars in the Dependent jars path as a comma-separated list. Otherwise, the libraries will not be loaded.

After you set up the notebook server, go to the Zeppelin notebook by choosing Dev Endpoints in the left navigation pane on the AWS Glue console. Choose the endpoint that you created. Next, choose the Notebook Server URL, which takes you to the Zeppelin server. Log in using the notebook user name and password that you specified when creating the notebook. Finally, create a new note to try out this example.

Each notebook is a collection of paragraphs, and each paragraph contains a sequence of commands and the output for that command. Moreover, each notebook includes a number of interpreters. If you set up the Zeppelin server using the console, the (Python-based) pyspark and (Scala-based) spark interpreters are already connected to your new development endpoint, with pyspark as the default. Therefore, throughout this example, you need to prepend %spark at the top of your paragraphs. In this example, we omit these for brevity.

Working with the data

In this section, we use AWS Glue extensions to Spark to work with the dataset. We look at the actual schema of the data and filter out the interesting event types for our analysis.

Start with some boilerplate code to import libraries that you need:

%spark

import com.amazonaws.services.glue.DynamicRecord
import com.amazonaws.services.glue.GlueContext
import com.amazonaws.services.glue.util.GlueArgParser
import com.amazonaws.services.glue.util.Job
import com.amazonaws.services.glue.util.JsonOptions
import com.amazonaws.services.glue.types._
import org.apache.spark.SparkContext

Then, create the Spark and AWS Glue contexts needed for working with the data:

@transient val spark: SparkContext = SparkContext.getOrCreate()
val glueContext: GlueContext = new GlueContext(spark)

You need the transient decorator on the SparkContext when working in Zeppelin; otherwise, you will run into a serialization error when executing commands.

Dynamic frames

This section shows how to create a dynamic frame that contains the GitHub records in the table that you crawled earlier. A dynamic frame is the basic data structure in AWS Glue scripts. It is like an Apache Spark data frame, except that it is designed and optimized for data cleaning and transformation workloads. A dynamic frame is well-suited for representing semi-structured datasets like the GitHub timeline.

A dynamic frame is a collection of dynamic records. In Spark lingo, it is an RDD (resilient distributed dataset) of DynamicRecords. A dynamic record is a self-describing record. Each record encodes its columns and types, so every record can have a schema that is unique from all others in the dynamic frame. This is convenient and often more efficient for datasets like the GitHub timeline, where payloads can vary drastically from one event type to another.

The following creates a dynamic frame, github_events, from your table:

val github_events = glueContext
                    .getCatalogSource(database = "githubarchive", tableName = "data")
                    .getDynamicFrame()

The getCatalogSource() method returns a DataSource, which represents a particular table in the Data Catalog. The getDynamicFrame() method returns a dynamic frame from the source.

Recall that the crawler created a schema from only a sample of the data. You can scan the entire dataset, count the rows, and print the complete schema as follows:

github_events.count
github_events.printSchema()

The result looks like the following:

The data has 414,826 records. As before, notice that there are eight top-level columns, and three partition columns. If you scroll down, you’ll also notice that the payload is the most complex column.

Run functions and filter records

This section describes how you can create your own functions and invoke them seamlessly to filter records. Unlike filtering with Python lambdas, Scala scripts do not need to convert records from one language representation to another, thereby reducing overhead and running much faster.

Let’s create a function that picks only the IssuesEvents from the GitHub timeline. These events are generated whenever someone posts an issue for a particular repository. Each GitHub event record has a field, “type”, that indicates the kind of event it is. The issueFilter() function returns true for records that are IssuesEvents.

def issueFilter(rec: DynamicRecord): Boolean = { 
    rec.getField("type").exists(_ == "IssuesEvent") 
}

Note that the getField() method returns an Option[Any] type, so you first need to check that it exists before checking the type.

You pass this function to the filter transformation, which applies the function on each record and returns a dynamic frame of those records that pass.

val issue_events =  github_events.filter(issueFilter)

Now, let’s look at the size and schema of issue_events.

issue_events.count
issue_events.printSchema()

It’s much smaller (14,063 records), and the payload schema is less complex, reflecting only the schema for issues. Keep a few essential columns for your analysis, and drop the rest using the ApplyMapping() transform:

val issue_titles = issue_events.applyMapping(Seq(("id", "string", "id", "string"),
                                                 ("actor.login", "string", "actor", "string"), 
                                                 ("repo.name", "string", "repo", "string"),
                                                 ("payload.action", "string", "action", "string"),
                                                 ("payload.issue.title", "string", "title", "string")))
issue_titles.show()

The ApplyMapping() transform is quite handy for renaming columns, casting types, and restructuring records. The preceding code snippet tells the transform to select the fields (or columns) that are enumerated in the left half of the tuples and map them to the fields and types in the right half.

Estimating sentiment using Stanford CoreNLP

To focus on the most pressing issues, you might want to isolate the records with the most negative sentiments. The Stanford CoreNLP libraries are Java-based and offer sentiment-prediction functions. Accessing these functions through Python is possible, but quite cumbersome. It requires creating Python surrogate classes and objects for those found on the Java side. Instead, with Scala support, you can use those classes and objects directly and invoke their methods. Let’s see how.

First, import the libraries needed for the analysis:

import java.util.Properties
import edu.stanford.nlp.ling.CoreAnnotations
import edu.stanford.nlp.neural.rnn.RNNCoreAnnotations
import edu.stanford.nlp.pipeline.{Annotation, StanfordCoreNLP}
import edu.stanford.nlp.sentiment.SentimentCoreAnnotations
import scala.collection.convert.wrapAll._

The Stanford CoreNLP libraries have a main driver that orchestrates all of their analysis. The driver setup is heavyweight, setting up threads and data structures that are shared across analyses. Apache Spark runs on a cluster with a main driver process and a collection of backend executor processes that do most of the heavy sifting of the data.

The Stanford CoreNLP shared objects are not serializable, so they cannot be distributed easily across a cluster. Instead, you need to initialize them once for every backend executor process that might need them. Here is how to accomplish that:

val props = new Properties()
props.setProperty("annotators", "tokenize, ssplit, parse, sentiment")
props.setProperty("parse.maxlen", "70")

object myNLP {
    lazy val coreNLP = new StanfordCoreNLP(props)
}

The properties tell the libraries which annotators to execute and how many words to process. The preceding code creates an object, myNLP, with a field coreNLP that is lazily evaluated. This field is initialized only when it is needed, and only once. So, when the backend executors start processing the records, each executor initializes the driver for the Stanford CoreNLP libraries only one time.

Next is a function that estimates the sentiment of a text string. It first calls Stanford CoreNLP to annotate the text. Then, it pulls out the sentences and takes the average sentiment across all the sentences. The sentiment is a double, from 0.0 as the most negative to 4.0 as the most positive.

def estimatedSentiment(text: String): Double = {
    if ((text == null) || (!text.nonEmpty)) { return Double.NaN }
    val annotations = myNLP.coreNLP.process(text)
    val sentences = annotations.get(classOf[CoreAnnotations.SentencesAnnotation])
    sentences.foldLeft(0.0)( (csum, x) => { 
        csum + RNNCoreAnnotations.getPredictedClass(x.get(classOf[SentimentCoreAnnotations.SentimentAnnotatedTree])) 
    }) / sentences.length
}

Now, let’s estimate the sentiment of the issue titles and add that computed field as part of the records. You can accomplish this with the map() method on dynamic frames:

val issue_sentiments = issue_titles.map((rec: DynamicRecord) => { 
    val mbody = rec.getField("title")
    mbody match {
        case Some(mval: String) => { 
            rec.addField("sentiment", ScalarNode(estimatedSentiment(mval)))
            rec }
        case _ => rec
    }
})

The map() method applies the user-provided function on every record. The function takes a DynamicRecord as an argument and returns a DynamicRecord. The code above computes the sentiment, adds it in a top-level field, sentiment, to the record, and returns the record.

Count the records with sentiment and show the schema. This takes a few minutes because Spark must initialize the library and run the sentiment analysis, which can be involved.

issue_sentiments.count
issue_sentiments.printSchema()

Notice that all records were processed (14,063), and the sentiment value was added to the schema.

Finally, let’s pick out the titles that have the lowest sentiment (less than 1.5). Count them and print out a sample to see what some of the titles look like.

val pressing_issues = issue_sentiments.filter(_.getField("sentiment").exists(_.asInstanceOf[Double] < 1.5))
pressing_issues.count
pressing_issues.show(10)

Next, write them all to a file so that you can handle them later. (You’ll need to replace the output path with your own.)

glueContext.getSinkWithFormat(connectionType = "s3", 
                              options = JsonOptions("""{"path": "s3://<bucket>/out/path/"}"""), 
                              format = "json")
            .writeDynamicFrame(pressing_issues)

Take a look in the output path, and you can see the output files.

Putting it all together

Now, let’s create a job from the preceding interactive session. The following script combines all the commands from earlier. It processes the GitHub archive files and writes out the highly negative issues:

import com.amazonaws.services.glue.DynamicRecord
import com.amazonaws.services.glue.GlueContext
import com.amazonaws.services.glue.util.GlueArgParser
import com.amazonaws.services.glue.util.Job
import com.amazonaws.services.glue.util.JsonOptions
import com.amazonaws.services.glue.types._
import org.apache.spark.SparkContext
import java.util.Properties
import edu.stanford.nlp.ling.CoreAnnotations
import edu.stanford.nlp.neural.rnn.RNNCoreAnnotations
import edu.stanford.nlp.pipeline.{Annotation, StanfordCoreNLP}
import edu.stanford.nlp.sentiment.SentimentCoreAnnotations
import scala.collection.convert.wrapAll._

object GlueApp {

    object myNLP {
        val props = new Properties()
        props.setProperty("annotators", "tokenize, ssplit, parse, sentiment")
        props.setProperty("parse.maxlen", "70")

        lazy val coreNLP = new StanfordCoreNLP(props)
    }

    def estimatedSentiment(text: String): Double = {
        if ((text == null) || (!text.nonEmpty)) { return Double.NaN }
        val annotations = myNLP.coreNLP.process(text)
        val sentences = annotations.get(classOf[CoreAnnotations.SentencesAnnotation])
        sentences.foldLeft(0.0)( (csum, x) => { 
            csum + RNNCoreAnnotations.getPredictedClass(x.get(classOf[SentimentCoreAnnotations.SentimentAnnotatedTree])) 
        }) / sentences.length
    }

    def main(sysArgs: Array[String]) {
        val spark: SparkContext = SparkContext.getOrCreate()
        val glueContext: GlueContext = new GlueContext(spark)

        val dbname = "githubarchive"
        val tblname = "data"
        val outpath = "s3://<bucket>/out/path/"

        val github_events = glueContext
                            .getCatalogSource(database = dbname, tableName = tblname)
                            .getDynamicFrame()

        val issue_events =  github_events.filter((rec: DynamicRecord) => {
            rec.getField("type").exists(_ == "IssuesEvent")
        })

        val issue_titles = issue_events.applyMapping(Seq(("id", "string", "id", "string"),
                                                         ("actor.login", "string", "actor", "string"), 
                                                         ("repo.name", "string", "repo", "string"),
                                                         ("payload.action", "string", "action", "string"),
                                                         ("payload.issue.title", "string", "title", "string")))

        val issue_sentiments = issue_titles.map((rec: DynamicRecord) => { 
            val mbody = rec.getField("title")
            mbody match {
                case Some(mval: String) => { 
                    rec.addField("sentiment", ScalarNode(estimatedSentiment(mval)))
                    rec }
                case _ => rec
            }
        })

        val pressing_issues = issue_sentiments.filter(_.getField("sentiment").exists(_.asInstanceOf[Double] < 1.5))

        glueContext.getSinkWithFormat(connectionType = "s3", 
                              options = JsonOptions(s"""{"path": "$outpath"}"""), 
                              format = "json")
                    .writeDynamicFrame(pressing_issues)
    }
}

Notice that the script is enclosed in a top-level object called GlueApp, which serves as the script’s entry point for the job. (You’ll need to replace the output path with your own.) Upload the script to an Amazon S3 location so that AWS Glue can load it when needed.

To create the job, open the AWS Glue console. Choose Jobs in the left navigation pane, and then choose Add job. Create a name for the job, and specify a role with permissions to access the data. Choose An existing script that you provide, and choose Scala as the language.

For the Scala class name, type GlueApp to indicate the script’s entry point. Specify the Amazon S3 location of the script.

Choose Script libraries and job parameters. In the Dependent jars path field, enter the Amazon S3 locations of the Stanford CoreNLP libraries from earlier as a comma-separated list (without spaces). Then choose Next.

No connections are needed for this job, so choose Next again. Review the job properties, and choose Finish. Finally, choose Run job to execute the job.

You can simply edit the script’s input table and output path to run this job on whatever GitHub timeline datasets that you might have.

Conclusion

In this post, we showed how to write AWS Glue ETL scripts in Scala via notebooks and how to run them as jobs. Scala has the advantage that it is the native language for the Spark runtime. With Scala, it is easier to call Scala or Java functions and third-party libraries for analyses. Moreover, data processing is faster in Scala because there’s no need to convert records from one language runtime to another.

You can find more example of Scala scripts in our GitHub examples repository: https://github.com/awslabs/aws-glue-samples. We encourage you to experiment with Scala scripts and let us know about any interesting ETL flows that you want to share.

Happy Glue-ing!

 


Additional Reading

If you found this post useful, be sure to check out Simplify Querying Nested JSON with the AWS Glue Relationalize Transform and Genomic Analysis with Hail on Amazon EMR and Amazon Athena.

 


About the Authors

Mehul Shah is a senior software manager for AWS Glue. His passion is leveraging the cloud to build smarter, more efficient, and easier to use data systems. He has three girls, and, therefore, he has no spare time.

 

 

 

Ben Sowell is a software development engineer at AWS Glue.

 

 

 

 
Vinay Vivili is a software development engineer for AWS Glue.

 

 

 

Zero WH: pre-soldered headers and what to do with them

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/zero-wh/

If you head over to the website of your favourite Raspberry Pi Approved Reseller today, you may find the new Zero WH available to purchase. But what it is? Why is it different, and what can you do with it?

Raspberry Pi Zero WH

“If you like pre-soldered headers, and getting caught in the rain…”

Raspberry Pi Zero WH

Imagine a Raspberry Pi Zero W. Now add a professionally soldered header. Boom, that’s the Raspberry Pi Zero WH! It’s your same great-tasting Pi, with a brand-new…crust? It’s perfect for everyone who doesn’t own a soldering iron or who wants the soldering legwork done for them.

What you can do with the Zero WH

What can’t you do? Am I right?! The small size of the Zero W makes it perfect for projects with minimal wiggle-room. In such projects, some people have no need for GPIO pins — they simply solder directly to the board. However, there are many instances where you do want a header on your Zero W, for example in order to easily take advantage of the GPIO expander tool for Debian Stretch on a PC or Mac.

GPIO expander in clubs and classrooms

As Ben Nuttall explains in his blog post on the topic:

[The GPIO expander tool] is a real game-changer for Raspberry Jams, Code Clubs, CoderDojos, and schools. You can live boot the Raspberry Pi Desktop OS from a USB stick, use Linux PCs, or even install [the Pi OS] on old computers. Then you have really simple access to physical computing without full Raspberry Pi setups, and with no SD cards to configure.

Using the GPIO expander with the Raspberry Pi Zero WH decreases the setup cost for anyone interested in trying out physical computing in the classroom or at home. (And once you’ve stuck your toes in, you’ll obviously fall in love and will soon find yourself with multiple Raspberry Pi models, HATs aplenty, and an area in your home dedicated to your new adventure in Raspberry Pi. Don’t say I didn’t warn you.)

Other uses for a Zero W with a header

The GPIO expander setup is just one of a multitude of uses for a Raspberry Pi Zero W with a header. You may want the header for prototyping before you commit to soldering wires directly to a board. Or you may have a temporary build in mind for your Zero W, in which case you won’t want to commit to soldering wires to the board at all.

Raspberry Pi Zero WH

Your use case may be something else entirely — tell us in the comments below how you’d utilise a pre-soldered Raspberry Pi Zero WH in your project. The best project idea will receive ten imaginary house points of absolutely no practical use, but immense emotional value. Decide amongst yourselves who you believe should win them — I’m going to go waste a few more hours playing SLUG!

The post Zero WH: pre-soldered headers and what to do with them appeared first on Raspberry Pi.

Wanted: Sales Engineer

Post Syndicated from Yev original https://www.backblaze.com/blog/wanted-sales-engineer/

At inception, Backblaze was a consumer company. Thousands upon thousands of individuals came to our website and gave us $5/mo to keep their data safe. But, we didn’t sell business solutions. It took us years before we had a sales team. In the last couple of years, we’ve released products that businesses of all sizes love: Backblaze B2 Cloud Storage and Backblaze for Business Computer Backup. Those businesses want to integrate Backblaze deeply into their infrastructure, so it’s time to hire our first Sales Engineer!

Company Description:
Founded in 2007, Backblaze started with a mission to make backup software elegant and provide complete peace of mind. Over the course of almost a decade, we have become a pioneer in robust, scalable low cost cloud backup. Recently, we launched B2 – robust and reliable object storage at just $0.005/gb/mo. Part of our differentiation is being able to offer the lowest price of any of the big players while still being profitable.

We’ve managed to nurture a team oriented culture with amazingly low turnover. We value our people and their families. Don’t forget to check out our “About Us” page to learn more about the people and some of our perks.

We have built a profitable, high growth business. While we love our investors, we have maintained control over the business. That means our corporate goals are simple – grow sustainably and profitably.

Some Backblaze Perks:

  • Competitive healthcare plans
  • Competitive compensation and 401k
  • All employees receive Option grants
  • Unlimited vacation days
  • Strong coffee
  • Fully stocked Micro kitchen
  • Catered breakfast and lunches
  • Awesome people who work on awesome projects
  • Childcare bonus
  • Normal work hours
  • Get to bring your pets into the office
  • San Mateo Office – located near Caltrain and Highways 101 & 280.

Backblaze B2 cloud storage is a building block for almost any computing service that requires storage. Customers need our help integrating B2 into iOS apps to Docker containers. Some customers integrate directly to the API using the programming language of their choice, others want to solve a specific problem using ready made software, already integrated with B2.

At the same time, our computer backup product is deepening it’s integration into enterprise IT systems. We are commonly asked for how to set Windows policies, integrate with Active Directory, and install the client via remote management tools.

We are looking for a sales engineer who can help our customers navigate the integration of Backblaze into their technical environments.

Are you 1/2” deep into many different technologies, and unafraid to dive deeper?

Can you confidently talk with customers about their technology, even if you have to look up all the acronyms right after the call?

Are you excited to setup complicated software in a lab and write knowledge base articles about your work?

Then Backblaze is the place for you!

Enough about Backblaze already, what’s in it for me?
In this role, you will be given the opportunity to learn about the technologies that drive innovation today; diverse technologies that customers are using day in and out. And more importantly, you’ll learn how to learn new technologies.

Just as an example, in the past 12 months, we’ve had the opportunity to learn and become experts in these diverse technologies:

  • How to setup VM servers for lab environments, both on-prem and using cloud services.
  • Create an automatically “resetting” demo environment for the sales team.
  • Setup Microsoft Domain Controllers with Active Directory and AD Federation Services.
  • Learn the basics of OAUTH and web single sign on (SSO).
  • Archive video workflows from camera to media asset management systems.
  • How upload/download files from Javascript by enabling CORS.
  • How to install and monitor online backup installations using RMM tools, like JAMF.
  • Tape (LTO) systems. (Yes – people still use tape for storage!)

How can I know if I’ll succeed in this role?

You have:

  • Confidence. Be able to ask customers questions about their environments and convey to them your technical acumen.
  • Curiosity. Always want to learn about customers’ situations, how they got there and what problems they are trying to solve.
  • Organization. You’ll work with customers, integration partners, and Backblaze team members on projects of various lengths. You can context switch and either have a great memory or keep copious notes. Your checklists have their own checklists.

You are versed in:

  • The fundamentals of Windows, Linux and Mac OS X operating systems. You shouldn’t be afraid to use a command line.
  • Building, installing, integrating and configuring applications on any operating system.
  • Debugging failures – reading logs, monitoring usage, effective google searching to fix problems excites you.
  • The basics of TCP/IP networking and the HTTP protocol.
  • Novice development skills in any programming/scripting language. Have basic understanding of data structures and program flow.
  • Your background contains:

  • Bachelor’s degree in computer science or the equivalent.
  • 2+ years of experience as a pre or post-sales engineer.
  • The right extra credit:
    There are literally hundreds of previous experiences you can have had that would make you perfect for this job. Some experiences that we know would be helpful for us are below, but make sure you tell us your stories!

  • Experience using or programming against Amazon S3.
  • Experience with large on-prem storage – NAS, SAN, Object. And backing up data on such storage with tools like Veeam, Veritas and others.
  • Experience with photo or video media. Media archiving is a key market for Backblaze B2.
  • Program arduinos to automatically feed your dog.
  • Experience programming against web or REST APIs. (Point us towards your projects, if they are open source and available to link to.)
  • Experience with sales tools like Salesforce.
  • 3D print door stops.
  • Experience with Windows Servers, Active Directory, Group policies and the like.
  • What’s it like working with the Sales team?
    The Backblaze sales team collaborates. We help each other out by sharing ideas, templates, and our customer’s experiences. When we talk about our accomplishments, there is no “I did this,” only “we”. We are truly a team.

    We are honest to each other and our customers and communicate openly. We aim to have fun by embracing crazy ideas and creative solutions. We try to think not outside the box, but with no boxes at all. Customers are the driving force behind the success of the company and we care deeply about their success.

    If this all sounds like you:

    1. Send an email to [email protected] with the position in the subject line.
    2. Tell us a bit about your Sales Engineering experience.
    3. Include your resume.

    The post Wanted: Sales Engineer appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

    Wanted: Datacenter Technician

    Post Syndicated from Yev original https://www.backblaze.com/blog/wanted-datacenter-technician/

    As we shoot way past 400 Petabytes of data under management we need some help scaling up our datacenters! We’re on the lookout for some datacenter technicians that can help us. This role is located near the Sacramento, California area. If you want to join a dynamic team that helps keep our almost 90,000+ hard drives spinning, this might be the job for you!

    Responsibilities

    • Work as Backblaze’s physical presence in Sacramento area datacenter(s).
    • Help maintain physical infrastructure including racking equipment, replacing hard drives and other system components.
    • Repair and troubleshoot defective equipment with minimal supervision.
    • Support datacenter’s 24×7 staff to install new equipment, handle after hours emergencies and other tasks.
    • Help manage onsite inventory of hard drives, cables, rails and other spare parts.
    • RMA defective components.
    • Setup, test and activate new equipment via the Linux command line.
    • Help train new Datacenter Technicians as needed.
    • Help with projects to install new systems and services as time allows.
    • Follow and improve Datacenter best practices and documentation.
    • Maintain a clean and well organized work environment.
    • On-call responsibilities require being within an hour of the SunGard’s Rancho Cordova/Roseville facility and occasional trips onsite 24×7 to resolve issues that can’t be handled remotely.
    • Work days may include Saturday and/or Sunday (e.g. working Tuesday – Saturday).

    Requirements

    • Excellent communication, time management, problem solving and organizational skills.
    • Ability to learn quickly.
    • Ability to lift/move 50-75 lbs and work down near the floor on a daily basis.
    • Position based near Sacramento, California and may require periodic visits to the corporate office in San Mateo.
    • May require travel to other Datacenters to provide coverage and/or to assist
      with new site set-up.

    Backblaze Employees Have:

    • Good attitude and willingness to do whatever it takes to get the job done.
    • Strong desire to work for a small, fast-paced company.
    • Desire to learn and adapt to rapidly changing technologies and work environment.
    • Comfortable with well-behaved pets in the office.
    • This position is located near Sacramento, California.

    Backblaze is an Equal Opportunity Employer and we offer competitive salary and benefits, including our no policy vacation policy.

    If This Sounds Like You:
    Send an email to [email protected] with:

    1. Datacenter Tech in the subject line
    2. Your resume attached
    3. An overview of your relevant experience

    The post Wanted: Datacenter Technician appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

    Combine Transactional and Analytical Data Using Amazon Aurora and Amazon Redshift

    Post Syndicated from Re Alvarez-Parmar original https://aws.amazon.com/blogs/big-data/combine-transactional-and-analytical-data-using-amazon-aurora-and-amazon-redshift/

    A few months ago, we published a blog post about capturing data changes in an Amazon Aurora database and sending it to Amazon Athena and Amazon QuickSight for fast analysis and visualization. In this post, I want to demonstrate how easy it can be to take the data in Aurora and combine it with data in Amazon Redshift using Amazon Redshift Spectrum.

    With Amazon Redshift, you can build petabyte-scale data warehouses that unify data from a variety of internal and external sources. Because Amazon Redshift is optimized for complex queries (often involving multiple joins) across large tables, it can handle large volumes of retail, inventory, and financial data without breaking a sweat.

    In this post, we describe how to combine data in Aurora in Amazon Redshift. Here’s an overview of the solution:

    • Use AWS Lambda functions with Amazon Aurora to capture data changes in a table.
    • Save data in an Amazon S3
    • Query data using Amazon Redshift Spectrum.

    We use the following services:

    Serverless architecture for capturing and analyzing Aurora data changes

    Consider a scenario in which an e-commerce web application uses Amazon Aurora for a transactional database layer. The company has a sales table that captures every single sale, along with a few corresponding data items. This information is stored as immutable data in a table. Business users want to monitor the sales data and then analyze and visualize it.

    In this example, you take the changes in data in an Aurora database table and save it in Amazon S3. After the data is captured in Amazon S3, you combine it with data in your existing Amazon Redshift cluster for analysis.

    By the end of this post, you will understand how to capture data events in an Aurora table and push them out to other AWS services using AWS Lambda.

    The following diagram shows the flow of data as it occurs in this tutorial:

    The starting point in this architecture is a database insert operation in Amazon Aurora. When the insert statement is executed, a custom trigger calls a Lambda function and forwards the inserted data. Lambda writes the data that it received from Amazon Aurora to a Kinesis data delivery stream. Kinesis Data Firehose writes the data to an Amazon S3 bucket. Once the data is in an Amazon S3 bucket, it is queried in place using Amazon Redshift Spectrum.

    Creating an Aurora database

    First, create a database by following these steps in the Amazon RDS console:

    1. Sign in to the AWS Management Console, and open the Amazon RDS console.
    2. Choose Launch a DB instance, and choose Next.
    3. For Engine, choose Amazon Aurora.
    4. Choose a DB instance class. This example uses a small, since this is not a production database.
    5. In Multi-AZ deployment, choose No.
    6. Configure DB instance identifier, Master username, and Master password.
    7. Launch the DB instance.

    After you create the database, use MySQL Workbench to connect to the database using the CNAME from the console. For information about connecting to an Aurora database, see Connecting to an Amazon Aurora DB Cluster.

    The following screenshot shows the MySQL Workbench configuration:

    Next, create a table in the database by running the following SQL statement:

    Create Table
    CREATE TABLE Sales (
    InvoiceID int NOT NULL AUTO_INCREMENT,
    ItemID int NOT NULL,
    Category varchar(255),
    Price double(10,2), 
    Quantity int not NULL,
    OrderDate timestamp,
    DestinationState varchar(2),
    ShippingType varchar(255),
    Referral varchar(255),
    PRIMARY KEY (InvoiceID)
    )

    You can now populate the table with some sample data. To generate sample data in your table, copy and run the following script. Ensure that the highlighted (bold) variables are replaced with appropriate values.

    #!/usr/bin/python
    import MySQLdb
    import random
    import datetime
    
    db = MySQLdb.connect(host="AURORA_CNAME",
                         user="DBUSER",
                         passwd="DBPASSWORD",
                         db="DB")
    
    states = ("AL","AK","AZ","AR","CA","CO","CT","DE","FL","GA","HI","ID","IL","IN",
    "IA","KS","KY","LA","ME","MD","MA","MI","MN","MS","MO","MT","NE","NV","NH","NJ",
    "NM","NY","NC","ND","OH","OK","OR","PA","RI","SC","SD","TN","TX","UT","VT","VA",
    "WA","WV","WI","WY")
    
    shipping_types = ("Free", "3-Day", "2-Day")
    
    product_categories = ("Garden", "Kitchen", "Office", "Household")
    referrals = ("Other", "Friend/Colleague", "Repeat Customer", "Online Ad")
    
    for i in range(0,10):
        item_id = random.randint(1,100)
        state = states[random.randint(0,len(states)-1)]
        shipping_type = shipping_types[random.randint(0,len(shipping_types)-1)]
        product_category = product_categories[random.randint(0,len(product_categories)-1)]
        quantity = random.randint(1,4)
        referral = referrals[random.randint(0,len(referrals)-1)]
        price = random.randint(1,100)
        order_date = datetime.date(2016,random.randint(1,12),random.randint(1,30)).isoformat()
    
        data_order = (item_id, product_category, price, quantity, order_date, state,
        shipping_type, referral)
    
        add_order = ("INSERT INTO Sales "
                       "(ItemID, Category, Price, Quantity, OrderDate, DestinationState, \
                       ShippingType, Referral) "
                       "VALUES (%s, %s, %s, %s, %s, %s, %s, %s)")
    
        cursor = db.cursor()
        cursor.execute(add_order, data_order)
    
        db.commit()
    
    cursor.close()
    db.close() 

    The following screenshot shows how the table appears with the sample data:

    Sending data from Amazon Aurora to Amazon S3

    There are two methods available to send data from Amazon Aurora to Amazon S3:

    • Using a Lambda function
    • Using SELECT INTO OUTFILE S3

    To demonstrate the ease of setting up integration between multiple AWS services, we use a Lambda function to send data to Amazon S3 using Amazon Kinesis Data Firehose.

    Alternatively, you can use a SELECT INTO OUTFILE S3 statement to query data from an Amazon Aurora DB cluster and save it directly in text files that are stored in an Amazon S3 bucket. However, with this method, there is a delay between the time that the database transaction occurs and the time that the data is exported to Amazon S3 because the default file size threshold is 6 GB.

    Creating a Kinesis data delivery stream

    The next step is to create a Kinesis data delivery stream, since it’s a dependency of the Lambda function.

    To create a delivery stream:

    1. Open the Kinesis Data Firehose console
    2. Choose Create delivery stream.
    3. For Delivery stream name, type AuroraChangesToS3.
    4. For Source, choose Direct PUT.
    5. For Record transformation, choose Disabled.
    6. For Destination, choose Amazon S3.
    7. In the S3 bucket drop-down list, choose an existing bucket, or create a new one.
    8. Enter a prefix if needed, and choose Next.
    9. For Data compression, choose GZIP.
    10. In IAM role, choose either an existing role that has access to write to Amazon S3, or choose to generate one automatically. Choose Next.
    11. Review all the details on the screen, and choose Create delivery stream when you’re finished.

     

    Creating a Lambda function

    Now you can create a Lambda function that is called every time there is a change that needs to be tracked in the database table. This Lambda function passes the data to the Kinesis data delivery stream that you created earlier.

    To create the Lambda function:

    1. Open the AWS Lambda console.
    2. Ensure that you are in the AWS Region where your Amazon Aurora database is located.
    3. If you have no Lambda functions yet, choose Get started now. Otherwise, choose Create function.
    4. Choose Author from scratch.
    5. Give your function a name and select Python 3.6 for Runtime
    6. Choose and existing or create a new Role, the role would need to have access to call firehose:PutRecord
    7. Choose Next on the trigger selection screen.
    8. Paste the following code in the code window. Change the stream_name variable to the Kinesis data delivery stream that you created in the previous step.
    9. Choose File -> Save in the code editor and then choose Save.
    import boto3
    import json
    
    firehose = boto3.client('firehose')
    stream_name = ‘AuroraChangesToS3’
    
    
    def Kinesis_publish_message(event, context):
        
        firehose_data = (("%s,%s,%s,%s,%s,%s,%s,%s\n") %(event['ItemID'], 
        event['Category'], event['Price'], event['Quantity'],
        event['OrderDate'], event['DestinationState'], event['ShippingType'], 
        event['Referral']))
        
        firehose_data = {'Data': str(firehose_data)}
        print(firehose_data)
        
        firehose.put_record(DeliveryStreamName=stream_name,
        Record=firehose_data)

    Note the Amazon Resource Name (ARN) of this Lambda function.

    Giving Aurora permissions to invoke a Lambda function

    To give Amazon Aurora permissions to invoke a Lambda function, you must attach an IAM role with appropriate permissions to the cluster. For more information, see Invoking a Lambda Function from an Amazon Aurora DB Cluster.

    Once you are finished, the Amazon Aurora database has access to invoke a Lambda function.

    Creating a stored procedure and a trigger in Amazon Aurora

    Now, go back to MySQL Workbench, and run the following command to create a new stored procedure. When this stored procedure is called, it invokes the Lambda function you created. Change the ARN in the following code to your Lambda function’s ARN.

    DROP PROCEDURE IF EXISTS CDC_TO_FIREHOSE;
    DELIMITER ;;
    CREATE PROCEDURE CDC_TO_FIREHOSE (IN ItemID VARCHAR(255), 
    									IN Category varchar(255), 
    									IN Price double(10,2),
                                        IN Quantity int(11),
                                        IN OrderDate timestamp,
                                        IN DestinationState varchar(2),
                                        IN ShippingType varchar(255),
                                        IN Referral  varchar(255)) LANGUAGE SQL 
    BEGIN
      CALL mysql.lambda_async('arn:aws:lambda:us-east-1:XXXXXXXXXXXXX:function:CDCFromAuroraToKinesis', 
         CONCAT('{ "ItemID" : "', ItemID, 
                '", "Category" : "', Category,
                '", "Price" : "', Price,
                '", "Quantity" : "', Quantity, 
                '", "OrderDate" : "', OrderDate, 
                '", "DestinationState" : "', DestinationState, 
                '", "ShippingType" : "', ShippingType, 
                '", "Referral" : "', Referral, '"}')
         );
    END
    ;;
    DELIMITER ;

    Create a trigger TR_Sales_CDC on the Sales table. When a new record is inserted, this trigger calls the CDC_TO_FIREHOSE stored procedure.

    DROP TRIGGER IF EXISTS TR_Sales_CDC;
     
    DELIMITER ;;
    CREATE TRIGGER TR_Sales_CDC
      AFTER INSERT ON Sales
      FOR EACH ROW
    BEGIN
      SELECT  NEW.ItemID , NEW.Category, New.Price, New.Quantity, New.OrderDate
      , New.DestinationState, New.ShippingType, New.Referral
      INTO @ItemID , @Category, @Price, @Quantity, @OrderDate
      , @DestinationState, @ShippingType, @Referral;
      CALL  CDC_TO_FIREHOSE(@ItemID , @Category, @Price, @Quantity, @OrderDate
      , @DestinationState, @ShippingType, @Referral);
    END
    ;;
    DELIMITER ;

    If a new row is inserted in the Sales table, the Lambda function that is mentioned in the stored procedure is invoked.

    Verify that data is being sent from the Lambda function to Kinesis Data Firehose to Amazon S3 successfully. You might have to insert a few records, depending on the size of your data, before new records appear in Amazon S3. This is due to Kinesis Data Firehose buffering. To learn more about Kinesis Data Firehose buffering, see the “Amazon S3” section in Amazon Kinesis Data Firehose Data Delivery.

    Every time a new record is inserted in the sales table, a stored procedure is called, and it updates data in Amazon S3.

    Querying data in Amazon Redshift

    In this section, you use the data you produced from Amazon Aurora and consume it as-is in Amazon Redshift. In order to allow you to process your data as-is, where it is, while taking advantage of the power and flexibility of Amazon Redshift, you use Amazon Redshift Spectrum. You can use Redshift Spectrum to run complex queries on data stored in Amazon S3, with no need for loading or other data prep.

    Just create a data source and issue your queries to your Amazon Redshift cluster as usual. Behind the scenes, Redshift Spectrum scales to thousands of instances on a per-query basis, ensuring that you get fast, consistent performance even as your dataset grows to beyond an exabyte! Being able to query data that is stored in Amazon S3 means that you can scale your compute and your storage independently. You have the full power of the Amazon Redshift query model and all the reporting and business intelligence tools at your disposal. Your queries can reference any combination of data stored in Amazon Redshift tables and in Amazon S3.

    Redshift Spectrum supports open, common data types, including CSV/TSV, Apache Parquet, SequenceFile, and RCFile. Files can be compressed using gzip or Snappy, with other data types and compression methods in the works.

    First, create an Amazon Redshift cluster. Follow the steps in Launch a Sample Amazon Redshift Cluster.

    Next, create an IAM role that has access to Amazon S3 and Athena. By default, Amazon Redshift Spectrum uses the Amazon Athena data catalog. Your cluster needs authorization to access your external data catalog in AWS Glue or Athena and your data files in Amazon S3.

    In the demo setup, I attached AmazonS3FullAccess and AmazonAthenaFullAccess. In a production environment, the IAM roles should follow the standard security of granting least privilege. For more information, see IAM Policies for Amazon Redshift Spectrum.

    Attach the newly created role to the Amazon Redshift cluster. For more information, see Associate the IAM Role with Your Cluster.

    Next, connect to the Amazon Redshift cluster, and create an external schema and database:

    create external schema if not exists spectrum_schema
    from data catalog 
    database 'spectrum_db' 
    region 'us-east-1'
    IAM_ROLE 'arn:aws:iam::XXXXXXXXXXXX:role/RedshiftSpectrumRole'
    create external database if not exists;

    Don’t forget to replace the IAM role in the statement.

    Then create an external table within the database:

     CREATE EXTERNAL TABLE IF NOT EXISTS spectrum_schema.ecommerce_sales(
      ItemID int,
      Category varchar,
      Price DOUBLE PRECISION,
      Quantity int,
      OrderDate TIMESTAMP,
      DestinationState varchar,
      ShippingType varchar,
      Referral varchar)
    ROW FORMAT DELIMITED
          FIELDS TERMINATED BY ','
    LINES TERMINATED BY '\n'
    LOCATION 's3://{BUCKET_NAME}/CDC/'

    Query the table, and it should contain data. This is a fact table.

    select top 10 * from spectrum_schema.ecommerce_sales

     

    Next, create a dimension table. For this example, we create a date/time dimension table. Create the table:

    CREATE TABLE date_dimension (
      d_datekey           integer       not null sortkey,
      d_dayofmonth        integer       not null,
      d_monthnum          integer       not null,
      d_dayofweek                varchar(10)   not null,
      d_prettydate        date       not null,
      d_quarter           integer       not null,
      d_half              integer       not null,
      d_year              integer       not null,
      d_season            varchar(10)   not null,
      d_fiscalyear        integer       not null)
    diststyle all;

    Populate the table with data:

    copy date_dimension from 's3://reparmar-lab/2016dates' 
    iam_role 'arn:aws:iam::XXXXXXXXXXXX:role/redshiftspectrum'
    DELIMITER ','
    dateformat 'auto';

    The date dimension table should look like the following:

    Querying data in local and external tables using Amazon Redshift

    Now that you have the fact and dimension table populated with data, you can combine the two and run analysis. For example, if you want to query the total sales amount by weekday, you can run the following:

    select sum(quantity*price) as total_sales, date_dimension.d_season
    from spectrum_schema.ecommerce_sales 
    join date_dimension on spectrum_schema.ecommerce_sales.orderdate = date_dimension.d_prettydate 
    group by date_dimension.d_season

    You get the following results:

    Similarly, you can replace d_season with d_dayofweek to get sales figures by weekday:

    With Amazon Redshift Spectrum, you pay only for the queries you run against the data that you actually scan. We encourage you to use file partitioning, columnar data formats, and data compression to significantly minimize the amount of data scanned in Amazon S3. This is important for data warehousing because it dramatically improves query performance and reduces cost.

    Partitioning your data in Amazon S3 by date, time, or any other custom keys enables Amazon Redshift Spectrum to dynamically prune nonrelevant partitions to minimize the amount of data processed. If you store data in a columnar format, such as Parquet, Amazon Redshift Spectrum scans only the columns needed by your query, rather than processing entire rows. Similarly, if you compress your data using one of the supported compression algorithms in Amazon Redshift Spectrum, less data is scanned.

    Analyzing and visualizing Amazon Redshift data in Amazon QuickSight

    Modify the Amazon Redshift security group to allow an Amazon QuickSight connection. For more information, see Authorizing Connections from Amazon QuickSight to Amazon Redshift Clusters.

    After modifying the Amazon Redshift security group, go to Amazon QuickSight. Create a new analysis, and choose Amazon Redshift as the data source.

    Enter the database connection details, validate the connection, and create the data source.

    Choose the schema to be analyzed. In this case, choose spectrum_schema, and then choose the ecommerce_sales table.

    Next, we add a custom field for Total Sales = Price*Quantity. In the drop-down list for the ecommerce_sales table, choose Edit analysis data sets.

    On the next screen, choose Edit.

    In the data prep screen, choose New Field. Add a new calculated field Total Sales $, which is the product of the Price*Quantity fields. Then choose Create. Save and visualize it.

    Next, to visualize total sales figures by month, create a graph with Total Sales on the x-axis and Order Data formatted as month on the y-axis.

    After you’ve finished, you can use Amazon QuickSight to add different columns from your Amazon Redshift tables and perform different types of visualizations. You can build operational dashboards that continuously monitor your transactional and analytical data. You can publish these dashboards and share them with others.

    Final notes

    Amazon QuickSight can also read data in Amazon S3 directly. However, with the method demonstrated in this post, you have the option to manipulate, filter, and combine data from multiple sources or Amazon Redshift tables before visualizing it in Amazon QuickSight.

    In this example, we dealt with data being inserted, but triggers can be activated in response to an INSERT, UPDATE, or DELETE trigger.

    Keep the following in mind:

    • Be careful when invoking a Lambda function from triggers on tables that experience high write traffic. This would result in a large number of calls to your Lambda function. Although calls to the lambda_async procedure are asynchronous, triggers are synchronous.
    • A statement that results in a large number of trigger activations does not wait for the call to the AWS Lambda function to complete. But it does wait for the triggers to complete before returning control to the client.
    • Similarly, you must account for Amazon Kinesis Data Firehose limits. By default, Kinesis Data Firehose is limited to a maximum of 5,000 records/second. For more information, see Monitoring Amazon Kinesis Data Firehose.

    In certain cases, it may be optimal to use AWS Database Migration Service (AWS DMS) to capture data changes in Aurora and use Amazon S3 as a target. For example, AWS DMS might be a good option if you don’t need to transform data from Amazon Aurora. The method used in this post gives you the flexibility to transform data from Aurora using Lambda before sending it to Amazon S3. Additionally, the architecture has the benefits of being serverless, whereas AWS DMS requires an Amazon EC2 instance for replication.

    For design considerations while using Redshift Spectrum, see Using Amazon Redshift Spectrum to Query External Data.

    If you have questions or suggestions, please comment below.


    Additional Reading

    If you found this post useful, be sure to check out Capturing Data Changes in Amazon Aurora Using AWS Lambda and 10 Best Practices for Amazon Redshift Spectrum


    About the Authors

    Re Alvarez-Parmar is a solutions architect for Amazon Web Services. He helps enterprises achieve success through technical guidance and thought leadership. In his spare time, he enjoys spending time with his two kids and exploring outdoors.

     

     

     

    A hedgehog cam or two

    Post Syndicated from Helen Lynn original https://www.raspberrypi.org/blog/a-hedgehog-cam-or-two/

    Here we are, hauling ourselves out of the Christmas and New Year holidays and into January proper. It’s dawning on me that I have to go back to work, even though it’s still very cold and gloomy in northern Europe, and even though my duvet is lovely and warm. I found myself envying beings that hibernate, and thinking about beings that hibernate, and searching for things to do with hedgehogs. And, well, the long and the short of it is, today’s blog post is a short meditation on the hedgehog cam.

    A hedgehog in a garden, photographed in infrared light by a hedgehog cam

    Success! It’s a hedgehog!
    Photo by Andrew Wedgbury

    Hedgehog watching

    Someone called Barker has installed a Raspberry Pi–based hedgehog cam in a location with a distant view of a famous Alp, and as well as providing live views by visible and infrared light for the dedicated and the insomniac, they also make a sped-up version of the previous night’s activity available. With hedgehogs usually being in hibernation during January, you mightn’t see them in any current feed — but don’t worry! You’re guaranteed a few hedgehogs on Barker’s website, because they have also thrown in some lovely GIFs of hoggy (and foxy) divas that their camera captured in the past.

    A Hedgehog eating from a bowl on a patio, captured by a hedgehog cam

    Nom nom nom!
    GIF by Barker’s Site

    Build your own hedgehog cam

    For pointers on how to replicate this kind of setup, you could do worse than turn to Andrew Wedgbury’s hedgehog cam write-up. Andrew’s Twitter feed reveals that he’s a Cambridge local, and there are hints that he was behind RealVNC’s hoggy mascot for Pi Wars 2017.

    RealVNC on Twitter

    Another day at the office: testing our #PiWars mascot using a @Raspberry_Pi 3, #VNC Connect and @4tronix_uk Picon Zero. Name suggestions? https://t.co/iYY3xAX9Bk

    Our infrared bird box and time-lapse camera resources will also set you well on the way towards your own custom wildlife camera. For a kit that wraps everything up in a weatherproof enclosure made with love, time, and serious amounts of design and testing, take a look at Naturebytes’ wildlife cam kit.

    Or, if you’re thinking that a robot mascot is more dependable than real animals for the fluffiness you need in order to start your January with something like productivity and with your soul intact, you might like to put your own spin on our robot buggy.

    Happy 2018

    While we’re on the subject of getting to grips with the new year, do take a look at yesterday’s blog post, in which we suggest a New Year’s project that’s different from the usual resolutions. However you tackle 2018, we wish you an excellent year of creative computing.

    The post A hedgehog cam or two appeared first on Raspberry Pi.

    Is Your Kodi Setup Being Spied On?

    Post Syndicated from Andy original https://torrentfreak.com/is-your-kodi-setup-being-spied-on-180101/

    As quite possibly the most people media player on earth, Kodi is installed on millions of machines – around 38 million according to the MPAA. The software has a seriously impressive range of features but one, if not configured properly, raises security issues for Kodi users.

    For many years, Kodi has had a remote control feature, whereby the software can be remotely managed via a web interface.

    This means that you’re able to control your Kodi setup installed on a computer or set-top box using a convenient browser-based interface on another device, from the same room or indeed anywhere in the world. Earlier versions of the web interface look like the one in the image below.

    The old Kodi web-interface – functional but basic

    But while this is a great feature, people don’t always password-protect the web-interface, meaning that outsiders can access their Kodi setups, if they have that person’s IP address and a web-browser. In fact, the image shown above is from a UK Kodi user’s setup that was found in seconds using a specialist search engine.

    While the old web-interface for Kodi was basically a remote control, things got more interesting in late 2016 when the much more functional Chorus2 interface was included in Kodi by default. It’s shown in the image below.

    Chorus 2 Kodi Web-Interface

    Again, the screenshot above was taken from the setup of a Kodi user whose setup was directly open to the Internet. In every way the web-interface of Kodi acts as a web page, allowing anyone with the user’s IP address (with :8080 appended to the end) to access the user’s setup. It’s no different than accessing Google with an IP address (216.58.216.142), instead of Google.com.

    However, Chorus 2 is much more comprehensive that its predecessors which means that it’s possible for outsiders to browse potentially sensitive items, including their addons if a password hasn’t been enabled in the appropriate section in Kodi.

    Kodi users probably don’t want this seen in public

    While browsing someone’s addons isn’t the most engaging thing in the world, things get decidedly spicier when one learns that the Chorus 2 interface allows both authorized and unauthorized users to go much further.

    For example, it’s possible to change Kodi’s system settings from the interface, including mischievous things such as disabling keyboards and mice. As seen (or not seen) in the redacted section in the image below, it can also give away system usernames, for example.

    Access to Kodi settings – and more

    But aside from screwing with people’s settings (which is both pointless and malicious), the Chorus 2 interface has a trick up its sleeve. If people’s Kodi setups contain video or music files (which is what Kodi was originally designed for), in many cases it’s possible to play these over the web interface.

    In basic terms, someone with your IP address can view the contents of your video library on the other side of the world, with just a couple of clicks.

    The image below shows that a Kodi setup has been granted access to some kind of storage (network or local disk, for example) and it can be browsed, revealing movies. (To protect the user, redactions have been made to remove home video titles, network, and drive names)

    Network storage accessed via Chorus 2

    The big question is, however, whether someone accessing a Kodi setup remotely can view these videos via a web browser. Answer: Absolutely.

    Clicking through on each piece of media reveals a button to the right of its title. Clicking that reveals two options – ‘Queue in Kodi’ (to play on the installation itself) or ‘Download’, which plays/stores the content via a remote browser located anywhere in the world. Chrome works like a charm.

    Queue to Kodi or watch remotely in a browser

    While this is ‘fun’ and potentially useful for outsiders looking for content, it’s not great if it’s your system that’s open to the world. The good news is that something can be done about it.

    In their description for Chorus 2, the Kodi team explain all of its benefits of the interface but it appears many people don’t take their advice to introduce a new password. The default password and username are both ‘kodi’ which is terrible for security if people leave things the way they are.

    If you run Kodi, now is probably the time to fix the settings, disable the web interface if you don’t use it, or enable stronger password protection if you do.

    Change that password – now

    Just recently, Kodi addon repository TVAddons issued a warning to people using jailbroken Apple TV 2 devices. That too was a default password issue and one that can be solved relatively easily.

    “People need to realize that their Kodi boxes are actually mini computers and need to be treated as such,” a TVAddons spokesperson told TF.

    “When you install a build, or follow a guide from an unreputable source, you’re opening yourself up to potential risk. Since Kodi boxes aren’t normally used to handle sensitive data, people seem to disregard the potential risks that are posed to their network.”

    Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

    2017’s “Piracy is Dangerous” Rhetoric Was Digital Reefer Madness

    Post Syndicated from Andy original https://torrentfreak.com/2017s-piracy-is-dangerous-rhetoric-was-digital-reefer-madness-171230/

    On dozens of occasions during the past year, TF has been compelled to cover the latest entertainment industry anti-piracy scare campaigns. We never have a problem doing so since news is to be reported and we’re all adults with our own minds to evaluate what we’re reading.

    Unfortunately, many people behind these efforts seem to be under the impression that their target audience is comprised of simpletons, none of whom are blessed with a brain of their own. Frankly it’s insulting but before we go on, let’s get a few things clear.

    Copyright infringement – including uploading, downloading, sharing or streaming – is illegal in most countries. That means that copyright holders are empowered under law to do something about those offenses, either through the civil or criminal courts. While unpalatable to some, most people accept that position and understand that should they be caught in the act, there might be some consequences.

    With that said, there are copyright holders out there that need to stop treating people like children at best, idiots at worst. At this point in 2017, there’s no adult out there with the ability to pirate that truly believes that obtaining or sharing the latest movies, TV shows and sports is likely to be completely legal.

    If you don’t believe me, ask a pirate why he or she is so excited by their fully-loaded Kodi setup. Hint: It’s because they’re getting content for free and they know full well that isn’t what the copyright holder wants. Then ask them if they want the copyright holder to know their name, address and everything they’ve downloaded. There. That’s your answer.

    The point is that these people are not dumb. They know what they’re doing and understand that getting caught is something that might possibly happen. They may not understand precisely how and they may consider the risk to be particularly small (they’d be right too) but they know that it’s something best kept fairly quiet when they aren’t shouting about it to anyone who will listen down the pub.

    Copyright holders aren’t dumb either. They know only too well that pirates recognize what they’re doing is probably illegal but they’re at a loss as to what to do about it. For reputable content owners, suing is expensive, doesn’t scale, is a public relations nightmare and, moreover, isn’t effective in solving the problem.

    So, we now have a concerted effort to convince pirates that piracy is not only bad for their computers but also bad for their lives. It’s a stated industry aim and we’re going to see more of it in 2018.

    If pirate sites aren’t infecting people’s computers with malware from God-knows-where, they’re stealing their identities and emptying their bank accounts, the industries warned in 2017. And if somehow people manage to run this gauntlet of terror without damaging their technology or their finances, then they’ll probably have their house burnt down by an exploding set-top box.

    Look, the intention is understandable. Entertainment companies need to contain the piracy problem because if they don’t, it only gets worse. Again, there are few people out there who genuinely expect them to do anything different but this current stampede towards blatant scaremongering is disingenuous at best and utterly ridiculous at worst.

    And it won’t work.

    While piracy can be engaged in as a solo activity, it’s inherently a social phenomenon. That things can be pirated from here and there, in this way and that, is the stuff of conversations between friends and colleagues, in person and via social media. The information is passed around today like VHS and compact cassettes were passed around three decades ago and people really aren’t talking about malware or their houses catching fire.

    In the somewhat unlikely event these topics do get raised for more than a minute, they get dealt with in the same way as anything else.

    People inquire whether their friends have ever had their bank accounts emptied or houses burnt down, or if they know anyone who has. When the answer comes back as “no” from literally everyone, people are likely to conclude that the stories are being spread by people trying to stop them getting movies, TV shows, and live sports for free. And they would be right.

    That’s not to say that these scare stories don’t have at least some basis in fact, they do.

    Many pirate sites do have low-tier advertising which can put users at risk. However, it’s nothing that a decent anti-virus program and/or ad blocker can’t handle, which is something everyone should be running when accessing untrusted sites. Also, being cautious about all electronics imported from overseas is something people should be aware of too, despite the tiny risk these devices appear to pose in the scheme of things.

    So, what we have here is the modern day equivalent of Reefer Madness, the 1930’s propaganda movie that tried to scare people away from marijuana with tales of car accidents, suicide, attempted rape and murder.

    While somewhat more refined, these modern-day cautionary messages over piracy are destined to fall on ears that are far more shrewd and educated than their 20th-century counterparts. Yet they’re all born out of the same desire, to stop people from getting involved in an activity by warning them that it’s dangerous to them, rather than it having a negative effect on someone else – an industry executive, for example.

    It’s all designed to appeal to the selfish nature of people, rather than their empathy for others, but that’s a big mistake.

    Most people really do want to do the right thing, as the staggering success of Netflix, iTunes, Spotify, and Amazon show. But the ridiculous costs and/or inaccessibility of live sports, latest movies, or packaged TV shows mean that no matter what warnings get thrown out there, some people will still cut corners if they feel they’re being taken advantage of.

    Worst still, if they believe the scare stories are completely ridiculous, eventually they’ll also discount the credibility of the messenger. When that happens, what little trust remains will be eroded.

    Then, let’s face it, who wants to buy something from people you can’t trust?

    Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

    Our ‘Kodi Box’ Is Legal & Our Users Don’t Break the Law, TickBox Tells Hollywood

    Post Syndicated from Andy original https://torrentfreak.com/our-kodi-box-is-legal-our-users-dont-break-the-law-tickbox-tells-hollywood-171229/

    Georgia-based TickBox TV is a provider of set-top boxes that allow users to stream all kinds of popular content. Like other similar devices, Tickboxes use the popular Kodi media player alongside instructions how to find and use third-party addons.

    Of course, these types of add-ons are considered a thorn in the side of the entertainment industries and as a result, Tickbox found itself on the receiving end of a lawsuit in the United States.

    Filed in a California federal court in October, Universal, Columbia Pictures, Disney, 20th Century Fox, Paramount Pictures, Warner Bros, Amazon, and Netflix accused Tickbox of inducing and contributing to copyright infringement.

    “TickBox sells ‘TickBox TV,’ a computer hardware device that TickBox urges its customers to use as a tool for the mass infringement of Plaintiffs’ copyrighted motion pictures and television shows,” the complaint reads.

    “TickBox promotes the use of TickBox TV for overwhelmingly, if not exclusively, infringing purposes, and that is how its customers use TickBox TV. TickBox advertises TickBox TV as a substitute for authorized and legitimate distribution channels such as cable television or video-on-demand services like Amazon Prime and Netflix.”

    The copyright holders reference a TickBox TV video which informs customers how to install ‘themes’, more commonly known as ‘builds’. These ‘builds’ are custom Kodi-setups which contain many popular add-ons that specialize in supplying pirate content. Is that illegal? TickBox TV believes not.

    In a response filed yesterday, TickBox underlined its position that its device is not sold with any unauthorized or illegal content and complains that just because users may choose to download and install third-party programs through which they can search for and view unauthorized content, that’s not its fault. It goes on to attack the lawsuit on several fronts.

    TickBox argues that plaintiffs’ claims, that TickBox can be held secondarily liable under the theory of contributory infringement or inducement liability as described in the famous Grokster and isoHunt cases, is unlikely to succeed. TickBox says the studios need to show four elements – distribution of a device or product, acts of infringement by users of Tickbox, an object of promoting its use to infringe copyright, and causation.

    “Plaintiffs have failed to establish any of these four elements,” TickBox’s lawyers write.

    Firstly, TickBox says that while its device can be programmed to infringe, it’s the third party software (the builds/themes containing addons) that do all the dirty work, and TickBox has nothing to do with them.

    “The Motion spends a great deal of time describing these third-party ‘Themes’ and how they operate to search for and stream videos. But the ‘Themes’ on which Plaintiffs so heavily focus are not the [TickBox], and they have absolutely nothing to do with Defendant. Rather, they are third-party modifications of the open-source media player software [Kodi] which the Box utilizes,” the response reads.

    TickBox says its device is merely a small computer, not unlike a smartphone or tablet. Indeed, when it comes to running the ‘pirate’ builds listed in the lawsuit, a device supplied by one of the plaintiffs can accomplish the same task.

    “Plaintiffs have identified certain of these thirdparty ‘builds’ or ‘Themes’ which are available on the internet and which can be downloaded by users to view content streamed by third-party websites; however, this same software can be installed on many different types of devices, even one distributed by affiliates of Plaintiff Amazon Content Services, LLC,” the company adds.

    Referencing the Grokster case, TickBox states that particular company was held liable for distributing a device (the Grokster software) “with the object of promoting its use to infringe copyright.” In the isoHunt case, it argues that the provision of torrent files satisfied the first element of inducement liability.

    “In contrast, Defendant’s product – the Box – is not software through which users can access unauthorized content, as in Grokster, or even a necessary component of accessing unauthorized content, as in Fung [isoHunt],” TickBox writes.

    “Defendant offers a computer, onto which users can voluntarily install legitimate or illegitimate software. The product about which Plaintiffs complain is third-party software which can be downloaded onto a myriad of devices, and which Defendant neither created nor supplies.”

    From defending itself, TickBox switches track to highlight weaknesses in the studios’ case against users of its TickBox device. The company states that the plaintiffs have not presented any evidence that buyers of the TickBox streaming unit have actually accessed any copyrighted material.

    Interestingly, however, the company also notes that even if people had streamed ‘pirate’ content, that might not constitute infringement.

    First up, the company notes that there are no allegations that anyone – from TickBox itself to TickBox device owners – ever violated the plaintiffs’ exclusive right to perform its copyrighted works.

    TickBox then further argues that copyright law does not impose liability for viewing streaming content, stating that an infringer is one who violates any of the exclusive rights of the copyright holder, in this case, the right to “perform the copyrighted work publicly.”

    “Plaintiffs do not allege that Defendant, Defendant’s product, or the users of Defendant’s product ‘transmit or otherwise communicate a performance’ to the public; instead, Plaintiffs allege that users view streaming material on the Box.

    “It is clear precedent [Perfect 10 v Google] in this Circuit that merely viewing copyrighted material online, without downloading, copying, or retransmitting such material, is not actionable.”

    Taking this argument to its logical conclusion, TickBox insists that if its users aren’t infringing copyright, it’s impossible to argue that TickBox induced its customers to violate the plaintiffs’ rights. In that respect, plaintiffs’ complaints that TickBox failed to develop “filtering tools” to diminish its customers’ infringing activity are moot, since in TickBox’s eyes no infringement took place.

    TickBox also argues that unlike in Grokster, where the defendant profited when users’ accessed infringing content, it does not. And, just to underline the earlier point, it claims that its place in the market is not to compete with entertainment companies, it’s actually to compete with devices such as Amazon’s Firestick – another similar Android-powered device.

    Finally, TickBox notes that it has zero connection with any third-party sites that transmit copyrighted works in violation of the plaintiffs’ rights.

    “Plaintiff has not alleged any element of contributory infringement vis-à-vis these unknown third-parties. Plaintiff has not alleged that Defendant has distributed any product to those third parties, that Defendant has committed any act which encourages those third parties’ infringement, or that any act of Defendant has, in fact, caused those third parties to infringe,” its response adds.

    But even given the above defenses, TickBox says that it “voluntarily took steps” to remove links to the allegedly infringing Kodi builds from its device, following the plaintiffs’ lawsuit. It also claims to have modified its advertising and webpage “to attempt to appease Plaintiffs and resolve their complaint amicably.”

    Given the above, TickBox says that the plaintiffs’ application for injunction is both vague and overly broad and would impose “imperssible hardship” on the company by effectively shutting it down while requiring it to “hack into and delete content” which TickBox users may have downloaded to their boxes.

    TickBox raises some very interesting points around some obvious weaknesses so it will be intriguing to see how the Court handles its claims and what effect that has on the market for these devices in the US. In particular, the thorny issue of how they are advertised and promoted, which is nearly always the final stumbling block.

    A copy of Tickbox’s response is available here (pdf), via Variety

    Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

    OWASP Dependency Check Maven Plugin – a Must-Have

    Post Syndicated from Bozho original https://techblog.bozho.net/owasp-dependency-check-maven-plugin-must/

    I have to admit with a high degree of shame that I didn’t know about the OWASP dependency check maven plugin. And seems to have been around since 2013. And apparently a thousand projects on GitHub are using it already.

    In the past I’ve gone manually through dependencies to check them against vulnerability databases, or in many cases I was just blissfully ignorant about any vulnerabilities that my dependencies had.

    The purpose of this post is just that – to recommend the OWASP dependency check maven plugin as a must-have in practically every maven project. (There are dependency-check tools for other build systems as well).

    When you add the plugin it generates a report. Initially you can go and manually upgrade the problematic dependencies (I upgraded two of those in my current project), or suppress the false positives (e.g. the cassandra library is marked as vulnerable, whereas the actual vulnerability is that Cassandra binds an unauthenticated RMI endpoint, which I’ve addressed via my stack setup, so the library isn’t an issue).

    Then you can configure a threshold for vulnerabilities and fail the build if new ones appear – either by you adding a vulnerable dependency, or in case a vulnerability is discovered in an existing dependency.

    All of that is shown in the examples page and is pretty straightforward. I’d suggest adding the plugin immediately, it’s a must-have:

    <plugin>
    	<groupId>org.owasp</groupId>
    	<artifactId>dependency-check-maven</artifactId>
    	<version>3.0.2</version>
    	<executions>
    		<execution>
    			<goals>
    				<goal>check</goal>
    			</goals>
    		</execution>
    	</executions>
    </plugin>
    

    Now, checking dependencies for vulnerabilities is just one small aspect of having your software secure and it shouldn’t give you a false sense of security (a sort-of “I have my dependencies checked, therefore my system is secure” fallacy). But it’s an important aspect. And having that check automated is a huge gain.

    The post OWASP Dependency Check Maven Plugin – a Must-Have appeared first on Bozho's tech blog.

    Thank you for my new Raspberry Pi, Santa! What next?

    Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/thank-you-for-my-new-raspberry-pi-santa-what-next/

    Note: the Pi Towers team have peeled away from their desks to spend time with their families over the festive season, and this blog will be quiet for a while as a result. We’ll be back in the New Year with a bushel of amazing projects, awesome resources, and much merriment and fun times. Happy holidays to all!

    Now back to the matter at hand. Your brand new Christmas Raspberry Pi.

    Your new Raspberry Pi

    Did you wake up this morning to find a new Raspberry Pi under the tree? Congratulations, and welcome to the Raspberry Pi community! You’re one of us now, and we’re happy to have you on board.

    But what if you’ve never seen a Raspberry Pi before? What are you supposed to do with it? What’s all the fuss about, and why does your new computer look so naked?

    Setting up your Raspberry Pi

    Are you comfy? Good. Then let us begin.

    Download our free operating system

    First of all, you need to make sure you have an operating system on your micro SD card: we suggest Raspbian, the Raspberry Pi Foundation’s official supported operating system. If your Pi is part of a starter kit, you might find that it comes with a micro SD card that already has Raspbian preinstalled. If not, you can download Raspbian for free from our website.

    An easy way to get Raspbian onto your SD card is to use a free tool called Etcher. Watch The MagPi’s Lucy Hattersley show you what you need to do. You can also use NOOBS to install Raspbian on your SD card, and our Getting Started guide explains how to do that.

    Plug it in and turn it on

    Your new Raspberry Pi 3 comes with four USB ports and an HDMI port. These allow you to plug in a keyboard, a mouse, and a television or monitor. If you have a Raspberry Pi Zero, you may need adapters to connect your devices to its micro USB and micro HDMI ports. Both the Raspberry Pi 3 and the Raspberry Pi Zero W have onboard wireless LAN, so you can connect to your home network, and you can also plug an Ethernet cable into the Pi 3.

    Make sure to plug the power cable in last. There’s no ‘on’ switch, so your Pi will turn on as soon as you connect the power. Raspberry Pi uses a micro USB power supply, so you can use a phone charger if you didn’t receive one as part of a kit.

    Learn with our free projects

    If you’ve never used a Raspberry Pi before, or you’re new to the world of coding, the best place to start is our projects site. It’s packed with free projects that will guide you through the basics of coding and digital making. You can create projects right on your screen using Scratch and Python, connect a speaker to make music with Sonic Pi, and upgrade your skills to physical making using items from around your house.

    Here’s James to show you how to build a whoopee cushion using a Raspberry Pi, paper plates, tin foil and a sponge:

    Whoopee cushion PRANK with a Raspberry Pi: HOW-TO

    Explore the world of Raspberry Pi physical computing with our free FutureLearn courses: http://rpf.io/futurelearn Free make your own Whoopi Cushion resource: http://rpf.io/whoopi For more information on Raspberry Pi and the charitable work of the Raspberry Pi Foundation, including Code Club and CoderDojo, visit http://rpf.io Our resources are free to use in schools, clubs, at home and at events.

    Diving deeper

    You’ve plundered our projects, you’ve successfully rigged every chair in the house to make rude noises, and now you want to dive deeper into digital making. Good! While you’re digesting your Christmas dinner, take a moment to skim through the Raspberry Pi blog for inspiration. You’ll find projects from across our worldwide community, with everything from home automation projects and retrofit upgrades, to robots, gaming systems, and cameras.

    You’ll also find bucketloads of ideas in The MagPi magazine, the official monthly Raspberry Pi publication, available in both print and digital format. You can download every issue for free. If you subscribe, you’ll get a Raspberry Pi Zero W to add to your new collection. HackSpace magazine is another fantastic place to turn for Raspberry Pi projects, along with other maker projects and tutorials.

    And, of course, simply typing “Raspberry Pi projects” into your preferred search engine will find thousands of ideas. Sites like Hackster, Hackaday, Instructables, Pimoroni, and Adafruit all have plenty of fab Raspberry Pi tutorials that they’ve devised themselves and that community members like you have created.

    And finally

    If you make something marvellous with your new Raspberry Pi – and we know you will – don’t forget to share it with us! Our Twitter, Facebook, Instagram and Google+ accounts are brimming with chatter, projects, and events. And our forums are a great place to visit if you have questions about your Raspberry Pi or if you need some help.

    It’s good to get together with like-minded folks, so check out the growing Raspberry Jam movement. Raspberry Jams are community-run events where makers and enthusiasts can meet other makers, show off their projects, and join in with workshops and discussions. Find your nearest Jam here.

    Have a great festive holiday and welcome to the community. We’ll see you in 2018!

    The post Thank you for my new Raspberry Pi, Santa! What next? appeared first on Raspberry Pi.

    Using Amazon CloudWatch and Amazon SNS to Notify when AWS X-Ray Detects Elevated Levels of Latency, Errors, and Faults in Your Application

    Post Syndicated from Bharath Kumar original https://aws.amazon.com/blogs/devops/using-amazon-cloudwatch-and-amazon-sns-to-notify-when-aws-x-ray-detects-elevated-levels-of-latency-errors-and-faults-in-your-application/

    AWS X-Ray helps developers analyze and debug production applications built using microservices or serverless architectures and quantify customer impact. With X-Ray, you can understand how your application and its underlying services are performing and identify and troubleshoot the root cause of performance issues and errors. You can use these insights to identify issues and opportunities for optimization.

    In this blog post, I will show you how you can use Amazon CloudWatch and Amazon SNS to get notified when X-Ray detects high latency, errors, and faults in your application. Specifically, I will show you how to use this sample app to get notified through an email or SMS message when your end users observe high latencies or server-side errors when they use your application. You can customize the alarms and events by updating the sample app code.

    Sample App Overview

    The sample app uses the X-Ray GetServiceGraph API to get the following information:

    • Aggregated response time.
    • Requests that failed with 4xx status code (errors).
    • 429 status code (throttle).
    • 5xx status code (faults).
    Sample app architecture

    Overview of sample app architecture

    Getting started

    The sample app uses AWS CloudFormation to deploy the required resources.
    To install the sample app:

    1. Run git clone to get the sample app.
    2. Update the JSON file in the Setup folder with threshold limits and notification details.
    3. Run the install.py script to install the sample app.

    For more information about the installation steps, see the readme file on GitHub.

    You can update the app configuration to include your phone number or email to get notified when your application in X-Ray breaches the latency, error, and fault limits you set in the configuration. If you prefer to not provide your phone number and email, then you can use the CloudWatch alarm deployed by the sample app to monitor your application in X-Ray.

    The sample app deploys resources with the sample app namespace you provided during setup. This enables you to have multiple sample apps in the same region.

    CloudWatch rules

    The sample app uses two CloudWatch rules:

    1. SCHEDULEDLAMBDAFOR-sample_app_name to trigger at regular intervals the AWS Lambda function that queries the GetServiceGraph API.
    2. XRAYALERTSFOR-sample_app_name to look for published CloudWatch events that match the pattern defined in this rule.
    CloudWatch Rules for sample app

    CloudWatch rules created for the sample app

    CloudWatch alarms

    If you did not provide your phone number or email in the JSON file, the sample app uses a CloudWatch alarm named XRayCloudWatchAlarm-sample_app_name in combination with the CloudWatch event that you can use for monitoring.

    CloudWatch Alarm for sample app

    CloudWatch alarm created for the sample app

    Amazon SNS messages

    The sample app creates two SNS topics:

    • sample_app_name-cloudwatcheventsnstopic to send out an SMS message when the CloudWatch event matches a pattern published from the Lambda function.
    • sample_app_name-cloudwatchalarmsnstopic to send out an email message when the CloudWatch alarm goes into an ALARM state.
    Amazon SNS for sample app

    Amazon SNS created for the sample app

    Getting notifications

    The CloudWatch event looks for the following matching pattern:

    {
      "detail-type": [
        "XCW Notification for Alerts"
      ],
      "source": [
        "<sample_app_name>-xcw.alerts"
      ]
    }
    

    The event then invokes an SNS topic that sends out an SMS message.

    SMS in sample app

    SMS that is sent when CloudWatch Event invokes Amazon SNS topic

    The CloudWatch alarm looks for the TriggeredRules metric that is published whenever the CloudWatch event matches the event pattern. It goes into the ALARM state whenever TriggeredRules > 0 for the specified evaluation period and invokes an SNS topic that sends an email message.

    Email sent in sample app

    Email that is sent when CloudWatch Alarm goes to ALARM state

    Stopping notifications

    If you provided your phone number or email address, but would like to stop getting notified, change the SUBSCRIBE_TO_EMAIL_SMS environment variable in the Lambda function to No. Then, go to the Amazon SNS console and delete the subscriptions. You can still monitor your application for elevated levels of latency, errors, and faults by using the CloudWatch console.

    Lambda environment variable in sample app

    Change environment variable in Lambda

     

    Delete subscription in SNS for sample app

    Delete subscriptions to stop getting notified

    Uninstalling the sample app

    To uninstall the sample app, run the uninstall.py script in the Setup folder.

    Extending the sample app

    The sample app notifes you when when X-Ray detects high latency, errors, and faults in your application. You can extend it to provide more value for your use cases (for example, to perform an action on a resource when the state of a CloudWatch alarm changes).

    To summarize, after this set up you will be able to get notified through Amazon SNS when X-Ray detects high latency, errors and faults in your application.

    I hope you found this information about setting up alarms and alerts for your application in AWS X-Ray helpful. Feel free to leave questions or other feedback in the comments. Feel free to learn more about AWS X-Ray, Amazon SNS and Amazon CloudWatch

    About the Author

    Bharath Kumar is a Sr.Product Manager with AWS X-Ray. He has developed and launched mobile games, web applications on microservices and serverless architecture.

    Power data ingestion into Splunk using Amazon Kinesis Data Firehose

    Post Syndicated from Tarik Makota original https://aws.amazon.com/blogs/big-data/power-data-ingestion-into-splunk-using-amazon-kinesis-data-firehose/

    In late September, during the annual Splunk .conf, Splunk and Amazon Web Services (AWS) jointly announced that Amazon Kinesis Data Firehose now supports Splunk Enterprise and Splunk Cloud as a delivery destination. This native integration between Splunk Enterprise, Splunk Cloud, and Amazon Kinesis Data Firehose is designed to make AWS data ingestion setup seamless, while offering a secure and fault-tolerant delivery mechanism. We want to enable customers to monitor and analyze machine data from any source and use it to deliver operational intelligence and optimize IT, security, and business performance.

    With Kinesis Data Firehose, customers can use a fully managed, reliable, and scalable data streaming solution to Splunk. In this post, we tell you a bit more about the Kinesis Data Firehose and Splunk integration. We also show you how to ingest large amounts of data into Splunk using Kinesis Data Firehose.

    Push vs. Pull data ingestion

    Presently, customers use a combination of two ingestion patterns, primarily based on data source and volume, in addition to existing company infrastructure and expertise:

    1. Pull-based approach: Using dedicated pollers running the popular Splunk Add-on for AWS to pull data from various AWS services such as Amazon CloudWatch or Amazon S3.
    2. Push-based approach: Streaming data directly from AWS to Splunk HTTP Event Collector (HEC) by using AWS Lambda. Examples of applicable data sources include CloudWatch Logs and Amazon Kinesis Data Streams.

    The pull-based approach offers data delivery guarantees such as retries and checkpointing out of the box. However, it requires more ops to manage and orchestrate the dedicated pollers, which are commonly running on Amazon EC2 instances. With this setup, you pay for the infrastructure even when it’s idle.

    On the other hand, the push-based approach offers a low-latency scalable data pipeline made up of serverless resources like AWS Lambda sending directly to Splunk indexers (by using Splunk HEC). This approach translates into lower operational complexity and cost. However, if you need guaranteed data delivery then you have to design your solution to handle issues such as a Splunk connection failure or Lambda execution failure. To do so, you might use, for example, AWS Lambda Dead Letter Queues.

    How about getting the best of both worlds?

    Let’s go over the new integration’s end-to-end solution and examine how Kinesis Data Firehose and Splunk together expand the push-based approach into a native AWS solution for applicable data sources.

    By using a managed service like Kinesis Data Firehose for data ingestion into Splunk, we provide out-of-the-box reliability and scalability. One of the pain points of the old approach was the overhead of managing the data collection nodes (Splunk heavy forwarders). With the new Kinesis Data Firehose to Splunk integration, there are no forwarders to manage or set up. Data producers (1) are configured through the AWS Management Console to drop data into Kinesis Data Firehose.

    You can also create your own data producers. For example, you can drop data into a Firehose delivery stream by using Amazon Kinesis Agent, or by using the Firehose API (PutRecord(), PutRecordBatch()), or by writing to a Kinesis Data Stream configured to be the data source of a Firehose delivery stream. For more details, refer to Sending Data to an Amazon Kinesis Data Firehose Delivery Stream.

    You might need to transform the data before it goes into Splunk for analysis. For example, you might want to enrich it or filter or anonymize sensitive data. You can do so using AWS Lambda. In this scenario, Kinesis Data Firehose buffers data from the incoming source data, sends it to the specified Lambda function (2), and then rebuffers the transformed data to the Splunk Cluster. Kinesis Data Firehose provides the Lambda blueprints that you can use to create a Lambda function for data transformation.

    Systems fail all the time. Let’s see how this integration handles outside failures to guarantee data durability. In cases when Kinesis Data Firehose can’t deliver data to the Splunk Cluster, data is automatically backed up to an S3 bucket. You can configure this feature while creating the Firehose delivery stream (3). You can choose to back up all data or only the data that’s failed during delivery to Splunk.

    In addition to using S3 for data backup, this Firehose integration with Splunk supports Splunk Indexer Acknowledgments to guarantee event delivery. This feature is configured on Splunk’s HTTP Event Collector (HEC) (4). It ensures that HEC returns an acknowledgment to Kinesis Data Firehose only after data has been indexed and is available in the Splunk cluster (5).

    Now let’s look at a hands-on exercise that shows how to forward VPC flow logs to Splunk.

    How-to guide

    To process VPC flow logs, we implement the following architecture.

    Amazon Virtual Private Cloud (Amazon VPC) delivers flow log files into an Amazon CloudWatch Logs group. Using a CloudWatch Logs subscription filter, we set up real-time delivery of CloudWatch Logs to an Kinesis Data Firehose stream.

    Data coming from CloudWatch Logs is compressed with gzip compression. To work with this compression, we need to configure a Lambda-based data transformation in Kinesis Data Firehose to decompress the data and deposit it back into the stream. Firehose then delivers the raw logs to the Splunk Http Event Collector (HEC).

    If delivery to the Splunk HEC fails, Firehose deposits the logs into an Amazon S3 bucket. You can then ingest the events from S3 using an alternate mechanism such as a Lambda function.

    When data reaches Splunk (Enterprise or Cloud), Splunk parsing configurations (packaged in the Splunk Add-on for Kinesis Data Firehose) extract and parse all fields. They make data ready for querying and visualization using Splunk Enterprise and Splunk Cloud.

    Walkthrough

    Install the Splunk Add-on for Amazon Kinesis Data Firehose

    The Splunk Add-on for Amazon Kinesis Data Firehose enables Splunk (be it Splunk Enterprise, Splunk App for AWS, or Splunk Enterprise Security) to use data ingested from Amazon Kinesis Data Firehose. Install the Add-on on all the indexers with an HTTP Event Collector (HEC). The Add-on is available for download from Splunkbase.

    HTTP Event Collector (HEC)

    Before you can use Kinesis Data Firehose to deliver data to Splunk, set up the Splunk HEC to receive the data. From Splunk web, go to the Setting menu, choose Data Inputs, and choose HTTP Event Collector. Choose Global Settings, ensure All tokens is enabled, and then choose Save. Then choose New Token to create a new HEC endpoint and token. When you create a new token, make sure that Enable indexer acknowledgment is checked.

    When prompted to select a source type, select aws:cloudwatch:vpcflow.

    Create an S3 backsplash bucket

    To provide for situations in which Kinesis Data Firehose can’t deliver data to the Splunk Cluster, we use an S3 bucket to back up the data. You can configure this feature to back up all data or only the data that’s failed during delivery to Splunk.

    Note: Bucket names are unique. Thus, you can’t use tmak-backsplash-bucket.

    aws s3 create-bucket --bucket tmak-backsplash-bucket --create-bucket-configuration LocationConstraint=ap-northeast-1

    Create an IAM role for the Lambda transform function

    Firehose triggers an AWS Lambda function that transforms the data in the delivery stream. Let’s first create a role for the Lambda function called LambdaBasicRole.

    Note: You can also set this role up when creating your Lambda function.

    $ aws iam create-role --role-name LambdaBasicRole --assume-role-policy-document file://TrustPolicyForLambda.json

    Here is TrustPolicyForLambda.json.

    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Principal": {
            "Service": "lambda.amazonaws.com"
          },
          "Action": "sts:AssumeRole"
        }
      ]
    }

     

    After the role is created, attach the managed Lambda basic execution policy to it.

    $ aws iam attach-role-policy 
      --policy-arn arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole 
      --role-name LambdaBasicRole
    

     

    Create a Firehose Stream

    On the AWS console, open the Amazon Kinesis service, go to the Firehose console, and choose Create Delivery Stream.

    In the next section, you can specify whether you want to use an inline Lambda function for transformation. Because incoming CloudWatch Logs are gzip compressed, choose Enabled for Record transformation, and then choose Create new.

    From the list of the available blueprint functions, choose Kinesis Data Firehose CloudWatch Logs Processor. This function unzips data and place it back into the Firehose stream in compliance with the record transformation output model.

    Enter a name for the Lambda function, choose Choose an existing role, and then choose the role you created earlier. Then choose Create Function.

    Go back to the Firehose Stream wizard, choose the Lambda function you just created, and then choose Next.

    Select Splunk as the destination, and enter your Splunk Http Event Collector information.

    Note: Amazon Kinesis Data Firehose requires the Splunk HTTP Event Collector (HEC) endpoint to be terminated with a valid CA-signed certificate matching the DNS hostname used to connect to your HEC endpoint. You receive delivery errors if you are using a self-signed certificate.

    In this example, we only back up logs that fail during delivery.

    To monitor your Firehose delivery stream, enable error logging. Doing this means that you can monitor record delivery errors.

    Create an IAM role for the Firehose stream by choosing Create new, or Choose. Doing this brings you to a new screen. Choose Create a new IAM role, give the role a name, and then choose Allow.

    If you look at the policy document, you can see that the role gives Kinesis Data Firehose permission to publish error logs to CloudWatch, execute your Lambda function, and put records into your S3 backup bucket.

    You now get a chance to review and adjust the Firehose stream settings. When you are satisfied, choose Create Stream. You get a confirmation once the stream is created and active.

    Create a VPC Flow Log

    To send events from Amazon VPC, you need to set up a VPC flow log. If you already have a VPC flow log you want to use, you can skip to the “Publish CloudWatch to Kinesis Data Firehose” section.

    On the AWS console, open the Amazon VPC service. Then choose VPC, Your VPC, and choose the VPC you want to send flow logs from. Choose Flow Logs, and then choose Create Flow Log. If you don’t have an IAM role that allows your VPC to publish logs to CloudWatch, choose Set Up Permissions and Create new role. Use the defaults when presented with the screen to create the new IAM role.

    Once active, your VPC flow log should look like the following.

    Publish CloudWatch to Kinesis Data Firehose

    When you generate traffic to or from your VPC, the log group is created in Amazon CloudWatch. The new log group has no subscription filter, so set up a subscription filter. Setting this up establishes a real-time data feed from the log group to your Firehose delivery stream.

    At present, you have to use the AWS Command Line Interface (AWS CLI) to create a CloudWatch Logs subscription to a Kinesis Data Firehose stream. However, you can use the AWS console to create subscriptions to Lambda and Amazon Elasticsearch Service.

    To allow CloudWatch to publish to your Firehose stream, you need to give it permissions.

    $ aws iam create-role --role-name CWLtoKinesisFirehoseRole --assume-role-policy-document file://TrustPolicyForCWLToFireHose.json


    Here is the content for TrustPolicyForCWLToFireHose.json.

    {
      "Statement": {
        "Effect": "Allow",
        "Principal": { "Service": "logs.us-east-1.amazonaws.com" },
        "Action": "sts:AssumeRole"
      }
    }
    

     

    Attach the policy to the newly created role.

    $ aws iam put-role-policy 
        --role-name CWLtoKinesisFirehoseRole 
        --policy-name Permissions-Policy-For-CWL 
        --policy-document file://PermissionPolicyForCWLToFireHose.json

    Here is the content for PermissionPolicyForCWLToFireHose.json.

    {
        "Statement":[
          {
            "Effect":"Allow",
            "Action":["firehose:*"],
            "Resource":["arn:aws:firehose:us-east-1:YOUR-AWS-ACCT-NUM:deliverystream/ FirehoseSplunkDeliveryStream"]
          },
          {
            "Effect":"Allow",
            "Action":["iam:PassRole"],
            "Resource":["arn:aws:iam::YOUR-AWS-ACCT-NUM:role/CWLtoKinesisFirehoseRole"]
          }
        ]
    }

    Finally, create a subscription filter.

    $ aws logs put-subscription-filter 
       --log-group-name " /vpc/flowlog/FirehoseSplunkDemo" 
       --filter-name "Destination" 
       --filter-pattern "" 
       --destination-arn "arn:aws:firehose:us-east-1:YOUR-AWS-ACCT-NUM:deliverystream/FirehoseSplunkDeliveryStream" 
       --role-arn "arn:aws:iam::YOUR-AWS-ACCT-NUM:role/CWLtoKinesisFirehoseRole"

    When you run the AWS CLI command preceding, you don’t get any acknowledgment. To validate that your CloudWatch Log Group is subscribed to your Firehose stream, check the CloudWatch console.

    As soon as the subscription filter is created, the real-time log data from the log group goes into your Firehose delivery stream. Your stream then delivers it to your Splunk Enterprise or Splunk Cloud environment for querying and visualization. The screenshot following is from Splunk Enterprise.

    In addition, you can monitor and view metrics associated with your delivery stream using the AWS console.

    Conclusion

    Although our walkthrough uses VPC Flow Logs, the pattern can be used in many other scenarios. These include ingesting data from AWS IoT, other CloudWatch logs and events, Kinesis Streams or other data sources using the Kinesis Agent or Kinesis Producer Library. We also used Lambda blueprint Kinesis Data Firehose CloudWatch Logs Processor to transform streaming records from Kinesis Data Firehose. However, you might need to use a different Lambda blueprint or disable record transformation entirely depending on your use case. For an additional use case using Kinesis Data Firehose, check out This is My Architecture Video, which discusses how to securely centralize cross-account data analytics using Kinesis and Splunk.

     


    Additional Reading

    If you found this post useful, be sure to check out Integrating Splunk with Amazon Kinesis Streams and Using Amazon EMR and Hunk for Rapid Response Log Analysis and Review.


    About the Authors

    Tarik Makota is a solutions architect with the Amazon Web Services Partner Network. He provides technical guidance, design advice and thought leadership to AWS’ most strategic software partners. His career includes work in an extremely broad software development and architecture roles across ERP, financial printing, benefit delivery and administration and financial services. He holds an M.S. in Software Development and Management from Rochester Institute of Technology.

     

     

     

    Roy Arsan is a solutions architect in the Splunk Partner Integrations team. He has a background in product development, cloud architecture, and building consumer and enterprise cloud applications. More recently, he has architected Splunk solutions on major cloud providers, including an AWS Quick Start for Splunk that enables AWS users to easily deploy distributed Splunk Enterprise straight from their AWS console. He’s also the co-author of the AWS Lambda blueprints for Splunk. He holds an M.S. in Computer Science Engineering from the University of Michigan.

     

     

     

    timeShift(GrafanaBuzz, 1w) Issue 26

    Post Syndicated from Blogs on Grafana Labs Blog original https://grafana.com/blog/2017/12/15/timeshiftgrafanabuzz-1w-issue-26/

    Welcome to TimeShift

    Big news this week: Grafana v5.0 has been merged into master and is available in the nightly builds! We are really excited to share this with the community, and look forward to receiving community feedback (good or bad) on the new features and enhancements. As you see in the video below, there are some big changes that aim to improve workflow, team organization, permissions, and overall user experience. Check out the video below to see it in action, and give it a spin yourself.

    • New Grid Layout Engine: Make it easier to build dashboards and enable more complex layouts
    • Dashboard Folders & Permissions
    • User Teams
    • Improved Dashboard Settings UX
    • Improved Page Design and Navigation

    NOTE: That’s actually Torkel Odegaard, creator of Grafana shredding on the soundtrack!


    Latest Stable Release

    Grafana 4.6.3 is available and includes some bug fixes:

    • Gzip: Fixes bug Gravatar images when gzip was enabled #5952
    • Alert list: Now shows alert state changes even after adding manual annotations on dashboard #99513
    • Alerting: Fixes bug where rules evaluated as firing when all conditions was false and using OR operator. #93183
    • Cloudwatch: CloudWatch no longer display metrics’ default alias #101514, thx @mtanda

    Download Grafana 4.6.3 Now


    From the Blogosphere

    Monitoring MySQL with Prometheus and Grafana: Julien Pivotto (who will be speaking at GrafanaCon EU), gave a great presentation last month on Monitoring MySQL with Prometheus and Grafana. You can also check out his slides.

    Monitor your Docker Containers: docker stats doesn’t often give you the level of insight you need to effectively manage your containers. This article discuses how to use cAdvisor, Prometheus and Grafana to get a handle on your Docker performance.

    Magento Performance Monitoring with Grafana Dashboards and Alerts: This Christmas-themed post walks you through how to monitor the performance of Magento, start building dashboards, and setup Slack alerts, all while sitting in your rocking chair, sipping eggnog.

    Icinga Web2 and Grafana Working Together: This is a follow-up post about displaying service performance data from Icinga2 in Grafana. Now that we know how to list the services on a dashboard, it would be helpful to filter this list so that specific teams can know the status of services they specifically manage.

    Setup of sitespeed in AWS with Peter Hedenskog: In this video, Peter Hedenskop from Wikimedia and Stefan Judis set up a video call to go over setting up sitespeed in AWS. They create a fully functional Grafana dashboard, including web performance metrics from Stefan’s personal website running in the cloud.

    Deploying Grafana to Access Zabbix in Alibaba Cloud ECS: This article walks you through how to deploy Grafana on Alibaba Cloud ECS to access Zabbix to visualize performance data for your website or application.

    Let’s Summarize the Test Results with Grafana Annotations + Prometheus: The engineers of NTT Communications Corporation have created something of an Advent Calendar, with new posts each day. December 14th’s post focused on Grafana’s new annotation functionality via the UI and the API.


    New Speakers Added!

    We have added new speakers, and talk titles to the lineup at grafanacon.org. Only a few left to include, which should be added in the next few days.

    Join us March 1-2, 2018 in Amsterdam for 2 days of talks centered around Grafana and the surrounding monitoring ecosystem including Graphite, Prometheus, InfluxData, Elasticsearch, Kubernetes, and many other topics.

    This year we have speakers from Bloomberg, CERN, Tinder, Red Hat, Prometheus, InfluxData, Fastly, Automattic, Percona, and more!

    Get Your Ticket Now


    Grafana Plugins

    This week we have a new plugin for the popular IoT platform DeviceHive, and an update to our own Kubernetes App. To install or update any plugin in an on-prem Grafana instance, use the Grafana-cli tool, or install and update with 1 click on Hosted Grafana.

    NEW PLUGIN

    DeviceHive is an IOT Platform and now has a data source plugin, which means you can visualize the live commands and notifications from a device.


    Install Now

    UPDATED PLUGIN

    Kubernetes App – The Grafana Kubernetes App allows you to monitor your Kubernetes cluster’s performance. It includes 4 dashboards, Cluster, Node, Pod/Container and Deployment, and also comes with Intel Snap collectors that are deployed to your cluster to collect health metrics.


    Update


    Upcoming Events:

    In between code pushes we like to speak at, sponsor and attend all kinds of conferences and meetups. We also like to make sure we mention other Grafana-related events happening all over the world. If you’re putting on just such an event, let us know and we’ll list it here.

    FOSDEM | Brussels, Belgium – Feb 3-4, 2018: FOSDEM is a free developer conference where thousands of developers of free and open source software gather to share ideas and technology. Carl Bergquist is managing the Cloud and Monitoring Devroom, and we’ve heard there were some great talks submitted. There is no need to register; all are welcome.


    Tweet of the Week

    We scour Twitter each week to find an interesting/beautiful dashboard and show it off! #monitoringLove


    Ok, ok – This tweet isn’t showing a off a dashboard, but we can’t help but be thrilled when someone post about our poster series. We’ll be working on the fourth poster to be unveiled at GrafanaCon EU!


    Grafana Labs is Hiring!

    We are passionate about open source software and thrive on tackling complex challenges to build the future. We ship code from every corner of the globe and love working with the community. If this sounds exciting, you’re in luck – WE’RE HIRING!

    Check out our Open Positions


    How are we doing?

    Let us know what you think about timeShift. Submit a comment on this article below, or post something at our community forum. Find an article I haven’t included? Send it my way. Help us make timeShift better!

    Follow us on Twitter, like us on Facebook, and join the Grafana Labs community.

    New – Amazon CloudWatch Agent with AWS Systems Manager Integration – Unified Metrics & Log Collection for Linux & Windows

    Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-amazon-cloudwatch-agent-with-aws-systems-manager-integration-unified-metrics-log-collection-for-linux-windows/

    In the past I’ve talked about several agents, deaemons, and scripts that you could use to collect system metrics and log files for your Windows and Linux instances and on-premise services and publish them to Amazon CloudWatch. The data collected by this somewhat disparate collection of tools gave you visibility into the status and behavior of your compute resources, along with the power to take action when a value goes out of range and indicates a potential issue. You can graph any desired metrics on CloudWatch Dashboards, initiate actions via CloudWatch Alarms, and search CloudWatch Logs to find error messages, while taking advantage of our support for custom high-resolution metrics.

    New Unified Agent
    Today we are taking a nice step forward and launching a new, unified CloudWatch Agent. It runs in the cloud and on-premises, on Linux and Windows instances and servers, and handles metrics and log files. You can deploy it using AWS Systems Manager (SSM) Run Command, SSM State Manager, or from the CLI. Here are some of the most important features:

    Single Agent – A single agent now collects both metrics and logs. This simplifies the setup process and reduces complexity.

    Cross-Platform / Cross-Environment – The new agent runs in the cloud and on-premises, on 64-bit Linux and 64-bit Windows, and includes HTTP proxy server support.

    Configurable – The new agent captures the most useful system metrics automatically. It can be configured to collect hundreds of others, including fine-grained metrics on sub-resources such as CPU threads, mounted filesystems, and network interfaces.

    CloudWatch-Friendly – The new agent supports standard 1-minute metrics and the newer 1-second high-resolution metrics. It automatically includes EC2 dimensions such as Instance Id, Image Id, and Auto Scaling Group Name, and also supports the use of custom dimensions. All of the dimensions can be used for custom aggregation across Auto Scaling Groups, applications, and so forth.

    Migration – You can easily migrate existing AWS SSM and EC2Config configurations for use with the new agent.

    Installing the Agent
    The CloudWatch Agent uses an IAM role when running on an EC2 instance, and an IAM user when running on an on-premises server. The role or the user must include the AmazonSSMFullAccess and AmazonEC2ReadOnlyAccess policies. Here’s my role:

    I can easily add it to a running instance (this is a relatively new and very handy EC2 feature):

    The SSM Agent is already running on my instance. If it wasn’t, I would follow the steps in Installing and Configuring SSM Agent to set it up.

    Next, I install the CloudWatch Agent using the AWS Systems Manager:

    This takes just a few seconds. Now I can use a simple wizard to set up the configuration file for the agent:

    The wizard also lets me set up the log files to be monitored:

    The wizard generates a JSON-format config file and stores it on the instance. It also offers me the option to upload the file to my Parameter Store so that I can deploy it to my other instances (I can also do fine-grained customization of the metrics and log collection configuration by editing the file):

    Now I can start the CloudWatch Agent using Run Command, supplying the name of my configuration in the Parameter Store:

    This runs in a few seconds and the agent begins to publish metrics right away. As I mentioned earlier, the agent can publish fine-grained metrics on the resources inside of or attached to an instance. For example, here are the metrics for each filesystem:

    There’s a separate log stream for each monitored log file on each instance:

    I can view and search it, just like I can do for any other log stream:

    Now Available
    The new CloudWatch Agent is available now and you can start using it today in all public AWS Regions, with AWS GovCloud (US) and the Regions in China to follow.

    There’s no charge for the agent; you pay the usual CloudWatch prices for logs and custom metrics.

    Jeff;