Tag Archives: location

Cloud Babble: The Jargon of Cloud Storage

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/what-is-cloud-computing/

Cloud Babble

One of the things we in the technology business are good at is coming up with names, phrases, euphemisms, and acronyms for the stuff that we create. The Cloud Storage market is no different, and we’d like to help by illuminating some of the cloud storage related terms that you might come across. We know this is just a start, so please feel free to add in your favorites in the comments section below and we’ll update this post accordingly.


The cloud is really just a collection of purpose built servers. In a public cloud the servers are shared between multiple unrelated tenants. In a private cloud, the servers are dedicated to a single tenant or sometimes a group of related tenants. A public cloud is off-site, while a private cloud can be on-site or off-site – or on-prem or off-prem, if you prefer.

Both Sides Now: Hybrid Clouds

Speaking of on-prem and off-prem, there are Hybrid Clouds or Hybrid Data Clouds depending on what you need. Both are based on the idea that you extend your local resources (typically on-prem) to the cloud (typically off-prem) as needed. This extension is controlled by software that decides, based on rules you define, what needs to be done where.

A Hybrid Data Cloud is specific to data. For example, you can set up a rule that says all accounting files that have not been touched in the last year are automatically moved off-prem to cloud storage. The files are still available; they are just no longer stored on your local systems. The rules can be defined to fit an organization’s workflow and data retention policies.

A Hybrid Cloud is similar to a Hybrid Data Cloud except it also extends compute. For example, at the end of the quarter, you can spin up order processing application instances off-prem as needed to add to your on-prem capacity. Of course, determining where the transactional data used and created by these applications resides can be an interesting systems design challenge.

Clouds in my Coffee: Fog

Typically, public and private clouds live in large buildings called data centers. Full of servers, networking equipment, and clean air, data centers need lots of power, lots of networking bandwidth, and lots of space. This often limits where data centers are located. The further away you are from a data center, the longer it generally takes to get your data to and from there. This is known as latency. That’s where “Fog” comes in.

Fog is often referred to as clouds close to the ground. Fog, in our cloud world, is basically having a “little” data center near you. This can make data storage and even cloud based processing faster for everyone nearby. Data, and less so processing, can be transferred to/from the Fog to the Cloud when time is less a factor. Data could also be aggregated in the Fog and sent to the Cloud. For example, your electric meter could report its minute-by-minute status to the Fog for diagnostic purposes. Then once a day the aggregated data could be send to the power company’s Cloud for billing purposes.

Another term used in place of Fog is Edge, as in computing at the Edge. In either case, a given cloud (data center) usually has multiple Edges (little data centers) connected to it. The connection between the Edge and the Cloud is sometimes known as the middle-mile. The network in the middle-mile can be less robust than that required to support a stand-alone data center. For example, the middle-mile can use 1 Gbps lines, versus a data center, which would require multiple 10 Gbps lines.

Heavy Clouds No Rain: Data

We’re all aware that we are creating, processing, and storing data faster than ever before. All of this data is stored in either a structured or more likely an unstructured way. Databases and data warehouses are structured ways to store data, but a vast amount of data is unstructured – meaning the schema and data access requirements are not known until the data is queried. A large pool of unstructured data in a flat architecture can be referred to as a Data Lake.

A Data Lake is often created so we can perform some type of “big data” analysis. In an over simplified example, let’s extend the lake metaphor a bit and ask the question; “how many fish are in our lake?” To get an answer, we take a sufficient sample of our lake’s water (data), count the number of fish we find, and extrapolate based on the size of the lake to get an answer within a given confidence interval.

A Data Lake is usually found in the cloud, an excellent place to store large amounts of non-transactional data. Watch out as this can lead to our data having too much Data Gravity or being locked in the Hotel California. This could also create a Data Silo, thereby making a potential data Lift-and-Shift impossible. Let me explain:

  • Data Gravity — Generally, the more data you collect in one spot, the harder it is to move. When you store data in a public cloud, you have to pay egress and/or network charges to download the data to another public cloud or even to your own on-premise systems. Some public cloud vendors charge a lot more than others, meaning that depending on your public cloud provider, your data could financially have a lot more gravity than you expected.
  • Hotel California — This is like Data Gravity but to a lesser scale. Your data is in the Hotel California if, to paraphrase, “your data can check out any time you want, but it can never leave.” If the cost of downloading your data is limiting the things you want to do with that data, then your data is in the Hotel California. Data is generally most valuable when used, and with cloud storage that can include archived data. This assumes of course that the archived data is readily available, and affordable, to download. When considering a cloud storage project always figure in the cost of using your own data.
  • Data Silo — Over the years, businesses have suffered from organizational silos as information is not shared between different groups, but instead needs to travel up to the top of the silo before it can be transferred to another silo. If your data is “trapped” in a given cloud by the cost it takes to share such data, then you may have a Data Silo, and that’s exactly opposite of what the cloud should do.
  • Lift-and-Shift — This term is used to define the movement of data or applications from one data center to another or from on-prem to off-prem systems. The move generally occurs all at once and once everything is moved, systems are operational and data is available at the new location with few, if any, changes. If your data has too much gravity or is locked in a hotel, a data lift-and-shift may break the bank.

I Can See Clearly Now

Hopefully, the cloudy terms we’ve covered are well, less cloudy. As we mentioned in the beginning, our compilation is just a start, so please feel free to add in your favorite cloud term in the comments section below and we’ll update this post with your contributions. Keep your entries “clean,” and please no words or phrases that are really adverts for your company. Thanks.

The post Cloud Babble: The Jargon of Cloud Storage appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Judge Tells Movie Company That it Can’t Sue Alleged BitTorrent Pirate

Post Syndicated from Andy original https://torrentfreak.com/judge-tells-movie-company-that-it-cant-sue-alleged-bittorrent-pirate-180118/

Despite a considerable migration towards streaming piracy in recent years, copyright trolls are still finding plenty of potential targets around the world. Alleged BitTorrent pirates are target number one since their activities are most easily tracked. However, it isn’t all plain sailing for the pirate hunters.

Last December we reported on the case of Lingfu Zhang, an Oregan resident accused by the makers of the 2015 drama film Fathers & Daughters (F&D) of downloading and sharing their content without permission. While these kinds of cases often disappear, with targets making confidential settlements to make a legal battle go away, Zhang chose to fight back.

Represented by attorney David Madden, Zhang not only denied downloading the movie in question but argued that the filmmakers had signed away their online distribution rights. He noted that (F&D), via an agent, had sold the online distribution rights to a third party not involved in the case.

So, if F&D no longer held the right to distribute the movie online, suing for an infringement of those rights would be impossible. With this in mind, Zhang’s attorney moved for a summary judgment in his client’s favor.

“ZHANG denies downloading the movie but Defendant’s current motion for summary judgment challenges a different portion of F&D’s case,” Madden wrote.

“Defendant argues that F&D has alienated all of the relevant rights necessary to sue for infringement under the Copyright Act.”

In response, F&D argued that they still held some rights, including the right to exploit the movie on “airlines and oceangoing vessels” but since Zhang wasn’t accused of being on either form of transport when the alleged offense occurred, the defense argued that point was moot.

Judge Michael H. Simon handed down his decision yesterday and it heralds bad news for F&D and celebration time for Zhang and his attorney. In a 17-page ruling first spotted by Fight Copyright Trolls, the Judge agrees that F&D has no standing to sue.

Citing the Righthaven LLC v. Hoehn case from 2013, the Judge notes that under the Copyright Act, only the “legal or beneficial owner of an exclusive right under a copyright” has standing to sue for infringement of that right.

Judge Simon notes that while F&D claims it is the ‘legal owner’ of the copyright to the Fathers & Daughters movie, the company “misstates the law”, adding that F&D also failed to present evidence that it is the ‘beneficial owner’ of the relevant exclusive right. On this basis, both claims are rejected.

The Judge noted that the exclusive rights to the movie were granted to a company called Vertical Entertainment which received the exclusive right to “manufacture, reproduce, sell, rent, exhibit, broadcast, transmit, stream, download, license, sub-license, distribute, sub-distribute, advertise, market, promote, publicize and exploit” the movie in the United States.

An exclusive license means that ownership of a copyright is transferred for the term of the license, meaning that Vertical – not F&D – is the legal owner under the Copyright Act. It matters not, the Judge says, that F&D retained the rights to display the movie “on airlines and ships” since only the transferee (Vertical) has standing to sue and those locations are irrelevant to the lawsuit.

“Under the Copyright Act, F&D is not the ‘legal owner’ with standing to sue for infringement relating to the rights that were transferred to Vertical through its exclusive license granted in the distribution agreement,” the Judge writes.

Also at issue was an undated document presented by F&D titled Anti-Piracy and Rights Enforcement Reservation of Rights Addendum. The document, relied upon by F&D, claimed that F&D is authorized to “enforce copyrights against Internet infringers” including those that use peer-to-peer technologies such as BitTorrent.

However, the Judge found that the peer-to-peer rights apparently reserved to F&D were infringing rights, not the display and distribution (exclusive rights) required to sue under the Copyright Act. Furthermore, the Judge determined that there was no evidence that this document existed before the lawsuit was filed. Zhang and his attorney previously asserted the addendum had been created afterwards and the Judge agrees.

“F&D did not dispute that the undated anti-piracy addendum was created after this lawsuit was filed, or otherwise respond to Defendant’s standing argument relating to the untimeliness of this document,” the Judge notes.

“Accordingly, because the only reasonable inference supported by the evidence is that this document was created after the filing of this lawsuit, it is not appropriate to consider for purposes of standing.”

So, with Vertical Entertainment the only company with the right to sue, could they be added to the lawsuit, F&D asked? Citing an earlier case, the Judge said ‘no’, noting that “summary judgment is not a procedural second chance to flesh out inadequate pleadings.”

With that, Judge Simon granted Lingfu Zhang’s request for summary judgment and dismissed F&D’s claims for lack of standing.

As noted by Fight Copyright Trolls, the movie licensing scheme employed by F&D is complex and, given the fact that notorious copyright troll outfit Guardaley is involved (Guardaley filed 24 cases in eight districts on behalf of F&D), it would be interesting if legal professionals could dig deeper, to see how far the rabbit hole goes.

The summary judgment can be found here (pdf)

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

Pirate IPTV Mastermind Owns Raided Bulgarian ISP, Sources Say

Post Syndicated from Andy original https://torrentfreak.com/pirate-iptv-mastermind-owns-raided-bulgarian-isp-sources-say-180117/

Last Tuesday a year-long investigation came to a climax when the Intellectual Property Crime Unit of the Cypriot Police teamed up with the Cybercrime Division of the Greek Police, the Dutch Fiscal Investigative and Intelligence Service (FIOD), the Cybercrime Unit of the Bulgarian Police, Europol’s Intellectual Property Crime Coordinated Coalition (IPC³), and the Audiovisual Anti-Piracy Alliance (AAPA), to raid a ‘pirate’ TV operation.

Official information didn’t become freely available until later in the week but across Cyprus, Bulgaria and Greece there were at least 17 house searches and individuals aged 43, 44, and 53 were arrested in Cyprus and remanded in custody for seven days.

According to Europol, the IPTV operation was considerable, offering 1,200 channels to as many as 500,000 subscribers around the world. Although early financial estimates in cases like these are best taken with a grain of salt, latest claims suggest revenues of five million euros a month, 60 million euros per year.

Part of the IPTV operation (credit:Europol)

As previously reported, so-called ‘front servers’ (servers designed to hide the main servers’ true location) were discovered in the Netherlands. Additionally, it’s now being reported by Cypriot media that nine suspects from an unnamed Internet service provider housing the servers were arrested and taken in for questioning. But the intrigue doesn’t stop there.

Well in advance of Europol’s statement late last week, TorrentFreak was informed by a source that police in Bulgaria had targeted a specific ISP called MegaByte Internet, located in the small town of Petrich. After returning online after a couple of days’ downtime, the ISP responded to some of our questions, detailed in our earlier interview.

“We were informed by the police that some of our clients in Petrich and Sofia were using our service for illegal streaming and actions,” a company spokesperson said.

“Of course, we were not able to know this because our services are unmanaged and root access [to servers] is given to our clients. For this reason any client and anyone that uses our services are responsible for their own actions.”

Other questions went unanswered but yesterday fresh information coming out of Cyprus certainly helped to fill in the gaps – and then some.

Philenews reports that a total of 140 servers were seized in Bulgaria – 60 from the headquarters of MegaByte Internet and four other custom locations, and 80 from two other locations in the Bulgarian capital, Sofia.

At least as far as locations go, this ties in with a statement provided by MegaByte to TF last week which claimed that some of its equipment was seized from Telepoint, Bulgaria’s biggest datacenter.

Viewing cards facilitating feeds…

We now know that ten employees of MegaByte were interrogated by the police but perhaps the biggest revelation is that the owner of the Internet service provider is now being openly named as the brains behind the entire operation.

Philenews reports that 47-year-old businessman Christos Apostolos Samaras from Greece, who has owned and run MegaByte since 2009, is the individual Europol reported as being arrested in Bulgaria last week.

In addition to linking him to MegaByte Internet’s domain, various searches indicate that Samaras is also connected to 1Stream, a hosting company dedicated to providing bandwidth for streaming purposes.

The investigation continues.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

Scale Your Web Application — One Step at a Time

Post Syndicated from Saurabh Shrivastava original https://aws.amazon.com/blogs/architecture/scale-your-web-application-one-step-at-a-time/

I often encounter people experiencing frustration as they attempt to scale their e-commerce or WordPress site—particularly around the cost and complexity related to scaling. When I talk to customers about their scaling plans, they often mention phrases such as horizontal scaling and microservices, but usually people aren’t sure about how to dive in and effectively scale their sites.

Now let’s talk about different scaling options. For instance if your current workload is in a traditional data center, you can leverage the cloud for your on-premises solution. This way you can scale to achieve greater efficiency with less cost. It’s not necessary to set up a whole powerhouse to light a few bulbs. If your workload is already in the cloud, you can use one of the available out-of-the-box options.

Designing your API in microservices and adding horizontal scaling might seem like the best choice, unless your web application is already running in an on-premises environment and you’ll need to quickly scale it because of unexpected large spikes in web traffic.

So how to handle this situation? Take things one step at a time when scaling and you may find horizontal scaling isn’t the right choice, after all.

For example, assume you have a tech news website where you did an early-look review of an upcoming—and highly-anticipated—smartphone launch, which went viral. The review, a blog post on your website, includes both video and pictures. Comments are enabled for the post and readers can also rate it. For example, if your website is hosted on a traditional Linux with a LAMP stack, you may find yourself with immediate scaling problems.

Let’s get more details on the current scenario and dig out more:

  • Where are images and videos stored?
  • How many read/write requests are received per second? Per minute?
  • What is the level of security required?
  • Are these synchronous or asynchronous requests?

We’ll also want to consider the following if your website has a transactional load like e-commerce or banking:

How is the website handling sessions?

  • Do you have any compliance requests—like the Payment Card Industry Data Security Standard (PCI DSS compliance) —if your website is using its own payment gateway?
  • How are you recording customer behavior data and fulfilling your analytics needs?
  • What are your loading balancing considerations (scaling, caching, session maintenance, etc.)?

So, if we take this one step at a time:

Step 1: Ease server load. We need to quickly handle spikes in traffic, generated by activity on the blog post, so let’s reduce server load by moving image and video to some third -party content delivery network (CDN). AWS provides Amazon CloudFront as a CDN solution, which is highly scalable with built-in security to verify origin access identity and handle any DDoS attacks. CloudFront can direct traffic to your on-premises or cloud-hosted server with its 113 Points of Presence (102 Edge Locations and 11 Regional Edge Caches) in 56 cities across 24 countries, which provides efficient caching.
Step 2: Reduce read load by adding more read replicas. MySQL provides a nice mirror replication for databases. Oracle has its own Oracle plug for replication and AWS RDS provide up to five read replicas, which can span across the region and even the Amazon database Amazon Aurora can have 15 read replicas with Amazon Aurora autoscaling support. If a workload is highly variable, you should consider Amazon Aurora Serverless database  to achieve high efficiency and reduced cost. While most mirror technologies do asynchronous replication, AWS RDS can provide synchronous multi-AZ replication, which is good for disaster recovery but not for scalability. Asynchronous replication to mirror instance means replication data can sometimes be stale if network bandwidth is low, so you need to plan and design your application accordingly.

I recommend that you always use a read replica for any reporting needs and try to move non-critical GET services to read replica and reduce the load on the master database. In this case, loading comments associated with a blog can be fetched from a read replica—as it can handle some delay—in case there is any issue with asynchronous reflection.

Step 3: Reduce write requests. This can be achieved by introducing queue to process the asynchronous message. Amazon Simple Queue Service (Amazon SQS) is a highly-scalable queue, which can handle any kind of work-message load. You can process data, like rating and review; or calculate Deal Quality Score (DQS) using batch processing via an SQS queue. If your workload is in AWS, I recommend using a job-observer pattern by setting up Auto Scaling to automatically increase or decrease the number of batch servers, using the number of SQS messages, with Amazon CloudWatch, as the trigger.  For on-premises workloads, you can use SQS SDK to create an Amazon SQS queue that holds messages until they’re processed by your stack. Or you can use Amazon SNS  to fan out your message processing in parallel for different purposes like adding a watermark in an image, generating a thumbnail, etc.

Step 4: Introduce a more robust caching engine. You can use Amazon Elastic Cache for Memcached or Redis to reduce write requests. Memcached and Redis have different use cases so if you can afford to lose and recover your cache from your database, use Memcached. If you are looking for more robust data persistence and complex data structure, use Redis. In AWS, these are managed services, which means AWS takes care of the workload for you and you can also deploy them in your on-premises instances or use a hybrid approach.

Step 5: Scale your server. If there are still issues, it’s time to scale your server.  For the greatest cost-effectiveness and unlimited scalability, I suggest always using horizontal scaling. However, use cases like database vertical scaling may be a better choice until you are good with sharding; or use Amazon Aurora Serverless for variable workloads. It will be wise to use Auto Scaling to manage your workload effectively for horizontal scaling. Also, to achieve that, you need to persist the session. Amazon DynamoDB can handle session persistence across instances.

If your server is on premises, consider creating a multisite architecture, which will help you achieve quick scalability as required and provide a good disaster recovery solution.  You can pick and choose individual services like Amazon Route 53, AWS CloudFormation, Amazon SQS, Amazon SNS, Amazon RDS, etc. depending on your needs.

Your multisite architecture will look like the following diagram:

In this architecture, you can run your regular workload on premises, and use your AWS workload as required for scalability and disaster recovery. Using Route 53, you can direct a precise percentage of users to an AWS workload.

If you decide to move all of your workloads to AWS, the recommended multi-AZ architecture would look like the following:

In this architecture, you are using a multi-AZ distributed workload for high availability. You can have a multi-region setup and use Route53 to distribute your workload between AWS Regions. CloudFront helps you to scale and distribute static content via an S3 bucket and DynamoDB, maintaining your application state so that Auto Scaling can apply horizontal scaling without loss of session data. At the database layer, RDS with multi-AZ standby provides high availability and read replica helps achieve scalability.

This is a high-level strategy to help you think through the scalability of your workload by using AWS even if your workload in on premises and not in the cloud…yet.

I highly recommend creating a hybrid, multisite model by placing your on-premises environment replica in the public cloud like AWS Cloud, and using Amazon Route53 DNS Service and Elastic Load Balancing to route traffic between on-premises and cloud environments. AWS now supports load balancing between AWS and on-premises environments to help you scale your cloud environment quickly, whenever required, and reduce it further by applying Amazon auto-scaling and placing a threshold on your on-premises traffic using Route 53.

AWS Glue Now Supports Scala Scripts

Post Syndicated from Mehul Shah original https://aws.amazon.com/blogs/big-data/aws-glue-now-supports-scala-scripts/

We are excited to announce AWS Glue support for running ETL (extract, transform, and load) scripts in Scala. Scala lovers can rejoice because they now have one more powerful tool in their arsenal. Scala is the native language for Apache Spark, the underlying engine that AWS Glue offers for performing data transformations.

Beyond its elegant language features, writing Scala scripts for AWS Glue has two main advantages over writing scripts in Python. First, Scala is faster for custom transformations that do a lot of heavy lifting because there is no need to shovel data between Python and Apache Spark’s Scala runtime (that is, the Java virtual machine, or JVM). You can build your own transformations or invoke functions in third-party libraries. Second, it’s simpler to call functions in external Java class libraries from Scala because Scala is designed to be Java-compatible. It compiles to the same bytecode, and its data structures don’t need to be converted.

To illustrate these benefits, we walk through an example that analyzes a recent sample of the GitHub public timeline available from the GitHub archive. This site is an archive of public requests to the GitHub service, recording more than 35 event types ranging from commits and forks to issues and comments.

This post shows how to build an example Scala script that identifies highly negative issues in the timeline. It pulls out issue events in the timeline sample, analyzes their titles using the sentiment prediction functions from the Stanford CoreNLP libraries, and surfaces the most negative issues.

Getting started

Before we start writing scripts, we use AWS Glue crawlers to get a sense of the data—its structure and characteristics. We also set up a development endpoint and attach an Apache Zeppelin notebook, so we can interactively explore the data and author the script.

Crawl the data

The dataset used in this example was downloaded from the GitHub archive website into our sample dataset bucket in Amazon S3, and copied to the following locations:


Choose the best folder by replacing <region> with the region that you’re working in, for example, us-east-1. Crawl this folder, and put the results into a database named githubarchive in the AWS Glue Data Catalog, as described in the AWS Glue Developer Guide. This folder contains 12 hours of the timeline from January 22, 2017, and is organized hierarchically (that is, partitioned) by year, month, and day.

When finished, use the AWS Glue console to navigate to the table named data in the githubarchive database. Notice that this data has eight top-level columns, which are common to each event type, and three partition columns that correspond to year, month, and day.

Choose the payload column, and you will notice that it has a complex schema—one that reflects the union of the payloads of event types that appear in the crawled data. Also note that the schema that crawlers generate is a subset of the true schema because they sample only a subset of the data.

Set up the library, development endpoint, and notebook

Next, you need to download and set up the libraries that estimate the sentiment in a snippet of text. The Stanford CoreNLP libraries contain a number of human language processing tools, including sentiment prediction.

Download the Stanford CoreNLP libraries. Unzip the .zip file, and you’ll see a directory full of jar files. For this example, the following jars are required:

  • stanford-corenlp-3.8.0.jar
  • stanford-corenlp-3.8.0-models.jar
  • ejml-0.23.jar

Upload these files to an Amazon S3 path that is accessible to AWS Glue so that it can load these libraries when needed. For this example, they are in s3://glue-sample-other/corenlp/.

Development endpoints are static Spark-based environments that can serve as the backend for data exploration. You can attach notebooks to these endpoints to interactively send commands and explore and analyze your data. These endpoints have the same configuration as that of AWS Glue’s job execution system. So, commands and scripts that work there also work the same when registered and run as jobs in AWS Glue.

To set up an endpoint and a Zeppelin notebook to work with that endpoint, follow the instructions in the AWS Glue Developer Guide. When you are creating an endpoint, be sure to specify the locations of the previously mentioned jars in the Dependent jars path as a comma-separated list. Otherwise, the libraries will not be loaded.

After you set up the notebook server, go to the Zeppelin notebook by choosing Dev Endpoints in the left navigation pane on the AWS Glue console. Choose the endpoint that you created. Next, choose the Notebook Server URL, which takes you to the Zeppelin server. Log in using the notebook user name and password that you specified when creating the notebook. Finally, create a new note to try out this example.

Each notebook is a collection of paragraphs, and each paragraph contains a sequence of commands and the output for that command. Moreover, each notebook includes a number of interpreters. If you set up the Zeppelin server using the console, the (Python-based) pyspark and (Scala-based) spark interpreters are already connected to your new development endpoint, with pyspark as the default. Therefore, throughout this example, you need to prepend %spark at the top of your paragraphs. In this example, we omit these for brevity.

Working with the data

In this section, we use AWS Glue extensions to Spark to work with the dataset. We look at the actual schema of the data and filter out the interesting event types for our analysis.

Start with some boilerplate code to import libraries that you need:


import com.amazonaws.services.glue.DynamicRecord
import com.amazonaws.services.glue.GlueContext
import com.amazonaws.services.glue.util.GlueArgParser
import com.amazonaws.services.glue.util.Job
import com.amazonaws.services.glue.util.JsonOptions
import com.amazonaws.services.glue.types._
import org.apache.spark.SparkContext

Then, create the Spark and AWS Glue contexts needed for working with the data:

@transient val spark: SparkContext = SparkContext.getOrCreate()
val glueContext: GlueContext = new GlueContext(spark)

You need the transient decorator on the SparkContext when working in Zeppelin; otherwise, you will run into a serialization error when executing commands.

Dynamic frames

This section shows how to create a dynamic frame that contains the GitHub records in the table that you crawled earlier. A dynamic frame is the basic data structure in AWS Glue scripts. It is like an Apache Spark data frame, except that it is designed and optimized for data cleaning and transformation workloads. A dynamic frame is well-suited for representing semi-structured datasets like the GitHub timeline.

A dynamic frame is a collection of dynamic records. In Spark lingo, it is an RDD (resilient distributed dataset) of DynamicRecords. A dynamic record is a self-describing record. Each record encodes its columns and types, so every record can have a schema that is unique from all others in the dynamic frame. This is convenient and often more efficient for datasets like the GitHub timeline, where payloads can vary drastically from one event type to another.

The following creates a dynamic frame, github_events, from your table:

val github_events = glueContext
                    .getCatalogSource(database = "githubarchive", tableName = "data")

The getCatalogSource() method returns a DataSource, which represents a particular table in the Data Catalog. The getDynamicFrame() method returns a dynamic frame from the source.

Recall that the crawler created a schema from only a sample of the data. You can scan the entire dataset, count the rows, and print the complete schema as follows:


The result looks like the following:

The data has 414,826 records. As before, notice that there are eight top-level columns, and three partition columns. If you scroll down, you’ll also notice that the payload is the most complex column.

Run functions and filter records

This section describes how you can create your own functions and invoke them seamlessly to filter records. Unlike filtering with Python lambdas, Scala scripts do not need to convert records from one language representation to another, thereby reducing overhead and running much faster.

Let’s create a function that picks only the IssuesEvents from the GitHub timeline. These events are generated whenever someone posts an issue for a particular repository. Each GitHub event record has a field, “type”, that indicates the kind of event it is. The issueFilter() function returns true for records that are IssuesEvents.

def issueFilter(rec: DynamicRecord): Boolean = { 
    rec.getField("type").exists(_ == "IssuesEvent") 

Note that the getField() method returns an Option[Any] type, so you first need to check that it exists before checking the type.

You pass this function to the filter transformation, which applies the function on each record and returns a dynamic frame of those records that pass.

val issue_events =  github_events.filter(issueFilter)

Now, let’s look at the size and schema of issue_events.


It’s much smaller (14,063 records), and the payload schema is less complex, reflecting only the schema for issues. Keep a few essential columns for your analysis, and drop the rest using the ApplyMapping() transform:

val issue_titles = issue_events.applyMapping(Seq(("id", "string", "id", "string"),
                                                 ("actor.login", "string", "actor", "string"), 
                                                 ("repo.name", "string", "repo", "string"),
                                                 ("payload.action", "string", "action", "string"),
                                                 ("payload.issue.title", "string", "title", "string")))

The ApplyMapping() transform is quite handy for renaming columns, casting types, and restructuring records. The preceding code snippet tells the transform to select the fields (or columns) that are enumerated in the left half of the tuples and map them to the fields and types in the right half.

Estimating sentiment using Stanford CoreNLP

To focus on the most pressing issues, you might want to isolate the records with the most negative sentiments. The Stanford CoreNLP libraries are Java-based and offer sentiment-prediction functions. Accessing these functions through Python is possible, but quite cumbersome. It requires creating Python surrogate classes and objects for those found on the Java side. Instead, with Scala support, you can use those classes and objects directly and invoke their methods. Let’s see how.

First, import the libraries needed for the analysis:

import java.util.Properties
import edu.stanford.nlp.ling.CoreAnnotations
import edu.stanford.nlp.neural.rnn.RNNCoreAnnotations
import edu.stanford.nlp.pipeline.{Annotation, StanfordCoreNLP}
import edu.stanford.nlp.sentiment.SentimentCoreAnnotations
import scala.collection.convert.wrapAll._

The Stanford CoreNLP libraries have a main driver that orchestrates all of their analysis. The driver setup is heavyweight, setting up threads and data structures that are shared across analyses. Apache Spark runs on a cluster with a main driver process and a collection of backend executor processes that do most of the heavy sifting of the data.

The Stanford CoreNLP shared objects are not serializable, so they cannot be distributed easily across a cluster. Instead, you need to initialize them once for every backend executor process that might need them. Here is how to accomplish that:

val props = new Properties()
props.setProperty("annotators", "tokenize, ssplit, parse, sentiment")
props.setProperty("parse.maxlen", "70")

object myNLP {
    lazy val coreNLP = new StanfordCoreNLP(props)

The properties tell the libraries which annotators to execute and how many words to process. The preceding code creates an object, myNLP, with a field coreNLP that is lazily evaluated. This field is initialized only when it is needed, and only once. So, when the backend executors start processing the records, each executor initializes the driver for the Stanford CoreNLP libraries only one time.

Next is a function that estimates the sentiment of a text string. It first calls Stanford CoreNLP to annotate the text. Then, it pulls out the sentences and takes the average sentiment across all the sentences. The sentiment is a double, from 0.0 as the most negative to 4.0 as the most positive.

def estimatedSentiment(text: String): Double = {
    if ((text == null) || (!text.nonEmpty)) { return Double.NaN }
    val annotations = myNLP.coreNLP.process(text)
    val sentences = annotations.get(classOf[CoreAnnotations.SentencesAnnotation])
    sentences.foldLeft(0.0)( (csum, x) => { 
        csum + RNNCoreAnnotations.getPredictedClass(x.get(classOf[SentimentCoreAnnotations.SentimentAnnotatedTree])) 
    }) / sentences.length

Now, let’s estimate the sentiment of the issue titles and add that computed field as part of the records. You can accomplish this with the map() method on dynamic frames:

val issue_sentiments = issue_titles.map((rec: DynamicRecord) => { 
    val mbody = rec.getField("title")
    mbody match {
        case Some(mval: String) => { 
            rec.addField("sentiment", ScalarNode(estimatedSentiment(mval)))
            rec }
        case _ => rec

The map() method applies the user-provided function on every record. The function takes a DynamicRecord as an argument and returns a DynamicRecord. The code above computes the sentiment, adds it in a top-level field, sentiment, to the record, and returns the record.

Count the records with sentiment and show the schema. This takes a few minutes because Spark must initialize the library and run the sentiment analysis, which can be involved.


Notice that all records were processed (14,063), and the sentiment value was added to the schema.

Finally, let’s pick out the titles that have the lowest sentiment (less than 1.5). Count them and print out a sample to see what some of the titles look like.

val pressing_issues = issue_sentiments.filter(_.getField("sentiment").exists(_.asInstanceOf[Double] < 1.5))

Next, write them all to a file so that you can handle them later. (You’ll need to replace the output path with your own.)

glueContext.getSinkWithFormat(connectionType = "s3", 
                              options = JsonOptions("""{"path": "s3://<bucket>/out/path/"}"""), 
                              format = "json")

Take a look in the output path, and you can see the output files.

Putting it all together

Now, let’s create a job from the preceding interactive session. The following script combines all the commands from earlier. It processes the GitHub archive files and writes out the highly negative issues:

import com.amazonaws.services.glue.DynamicRecord
import com.amazonaws.services.glue.GlueContext
import com.amazonaws.services.glue.util.GlueArgParser
import com.amazonaws.services.glue.util.Job
import com.amazonaws.services.glue.util.JsonOptions
import com.amazonaws.services.glue.types._
import org.apache.spark.SparkContext
import java.util.Properties
import edu.stanford.nlp.ling.CoreAnnotations
import edu.stanford.nlp.neural.rnn.RNNCoreAnnotations
import edu.stanford.nlp.pipeline.{Annotation, StanfordCoreNLP}
import edu.stanford.nlp.sentiment.SentimentCoreAnnotations
import scala.collection.convert.wrapAll._

object GlueApp {

    object myNLP {
        val props = new Properties()
        props.setProperty("annotators", "tokenize, ssplit, parse, sentiment")
        props.setProperty("parse.maxlen", "70")

        lazy val coreNLP = new StanfordCoreNLP(props)

    def estimatedSentiment(text: String): Double = {
        if ((text == null) || (!text.nonEmpty)) { return Double.NaN }
        val annotations = myNLP.coreNLP.process(text)
        val sentences = annotations.get(classOf[CoreAnnotations.SentencesAnnotation])
        sentences.foldLeft(0.0)( (csum, x) => { 
            csum + RNNCoreAnnotations.getPredictedClass(x.get(classOf[SentimentCoreAnnotations.SentimentAnnotatedTree])) 
        }) / sentences.length

    def main(sysArgs: Array[String]) {
        val spark: SparkContext = SparkContext.getOrCreate()
        val glueContext: GlueContext = new GlueContext(spark)

        val dbname = "githubarchive"
        val tblname = "data"
        val outpath = "s3://<bucket>/out/path/"

        val github_events = glueContext
                            .getCatalogSource(database = dbname, tableName = tblname)

        val issue_events =  github_events.filter((rec: DynamicRecord) => {
            rec.getField("type").exists(_ == "IssuesEvent")

        val issue_titles = issue_events.applyMapping(Seq(("id", "string", "id", "string"),
                                                         ("actor.login", "string", "actor", "string"), 
                                                         ("repo.name", "string", "repo", "string"),
                                                         ("payload.action", "string", "action", "string"),
                                                         ("payload.issue.title", "string", "title", "string")))

        val issue_sentiments = issue_titles.map((rec: DynamicRecord) => { 
            val mbody = rec.getField("title")
            mbody match {
                case Some(mval: String) => { 
                    rec.addField("sentiment", ScalarNode(estimatedSentiment(mval)))
                    rec }
                case _ => rec

        val pressing_issues = issue_sentiments.filter(_.getField("sentiment").exists(_.asInstanceOf[Double] < 1.5))

        glueContext.getSinkWithFormat(connectionType = "s3", 
                              options = JsonOptions(s"""{"path": "$outpath"}"""), 
                              format = "json")

Notice that the script is enclosed in a top-level object called GlueApp, which serves as the script’s entry point for the job. (You’ll need to replace the output path with your own.) Upload the script to an Amazon S3 location so that AWS Glue can load it when needed.

To create the job, open the AWS Glue console. Choose Jobs in the left navigation pane, and then choose Add job. Create a name for the job, and specify a role with permissions to access the data. Choose An existing script that you provide, and choose Scala as the language.

For the Scala class name, type GlueApp to indicate the script’s entry point. Specify the Amazon S3 location of the script.

Choose Script libraries and job parameters. In the Dependent jars path field, enter the Amazon S3 locations of the Stanford CoreNLP libraries from earlier as a comma-separated list (without spaces). Then choose Next.

No connections are needed for this job, so choose Next again. Review the job properties, and choose Finish. Finally, choose Run job to execute the job.

You can simply edit the script’s input table and output path to run this job on whatever GitHub timeline datasets that you might have.


In this post, we showed how to write AWS Glue ETL scripts in Scala via notebooks and how to run them as jobs. Scala has the advantage that it is the native language for the Spark runtime. With Scala, it is easier to call Scala or Java functions and third-party libraries for analyses. Moreover, data processing is faster in Scala because there’s no need to convert records from one language runtime to another.

You can find more example of Scala scripts in our GitHub examples repository: https://github.com/awslabs/aws-glue-samples. We encourage you to experiment with Scala scripts and let us know about any interesting ETL flows that you want to share.

Happy Glue-ing!


Additional Reading

If you found this post useful, be sure to check out Simplify Querying Nested JSON with the AWS Glue Relationalize Transform and Genomic Analysis with Hail on Amazon EMR and Amazon Athena.


About the Authors

Mehul Shah is a senior software manager for AWS Glue. His passion is leveraging the cloud to build smarter, more efficient, and easier to use data systems. He has three girls, and, therefore, he has no spare time.




Ben Sowell is a software development engineer at AWS Glue.




Vinay Vivili is a software development engineer for AWS Glue.




Europol Hits Huge 500,000 Subscriber Pirate IPTV Operation

Post Syndicated from Andy original https://torrentfreak.com/europol-hits-huge-500000-subscriber-pirate-iptv-operation-180111/

Live TV is in massive demand but accessing all content in a particular region can be a hugely expensive proposition, with tradtional broadcasting monopolies demanding large subscription fees.

For millions around the world, this ‘problem’ can be easily circumvented. Pirate IPTV operations, which supply thousands of otherwise subscription channels via the Internet, are on the increase. They’re accessible for just a few dollars, euros, or pounds per month, slashing bills versus official providers on a grand scale.

This week, however, police forces around Europe coordinated to target what they claim is one of the world’s largest illicit IPTV operations. The investigation was launched last February by Europol and on Tuesday coordinated actions were carried out in Cyprus, Bulgaria, Greece, and the Netherlands.

Three suspects were arrested in Cyprus – two in Limassol (aged 43 and 44) and one in Larnaca (aged 53). All are alleged to be part of an international operation to illegally broadcast around 1,200 channels of pirated content worldwide. Some of the channels offered were illegally sourced from Sky UK, Bein Sports, Sky Italia, and Sky DE

If initial reports are to be believed, the reach of the IPTV service was huge. Figures usually need to be taken with a pinch of salt but information suggests the service had more than 500,000 subscribers, each paying around 10 euros per month. (Note: how that relates to the alleged five million euros per year in revenue is yet to be made clear)

Police action was spread across the continent, with at least nine separate raids, including in the Netherlands where servers were uncovered. However, it was determined that these were in place to hide the true location of the operation’s main servers. Similar ‘front’ servers were also deployed in other regions.

The main servers behind the IPTV operation were located in Petrich, a small town in Blagoevgrad Province, southwestern Bulgaria. No details have been provided by the authorities but TF is informed that the website of a local ISP, Megabyte-Internet, from where pirate IPTV has been broadcast for at least the past several months, disappeared on Tuesday. It remains offline this morning.

The company did not respond to our request for comment and there’s no suggestion that it’s directly involved in any illegal activity. However, its Autonomous System (AS) number reveals linked IPTV services, none of which appear to be operational today. The ISP is also listed on sites where ‘pirate’ IPTV channel playlists are compiled by users.

According to sources in Cyprus, police requested permission from the Larnaca District Court to detain the arrested individuals for eight days. However, local news outlet Philenews said that any decision would be postponed until this morning, since one of the three suspects, an English Cypriot, required an interpreter which caused a delay.

In addition to prosecutors and defense lawyers, two Dutch investigators from Europol were present in court yesterday. The hearing lasted for six hours and was said to be so intensive that the court stenographer had to be replaced due to overwork.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

Combine Transactional and Analytical Data Using Amazon Aurora and Amazon Redshift

Post Syndicated from Re Alvarez-Parmar original https://aws.amazon.com/blogs/big-data/combine-transactional-and-analytical-data-using-amazon-aurora-and-amazon-redshift/

A few months ago, we published a blog post about capturing data changes in an Amazon Aurora database and sending it to Amazon Athena and Amazon QuickSight for fast analysis and visualization. In this post, I want to demonstrate how easy it can be to take the data in Aurora and combine it with data in Amazon Redshift using Amazon Redshift Spectrum.

With Amazon Redshift, you can build petabyte-scale data warehouses that unify data from a variety of internal and external sources. Because Amazon Redshift is optimized for complex queries (often involving multiple joins) across large tables, it can handle large volumes of retail, inventory, and financial data without breaking a sweat.

In this post, we describe how to combine data in Aurora in Amazon Redshift. Here’s an overview of the solution:

  • Use AWS Lambda functions with Amazon Aurora to capture data changes in a table.
  • Save data in an Amazon S3
  • Query data using Amazon Redshift Spectrum.

We use the following services:

Serverless architecture for capturing and analyzing Aurora data changes

Consider a scenario in which an e-commerce web application uses Amazon Aurora for a transactional database layer. The company has a sales table that captures every single sale, along with a few corresponding data items. This information is stored as immutable data in a table. Business users want to monitor the sales data and then analyze and visualize it.

In this example, you take the changes in data in an Aurora database table and save it in Amazon S3. After the data is captured in Amazon S3, you combine it with data in your existing Amazon Redshift cluster for analysis.

By the end of this post, you will understand how to capture data events in an Aurora table and push them out to other AWS services using AWS Lambda.

The following diagram shows the flow of data as it occurs in this tutorial:

The starting point in this architecture is a database insert operation in Amazon Aurora. When the insert statement is executed, a custom trigger calls a Lambda function and forwards the inserted data. Lambda writes the data that it received from Amazon Aurora to a Kinesis data delivery stream. Kinesis Data Firehose writes the data to an Amazon S3 bucket. Once the data is in an Amazon S3 bucket, it is queried in place using Amazon Redshift Spectrum.

Creating an Aurora database

First, create a database by following these steps in the Amazon RDS console:

  1. Sign in to the AWS Management Console, and open the Amazon RDS console.
  2. Choose Launch a DB instance, and choose Next.
  3. For Engine, choose Amazon Aurora.
  4. Choose a DB instance class. This example uses a small, since this is not a production database.
  5. In Multi-AZ deployment, choose No.
  6. Configure DB instance identifier, Master username, and Master password.
  7. Launch the DB instance.

After you create the database, use MySQL Workbench to connect to the database using the CNAME from the console. For information about connecting to an Aurora database, see Connecting to an Amazon Aurora DB Cluster.

The following screenshot shows the MySQL Workbench configuration:

Next, create a table in the database by running the following SQL statement:

Create Table
ItemID int NOT NULL,
Category varchar(255),
Price double(10,2), 
Quantity int not NULL,
OrderDate timestamp,
DestinationState varchar(2),
ShippingType varchar(255),
Referral varchar(255),

You can now populate the table with some sample data. To generate sample data in your table, copy and run the following script. Ensure that the highlighted (bold) variables are replaced with appropriate values.

import MySQLdb
import random
import datetime

db = MySQLdb.connect(host="AURORA_CNAME",

states = ("AL","AK","AZ","AR","CA","CO","CT","DE","FL","GA","HI","ID","IL","IN",

shipping_types = ("Free", "3-Day", "2-Day")

product_categories = ("Garden", "Kitchen", "Office", "Household")
referrals = ("Other", "Friend/Colleague", "Repeat Customer", "Online Ad")

for i in range(0,10):
    item_id = random.randint(1,100)
    state = states[random.randint(0,len(states)-1)]
    shipping_type = shipping_types[random.randint(0,len(shipping_types)-1)]
    product_category = product_categories[random.randint(0,len(product_categories)-1)]
    quantity = random.randint(1,4)
    referral = referrals[random.randint(0,len(referrals)-1)]
    price = random.randint(1,100)
    order_date = datetime.date(2016,random.randint(1,12),random.randint(1,30)).isoformat()

    data_order = (item_id, product_category, price, quantity, order_date, state,
    shipping_type, referral)

    add_order = ("INSERT INTO Sales "
                   "(ItemID, Category, Price, Quantity, OrderDate, DestinationState, \
                   ShippingType, Referral) "
                   "VALUES (%s, %s, %s, %s, %s, %s, %s, %s)")

    cursor = db.cursor()
    cursor.execute(add_order, data_order)



The following screenshot shows how the table appears with the sample data:

Sending data from Amazon Aurora to Amazon S3

There are two methods available to send data from Amazon Aurora to Amazon S3:

  • Using a Lambda function

To demonstrate the ease of setting up integration between multiple AWS services, we use a Lambda function to send data to Amazon S3 using Amazon Kinesis Data Firehose.

Alternatively, you can use a SELECT INTO OUTFILE S3 statement to query data from an Amazon Aurora DB cluster and save it directly in text files that are stored in an Amazon S3 bucket. However, with this method, there is a delay between the time that the database transaction occurs and the time that the data is exported to Amazon S3 because the default file size threshold is 6 GB.

Creating a Kinesis data delivery stream

The next step is to create a Kinesis data delivery stream, since it’s a dependency of the Lambda function.

To create a delivery stream:

  1. Open the Kinesis Data Firehose console
  2. Choose Create delivery stream.
  3. For Delivery stream name, type AuroraChangesToS3.
  4. For Source, choose Direct PUT.
  5. For Record transformation, choose Disabled.
  6. For Destination, choose Amazon S3.
  7. In the S3 bucket drop-down list, choose an existing bucket, or create a new one.
  8. Enter a prefix if needed, and choose Next.
  9. For Data compression, choose GZIP.
  10. In IAM role, choose either an existing role that has access to write to Amazon S3, or choose to generate one automatically. Choose Next.
  11. Review all the details on the screen, and choose Create delivery stream when you’re finished.


Creating a Lambda function

Now you can create a Lambda function that is called every time there is a change that needs to be tracked in the database table. This Lambda function passes the data to the Kinesis data delivery stream that you created earlier.

To create the Lambda function:

  1. Open the AWS Lambda console.
  2. Ensure that you are in the AWS Region where your Amazon Aurora database is located.
  3. If you have no Lambda functions yet, choose Get started now. Otherwise, choose Create function.
  4. Choose Author from scratch.
  5. Give your function a name and select Python 3.6 for Runtime
  6. Choose and existing or create a new Role, the role would need to have access to call firehose:PutRecord
  7. Choose Next on the trigger selection screen.
  8. Paste the following code in the code window. Change the stream_name variable to the Kinesis data delivery stream that you created in the previous step.
  9. Choose File -> Save in the code editor and then choose Save.
import boto3
import json

firehose = boto3.client('firehose')
stream_name = ‘AuroraChangesToS3’

def Kinesis_publish_message(event, context):
    firehose_data = (("%s,%s,%s,%s,%s,%s,%s,%s\n") %(event['ItemID'], 
    event['Category'], event['Price'], event['Quantity'],
    event['OrderDate'], event['DestinationState'], event['ShippingType'], 
    firehose_data = {'Data': str(firehose_data)}

Note the Amazon Resource Name (ARN) of this Lambda function.

Giving Aurora permissions to invoke a Lambda function

To give Amazon Aurora permissions to invoke a Lambda function, you must attach an IAM role with appropriate permissions to the cluster. For more information, see Invoking a Lambda Function from an Amazon Aurora DB Cluster.

Once you are finished, the Amazon Aurora database has access to invoke a Lambda function.

Creating a stored procedure and a trigger in Amazon Aurora

Now, go back to MySQL Workbench, and run the following command to create a new stored procedure. When this stored procedure is called, it invokes the Lambda function you created. Change the ARN in the following code to your Lambda function’s ARN.

									IN Category varchar(255), 
									IN Price double(10,2),
                                    IN Quantity int(11),
                                    IN OrderDate timestamp,
                                    IN DestinationState varchar(2),
                                    IN ShippingType varchar(255),
                                    IN Referral  varchar(255)) LANGUAGE SQL 
  CALL mysql.lambda_async('arn:aws:lambda:us-east-1:XXXXXXXXXXXXX:function:CDCFromAuroraToKinesis', 
     CONCAT('{ "ItemID" : "', ItemID, 
            '", "Category" : "', Category,
            '", "Price" : "', Price,
            '", "Quantity" : "', Quantity, 
            '", "OrderDate" : "', OrderDate, 
            '", "DestinationState" : "', DestinationState, 
            '", "ShippingType" : "', ShippingType, 
            '", "Referral" : "', Referral, '"}')

Create a trigger TR_Sales_CDC on the Sales table. When a new record is inserted, this trigger calls the CDC_TO_FIREHOSE stored procedure.

  SELECT  NEW.ItemID , NEW.Category, New.Price, New.Quantity, New.OrderDate
  , New.DestinationState, New.ShippingType, New.Referral
  INTO @ItemID , @Category, @Price, @Quantity, @OrderDate
  , @DestinationState, @ShippingType, @Referral;
  CALL  CDC_TO_FIREHOSE(@ItemID , @Category, @Price, @Quantity, @OrderDate
  , @DestinationState, @ShippingType, @Referral);

If a new row is inserted in the Sales table, the Lambda function that is mentioned in the stored procedure is invoked.

Verify that data is being sent from the Lambda function to Kinesis Data Firehose to Amazon S3 successfully. You might have to insert a few records, depending on the size of your data, before new records appear in Amazon S3. This is due to Kinesis Data Firehose buffering. To learn more about Kinesis Data Firehose buffering, see the “Amazon S3” section in Amazon Kinesis Data Firehose Data Delivery.

Every time a new record is inserted in the sales table, a stored procedure is called, and it updates data in Amazon S3.

Querying data in Amazon Redshift

In this section, you use the data you produced from Amazon Aurora and consume it as-is in Amazon Redshift. In order to allow you to process your data as-is, where it is, while taking advantage of the power and flexibility of Amazon Redshift, you use Amazon Redshift Spectrum. You can use Redshift Spectrum to run complex queries on data stored in Amazon S3, with no need for loading or other data prep.

Just create a data source and issue your queries to your Amazon Redshift cluster as usual. Behind the scenes, Redshift Spectrum scales to thousands of instances on a per-query basis, ensuring that you get fast, consistent performance even as your dataset grows to beyond an exabyte! Being able to query data that is stored in Amazon S3 means that you can scale your compute and your storage independently. You have the full power of the Amazon Redshift query model and all the reporting and business intelligence tools at your disposal. Your queries can reference any combination of data stored in Amazon Redshift tables and in Amazon S3.

Redshift Spectrum supports open, common data types, including CSV/TSV, Apache Parquet, SequenceFile, and RCFile. Files can be compressed using gzip or Snappy, with other data types and compression methods in the works.

First, create an Amazon Redshift cluster. Follow the steps in Launch a Sample Amazon Redshift Cluster.

Next, create an IAM role that has access to Amazon S3 and Athena. By default, Amazon Redshift Spectrum uses the Amazon Athena data catalog. Your cluster needs authorization to access your external data catalog in AWS Glue or Athena and your data files in Amazon S3.

In the demo setup, I attached AmazonS3FullAccess and AmazonAthenaFullAccess. In a production environment, the IAM roles should follow the standard security of granting least privilege. For more information, see IAM Policies for Amazon Redshift Spectrum.

Attach the newly created role to the Amazon Redshift cluster. For more information, see Associate the IAM Role with Your Cluster.

Next, connect to the Amazon Redshift cluster, and create an external schema and database:

create external schema if not exists spectrum_schema
from data catalog 
database 'spectrum_db' 
region 'us-east-1'
IAM_ROLE 'arn:aws:iam::XXXXXXXXXXXX:role/RedshiftSpectrumRole'
create external database if not exists;

Don’t forget to replace the IAM role in the statement.

Then create an external table within the database:

 CREATE EXTERNAL TABLE IF NOT EXISTS spectrum_schema.ecommerce_sales(
  ItemID int,
  Category varchar,
  Quantity int,
  OrderDate TIMESTAMP,
  DestinationState varchar,
  ShippingType varchar,
  Referral varchar)

Query the table, and it should contain data. This is a fact table.

select top 10 * from spectrum_schema.ecommerce_sales


Next, create a dimension table. For this example, we create a date/time dimension table. Create the table:

CREATE TABLE date_dimension (
  d_datekey           integer       not null sortkey,
  d_dayofmonth        integer       not null,
  d_monthnum          integer       not null,
  d_dayofweek                varchar(10)   not null,
  d_prettydate        date       not null,
  d_quarter           integer       not null,
  d_half              integer       not null,
  d_year              integer       not null,
  d_season            varchar(10)   not null,
  d_fiscalyear        integer       not null)
diststyle all;

Populate the table with data:

copy date_dimension from 's3://reparmar-lab/2016dates' 
iam_role 'arn:aws:iam::XXXXXXXXXXXX:role/redshiftspectrum'
dateformat 'auto';

The date dimension table should look like the following:

Querying data in local and external tables using Amazon Redshift

Now that you have the fact and dimension table populated with data, you can combine the two and run analysis. For example, if you want to query the total sales amount by weekday, you can run the following:

select sum(quantity*price) as total_sales, date_dimension.d_season
from spectrum_schema.ecommerce_sales 
join date_dimension on spectrum_schema.ecommerce_sales.orderdate = date_dimension.d_prettydate 
group by date_dimension.d_season

You get the following results:

Similarly, you can replace d_season with d_dayofweek to get sales figures by weekday:

With Amazon Redshift Spectrum, you pay only for the queries you run against the data that you actually scan. We encourage you to use file partitioning, columnar data formats, and data compression to significantly minimize the amount of data scanned in Amazon S3. This is important for data warehousing because it dramatically improves query performance and reduces cost.

Partitioning your data in Amazon S3 by date, time, or any other custom keys enables Amazon Redshift Spectrum to dynamically prune nonrelevant partitions to minimize the amount of data processed. If you store data in a columnar format, such as Parquet, Amazon Redshift Spectrum scans only the columns needed by your query, rather than processing entire rows. Similarly, if you compress your data using one of the supported compression algorithms in Amazon Redshift Spectrum, less data is scanned.

Analyzing and visualizing Amazon Redshift data in Amazon QuickSight

Modify the Amazon Redshift security group to allow an Amazon QuickSight connection. For more information, see Authorizing Connections from Amazon QuickSight to Amazon Redshift Clusters.

After modifying the Amazon Redshift security group, go to Amazon QuickSight. Create a new analysis, and choose Amazon Redshift as the data source.

Enter the database connection details, validate the connection, and create the data source.

Choose the schema to be analyzed. In this case, choose spectrum_schema, and then choose the ecommerce_sales table.

Next, we add a custom field for Total Sales = Price*Quantity. In the drop-down list for the ecommerce_sales table, choose Edit analysis data sets.

On the next screen, choose Edit.

In the data prep screen, choose New Field. Add a new calculated field Total Sales $, which is the product of the Price*Quantity fields. Then choose Create. Save and visualize it.

Next, to visualize total sales figures by month, create a graph with Total Sales on the x-axis and Order Data formatted as month on the y-axis.

After you’ve finished, you can use Amazon QuickSight to add different columns from your Amazon Redshift tables and perform different types of visualizations. You can build operational dashboards that continuously monitor your transactional and analytical data. You can publish these dashboards and share them with others.

Final notes

Amazon QuickSight can also read data in Amazon S3 directly. However, with the method demonstrated in this post, you have the option to manipulate, filter, and combine data from multiple sources or Amazon Redshift tables before visualizing it in Amazon QuickSight.

In this example, we dealt with data being inserted, but triggers can be activated in response to an INSERT, UPDATE, or DELETE trigger.

Keep the following in mind:

  • Be careful when invoking a Lambda function from triggers on tables that experience high write traffic. This would result in a large number of calls to your Lambda function. Although calls to the lambda_async procedure are asynchronous, triggers are synchronous.
  • A statement that results in a large number of trigger activations does not wait for the call to the AWS Lambda function to complete. But it does wait for the triggers to complete before returning control to the client.
  • Similarly, you must account for Amazon Kinesis Data Firehose limits. By default, Kinesis Data Firehose is limited to a maximum of 5,000 records/second. For more information, see Monitoring Amazon Kinesis Data Firehose.

In certain cases, it may be optimal to use AWS Database Migration Service (AWS DMS) to capture data changes in Aurora and use Amazon S3 as a target. For example, AWS DMS might be a good option if you don’t need to transform data from Amazon Aurora. The method used in this post gives you the flexibility to transform data from Aurora using Lambda before sending it to Amazon S3. Additionally, the architecture has the benefits of being serverless, whereas AWS DMS requires an Amazon EC2 instance for replication.

For design considerations while using Redshift Spectrum, see Using Amazon Redshift Spectrum to Query External Data.

If you have questions or suggestions, please comment below.

Additional Reading

If you found this post useful, be sure to check out Capturing Data Changes in Amazon Aurora Using AWS Lambda and 10 Best Practices for Amazon Redshift Spectrum

About the Authors

Re Alvarez-Parmar is a solutions architect for Amazon Web Services. He helps enterprises achieve success through technical guidance and thought leadership. In his spare time, he enjoys spending time with his two kids and exploring outdoors.




Why Raspberry Pi isn’t vulnerable to Spectre or Meltdown

Post Syndicated from Eben Upton original https://www.raspberrypi.org/blog/why-raspberry-pi-isnt-vulnerable-to-spectre-or-meltdown/

Over the last couple of days, there has been a lot of discussion about a pair of security vulnerabilities nicknamed Spectre and Meltdown. These affect all modern Intel processors, and (in the case of Spectre) many AMD processors and ARM cores. Spectre allows an attacker to bypass software checks to read data from arbitrary locations in the current address space; Meltdown allows an attacker to read arbitrary data from the operating system kernel’s address space (which should normally be inaccessible to user programs).

Both vulnerabilities exploit performance features (caching and speculative execution) common to many modern processors to leak data via a so-called side-channel attack. Happily, the Raspberry Pi isn’t susceptible to these vulnerabilities, because of the particular ARM cores that we use.

To help us understand why, here’s a little primer on some concepts in modern processor design. We’ll illustrate these concepts using simple programs in Python syntax like this one:

t = a+b
u = c+d
v = e+f
w = v+g
x = h+i
y = j+k

While the processor in your computer doesn’t execute Python directly, the statements here are simple enough that they roughly correspond to a single machine instruction. We’re going to gloss over some details (notably pipelining and register renaming) which are very important to processor designers, but which aren’t necessary to understand how Spectre and Meltdown work.

For a comprehensive description of processor design, and other aspects of modern computer architecture, you can’t do better than Hennessy and Patterson’s classic Computer Architecture: A Quantitative Approach.

What is a scalar processor?

The simplest sort of modern processor executes one instruction per cycle; we call this a scalar processor. Our example above will execute in six cycles on a scalar processor.

Examples of scalar processors include the Intel 486 and the ARM1176 core used in Raspberry Pi 1 and Raspberry Pi Zero.

What is a superscalar processor?

The obvious way to make a scalar processor (or indeed any processor) run faster is to increase its clock speed. However, we soon reach limits of how fast the logic gates inside the processor can be made to run; processor designers therefore quickly began to look for ways to do several things at once.

An in-order superscalar processor examines the incoming stream of instructions and tries execute more than one at once, in one of several “pipes”, subject to dependencies between the instructions. Dependencies are important: you might think that a two-way superscalar processor could just pair up (or dual-issue) the six instructions in our example like this:

t, u = a+b, c+d
v, w = e+f, v+g
x, y = h+i, j+k

But this doesn’t make sense: we have to compute v before we can compute w, so the third and fourth instructions can’t be executed at the same time. Our two-way superscalar processor won’t be able to find anything to pair with the third instruction, so our example will execute in four cycles:

t, u = a+b, c+d
v    = e+f                   # second pipe does nothing here
w, x = v+g, h+i
y    = j+k

Examples of superscalar processors include the Intel Pentium, and the ARM Cortex-A7 and Cortex-A53 cores used in Raspberry Pi 2 and Raspberry Pi 3 respectively. Raspberry Pi 3 has only a 33% higher clock speed than Raspberry Pi 2, but has roughly double the performance: the extra performance is partly a result of Cortex-A53’s ability to dual-issue a broader range of instructions than Cortex-A7.

What is an out-of-order processor?

Going back to our example, we can see that, although we have a dependency between v and w, we have other independent instructions later in the program that we could potentially have used to fill the empty pipe during the second cycle. An out-of-order superscalar processor has the ability to shuffle the order of incoming instructions (again subject to dependencies) in order to keep its pipelines busy.

An out-of-order processor might effectively swap the definitions of w and x in our example like this:

t = a+b
u = c+d
v = e+f
x = h+i
w = v+g
y = j+k

allowing it to execute in three cycles:

t, u = a+b, c+d
v, x = e+f, h+i
w, y = v+g, j+k

Examples of out-of-order processors include the Intel Pentium 2 (and most subsequent Intel and AMD x86 processors), and many recent ARM cores, including Cortex-A9, -A15, -A17, and -A57.

What is speculation?

Reordering sequential instructions is a powerful way to recover more instruction-level parallelism, but as processors become wider (able to triple- or quadruple-issue instructions) it becomes harder to keep all those pipes busy. Modern processors have therefore grown the ability to speculate. Speculative execution lets us issue instructions which might turn out not to be required (because they are branched over): this keeps a pipe busy, and if it turns out that the instruction isn’t executed, we can just throw the result away.

To demonstrate the benefits of speculation, let’s look at another example:

t = a+b
u = t+c
v = u+d
if v:
   w = e+f
   x = w+g
   y = x+h

Now we have dependencies from t to u to v, and from w to x to y, so a two-way out-of-order processor without speculation won’t ever be able to fill its second pipe. It spends three cycles computing t, u, and v, after which it knows whether the body of the if statement will execute, in which case it then spends three cycles computing w, x, and y. Assuming the if (a branch instruction) takes one cycle, our example takes either four cycles (if v turns out to be zero) or seven cycles (if v is non-zero).

Speculation effectively shuffles the program like this:

t = a+b
u = t+c
v = u+d
w_ = e+f
x_ = w_+g
y_ = x_+h
if v:
   w, x, y = w_, x_, y_

so we now have additional instruction level parallelism to keep our pipes busy:

t, w_ = a+b, e+f
u, x_ = t+c, w_+g
v, y_ = u+d, x_+h
if v:
   w, x, y = w_, x_, y_

Cycle counting becomes less well defined in speculative out-of-order processors, but the branch and conditional update of w, x, and y are (approximately) free, so our example executes in (approximately) three cycles.

What is a cache?

In the good old days*, the speed of processors was well matched with the speed of memory access. My BBC Micro, with its 2MHz 6502, could execute an instruction roughly every 2µs (microseconds), and had a memory cycle time of 0.25µs. Over the ensuing 35 years, processors have become very much faster, but memory only modestly so: a single Cortex-A53 in a Raspberry Pi 3 can execute an instruction roughly every 0.5ns (nanoseconds), but can take up to 100ns to access main memory.

At first glance, this sounds like a disaster: every time we access memory, we’ll end up waiting for 100ns to get the result back. In this case, this example:

a = mem[0]
b = mem[1]

would take 200ns.

In practice, programs tend to access memory in relatively predictable ways, exhibiting both temporal locality (if I access a location, I’m likely to access it again soon) and spatial locality (if I access a location, I’m likely to access a nearby location soon). Caching takes advantage of these properties to reduce the average cost of access to memory.

A cache is a small on-chip memory, close to the processor, which stores copies of the contents of recently used locations (and their neighbours), so that they are quickly available on subsequent accesses. With caching, the example above will execute in a little over 100ns:

a = mem[0]    # 100ns delay, copies mem[0:15] into cache
b = mem[1]    # mem[1] is in the cache

From the point of view of Spectre and Meltdown, the important point is that if you can time how long a memory access takes, you can determine whether the address you accessed was in the cache (short time) or not (long time).

What is a side channel?

From Wikipedia:

“… a side-channel attack is any attack based on information gained from the physical implementation of a cryptosystem, rather than brute force or theoretical weaknesses in the algorithms (compare cryptanalysis). For example, timing information, power consumption, electromagnetic leaks or even sound can provide an extra source of information, which can be exploited to break the system.”

Spectre and Meltdown are side-channel attacks which deduce the contents of a memory location which should not normally be accessible by using timing to observe whether another location is present in the cache.

Putting it all together

Now let’s look at how speculation and caching combine to permit the Meltdown attack. Consider the following example, which is a user program that sometimes reads from an illegal (kernel) address:

t = a+b
u = t+c
v = u+d
if v:
   w = kern_mem[address]   # if we get here crash
   x = w&0x100
   y = user_mem[x]

Now our out-of-order two-way superscalar processor shuffles the program like this:

t, w_ = a+b, kern_mem[address]
u, x_ = t+c, w_&0x100
v, y_ = u+d, user_mem[x_]

if v:
   # crash
   w, x, y = w_, x_, y_      # we never get here

Even though the processor always speculatively reads from the kernel address, it must defer the resulting fault until it knows that v was non-zero. On the face of it, this feels safe because either:

  • v is zero, so the result of the illegal read isn’t committed to w
  • v is non-zero, so the program crashes before the read is committed to w

However, suppose we flush our cache before executing the code, and arrange a, b, c, and d so that v is zero. Now, the speculative load in the third cycle:

v, y_ = u+d, user_mem[x_]

will read from either address 0x000 or address 0x100 depending on the eighth bit of the result of the illegal read. Because v is zero, the results of the speculative instructions will be discarded, and execution will continue. If we time a subsequent access to one of those addresses, we can determine which address is in the cache. Congratulations: you’ve just read a single bit from the kernel’s address space!

The real Meltdown exploit is more complex than this, but the principle is the same. Spectre uses a similar approach to subvert software array bounds checks.


Modern processors go to great lengths to preserve the abstraction that they are in-order scalar machines that access memory directly, while in fact using a host of techniques including caching, instruction reordering, and speculation to deliver much higher performance than a simple processor could hope to achieve. Meltdown and Spectre are examples of what happens when we reason about security in the context of that abstraction, and then encounter minor discrepancies between the abstraction and reality.

The lack of speculation in the ARM1176, Cortex-A7, and Cortex-A53 cores used in Raspberry Pi render us immune to attacks of the sort.

* days may not be that old, or that good

The post Why Raspberry Pi isn’t vulnerable to Spectre or Meltdown appeared first on Raspberry Pi.

Dish Network Files Two Lawsuits Against Pirate IPTV Providers

Post Syndicated from Andy original https://torrentfreak.com/dish-network-files-two-lawsuits-against-pirate-iptv-providers-180103/

In broad terms, there are two types of unauthorized online streaming of live TV. The first is via open-access websites where users can view for free. The second features premium services to which viewers are required to subscribe.

Usually available for a few dollars, euros, or pounds per month, the latter are gaining traction all around the world. Service levels are relatively high and the majority of illicit packages offer a dazzling array of programming, often putting official providers in the shade.

For this reason, commercial IPTV providers are considered a huge threat to broadcasters’ business models, since they offer a broadly comparable and accessible service at a much cheaper price. This is forcing companies such as US giant Dish Networks to court, seeking relief.

Following on from a lawsuit filed last year against Kodi add-on ZemTV and TVAddons.ag, Dish has just filed two more lawsuits targeting a pair of unauthorized pirate IPTV services.

Filed in Maryland and Texas respectively, the actions are broadly similar, with the former targeting a provider known as Spider-TV.

The suit, filed against Dima Furniture Inc. and Mohammad Yusif (individually and collectively doing business as Spider-TV), claims that the defendants are “capturing
broadcasts of television channels exclusively licensed to DISH and are unlawfully retransmitting these channels over the Internet to their customers throughout the United States, 24 hours per day, 7 days per week.”

Dish claim that the defendants profit from the scheme by selling set-top boxes along with subscriptions, charging around $199 per device loaded with 13 months of service.

Dima Furniture is a Maryland corporation, registered at Takoma Park, Maryland 20912, an address that is listed on the Spider-TV website. The connection between the defendants is further supported by FCC references which identify Spider devices in the market. Mohammad Yusif is claimed to be the president, executive director, general manager, and sole shareholder of Dima Furniture.

Dish describes itself as the fourth largest pay-television provider in the United States, delivering copyrighted programming to millions of subscribers nationwide by means of satellite delivery and over-the-top services. Dish has acquired the rights to do this, the defendants have not, the broadcaster states.

“Defendants capture live broadcast signals of the Protected Channels, transcode these signals into a format useful for streaming over the Internet, transfer the transcoded content to one or more servers provided, controlled, and maintained by Defendants, and then transmit the Protected Channels to users of the Service through
OTT delivery, including users in the United States,” the lawsuit reads.

It’s claimed that in July 2015, Yusif registered Spider-TV as a trade name of Dima Furniture with the Department of Assessments and Taxation Charter Division, describing the business as “Television Channel Installation”. Since then, the defendants have been illegally retransmitting Dish channels to customers in the United States.

The overall offer from Spider-TV appears to be considerable, with a claimed 1,300 channels from major regions including the US, Canada, UK, Europe, Middle East, and Africa.

Importantly, Dish state that the defendants know that their activities are illegal, since the provider sent at least 32 infringement notices since January 20, 2017 demanding an end to the unauthorized retransmission of its channels. It went on to send even more to the defendants’ ISPs.

“DISH and Networks sent at least thirty-three additional notices requesting the
removal of infringing content to Internet service providers associated with the Service from February 16, 2017 to the filing of this Complaint. Upon information and belief, at least some of these notices were forwarded to Defendants,” the lawsuit reads.

But while Dish says that the takedowns responded to by the ISPs were initially successful, the defendants took evasive action by transmitting the targeted channels from other locations.

Describing the defendants’ actions as “willful, malicious, intentional [and] purposeful”, Dish is suing for Direct Copyright Infringement, demanding a permanent injunction preventing the promotion and provision of the service plus statutory damages of $150,000 per registered work. The final amount isn’t specified but the numbers are potentially enormous. In addition, Dish demands attorneys’ fees, costs, and the seizure of all infringing articles.

The second lawsuit, filed in Texas, is broadly similar. It targets Mo’ Ayad Al
Zayed Trading Est., and Mo’ Ayad Fawzi Al Zayed (individually and collectively doing business as Tiger International Company), and Shenzhen Tiger Star Electronical Co., Ltd, otherwise known as Shenzhen Tiger Star.

Dish claims that these defendants also illegally capture and retransmit channels to customers in the United States. IPTV boxes costing up to $179 including one year’s service are the method of delivery.

In common with the Maryland case, Dish says it sent almost two dozen takedown notices to ISPs utilized by the defendants. These were also countered by the unauthorized service retransmitting Dish channels from other servers.

The biggest difference between the Maryland and Texas cases is that while Yusif/Spider/Dima Furniture are said to be in the US, Zayed is said to reside in Amman, Jordan, and Tiger Star is registered in Shenzhen, China. However, since the unauthorized service is targeted at customers in Texas, Dish states that the Texas court has jurisdiction.

Again, Dish is suing for Direct Infringement, demanding damages, costs, and a permanent injunction.

The complaints can be found here and here.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

12 B2 Power Tips for New Users

Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/newbie-cloud-storage-guide/

B2 Tips for Beginners
You probably know that B2 is Backblaze’s fast and economical general purpose cloud storage, but do you know everything that you can do with it?

If you’re a B2 newbie, here are some blazing power tips to help you get the most out of B2 Cloud Storage.

If you’re a B2 expert or a developer, stay tuned. We’ll be publishing power tips for you in the near future. Enter your email address using the Join button at the top of the page and you won’t miss any upcoming blog posts.
Backblaze logo

1    Drag and Drop Files to B2

Use Backblaze’s drag-and-drop web interface to store, restore, and share B2 files.

Backblaze logo

2    Share Files You Have in B2

You can designate a B2 bucket as private or public. If the bucket is public and you’d like to share a file with others, you can create and copy a Friendly URL and paste it into an email or message.

Backblaze logo

3    Use B2 Just Like Any Other Drive

Use B2 just as if it were a drive on your computer — drag and drop files and folders, save files to it — using one of a number of integrations that let you mount B2 as a volume in your Windows or Macintosh file system (Mountain Duck, ExpanDrive, odrive). Pick the files you want to save, drop them in a desktop folder, and they are automatically saved to B2.

Backblaze logo

4    Drag and Drop To and From B2 from the Desktop, Too

Use Cyberduck, a B2 integration partner, to drag-and-drop files to and from B2 right from the Windows or Macintosh desktop.

Backblaze logo

5    Determine the Speed of your Connection to B2

You can check the speed and latency of your internet connection between your location and Backblaze’s data centers, and see how much data you could theoretically transfer in a day, at https://www.backblaze.com/speedtest/.

Backblaze logo

6    No Matter What Type of Data you Have, B2 Can Handle It

You can transfer any type or amount of data to B2 from any device that can connect to the internet, including Windows, Macintosh, Linux, servers, mobile devices, external drives, and NAS.

Backblaze logo

7    Get Your Files from B2 by Mail

You have a choice of how to receive your data from B2. You can download data directly or request that your data be shipped to you via FedEx.

Backblaze logo

8    Back Up Your Backups to B2

You can automatically back up your Apple Time Machine backup or Windows backup to a NAS and then back that up to B2 to give you both local and cloud backups for a 3-2-1 backup solution.

Backblaze logo

9    Protect Your B2 Account with Two-Factor Verification

You can (and should) protect your Backblaze account with two-factor verification (such as using an app on your smartphone), and you can use backup codes and SMS verification in case you lose access to your smartphone.

Backblaze logo

10    Preview Photos Stored on B2 from the Web

Preview your photos as thumbnails (and optionally download individual photos) in common image formats (including jpg, png, img, tiff, and gif) with the B2 web interface.

Backblaze logo

11    B2 Has Group Management, Too

Backblaze Groups works for B2, too — just like Backblaze Personal Backup and Business Backup. You can manage billing, group membership, and control access using Group Management in your Backblaze account dashboard.

Backblaze logo

12    B2 Integrations Make B2 More Powerful and Useful

There are over 30+ software and hardware integrations that make B2 more powerful. You can visit our integrations page to find a solution that works for you.

Want to Learn More About B2?

You can find more information on B2 on our website and in our help pages.

The post 12 B2 Power Tips for New Users appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

A hedgehog cam or two

Post Syndicated from Helen Lynn original https://www.raspberrypi.org/blog/a-hedgehog-cam-or-two/

Here we are, hauling ourselves out of the Christmas and New Year holidays and into January proper. It’s dawning on me that I have to go back to work, even though it’s still very cold and gloomy in northern Europe, and even though my duvet is lovely and warm. I found myself envying beings that hibernate, and thinking about beings that hibernate, and searching for things to do with hedgehogs. And, well, the long and the short of it is, today’s blog post is a short meditation on the hedgehog cam.

A hedgehog in a garden, photographed in infrared light by a hedgehog cam

Success! It’s a hedgehog!
Photo by Andrew Wedgbury

Hedgehog watching

Someone called Barker has installed a Raspberry Pi–based hedgehog cam in a location with a distant view of a famous Alp, and as well as providing live views by visible and infrared light for the dedicated and the insomniac, they also make a sped-up version of the previous night’s activity available. With hedgehogs usually being in hibernation during January, you mightn’t see them in any current feed — but don’t worry! You’re guaranteed a few hedgehogs on Barker’s website, because they have also thrown in some lovely GIFs of hoggy (and foxy) divas that their camera captured in the past.

A Hedgehog eating from a bowl on a patio, captured by a hedgehog cam

Nom nom nom!
GIF by Barker’s Site

Build your own hedgehog cam

For pointers on how to replicate this kind of setup, you could do worse than turn to Andrew Wedgbury’s hedgehog cam write-up. Andrew’s Twitter feed reveals that he’s a Cambridge local, and there are hints that he was behind RealVNC’s hoggy mascot for Pi Wars 2017.

RealVNC on Twitter

Another day at the office: testing our #PiWars mascot using a @Raspberry_Pi 3, #VNC Connect and @4tronix_uk Picon Zero. Name suggestions? https://t.co/iYY3xAX9Bk

Our infrared bird box and time-lapse camera resources will also set you well on the way towards your own custom wildlife camera. For a kit that wraps everything up in a weatherproof enclosure made with love, time, and serious amounts of design and testing, take a look at Naturebytes’ wildlife cam kit.

Or, if you’re thinking that a robot mascot is more dependable than real animals for the fluffiness you need in order to start your January with something like productivity and with your soul intact, you might like to put your own spin on our robot buggy.

Happy 2018

While we’re on the subject of getting to grips with the new year, do take a look at yesterday’s blog post, in which we suggest a New Year’s project that’s different from the usual resolutions. However you tackle 2018, we wish you an excellent year of creative computing.

The post A hedgehog cam or two appeared first on Raspberry Pi.

AWS Direct Connect Update – Ten New Locations Added in Late 2017

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-direct-connect-update-ten-new-locations-added-in-late-2017/

Happy 2018! I am looking forward to getting back to my usual routine, working with our teams to learn about their upcoming launches and then writing blog posts to bring the news to you. Right now I am still catching up on a few launches and announcements from late 2017.

First on the list for today is our most recent round of new cities for AWS Direct Connect. AWS customers all over the world use Direct Connect to create dedicated network connections from their premises to AWS in order to reduce their network costs, increase throughput, and to pursue a more consistent network experience.

We added ten new locations to our Direct Connect roster in December, all of which offer both 1 Gbps and 10 Gbps connectivity, along with partner-supplied options for speeds below 1 Gbps. Here are the newest locations, along withe the data centers and associated AWS Regions:

  • Bangalore, India – NetMagic DC2Asia Pacific (Mumbai).
  • Cape Town, South Africa – Teraco Ct1EU (Ireland).
  • Johannesburg, South Africa – Teraco JB1EU (Ireland).
  • London, UK – Telehouse North TwoEU (London).
  • Miami, Florida, US – Equinix MI1US East (Northern Virginia).
  • Minneapolis, Minnesota, US – Cologix MIN3US East (Ohio)
  • Ningxia, China – Shapotou IDC – China (Ningxia).
  • Ningxia, China – Industrial Park IDC – China (Ningxia).
  • Rio de Janeiro, Brazil – Equinix RJ2South America (São Paulo).
  • Tokyo, Japan – AT Tokyo ChuoAsia Pacific (Tokyo).

You can use these new locations in conjunction with the AWS Direct Connect Gateway to set up connectivity that spans Virtual Private Clouds (VPCs) spread across multiple AWS Regions (this does not apply to the AWS Regions in China).

If you are interested in putting Direct Connect to use, be sure to check out our ever-growing list of Direct Connect Partners.


Weekly roundup: Anise’s very own video game

Post Syndicated from Eevee original https://eev.ee/dev/2018/01/01/weekly-roundup-anises-very-own-video-game/

Happy new year! 🎆

In an unprecedented move, I did one thing for an entire calendar week. I say “unprecedented” but I guess the same thing happened with fox flux. And NEON PHASE. Hmm. Sensing a pattern. See if you can guess what the one thing was!

  • anise!!: Wow! It’s Anise! The game has come so far that I can’t even believe that any of this was a recent change. I made monster AI vastly more sensible, added a boatload of mechanics, fleshed out more than half the map (and sketched out the rest), and drew and implemented most of a menu with a number of excellent goodies. Also, FINALLY (after a full year of daydreaming about it), eliminated the terrible “clock” structure I invented for collision detection, as well as cut down on a huge source of completely pointless allocations, which sped physics up in general by at least 10% and cut GC churn significantly. Hooray! And I’ve done even more just in the last day and a half. Still a good bit of work left, but this game is gonna be fantastic.

  • art: Oh right I tried drawing a picture but I didn’t like it so I stopped.

I have some writing to catch up on — I have several things 80% written, but had to stop because I was just starting to get a cold and couldn’t even tell if my own writing was sensible any more. And then I had to work on a video game about my cat. Sorry. Actually, not sorry, video games about my cat are always top priority. You knew what you were signing up for.

IPTV Provider Stops Selling New Subscriptions Under Pressure From “UK Authorities”

Post Syndicated from Andy original https://torrentfreak.com/iptv-provider-stops-selling-new-subscriptions-under-pressure-from-uk-authorities-171224/

Over the past couple of decades, piracy of live TV has broadly taken two forms. That which relies on breaking broadcaster encryption (such as card sharing and hacked set-top boxes), and the more recent developments of P2P and IPTV-style transmission.

With the former under pressure and P2P systems such as Sopcast and AceTorrent moving along in the background, streaming from servers is now the next big thing, whether that’s for free via third-party Kodi plugins or for a small fee from premium IPTV providers.

Of course, copyright holders don’t like any of this usage but with their for-profit strategy, commercial IPTV providers have a big target on their backs. More evidence of this was revealed recently when UK-based IPTV service ACE TV announced they were taking action to avoid problems in the country.

In a message to prospective and existing customers, ACE TV said that potential legal issues were behind its decision to accept no new customers while locking down its service.

“It saddens me to announce this, but due to pressure from the authorities in the UK, we are no longer selling new subscriptions. This obviously includes trials,” the announcement reads.

Noting that it would take new order for just 24 hours more, ACE TV insisted that it wasn’t shutting down but would lock down the service while closing Facebook.

TF sources and unconfirmed rumors online suggest that the Federation Against Copyright Theft and partners the Premier League are involved. However, ACE TV didn’t respond to TorrentFreak’s request for comment so we’re unable to confirm or deny the allegations.

That being said, even if the threats came directly from the police, it’s likely that the approach would’ve been initially prompted by companies connected to FACT, since the anti-piracy outfit often puts forward names of services for investigation on behalf of its partners.

Perhaps surprisingly, ACE TV is legally incorporated in the UK as Ace Hosting Limited, a fact it makes clear on its website. While easy to find, the company’s registered address is shared by dozens of other companies, indicating a mail forwarding operation rather than a place servers or staff can be found.

This proxy location may well be the reason the company feels emboldened to carry on some level of service rather than shutting down completely, but its legal basis for doing so is interesting at best, precarious at worst.

“This website, any content contained herein and any contract brought into being as a result of usage of this website are governed by and construed in accordance with English Law,” ACE TV’s website reads.

“The parties to any such contract agree to submit to the exclusive jurisdiction of the courts of England and Wales. All contracts are concluded in English.”

It seems likely that ACE TV has been threatened under UK law, since that’s where it’s incorporated. That would seem to explain why its concerned about UK authorities and their potential effect on the business. On the other hand, however, the service claims to operate entirely legally, but under the laws of the United States. It even has a repeat infringer policy.

“Ace Hosting operates as an intermediary to cache and deliver content hosted by others at the instruction of our subscribers. We cannot remove content hosted by others,” the company says.

“As an intermediary, we are entitled to rely upon (among other things) the DMCA safe harbor available to system caching service providers and we maintain policies and procedures to terminate subscribers that would be considered repeat infringers under the DMCA.”

Whether the notices on the site have been advised by a legal professional or are there to present an air of authenticity is unclear but it’s precarious for a service of this nature to rely solely on conduit status in order to avoid liability.

Marketing, prior conduct, and overall intent play a major role in such cases and when all of that is aired in the cold light of day, the situation can look very different to a judge, particularly in the UK, where no similar cases have been successfully defended to date.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

Set Up a Continuous Delivery Pipeline for Containers Using AWS CodePipeline and Amazon ECS

Post Syndicated from Nathan Taber original https://aws.amazon.com/blogs/compute/set-up-a-continuous-delivery-pipeline-for-containers-using-aws-codepipeline-and-amazon-ecs/

This post contributed by Abby FullerAWS Senior Technical Evangelist

Last week, AWS announced support for Amazon Elastic Container Service (ECS) targets (including AWS Fargate) in AWS CodePipeline. This support makes it easier to create a continuous delivery pipeline for container-based applications and microservices.

Building and deploying containerized services manually is slow and prone to errors. Continuous delivery with automated build and test mechanisms helps detect errors early, saves time, and reduces failures, making this a popular model for application deployments. Previously, to automate your container workflows with ECS, you had to build your own solution using AWS CloudFormation. Now, you can integrate CodePipeline and CodeBuild with ECS to automate your workflows in just a few steps.

A typical continuous delivery workflow with CodePipeline, CodeBuild, and ECS might look something like the following:

  • Choosing your source
  • Building your project
  • Deploying your code

We also have a continuous deployment reference architecture on GitHub for this workflow.

Getting Started

First, create a new project with CodePipeline and give the project a name, such as “demo”.

Next, choose a source location where the code is stored. This could be AWS CodeCommit, GitHub, or Amazon S3. For this example, enter GitHub and then give CodePipeline access to the repository.

Next, add a build step. You can import an existing build, such as a Jenkins server URL or CodeBuild project, or create a new step with CodeBuild. If you don’t have an existing build project in CodeBuild, create one from within CodePipeline:

  • Build provider: AWS CodeBuild
  • Configure your project: Create a new build project
  • Environment image: Use an image managed by AWS CodeBuild
  • Operating system: Ubuntu
  • Runtime: Docker
  • Version: aws/codebuild/docker:1.12.1
  • Build specification: Use the buildspec.yml in the source code root directory

Now that you’ve created the CodeBuild step, you can use it as an existing project in CodePipeline.

Next, add a deployment provider. This is where your built code is placed. It can be a number of different options, such as AWS CodeDeploy, AWS Elastic Beanstalk, AWS CloudFormation, or Amazon ECS. For this example, connect to Amazon ECS.

For CodeBuild to deploy to ECS, you must create an image definition JSON file. This requires adding some instructions to the pre-build, build, and post-build phases of the CodeBuild build process in your buildspec.yml file. For help with creating the image definition file, see Step 1 of the Tutorial: Continuous Deployment with AWS CodePipeline.

  • Deployment provider: Amazon ECS
  • Cluster name: enter your project name from the build step
  • Service name: web
  • Image filename: enter your image definition filename (“web.json”).

You are almost done!

You can now choose an existing IAM service role that CodePipeline can use to access resources in your account, or let CodePipeline create one. For this example, use the wizard, and go with the role that it creates (AWS-CodePipeline-Service).

Finally, review all of your changes, and choose Create pipeline.

After the pipeline is created, you’ll have a model of your entire pipeline where you can view your executions, add different tests, add manual approvals, or release a change.

You can learn more in the AWS CodePipeline User Guide.

Happy automating!

Power Tips for Backblaze Backup

Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/data-backup-tips/

Backup Power Tips

2017 has been a busy year for Backblaze. We’ve reached a total of over 400 petabytes of data stored for our customers — that’s a lot!, released a major upgrade to our backup product — Backblaze Cloud Backup 5.0, added Groups to our consumer and business backup products, further enhanced account security, and welcomed a whole lot of new customers to Backblaze.

For all of our new users (and maybe some of you more experienced ones, too), we’d like to share some power tips that will help you get the most out of Backblaze Backup for home and business.

Blazing Power Tips for Backblaze Backup

Back Up All of Your Valuable Data

Backblaze logo

Include Directly-Attached External Drives in Your Backup

Backblaze can back up external drives attached via USB, Thunderbolt, or Firewire.

Backblaze logo

Back Up Virtual Machines Installed on Your Computer

Virtual machines, such as those created by Parallels, VMware Fusion, VirtualBox, Hyper-V, or other programs, can be backed up with Backblaze.

Backblaze logo

You Can Back Up Your Mobile Phone to Backblaze

Gain extra peace-of-mind by backing up your iPhone or Android phone to your computer and including that in your computer backup.

Backblaze logo

Bring on Your Big Files

By default, Backblaze has no restrictions on the size of the files you are backing up, even that large high school reunion video you want to be sure to keep.

Backblaze logo

Rescan Your Hard Drive to Check for Changes

Backblaze works quietly and continuously in the background to keep you backed up, but you can ask Backblaze to immediately check whether anything needs backing up by holding down the Alt key and clicking on the Restore Options button in the Backblaze client.

Manage and Restore Your Backed Up Files

Backblaze logo

You Can Share Files You’ve Backed Up

You can share files with anyone directly from your Backblaze account.

Backblaze logo

Select and Restore Individual Files

You can restore a single file without zipping it using the Backblaze web interface.

Backblaze logo

Receive Your Restores from Backblaze by Mail

You have a choice of how to receive your data from Backblaze. You can download individual files, download a ZIP of the files you choose, or request that your data be shipped to you anywhere in the world via FedEx.

Backblaze logo

Put Your Account on Hold for Six Months

As long as your account is current, all the data you’ve backed up is maintained for up to six months if you’re traveling or not using your computer and don’t connect to our servers. (For active accounts, data is maintained up to 30 days.)

Backblaze logo

Groups Make Managing Business or Family Members Easy

For businesses, families, or organizations, our Groups feature makes it easy to manage billing, group membership, and individual user access to files and accounts — all at no incremental charge.

Backblaze logo

You Can Browse and Restore Previous Versions of a File

Visit the View/Restore Files page to go back in time to earlier or deleted versions of your files.

Backblaze logo

Mass Deploy Backblaze Remotely to Many Computers

Companies, organizations, schools, non-profits, and others can deploy Backblaze computer backup remotely across all their computers without any end-user interaction.

Backblaze logo

Move Your Account and Preserve Backups on a New or Restored Computer

You can move your Backblaze account to a new or restored computer with the same data — and preserve the backups you have already completed — using the Inherit Backup State feature.

Backblaze logo

Reinstall Backblaze under a Different Account

Backblaze remembers the account information when it is uninstalled and reinstalled. To install Backblaze under a different account, hold down the ALT key and click the Install Now button.

Keep Your Data Secure

Backblaze logo

Protect Your Account with Two-Factor Verification

You can (and should) protect your Backblaze account with two-factor verification. You can use backup codes and SMS verification in case you lose access to your smartphone and the authentication app. Sign in to your account to set that up.

Backblaze logo

Add Additional Security to Your Data

All transmissions of your data between your system and our servers is encrypted. For extra account security, you can add an optional private encryption key (PEK) to the data on our servers. Just be sure to remember your encryption key because it’s required to restore your data.

Get the Best Data Transfer Speeds

Backblaze logo

How Fast is your Connection to Backblaze?

You can check the speed and latency of your internet connection between your location and Backblaze’s data centers at https://www.backblaze.com/speedtest/.

Backblaze logo

Fine-Tune Your Upload Speed with Multiple Threads

Our auto-threading feature adjusts Backblaze’s CPU usage to give you the best upload speeds, but for those of you who like to tinker, the Backblaze client on Windows and Macintosh lets you fine-tune the number of threads our client is using to upload your files to our data centers.

Backblaze logo

Use the Backblaze Downloader To Get Your Restores Faster

If you are downloading a large ZIP restore, we recommend that you use the Backblaze Downloader application for Macintosh or Windows for maximum speed.

Want to Learn More About Backblaze Backup?

You can find more information on Backblaze Backup (including a free trial) on our website, and more tips about backing up in our help pages and in our Backup Guide.

Do you have a friend who should be backing up, but doesn’t? Why not give the gift of Backblaze?

The post Power Tips for Backblaze Backup appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

MQTT 5: Is it time to upgrade to MQTT 5 yet?

Post Syndicated from The HiveMQ Team original https://www.hivemq.com/blog/mqtt-5-time-to-upgrade-yet/

MQTT 5 - Is it time to upgrade yet?

Is it time to upgrade to MQTT 5 yet?

Welcome to this week’s blog post! After last week’s Introduction to MQTT 5, many readers wondered when the successor to MQTT 3.1.1 is ready for prime time and can be used in future and existing projects.

Before we try to answer the question in more detail, we’d love to hear your thoughts about upgrading to MQTT 5. We prepared a small survey below. Let us know how your MQTT 5 upgrading plans are!

The MQTT 5 OASIS Standard

As of late December 2017, the MQTT 5 specification is not available as an official “Committee Specification” yet. In other words: MQTT 5 is not available yet officially. The foundation for every implementation of the standard is, that the Technical Committee at OASIS officially releases the standard.

The good news: Although no official version of the standard is available yet, fundamental changes to the current state of the specification are not expected.
The Public Review phase of the “Committee Specification Draft 2” finished without any major comments or issues. We at HiveMQ expect the MQTT 5 standard to be released in very late December 2017 or January 2018.

Current state of client libraries

To start using MQTT 5, you need two participants: An MQTT 5 client library implementation in your programming language(s) of choice and an MQTT 5 broker implementation (like HiveMQ). If both components support the new standard, you are good to go and can use the new version in your projects.

When it comes to MQTT libraries, Eclipse Paho is the one-stop shop for MQTT clients in most programming languages. A recent Paho mailing list entry stated that Paho plans to release MQTT 5 client libraries end of June 2018 for the following programming languages:

  • C (+ embedded C)
  • Java
  • Go
  • C++

If you’re feeling adventurous, at least the Java Paho client has preliminary MQTT 5 support available. You can play around with the API and get a feel about the upcoming Paho version. Just build the library from source and test it, but be aware that this is not safe for production use.

There is also a very basic test broker implementation available at Eclipse Paho which can be used for playing around. This is of course only for very basic tests and does not support all MQTT 5 features yet. If you’re planning to write your own library, this may be a good tool to test your implementation against.

There are of other individual MQTT library projects which may be worth to check out. As of December 2017 most of these libraries don’t have an MQTT 5 roadmap published yet.

HiveMQ and MQTT 5

You can’t use the new version of the MQTT protocol only by having a client that is ready for MQTT 5. The counterpart, the MQTT broker, also needs to fully support the new protocol version. At the time of writing, no broker is MQTT 5 ready yet.

HiveMQ was the first broker to fully support version 3.1.1 of MQTT and of course here at HiveMQ we are committed to give our customers the advantage of the new features of version 5 of the protocol as soon as possible and viable.

We are going to provide an Early Access version of the upcoming HiveMQ generation with MQTT 5 support by May/June 2018. If you’re a library developer or want to go live with the new protocol version as soon as possible: The Early Access version is for you. Add yourself to the Early Access Notification List and we’ll notify you when the Early Access version is available.

We expect to release the upcoming HiveMQ generation in the third quarter of 2018 with full support of ALL MQTT 5 features at scale in an interoperable way with previous MQTT versions.

When is MQTT 5 ready for prime time?

MQTT is typically used in mission critical environments where it’s not acceptable that parts of the infrastructure, broker or client, are unreliable or have some rough edges. So it’s typically not advisable to be the very first to try out new things in a critical production environment.

Here at HiveMQ we expect that the first users will go live to production in late Q3 (September) 2018 and in the subsequent months. After the releases of the Paho library in June and the HiveMQ Early Access version, the adoption of MQTT 5 is expected to increase rapidly.

So, is MQTT 5 ready for prime time yet (as of December 2017)? No.

Will the new version of the protocol be suitable for production environments in the second half of 2018: Yes, definitely.

Upcoming topics in this series

We will continue this blog post series in January after the European Christmas Holidays. To kick-off the technical part of the series, we will take a look at the foundational changes of the MQTT protocol. And after that, we will release one blog post per week that will thoroughly review and inspect one new feature in detail together with best practices and fun trivia.

If you want us to send the next and all upcoming articles directly into your inbox, just use the newsletter subscribe form below.

P.S. Don’t forget to let us know if MQTT 5 is of interest for you by participating in this quick poll.

Have an awesome week,
The HiveMQ Team

VPN Server Seized to Investigate Russian Ambassador’s Assassination

Post Syndicated from Ernesto original https://torrentfreak.com/vpn-server-seized-to-investigate-russian-ambassadors-assassination-1171219/

VPNs are valuable tools for people who want to use the Internet securely and maintain their anonymity. They are vital for whistleblowers and people who rebel against Government oppression.

As with any online service, they can also be used for criminal purposes. According to Turkish news sources, this is also what happened following the assassination of Andrei Karlov, the Russian Ambassador to Turkey, exactly one year ago.

Karlov was shot dead in Ankara by Mevlüt Mert Altıntaş, an off-duty Turkish police officer. While that much is clear, the investigation into the assassination is not closed yet.

When the authorities tried to find links to other people that may have been involved, they found out that the policeman’s Gmail and Facebook had been deleted. This happened remotely over a VPN connection, operated by ExpressVPN.

To find out more, the authorities raided the datacenter and seized the server through which the connection went. This all happened last January, but the information just came out today.

Like many other VPN services nowadays, ExpressVPN doesn’t store any logs, and this is what the investigators soon found out as well. An inspection of the server in question yielded no useful information.

Following the seizure, an investigator also reached out to ExpressVPN directly, asking for logs. The VPN provider is incorporated in the British Virgin Islands and only responds to local court orders, but the investigator was informed that they don’t store connection or activity logs.

“As we stated to Turkish authorities in January 2017, ExpressVPN does not and has never possessed any customer connection logs that would enable us to know which customer was using the specific IPs cited by the investigators,” ExpressVPN writes in a statement.

“Furthermore, we were unable to see which customers accessed Gmail or Facebook during the time in question, as we do not keep activity logs. We believe that the investigators’ seizure and inspection of the VPN server in question confirmed these points.”

Speaking with TorrentFreak, the VPN provider mentions that they’ve had physical server seizures in the past, but generally not more than a few times per year.

These seizures are not announced in public, but the company stresses that user anonymity is their highest priority.

“While we don’t have a policy of announcing such incidents, we’ve designed our technology to ensure that VPN servers do not possess logs which would enable a third party to determine sensitive information about our users, such as their VPN activity or connections.

“A physical server seizure is therefore highly unlikely to provide relevant information to someone trying to determine data about specific usage,” ExpressVPN tells us.

Incidents like these show that decent VPNs do what they’re set out to. They safeguard the privacy of users which, like the Internet in general, can be used for good and bad.

It also highlights the importance of the server location. When servers are operated by third-party companies in foreign jurisdictions, they can be easily targeted, or perhaps even worse, monitored.

ExpressVPN told TorrentFreak that after the seizure incident in Turkey, the company decided to no longer use physical servers in Turkey. Instead, they provide a virtual location with Turkey-registered IP addresses pointing to VPN servers hosted in the Netherlands.

The VPN provider regrets that its services were used for unlawful purposes but says that its policies will remain the same.

“While it’s unfortunate that security tools like VPNs can be abused for illicit purposes, they are critical for our safety and the preservation of our right to privacy online. ExpressVPN is fundamentally opposed to any efforts to install ‘backdoors’ or attempts by governments to otherwise undermine such technologies,” the company concludes.

Disclosure: ExpressVPN is a TorrentFreak sponsor

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons