Tag Archives: React

Cabinet of Secret Documents from Australia

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2018/02/cabinet_of_secr.html

This story of leaked Australian government secrets is unlike any other I’ve heard:

It begins at a second-hand shop in Canberra, where ex-government furniture is sold off cheaply.

The deals can be even cheaper when the items in question are two heavy filing cabinets to which no-one can find the keys.

They were purchased for small change and sat unopened for some months until the locks were attacked with a drill.

Inside was the trove of documents now known as The Cabinet Files.

The thousands of pages reveal the inner workings of five separate governments and span nearly a decade.

Nearly all the files are classified, some as “top secret” or “AUSTEO”, which means they are to be seen by Australian eyes only.

Yes, that really happened. The person who bought and opened the file cabinets contacted the Australian Broadcasting Corp, who is now publishing a bunch of it.

There’s lots of interesting (and embarassing) stuff in the documents, although most of it is local politics. I am more interested in the government’s reaction to the incident: they’re pushing for a law making it illegal for the press to publish government secrets it received through unofficial channels.

“The one thing I would point out about the legislation that does concern me particularly is that classified information is an element of the offence,” he said.

“That is to say, if you’ve got a filing cabinet that is full of classified information … that means all the Crown has to prove if they’re prosecuting you is that it is classified ­ nothing else.

“They don’t have to prove that you knew it was classified, so knowledge is beside the point.”

[…]

Many groups have raised concerns, including media organisations who say they unfairly target journalists trying to do their job.

But really anyone could be prosecuted just for possessing classified information, regardless of whether they know about it.

That might include, for instance, if you stumbled across a folder of secret files in a regular skip bin while walking home and handed it over to a journalist.

This illustrates a fundamental misunderstanding of the threat. The Australian Broadcasting Corp gets their funding from the government, and was very restrained in what they published. They waited months before publishing as they coordinated with the Australian government. They allowed the government to secure the files, and then returned them. From the government’s perspective, they were the best possible media outlet to receive this information. If the government makes it illegal for the Australian press to publish this sort of material, the next time it will be sent to the BBC, the Guardian, the New York Times, or Wikileaks. And since people no longer read their news from newspapers sold in stores but on the Internet, the result will be just as many people reading the stories with far fewer redactions.

The proposed law is older than this leak, but the leak is giving it new life. The Australian opposition party is being cagey on whether they will support the law. They don’t want to appear weak on national security, so I’m not optimistic.

EDITED TO ADD (2/8): The Australian government backed down on that new security law.

EDITED TO ADD (2/13): Excellent political cartoon.

Reactive Microservices Architecture on AWS

Post Syndicated from Sascha Moellering original https://aws.amazon.com/blogs/architecture/reactive-microservices-architecture-on-aws/

Microservice-application requirements have changed dramatically in recent years. These days, applications operate with petabytes of data, need almost 100% uptime, and end users expect sub-second response times. Typical N-tier applications can’t deliver on these requirements.

Reactive Manifesto, published in 2014, describes the essential characteristics of reactive systems including: responsiveness, resiliency, elasticity, and being message driven.

Being message driven is perhaps the most important characteristic of reactive systems. Asynchronous messaging helps in the design of loosely coupled systems, which is a key factor for scalability. In order to build a highly decoupled system, it is important to isolate services from each other. As already described, isolation is an important aspect of the microservices pattern. Indeed, reactive systems and microservices are a natural fit.

Implemented Use Case
This reference architecture illustrates a typical ad-tracking implementation.

Many ad-tracking companies collect massive amounts of data in near-real-time. In many cases, these workloads are very spiky and heavily depend on the success of the ad-tech companies’ customers. Typically, an ad-tracking-data use case can be separated into a real-time part and a non-real-time part. In the real-time part, it is important to collect data as fast as possible and ask several questions including:,  “Is this a valid combination of parameters?,””Does this program exist?,” “Is this program still valid?”

Because response time has a huge impact on conversion rate in advertising, it is important for advertisers to respond as fast as possible. This information should be kept in memory to reduce communication overhead with the caching infrastructure. The tracking application itself should be as lightweight and scalable as possible. For example, the application shouldn’t have any shared mutable state and it should use reactive paradigms. In our implementation, one main application is responsible for this real-time part. It collects and validates data, responds to the client as fast as possible, and asynchronously sends events to backend systems.

The non-real-time part of the application consumes the generated events and persists them in a NoSQL database. In a typical tracking implementation, clicks, cookie information, and transactions are matched asynchronously and persisted in a data store. The matching part is not implemented in this reference architecture. Many ad-tech architectures use frameworks like Hadoop for the matching implementation.

The system can be logically divided into the data collection partand the core data updatepart. The data collection part is responsible for collecting, validating, and persisting the data. In the core data update part, the data that is used for validation gets updated and all subscribers are notified of new data.

Components and Services

Main Application
The main application is implemented using Java 8 and uses Vert.x as the main framework. Vert.x is an event-driven, reactive, non-blocking, polyglot framework to implement microservices. It runs on the Java virtual machine (JVM) by using the low-level IO library Netty. You can write applications in Java, JavaScript, Groovy, Ruby, Kotlin, Scala, and Ceylon. The framework offers a simple and scalable actor-like concurrency model. Vert.x calls handlers by using a thread known as an event loop. To use this model, you have to write code known as “verticles.” Verticles share certain similarities with actors in the actor model. To use them, you have to implement the verticle interface. Verticles communicate with each other by generating messages in  a single event bus. Those messages are sent on the event bus to a specific address, and verticles can register to this address by using handlers.

With only a few exceptions, none of the APIs in Vert.x block the calling thread. Similar to Node.js, Vert.x uses the reactor pattern. However, in contrast to Node.js, Vert.x uses several event loops. Unfortunately, not all APIs in the Java ecosystem are written asynchronously, for example, the JDBC API. Vert.x offers a possibility to run this, blocking APIs without blocking the event loop. These special verticles are called worker verticles. You don’t execute worker verticles by using the standard Vert.x event loops, but by using a dedicated thread from a worker pool. This way, the worker verticles don’t block the event loop.

Our application consists of five different verticles covering different aspects of the business logic. The main entry point for our application is the HttpVerticle, which exposes an HTTP-endpoint to consume HTTP-requests and for proper health checking. Data from HTTP requests such as parameters and user-agent information are collected and transformed into a JSON message. In order to validate the input data (to ensure that the program exists and is still valid), the message is sent to the CacheVerticle.

This verticle implements an LRU-cache with a TTL of 10 minutes and a capacity of 100,000 entries. Instead of adding additional functionality to a standard JDK map implementation, we use Google Guava, which has all the features we need. If the data is not in the L1 cache, the message is sent to the RedisVerticle. This verticle is responsible for data residing in Amazon ElastiCache and uses the Vert.x-redis-client to read data from Redis. In our example, Redis is the central data store. However, in a typical production implementation, Redis would just be the L2 cache with a central data store like Amazon DynamoDB. One of the most important paradigms of a reactive system is to switch from a pull- to a push-based model. To achieve this and reduce network overhead, we’ll use Redis pub/sub to push core data changes to our main application.

Vert.x also supports direct Redis pub/sub-integration, the following code shows our subscriber-implementation:

vertx.eventBus().<JsonObject>consumer(REDIS_PUBSUB_CHANNEL_VERTX, received -> {

JsonObject value = received.body().getJsonObject("value");

String message = value.getString("message");

JsonObject jsonObject = new JsonObject(message);

eb.send(CACHE_REDIS_EVENTBUS_ADDRESS, jsonObject);

});

redis.subscribe(Constants.REDIS_PUBSUB_CHANNEL, res -> {

if (res.succeeded()) {

LOGGER.info("Subscribed to " + Constants.REDIS_PUBSUB_CHANNEL);

} else {

LOGGER.info(res.cause());

}

});

The verticle subscribes to the appropriate Redis pub/sub-channel. If a message is sent over this channel, the payload is extracted and forwarded to the cache-verticle that stores the data in the L1-cache. After storing and enriching data, a response is sent back to the HttpVerticle, which responds to the HTTP request that initially hit this verticle. In addition, the message is converted to ByteBuffer, wrapped in protocol buffers, and send to an Amazon Kinesis Data Stream.

The following example shows a stripped-down version of the KinesisVerticle:

public class KinesisVerticle extends AbstractVerticle {

private static final Logger LOGGER = LoggerFactory.getLogger(KinesisVerticle.class);

private AmazonKinesisAsync kinesisAsyncClient;

private String eventStream = "EventStream";

@Override

public void start() throws Exception {

EventBus eb = vertx.eventBus();

kinesisAsyncClient = createClient();

eventStream = System.getenv(STREAM_NAME) == null ? "EventStream" : System.getenv(STREAM_NAME);

eb.consumer(Constants.KINESIS_EVENTBUS_ADDRESS, message -> {

try {

TrackingMessage trackingMessage = Json.decodeValue((String)message.body(), TrackingMessage.class);

String partitionKey = trackingMessage.getMessageId();

byte [] byteMessage = createMessage(trackingMessage);

ByteBuffer buf = ByteBuffer.wrap(byteMessage);

sendMessageToKinesis(buf, partitionKey);

message.reply("OK");

}

catch (KinesisException exc) {

LOGGER.error(exc);

}

});

}

Kinesis Consumer
This AWS Lambda function consumes data from an Amazon Kinesis Data Stream and persists the data in an Amazon DynamoDB table. In order to improve testability, the invocation code is separated from the business logic. The invocation code is implemented in the class KinesisConsumerHandler and iterates over the Kinesis events pulled from the Kinesis stream by AWS Lambda. Each Kinesis event is unwrapped and transformed from ByteBuffer to protocol buffers and converted into a Java object. Those Java objects are passed to the business logic, which persists the data in a DynamoDB table. In order to improve duration of successive Lambda calls, the DynamoDB-client is instantiated lazily and reused if possible.

Redis Updater
From time to time, it is necessary to update core data in Redis. A very efficient implementation for this requirement is using AWS Lambda and Amazon Kinesis. New core data is sent over the AWS Kinesis stream using JSON as data format and consumed by a Lambda function. This function iterates over the Kinesis events pulled from the Kinesis stream by AWS Lambda. Each Kinesis event is unwrapped and transformed from ByteBuffer to String and converted into a Java object. The Java object is passed to the business logic and stored in Redis. In addition, the new core data is also sent to the main application using Redis pub/sub in order to reduce network overhead and converting from a pull- to a push-based model.

The following example shows the source code to store data in Redis and notify all subscribers:

public void updateRedisData(final TrackingMessage trackingMessage, final Jedis jedis, final LambdaLogger logger) {

try {

ObjectMapper mapper = new ObjectMapper();

String jsonString = mapper.writeValueAsString(trackingMessage);

Map<String, String> map = marshal(jsonString);

String statusCode = jedis.hmset(trackingMessage.getProgramId(), map);

}

catch (Exception exc) {

if (null == logger)

exc.printStackTrace();

else

logger.log(exc.getMessage());

}

}

public void notifySubscribers(final TrackingMessage trackingMessage, final Jedis jedis, final LambdaLogger logger) {

try {

ObjectMapper mapper = new ObjectMapper();

String jsonString = mapper.writeValueAsString(trackingMessage);

jedis.publish(Constants.REDIS_PUBSUB_CHANNEL, jsonString);

}

catch (final IOException e) {

log(e.getMessage(), logger);

}

}

Similarly to our Kinesis Consumer, the Redis-client is instantiated somewhat lazily.

Infrastructure as Code
As already outlined, latency and response time are a very critical part of any ad-tracking solution because response time has a huge impact on conversion rate. In order to reduce latency for customers world-wide, it is common practice to roll out the infrastructure in different AWS Regions in the world to be as close to the end customer as possible. AWS CloudFormation can help you model and set up your AWS resources so that you can spend less time managing those resources and more time focusing on your applications that run in AWS.

You create a template that describes all the AWS resources that you want (for example, Amazon EC2 instances or Amazon RDS DB instances), and AWS CloudFormation takes care of provisioning and configuring those resources for you. Our reference architecture can be rolled out in different Regions using an AWS CloudFormation template, which sets up the complete infrastructure (for example, Amazon Virtual Private Cloud (Amazon VPC), Amazon Elastic Container Service (Amazon ECS) cluster, Lambda functions, DynamoDB table, Amazon ElastiCache cluster, etc.).

Conclusion
In this blog post we described reactive principles and an example architecture with a common use case. We leveraged the capabilities of different frameworks in combination with several AWS services in order to implement reactive principles—not only at the application-level but also at the system-level. I hope I’ve given you ideas for creating your own reactive applications and systems on AWS.

About the Author

Sascha Moellering is a Senior Solution Architect. Sascha is primarily interested in automation, infrastructure as code, distributed computing, containers and JVM. He can be reached at [email protected]

 

 

Progressing from tech to leadership

Post Syndicated from Michal Zalewski original http://lcamtuf.blogspot.com/2018/02/on-leadership.html

I’ve been a technical person all my life. I started doing vulnerability research in the late 1990s – and even today, when I’m not fiddling with CNC-machined robots or making furniture, I’m probably clobbering together a fuzzer or writing a book about browser protocols and APIs. In other words, I’m a geek at heart.

My career is a different story. Over the past two decades and a change, I went from writing CGI scripts and setting up WAN routers for a chain of shopping malls, to doing pentests for institutional customers, to designing a series of network monitoring platforms and handling incident response for a big telco, to building and running the product security org for one of the largest companies in the world. It’s been an interesting ride – and now that I’m on the hook for the well-being of about 100 folks across more than a dozen subteams around the world, I’ve been thinking a bit about the lessons learned along the way.

Of course, I’m a bit hesitant to write such a post: sometimes, your efforts pan out not because of your approach, but despite it – and it’s possible to draw precisely the wrong conclusions from such anecdotes. Still, I’m very proud of the culture we’ve created and the caliber of folks working on our team. It happened through the work of quite a few talented tech leads and managers even before my time, but it did not happen by accident – so I figured that my observations may be useful for some, as long as they are taken with a grain of salt.

But first, let me start on a somewhat somber note: what nobody tells you is that one’s level on the leadership ladder tends to be inversely correlated with several measures of happiness. The reason is fairly simple: as you get more senior, a growing number of people will come to you expecting you to solve increasingly fuzzy and challenging problems – and you will no longer be patted on the back for doing so. This should not scare you away from such opportunities, but it definitely calls for a particular mindset: your motivation must come from within. Look beyond the fight-of-the-day; find satisfaction in seeing how far your teams have come over the years.

With that out of the way, here’s a collection of notes, loosely organized into three major themes.

The curse of a techie leader

Perhaps the most interesting observation I have is that for a person coming from a technical background, building a healthy team is first and foremost about the subtle art of letting go.

There is a natural urge to stay involved in any project you’ve started or helped improve; after all, it’s your baby: you’re familiar with all the nuts and bolts, and nobody else can do this job as well as you. But as your sphere of influence grows, this becomes a choke point: there are only so many things you could be doing at once. Just as importantly, the project-hoarding behavior robs more junior folks of the ability to take on new responsibilities and bring their own ideas to life. In other words, when done properly, delegation is not just about freeing up your plate; it’s also about empowerment and about signalling trust.

Of course, when you hand your project over to somebody else, the new owner will initially be slower and more clumsy than you; but if you pick the new leads wisely, give them the right tools and the right incentives, and don’t make them deathly afraid of messing up, they will soon excel at their new jobs – and be grateful for the opportunity.

A related affliction of many accomplished techies is the conviction that they know the answers to every question even tangentially related to their domain of expertise; that belief is coupled with a burning desire to have the last word in every debate. When practiced in moderation, this behavior is fine among peers – but for a leader, one of the most important skills to learn is knowing when to keep your mouth shut: people learn a lot better by experimenting and making small mistakes than by being schooled by their boss, and they often try to read into your passing remarks. Don’t run an authoritarian camp focused on total risk aversion or perfectly efficient resource management; just set reasonable boundaries and exit conditions for experiments so that they don’t spiral out of control – and be amazed by the results every now and then.

Death by planning

When nothing is on fire, it’s easy to get preoccupied with maintaining the status quo. If your current headcount or budget request lists all the same projects as last year’s, or if you ever find yourself ending an argument by deferring to a policy or a process document, it’s probably a sign that you’re getting complacent. In security, complacency usually ends in tears – and when it doesn’t, it leads to burnout or boredom.

In my experience, your goal should be to develop a cadre of managers or tech leads capable of coming up with clever ideas, prioritizing them among themselves, and seeing them to completion without your day-to-day involvement. In your spare time, make it your mission to challenge them to stay ahead of the curve. Ask your vendor security lead how they’d streamline their work if they had a 40% jump in the number of vendors but no extra headcount; ask your product security folks what’s the second line of defense or containment should your primary defenses fail. Help them get good ideas off the ground; set some mental success and failure criteria to be able to cut your losses if something does not pan out.

Of course, malfunctions happen even in the best-run teams; to spot trouble early on, instead of overzealous project tracking, I found it useful to encourage folks to run a data-driven org. I’d usually ask them to imagine that a brand new VP shows up in our office and, as his first order of business, asks “why do you have so many people here and how do I know they are doing the right things?”. Not everything in security can be quantified, but hard data can validate many of your assumptions – and will alert you to unseen issues early on.

When focusing on data, it’s important not to treat pie charts and spreadsheets as an art unto itself; if you run a security review process for your company, your CSAT scores are going to reach 100% if you just rubberstamp every launch request within ten minutes of receiving it. Make sure you’re asking the right questions; instead of “how satisfied are you with our process”, try “is your product better as a consequence of talking to us?”

Whenever things are not progressing as expected, it is a natural instinct to fall back to micromanagement, but it seldom truly cures the ill. It’s probable that your team disagrees with your vision or its feasibility – and that you’re either not listening to their feedback, or they don’t think you’d care. It’s good to assume that most of your employees are as smart or smarter than you; barking your orders at them more loudly or more frequently does not lead anyplace good. It’s good to listen to them and either present new facts or work with them on a plan you can all get behind.

In some circumstances, all that’s needed is honesty about the business trade-offs, so that your team feels like your “partner in crime”, not a victim of circumstance. For example, we’d tell our folks that by not falling behind on basic, unglamorous work, we earn the trust of our VPs and SVPs – and that this translates into the independence and the resources we need to pursue more ambitious ideas without being told what to do; it’s how we game the system, so to speak. Oh: leading by example is a pretty powerful tool at your disposal, too.

The human factor

I’ve come to appreciate that hiring decent folks who can get along with others is far more important than trying to recruit conference-circuit superstars. In fact, hiring superstars is a decidedly hit-and-miss affair: while certainly not a rule, there is a proportion of folks who put the maintenance of their celebrity status ahead of job responsibilities or the well-being of their peers.

For teams, one of the most powerful demotivators is a sense of unfairness and disempowerment. This is where tech-originating leaders can shine, because their teams usually feel that their bosses understand and can evaluate the merits of the work. But it also means you need to be decisive and actually solve problems for them, rather than just letting them vent. You will need to make unpopular decisions every now and then; in such cases, I think it’s important to move quickly, rather than prolonging the uncertainty – but it’s also important to sincerely listen to concerns, explain your reasoning, and be frank about the risks and trade-offs.

Whenever you see a clash of personalities on your team, you probably need to respond swiftly and decisively; being right should not justify being a bully. If you don’t react to repeated scuffles, your best people will probably start looking for other opportunities: it’s draining to put up with constant pie fights, no matter if the pies are thrown straight at you or if you just need to duck one every now and then.

More broadly, personality differences seem to be a much better predictor of conflict than any technical aspects underpinning a debate. As a boss, you need to identify such differences early on and come up with creative solutions. Sometimes, all you need is taking some badly-delivered but valid feedback and having a conversation with the other person, asking some questions that can help them reach the same conclusions without feeling that their worldview is under attack. Other times, the only path forward is making sure that some folks simply don’t run into each for a while.

Finally, dealing with low performers is a notoriously hard but important part of the game. Especially within large companies, there is always the temptation to just let it slide: sideline a struggling person and wait for them to either get over their issues or leave. But this sends an awful message to the rest of the team; for better or worse, fairness is important to most. Simply firing the low performers is seldom the best solution, though; successful recovery cases are what sets great managers apart from the average ones.

Oh, one more thought: people in leadership roles have their allegiance divided between the company and the people who depend on them. The obligation to the company is more formal, but the impact you have on your team is longer-lasting and more intimate. When the obligations to the employer and to your team collide in some way, make sure you can make the right call; it might be one of the the most consequential decisions you’ll ever make.

AWS Online Tech Talks – January 2018

Post Syndicated from Ana Visneski original https://aws.amazon.com/blogs/aws/aws-online-tech-talks-january-2018/

Happy New Year! Kick of 2018 right by expanding your AWS knowledge with a great batch of new Tech Talks. We’re covering some of the biggest launches from re:Invent including Amazon Neptune, Amazon Rekognition Video, AWS Fargate, AWS Cloud9, Amazon Kinesis Video Streams, AWS PrivateLink, AWS Single-Sign On and more!

January 2018– Schedule

Noted below are the upcoming scheduled live, online technical sessions being held during the month of January. Make sure to register ahead of time so you won’t miss out on these free talks conducted by AWS subject matter experts.

Webinars featured this month are:

Monday January 22

Analytics & Big Data
11:00 AM – 11:45 AM PT Analyze your Data Lake, Fast @ Any Scale  Lvl 300

Database
01:00 PM – 01:45 PM PT Deep Dive on Amazon Neptune Lvl 200

Tuesday, January 23

Artificial Intelligence
9:00 AM – 09:45 AM PT  How to get the most out of Amazon Rekognition Video, a deep learning based video analysis service Lvl 300

Containers

11:00 AM – 11:45 AM Introducing AWS Fargate Lvl 200

Serverless
01:00 PM – 02:00 PM PT Overview of Serverless Application Deployment Patterns Lvl 400

Wednesday, January 24

DevOps
09:00 AM – 09:45 AM PT Introducing AWS Cloud9  Lvl 200

Analytics & Big Data
11:00 AM – 11:45 AM PT Deep Dive: Amazon Kinesis Video Streams
Lvl 300
Database
01:00 PM – 01:45 PM PT Introducing Amazon Aurora with PostgreSQL Compatibility Lvl 200

Thursday, January 25

Artificial Intelligence
09:00 AM – 09:45 AM PT Introducing Amazon SageMaker Lvl 200

Mobile
11:00 AM – 11:45 AM PT Ionic and React Hybrid Web/Native Mobile Applications with Mobile Hub Lvl 200

IoT
01:00 PM – 01:45 PM PT Connected Product Development: Secure Cloud & Local Connectivity for Microcontroller-based Devices Lvl 200

Monday, January 29

Enterprise
11:00 AM – 11:45 AM PT Enterprise Solutions Best Practices 100 Achieving Business Value with AWS Lvl 100

Compute
01:00 PM – 01:45 PM PT Introduction to Amazon Lightsail Lvl 200

Tuesday, January 30

Security, Identity & Compliance
09:00 AM – 09:45 AM PT Introducing Managed Rules for AWS WAF Lvl 200

Storage
11:00 AM – 11:45 AM PT  Improving Backup & DR – AWS Storage Gateway Lvl 300

Compute
01:00 PM – 01:45 PM PT  Introducing the New Simplified Access Model for EC2 Spot Instances Lvl 200

Wednesday, January 31

Networking
09:00 AM – 09:45 AM PT  Deep Dive on AWS PrivateLink Lvl 300

Enterprise
11:00 AM – 11:45 AM PT Preparing Your Team for a Cloud Transformation Lvl 200

Compute
01:00 PM – 01:45 PM PT  The Nitro Project: Next-Generation EC2 Infrastructure Lvl 300

Thursday, February 1

Security, Identity & Compliance
09:00 AM – 09:45 AM PT  Deep Dive on AWS Single Sign-On Lvl 300

Storage
11:00 AM – 11:45 AM PT How to Build a Data Lake in Amazon S3 & Amazon Glacier Lvl 300

Detecting Adblocker Blockers

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2018/01/detecting_adblo.html

Interesting research on the prevalence of adblock blockers: “Measuring and Disrupting Anti-Adblockers Using Differential Execution Analysis“:

Abstract: Millions of people use adblockers to remove intrusive and malicious ads as well as protect themselves against tracking and pervasive surveillance. Online publishers consider adblockers a major threat to the ad-powered “free” Web. They have started to retaliate against adblockers by employing anti-adblockers which can detect and stop adblock users. To counter this retaliation, adblockers in turn try to detect and filter anti-adblocking scripts. This back and forth has prompted an escalating arms race between adblockers and anti-adblockers.

We want to develop a comprehensive understanding of anti-adblockers, with the ultimate aim of enabling adblockers to bypass state-of-the-art anti-adblockers. In this paper, we present a differential execution analysis to automatically detect and analyze anti-adblockers. At a high level, we collect execution traces by visiting a website with and without adblockers. Through differential execution analysis, we are able to pinpoint the conditions that lead to the differences caused by anti-adblocking code. Using our system, we detect anti-adblockers on 30.5% of the Alexa top-10K websites which is 5-52 times more than reported in prior literature. Unlike prior work which is limited to detecting visible reactions (e.g., warning messages) by anti-adblockers, our system can discover attempts to detect adblockers even when there is no visible reaction. From manually checking one third of the detected websites, we find that the websites that have no visible reactions constitute over 90% of the cases, completely dominating the ones that have visible warning messages. Finally, based on our findings, we further develop JavaScript rewriting and API hooking based solutions (the latter implemented as a Chrome extension) to help adblockers bypass state-of-the-art anti-adblockers.

News article.

Goodbye, net neutrality—Ajit Pai’s FCC votes to allow blocking and throttling (Ars Technica)

Post Syndicated from jake original https://lwn.net/Articles/741482/rss

In a vote that was not any kind of surprise, the US Federal Communications Commission (FCC) voted to end the “net neutrality” rules that stop internet service providers (ISPs) and others from blocking or throttling certain kinds of traffic to try to force consumers and content providers to pay more for “fast lanes”. Ars Technica covers the vote and the reaction to it, including the fact that the fight is not yet over: “Plenty of organizations might appeal, said consumer advocate Gigi Sohn, who was a top counselor to then-FCC Chairman Tom Wheeler when the commission imposed its rules.

‘I think you’ll see public interest groups, trade associations, and small and mid-sized tech companies filing the petitions for review,’ Sohn told Ars. One or two ‘big companies’ could also challenge the repeal, she thinks.

Lawsuit filers can challenge the repeal on numerous respects, she said. They can argue that the public record doesn’t support the FCC’s claim that broadband isn’t a telecommunications service, that ‘throwing away all protections for consumers and innovators for the first time since this issue has been debated is arbitrary and capricious,’ and that the FCC cannot preempt state net neutrality laws, she said.”

How to Make Your Web App More Reliable and Performant Using webpack: a Yahoo Mail Case Study

Post Syndicated from mikesefanov original https://yahooeng.tumblr.com/post/168508200981

yahoodevelopers:

image

By Murali Krishna Bachhu, Anurag Damle, and Utkarsh Shrivastava

As engineers on the Yahoo Mail team at Oath, we pride ourselves on the things that matter most to developers: faster development cycles, more reliability, and better performance. Users don’t necessarily see these elements, but they certainly feel the difference they make when significant improvements are made. Recently, we were able to upgrade all three of these areas at scale by adopting webpack® as Yahoo Mail’s underlying module bundler, and you can do the same for your web application.

What is webpack?

webpack is an open source module bundler for modern JavaScript applications. When webpack processes your application, it recursively builds a dependency graph that includes every module your application needs. Then it packages all of those modules into a small number of bundles, often only one, to be loaded by the browser.

webpack became our choice module bundler not only because it supports on-demand loading, multiple bundle generation, and has a relatively low runtime overhead, but also because it is better suited for web platforms and NodeJS apps and has great community support.

image

Comparison of webpack to other open source bundlers


How did we integrate webpack?

Like any developer does when integrating a new module bundler, we started integrating webpack into Yahoo Mail by looking at its basic config file. We explored available default webpack plugins as well as third-party webpack plugins and then picked the plugins most suitable for our application. If we didn’t find a plugin that suited a specific need, we wrote the webpack plugin ourselves (e.g., We wrote a plugin to execute Atomic CSS scripts in the latest Yahoo Mail experience in order to decrease our overall CSS payload**).

During the development process for Yahoo Mail, we needed a way to make sure webpack would continuously run in the background. To make this happen, we decided to use the task runner Grunt. Not only does Grunt keep the connection to webpack alive, but it also gives us the ability to pass different parameters to the webpack config file based on the given environment. Some examples of these parameters are source map options, enabling HMR, and uglification.

Before deployment to production, we wanted to optimize the javascript bundles for size to make the Yahoo Mail experience faster. webpack provides good default support for this with the UglifyJS plugin. Although the default options are conservative, they give us the ability to configure the options. Once we modified the options to our specifications, we saved approximately 10KB.

image

Code snippet showing the configuration options for the UglifyJS plugin


Faster development cycles for developers

While developing a new feature, engineers ideally want to see their code changes reflected on their web app instantaneously. This allows them to maintain their train of thought and eventually results in more productivity. Before we implemented webpack, it took us around 30 seconds to 1 minute for changes to reflect on our Yahoo Mail development environment. webpack helped us reduce the wait time to 5 seconds.

More reliability

Consumers love a reliable product, where all the features work seamlessly every time. Before we began using webpack, we were generating javascript bundles on demand or during run-time, which meant the product was more prone to exceptions or failures while fetching the javascript bundles. With webpack, we now generate all the bundles during build time, which means that all the bundles are available whenever consumers access Yahoo Mail. This results in significantly fewer exceptions and failures and a better experience overall.

Better Performance

We were able to attain a significant reduction of payload after adopting webpack.

  1. Reduction of about 75 KB gzipped Javascript payload
  2. 50% reduction on server-side render time
  3. 10% improvement in Yahoo Mail’s launch performance metrics, as measured by render time above the fold (e.g., Time to load contents of an email).

Below are some charts that demonstrate the payload size of Yahoo Mail before and after implementing webpack.

image

Payload before using webpack (JavaScript Size = 741.41KB)


image

Payload after switching to webpack (JavaScript size = 669.08KB)


image

Conclusion

Shifting to webpack has resulted in significant improvements. We saw a common build process go from 30 seconds to 5 seconds, large JavaScript bundle size reductions, and a halving in server-side rendering time. In addition to these benefits, our engineers have found the community support for webpack to have been impressive as well. webpack has made the development of Yahoo Mail more efficient and enhanced the product for users. We believe you can use it to achieve similar results for your web application as well.

**Optimized CSS generation with Atomizer

Before we implemented webpack into the development of Yahoo Mail, we looked into how we could decrease our CSS payload. To achieve this, we developed an in-house solution for writing modular and scoped CSS in React. Our solution is similar to the Atomizer library, and our CSS is written in JavaScript like the example below:

image

Sample snippet of CSS written with Atomizer


Every React component creates its own styles.js file with required style definitions. React-Atomic-CSS converts these files into unique class definitions. Our total CSS payload after implementing our solution equaled all the unique style definitions in our code, or only 83KB (21KB gzipped).

During our migration to webpack, we created a custom plugin and loader to parse these files and extract the unique style definitions from all of our CSS files. Since this process is tied to bundling, only CSS files that are part of the dependency chain are included in the final CSS.

Libertarians are against net neutrality

Post Syndicated from Robert Graham original http://blog.erratasec.com/2017/12/libertarians-are-against-net-neutrality.html

This post claims to be by a libertarian in support of net neutrality. As a libertarian, I need to debunk this. “Net neutrality” is a case of one-hand clapping, you rarely hear the competing side, and thus, that side may sound attractive. This post is about the other side, from a libertarian point of view.

That post just repeats the common, and wrong, left-wing talking points. I mean, there might be a libertarian case for some broadband regulation, but this isn’t it.

This thing they call “net neutrality” is just left-wing politics masquerading as some sort of principle. It’s no different than how people claim to be “pro-choice”, yet demand forced vaccinations. Or, it’s no different than how people claim to believe in “traditional marriage” even while they are on their third “traditional marriage”.

Properly defined, “net neutrality” means no discrimination of network traffic. But nobody wants that. A classic example is how most internet connections have faster download speeds than uploads. This discriminates against upload traffic, harming innovation in upload-centric applications like DropBox’s cloud backup or BitTorrent’s peer-to-peer file transfer. Yet activists never mention this, or other types of network traffic discrimination, because they no more care about “net neutrality” than Trump or Gingrich care about “traditional marriage”.

Instead, when people say “net neutrality”, they mean “government regulation”. It’s the same old debate between who is the best steward of consumer interest: the free-market or government.

Specifically, in the current debate, they are referring to the Obama-era FCC “Open Internet” order and reclassification of broadband under “Title II” so they can regulate it. Trump’s FCC is putting broadband back to “Title I”, which means the FCC can’t regulate most of its “Open Internet” order.

Don’t be tricked into thinking the “Open Internet” order is anything but intensely politically. The premise behind the order is the Democrat’s firm believe that it’s government who created the Internet, and all innovation, advances, and investment ultimately come from the government. It sees ISPs as inherently deceitful entities who will only serve their own interests, at the expense of consumers, unless the FCC protects consumers.

It says so right in the order itself. It starts with the premise that broadband ISPs are evil, using illegitimate “tactics” to hurt consumers, and continues with similar language throughout the order.

A good contrast to this can be seen in Tim Wu’s non-political original paper in 2003 that coined the term “net neutrality”. Whereas the FCC sees broadband ISPs as enemies of consumers, Wu saw them as allies. His concern was not that ISPs would do evil things, but that they would do stupid things, such as favoring short-term interests over long-term innovation (such as having faster downloads than uploads).

The political depravity of the FCC’s order can be seen in this comment from one of the commissioners who voted for those rules:

FCC Commissioner Jessica Rosenworcel wants to increase the minimum broadband standards far past the new 25Mbps download threshold, up to 100Mbps. “We invented the internet. We can do audacious things if we set big goals, and I think our new threshold, frankly, should be 100Mbps. I think anything short of that shortchanges our children, our future, and our new digital economy,” Commissioner Rosenworcel said.

This is indistinguishable from communist rhetoric that credits the Party for everything, as this booklet from North Korea will explain to you.

But what about monopolies? After all, while the free-market may work when there’s competition, it breaks down where there are fewer competitors, oligopolies, and monopolies.

There is some truth to this, in individual cities, there’s often only only a single credible high-speed broadband provider. But this isn’t the issue at stake here. The FCC isn’t proposing light-handed regulation to keep monopolies in check, but heavy-handed regulation that regulates every last decision.

Advocates of FCC regulation keep pointing how broadband monopolies can exploit their renting-seeking positions in order to screw the customer. They keep coming up with ever more bizarre and unlikely scenarios what monopoly power grants the ISPs.

But the never mention the most simplest: that broadband monopolies can just charge customers more money. They imagine instead that these companies will pursue a string of outrageous, evil, and less profitable behaviors to exploit their monopoly position.

The FCC’s reclassification of broadband under Title II gives it full power to regulate ISPs as utilities, including setting prices. The FCC has stepped back from this, promising it won’t go so far as to set prices, that it’s only regulating these evil conspiracy theories. This is kind of bizarre: either broadband ISPs are evilly exploiting their monopoly power or they aren’t. Why stop at regulating only half the evil?

The answer is that the claim “monopoly” power is a deception. It starts with overstating how many monopolies there are to begin with. When it issued its 2015 “Open Internet” order the FCC simultaneously redefined what they meant by “broadband”, upping the speed from 5-mbps to 25-mbps. That’s because while most consumers have multiple choices at 5-mbps, fewer consumers have multiple choices at 25-mbps. It’s a dirty political trick to convince you there is more of a problem than there is.

In any case, their rules still apply to the slower broadband providers, and equally apply to the mobile (cell phone) providers. The US has four mobile phone providers (AT&T, Verizon, T-Mobile, and Sprint) and plenty of competition between them. That it’s monopolistic power that the FCC cares about here is a lie. As their Open Internet order clearly shows, the fundamental principle that animates the document is that all corporations, monopolies or not, are treacherous and must be regulated.

“But corporations are indeed evil”, people argue, “see here’s a list of evil things they have done in the past!”

No, those things weren’t evil. They were done because they benefited the customers, not as some sort of secret rent seeking behavior.

For example, one of the more common “net neutrality abuses” that people mention is AT&T’s blocking of FaceTime. I’ve debunked this elsewhere on this blog, but the summary is this: there was no network blocking involved (not a “net neutrality” issue), and the FCC analyzed it and decided it was in the best interests of the consumer. It’s disingenuous to claim it’s an evil that justifies FCC actions when the FCC itself declared it not evil and took no action. It’s disingenuous to cite the “net neutrality” principle that all network traffic must be treated when, in fact, the network did treat all the traffic equally.

Another frequently cited abuse is Comcast’s throttling of BitTorrent.Comcast did this because Netflix users were complaining. Like all streaming video, Netflix backs off to slower speed (and poorer quality) when it experiences congestion. BitTorrent, uniquely among applications, never backs off. As most applications become slower and slower, BitTorrent just speeds up, consuming all available bandwidth. This is especially problematic when there’s limited upload bandwidth available. Thus, Comcast throttled BitTorrent during prime time TV viewing hours when the network was already overloaded by Netflix and other streams. BitTorrent users wouldn’t mind this throttling, because it often took days to download a big file anyway.

When the FCC took action, Comcast stopped the throttling and imposed bandwidth caps instead. This was a worse solution for everyone. It penalized heavy Netflix viewers, and prevented BitTorrent users from large downloads. Even though BitTorrent users were seen as the victims of this throttling, they’d vastly prefer the throttling over the bandwidth caps.

In both the FaceTime and BitTorrent cases, the issue was “network management”. AT&T had no competing video calling service, Comcast had no competing download service. They were only reacting to the fact their networks were overloaded, and did appropriate things to solve the problem.

Mobile carriers still struggle with the “network management” issue. While their networks are fast, they are still of low capacity, and quickly degrade under heavy use. They are looking for tricks in order to reduce usage while giving consumers maximum utility.

The biggest concern is video. It’s problematic because it’s designed to consume as much bandwidth as it can, throttling itself only when it experiences congestion. This is what you probably want when watching Netflix at the highest possible quality, but it’s bad when confronted with mobile bandwidth caps.

With small mobile devices, you don’t want as much quality anyway. You want the video degraded to lower quality, and lower bandwidth, all the time.

That’s the reasoning behind T-Mobile’s offerings. They offer an unlimited video plan in conjunction with the biggest video providers (Netflix, YouTube, etc.). The catch is that when congestion occurs, they’ll throttle it to lower quality. In other words, they give their bandwidth to all the other phones in your area first, then give you as much of the leftover bandwidth as you want for video.

While it sounds like T-Mobile is doing something evil, “zero-rating” certain video providers and degrading video quality, the FCC allows this, because they recognize it’s in the customer interest.

Mobile providers especially have great interest in more innovation in this area, in order to conserve precious bandwidth, but they are finding it costly. They can’t just innovate, but must ask the FCC permission first. And with the new heavy handed FCC rules, they’ve become hostile to this innovation. This attitude is highlighted by the statement from the “Open Internet” order:

And consumers must be protected, for example from mobile commercial practices masquerading as “reasonable network management.”

This is a clear declaration that free-market doesn’t work and won’t correct abuses, and that that mobile companies are treacherous and will do evil things without FCC oversight.

Conclusion

Ignoring the rhetoric for the moment, the debate comes down to simple left-wing authoritarianism and libertarian principles. The Obama administration created a regulatory regime under clear Democrat principles, and the Trump administration is rolling it back to more free-market principles. There is no principle at stake here, certainly nothing to do with a technical definition of “net neutrality”.

The 2015 “Open Internet” order is not about “treating network traffic neutrally”, because it doesn’t do that. Instead, it’s purely a left-wing document that claims corporations cannot be trusted, must be regulated, and that innovation and prosperity comes from the regulators and not the free market.

It’s not about monopolistic power. The primary targets of regulation are the mobile broadband providers, where there is plenty of competition, and who have the most “network management” issues. Even if it were just about wired broadband (like Comcast), it’s still ignoring the primary ways monopolies profit (raising prices) and instead focuses on bizarre and unlikely ways of rent seeking.

If you are a libertarian who nonetheless believes in this “net neutrality” slogan, you’ve got to do better than mindlessly repeating the arguments of the left-wing. The term itself, “net neutrality”, is just a slogan, varying from person to person, from moment to moment. You have to be more specific. If you truly believe in the “net neutrality” technical principle that all traffic should be treated equally, then you’ll want a rewrite of the “Open Internet” order.

In the end, while libertarians may still support some form of broadband regulation, it’s impossible to reconcile libertarianism with the 2015 “Open Internet”, or the vague things people mean by the slogan “net neutrality”.

Stretch for PCs and Macs, and a Raspbian update

Post Syndicated from Simon Long original https://www.raspberrypi.org/blog/stretch-pcs-macs-raspbian-update/

Today, we are launching the first Debian Stretch release of the Raspberry Pi Desktop for PCs and Macs, and we’re also releasing the latest version of Raspbian Stretch for your Pi.

Raspberry Pi Desktop Stretch splash screen

For PCs and Macs

When we released our custom desktop environment on Debian for PCs and Macs last year, we were slightly taken aback by how popular it turned out to be. We really only created it as a result of one of those “Wouldn’t it be cool if…” conversations we sometimes have in the office, so we were delighted by the Pi community’s reaction.

Seeing how keen people were on the x86 version, we decided that we were going to try to keep releasing it alongside Raspbian, with the ultimate aim being to make simultaneous releases of both. This proved to be tricky, particularly with the move from the Jessie version of Debian to the Stretch version this year. However, we have now finished the job of porting all the custom code in Raspbian Stretch to Debian, and so the first Debian Stretch release of the Raspberry Pi Desktop for your PC or Mac is available from today.

The new Stretch releases

As with the Jessie release, you can either run this as a live image from a DVD, USB stick, or SD card or install it as the native operating system on the hard drive of an old laptop or desktop computer. Please note that installing this software will erase anything else on the hard drive — do not install this over a machine running Windows or macOS that you still need to use for its original purpose! It is, however, safe to boot a live image on such a machine, since your hard drive will not be touched by this.

We’re also pleased to announce that we are releasing the latest version of Raspbian Stretch for your Pi today. The Pi and PC versions are largely identical: as before, there are a few applications (such as Mathematica) which are exclusive to the Pi, but the user interface, desktop, and most applications will be exactly the same.

For Raspbian, this new release is mostly bug fixes and tweaks over the previous Stretch release, but there are one or two changes you might notice.

File manager

The file manager included as part of the LXDE desktop (on which our desktop is based) is a program called PCManFM, and it’s very feature-rich; there’s not much you can’t do in it. However, having used it for a few years, we felt that it was perhaps more complex than it needed to be — the sheer number of menu options and choices made some common operations more awkward than they needed to be. So to try to make file management easier, we have implemented a cut-down mode for the file manager.

Raspberry Pi Desktop Stretch - file manager

Most of the changes are to do with the menus. We’ve removed a lot of options that most people are unlikely to change, and moved some other options into the Preferences screen rather than the menus. The two most common settings people tend to change — how icons are displayed and sorted — are now options on the toolbar and in a top-level menu rather than hidden away in submenus.

The sidebar now only shows a single hierarchical view of the file system, and we’ve tidied the toolbar and updated the icons to make them match our house style. We’ve removed the option for a tabbed interface, and we’ve stomped a few bugs as well.

One final change was to make it possible to rename a file just by clicking on its icon to highlight it, and then clicking on its name. This is the way renaming works on both Windows and macOS, and it’s always seemed slightly awkward that Unix desktop environments tend not to support it.

As with most of the other changes we’ve made to the desktop over the last few years, the intention is to make it simpler to use, and to ease the transition from non-Unix environments. But if you really don’t like what we’ve done and long for the old file manager, just untick the box for Display simplified user interface and menus in the Layout page of Preferences, and everything will be back the way it was!

Raspberry Pi Desktop Stretch - preferences GUI

Battery indicator for laptops

One important feature missing from the previous release was an indication of the amount of battery life. Eben runs our desktop on his Mac, and he was becoming slightly irritated by having to keep rebooting into macOS just to check whether his battery was about to die — so fixing this was a priority!

We’ve added a battery status icon to the taskbar; this shows current percentage charge, along with whether the battery is charging, discharging, or connected to the mains. When you hover over the icon with the mouse pointer, a tooltip with more details appears, including the time remaining if the battery can provide this information.

Raspberry Pi Desktop Stretch - battery indicator

While this battery monitor is mainly intended for the PC version, it also supports the first-generation pi-top — to see it, you’ll only need to make sure that I2C is enabled in Configuration. A future release will support the new second-generation pi-top.

New PC applications

We have included a couple of new applications in the PC version. One is called PiServer — this allows you to set up an operating system, such as Raspbian, on the PC which can then be shared by a number of Pi clients networked to it. It is intended to make it easy for classrooms to have multiple Pis all running exactly the same software, and for the teacher to have control over how the software is installed and used. PiServer is quite a clever piece of software, and it’ll be covered in more detail in another blog post in December.

We’ve also added an application which allows you to easily use the GPIO pins of a Pi Zero connected via USB to a PC in applications using Scratch or Python. This makes it possible to run the same physical computing projects on the PC as you do on a Pi! Again, we’ll tell you more in a separate blog post this month.

Both of these applications are included as standard on the PC image, but not on the Raspbian image. You can run them on a Pi if you want — both can be installed from apt.

How to get the new versions

New images for both Raspbian and Debian versions are available from the Downloads page.

It is possible to update existing installations of both Raspbian and Debian versions. For Raspbian, this is easy: just open a terminal window and enter

sudo apt-get update
sudo apt-get dist-upgrade

Updating Raspbian on your Raspberry Pi

How to update to the latest version of Raspbian on your Raspberry Pi. Download Raspbian here: More information on the latest version of Raspbian: Buy a Raspberry Pi:

It is slightly more complex for the PC version, as the previous release was based around Debian Jessie. You will need to edit the files /etc/apt/sources.list and /etc/apt/sources.list.d/raspi.list, using sudo to do so. In both files, change every occurrence of the word “jessie” to “stretch”. When that’s done, do the following:

sudo apt-get update 
sudo dpkg --force-depends -r libwebkitgtk-3.0-common
sudo apt-get -f install
sudo apt-get dist-upgrade
sudo apt-get install python3-thonny
sudo apt-get install sonic-pi=2.10.0~repack-rpt1+2
sudo apt-get install piserver
sudo apt-get install usbbootgui

At several points during the upgrade process, you will be asked if you want to keep the current version of a configuration file or to install the package maintainer’s version. In every case, keep the existing version, which is the default option. The update may take an hour or so, depending on your network connection.

As with all software updates, there is the possibility that something may go wrong during the process, which could lead to your operating system becoming corrupted. Therefore, we always recommend making a backup first.

Enjoy the new versions, and do let us know any feedback you have in the comments or on the forums!

The post Stretch for PCs and Macs, and a Raspbian update appeared first on Raspberry Pi.

AWS Systems Manager – A Unified Interface for Managing Your Cloud and Hybrid Resources

Post Syndicated from Randall Hunt original https://aws.amazon.com/blogs/aws/aws-systems-manager/

AWS Systems Manager is a new way to manage your cloud and hybrid IT environments. AWS Systems Manager provides a unified user interface that simplifies resource and application management, shortens the time to detect and resolve operational problems, and makes it easy to operate and manage your infrastructure securely at scale. This service is absolutely packed full of features. It defines a new experience around grouping, visualizing, and reacting to problems using features from products like Amazon EC2 Systems Manager (SSM) to enable rich operations across your resources.

As I said above, there are a lot of powerful features in this service and we won’t be able to dive deep on all of them but it’s easy to go to the console and get started with any of the tools.

Resource Groupings

Resource Groups allow you to create logical groupings of most resources that support tagging like: Amazon Elastic Compute Cloud (EC2) instances, Amazon Simple Storage Service (S3) buckets, Elastic Load Balancing balancers, Amazon Relational Database Service (RDS) instances, Amazon Virtual Private Cloud, Amazon Kinesis streams, Amazon Route 53 zones, and more. Previously, you could use the AWS Console to define resource groupings but AWS Systems Manager provides this new resource group experience via a new console and API. These groupings are a fundamental building block of Systems Manager in that they are frequently the target of various operations you may want to perform like: compliance management, software inventories, patching, and other automations.

You start by defining a group based on tag filters. From there you can view all of the resources in a centralized console. You would typically use these groupings to differentiate between applications, application layers, and environments like production or dev – but you can make your own rules about how to use them as well. If you imagine a typical 3 tier web-app you might have a few EC2 instances, an ELB, a few S3 buckets, and an RDS instance. You can define a grouping for that application and with all of those different resources simultaneously.

Insights

AWS Systems Manager automatically aggregates and displays operational data for each resource group through a dashboard. You no longer need to navigate through multiple AWS consoles to view all of your operational data. You can easily integrate your exiting Amazon CloudWatch dashboards, AWS Config rules, AWS CloudTrail trails, AWS Trusted Advisor notifications, and AWS Personal Health Dashboard performance and availability alerts. You can also easily view your software inventories across your fleet. AWS Systems Manager also provides a compliance dashboard allowing you to see the state of various security controls and patching operations across your fleets.

Acting on Insights

Building on the success of EC2 Systems Manager (SSM), AWS Systems Manager takes all of the features of SSM and provides a central place to access them. These are all the same experiences you would have through SSM with a more accesible console and centralized interface. You can use the resource groups you’ve defined in Systems Manager to visualize and act on groups of resources.

Automation


Automations allow you to define common IT tasks as a JSON document that specify a list of tasks. You can also use community published documents. These documents can be executed through the Console, CLIs, SDKs, scheduled maintenance windows, or triggered based on changes in your infrastructure through CloudWatch events. You can track and log the execution of each step in the documents and prompt for additional approvals. It also allows you to incrementally roll out changes and automatically halt when errors occur. You can start executing an automation directly on a resource group and it will be able to apply itself to the resources that it understands within the group.

Run Command

Run Command is a superior alternative to enabling SSH on your instances. It provides safe, secure remote management of your instances at scale without logging into your servers, replacing the need for SSH bastions or remote powershell. It has granular IAM permissions that allow you to restrict which roles or users can run certain commands.

Patch Manager, Maintenance Windows, and State Manager

I’ve written about Patch Manager before and if you manage fleets of Windows and Linux instances it’s a great way to maintain a common baseline of security across your fleet.

Maintenance windows allow you to schedule instance maintenance and other disruptive tasks for a specific time window.

State Manager allows you to control various server configuration details like anti-virus definitions, firewall settings, and more. You can define policies in the console or run existing scripts, PowerShell modules, or even Ansible playbooks directly from S3 or GitHub. You can query State Manager at any time to view the status of your instance configurations.

Things To Know

There’s some interesting terminology here. We haven’t done the best job of naming things in the past so let’s take a moment to clarify. EC2 Systems Manager (sometimes called SSM) is what you used before today. You can still invoke aws ssm commands. However, AWS Systems Manager builds on and enhances many of the tools provided by EC2 Systems Manager and allows those same tools to be applied to more than just EC2. When you see the phrase “Systems Manager” in the future you should think of AWS Systems Manager and not EC2 Systems Manager.

AWS Systems Manager with all of this useful functionality is provided at no additional charge. It is immediately available in all public AWS regions.

The best part about these services is that even with their tight integrations each one is designed to be used in isolation as well. If you only need one component of these services it’s simple to get started with only that component.

There’s a lot more than I could ever document in this post so I encourage you all to jump into the console and documentation to figure out where you can start using AWS Systems Manager.

Randall

Sky’s Pirate Site-Blocking Move is Something For North Korea, ISPs Say

Post Syndicated from Andy original https://torrentfreak.com/skys-pirate-site-blocking-move-is-something-for-north-korea-isps-say-171129/

Entertainment companies have been taking legal action to have pirate sites blocked for more than a decade so it was only a matter of time before New Zealand had a taste of the action.

It’s now been revealed that Sky Network Television, the country’s biggest pay-TV service, filed a complaint with the High Court in September, demanding that four local Internet service providers block subscriber access to several ‘pirate’ sites.

At this point, the sites haven’t been named, but it seems almost inevitable that the likes of The Pirate Bay will be present. The ISPs are known, however. Spark, Vodafone, Vocus and Two Degrees control around 90% of the Kiwi market so any injunction handed down will affect almost the entire country.

In its application, Sky states that pirate sites make available unauthorized copies of its entertainment works, something which not only infringes its copyrights but also undermines its business model. But while this is standard fare in such complaints, the Internet industry backlash today is something out of the ordinary.

ISPs in other jurisdictions have fought back against blocking efforts but few have deployed the kind of language being heard in New Zealand this morning.

Vocus Group – which runs the Orcon, Slingshot and Flip brands – is labeling Sky’s efforts as “gross censorship and a breach of net neutrality”, adding that they’re in direct opposition to the idea of a free and open Internet.

“SKY’s call that sites be blacklisted on their say so is dinosaur behavior, something you would expect in North Korea, not in New Zealand. It isn’t our job to police the Internet and it sure as hell isn’t SKY’s either, all sites should be equal and open,” says Vocus Consumer General Manager Taryn Hamilton.

But in response, Sky said Vocus “has got it wrong”, highlighting that site-blocking is now common practice in places such as Australia and the UK.

“Pirate sites like Pirate Bay make no contribution to the development of content, but rather just steal it. Over 40 countries around the world have put in place laws to block such sites, and we’re just looking to do the same,” the company said.

The broadcaster says it will only go to court to have dedicated pirate sites blocked, ones that “pay nothing to the creators” while stealing content for their own gain.

“We’re doing this because illegal streaming and content piracy is a major threat to the entertainment, creative and sporting industries in New Zealand and abroad. With piracy, not only is the sport and entertainment content that we love at risk, but so are the livelihoods of the thousands of people employed by these industries,” the company said.

“Illegally sharing or viewing content impacts a vast number of people and jobs including athletes, actors, artists, production crew, customer service representatives, event planners, caterers and many, many more.”

ISP Spark, which is also being targeted by Sky, was less visibly outraged than some of its competitors. However, the company still feels that controlling what people can see on the Internet is a slippery slope.

“We have some sympathy for this given we invest tens of millions of dollars into content ourselves through Lightbox. However, we don’t think it should be the role of ISPs to become the ‘police of the internet’ on behalf of other parties,” a Spark spokesperson said.

Perhaps unsurprisingly, Sky’s blocking efforts haven’t been well received by InternetNZ, the non-profit organization which protects and promotes Internet use in New Zealand.

Describing the company’s application for an injunction as an “extreme step”, InternetNZ Chief Executive Jordan Carter said that site-blocking works against the “very nature” of the Internet and is a measure that’s unlikely to achieve its goals.

“Site blocking is very easily evaded by people with the right skills or tools. Those who are deliberate pirates will be able to get around site blocking without difficulty,” Carter said.

“If blocking is ordered, it risks driving content piracy further underground, with the help of easily-deployed and common Internet tools. This could well end up making the issues that Sky are facing even harder to police in the future.”

What most of the ISPs and InternetNZ are also agreed on is the need to fight piracy with competitive, attractive legal offerings. Vocus says that local interest in The Pirate Bay has halved since Netflix launched in New Zealand, with traffic to the torrent site sitting at just 23% of its peak 2013 levels.

“The success of Netflix, iTunes and Spotify proves that people are willing to pay to access good-quality content. It’s pretty clear that SKY doesn’t understand the internet, and is trying a Hail Mary to turnaround its sunset business,” Vocus Consumer General Manager Taryn Hamilton said.

The big question now is whether the High Court has the ability to order these kinds of blocks. InternetNZ has its doubts, noting that it should only happen following a parliamentary mandate.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

Introducing AWS AppSync – Build data-driven apps with real-time and off-line capabilities

Post Syndicated from Tara Walker original https://aws.amazon.com/blogs/aws/introducing-amazon-appsync/

In this day and age, it is almost impossible to do without our mobile devices and the applications that help make our lives easier. As our dependency on our mobile phone grows, the mobile application market has exploded with millions of apps vying for our attention. For mobile developers, this means that we must ensure that we build applications that provide the quality, real-time experiences that app users desire.  Therefore, it has become essential that mobile applications are developed to include features such as multi-user data synchronization, offline network support, and data discovery, just to name a few.  According to several articles, I read recently about mobile development trends on publications like InfoQ, DZone, and the mobile development blog AlleviateTech, one of the key elements in of delivering the aforementioned capabilities is with cloud-driven mobile applications.  It seems that this is especially true, as it related to mobile data synchronization and data storage.

That being the case, it is a perfect time for me to announce a new service for building innovative mobile applications that are driven by data-intensive services in the cloud; AWS AppSync. AWS AppSync is a fully managed serverless GraphQL service for real-time data queries, synchronization, communications and offline programming features. For those not familiar, let me briefly share some information about the open GraphQL specification. GraphQL is a responsive data query language and server-side runtime for querying data sources that allow for real-time data retrieval and dynamic query execution. You can use GraphQL to build a responsive API for use in when building client applications. GraphQL works at the application layer and provides a type system for defining schemas. These schemas serve as specifications to define how operations should be performed on the data and how the data should be structured when retrieved. Additionally, GraphQL has a declarative coding model which is supported by many client libraries and frameworks including React, React Native, iOS, and Android.

Now the power of the GraphQL open standard query language is being brought to you in a rich managed service with AWS AppSync.  With AppSync developers can simplify the retrieval and manipulation of data across multiple data sources with ease, allowing them to quickly prototype, build and create robust, collaborative, multi-user applications. AppSync keeps data updated when devices are connected, but enables developers to build solutions that work offline by caching data locally and synchronizing local data when connections become available.

Let’s discuss some key concepts of AWS AppSync and how the service works.

AppSync Concepts

  • AWS AppSync Client: service client that defines operations, wraps authorization details of requests, and manage offline logic.
  • Data Source: the data storage system or a trigger housing data
  • Identity: a set of credentials with permissions and identification context provided with requests to GraphQL proxy
  • GraphQL Proxy: the GraphQL engine component for processing and mapping requests, handling conflict resolution, and managing Fine Grained Access Control
  • Operation: one of three GraphQL operations supported in AppSync
    • Query: a read-only fetch call to the data
    • Mutation: a write of the data followed by a fetch,
    • Subscription: long-lived connections that receive data in response to events.
  • Action: a notification to connected subscribers from a GraphQL subscription.
  • Resolver: function using request and response mapping templates that converts and executes payload against data source

How It Works

A schema is created to define types and capabilities of the desired GraphQL API and tied to a Resolver function.  The schema can be created to mirror existing data sources or AWS AppSync can create tables automatically based the schema definition. Developers can also use GraphQL features for data discovery without having knowledge of the backend data sources. After a schema definition is established, an AWS AppSync client can be configured with an operation request, like a Query operation. The client submits the operation request to GraphQL Proxy along with an identity context and credentials. The GraphQL Proxy passes this request to the Resolver which maps and executes the request payload against pre-configured AWS data services like an Amazon DynamoDB table, an AWS Lambda function, or a search capability using Amazon Elasticsearch. The Resolver executes calls to one or all of these services within a single network call minimizing CPU cycles and bandwidth needs and returns the response to the client. Additionally, the client application can change data requirements in code on demand and the AppSync GraphQL API will dynamically map requests for data accordingly, allowing prototyping and faster development.

In order to take a quick peek at the service, I’ll go to the AWS AppSync console. I’ll click the Create API button to get started.

 

When the Create new API screen opens, I’ll give my new API a name, TarasTestApp, and since I am just exploring the new service I will select the Sample schema option.  You may notice from the informational dialog box on the screen that in using the sample schema, AWS AppSync will automatically create the DynamoDB tables and the IAM roles for me.It will also deploy the TarasTestApp API on my behalf.  After review of the sample schema provided by the console, I’ll click the Create button to create my test API.

After the TaraTestApp API has been created and the associated AWS resources provisioned on my behalf, I can make updates to the schema, data source, or connect my data source(s) to a resolver. I also can integrate my GraphQL API into an iOS, Android, Web, or React Native application by cloning the sample repo from GitHub and downloading the accompanying GraphQL schema.  These application samples are great to help get you started and they are pre-configured to function in offline scenarios.

If I select the Schema menu option on the console, I can update and view the TarasTestApp GraphQL API schema.


Additionally, if I select the Data Sources menu option in the console, I can see the existing data sources.  Within this screen, I can update, delete, or add data sources if I so desire.

Next, I will select the Query menu option which takes me to the console tool for writing and testing queries. Since I chose the sample schema and the AWS AppSync service did most of the heavy lifting for me, I’ll try a query against my new GraphQL API.

I’ll use a mutation to add data for the event type in my schema. Since this is a mutation and it first writes data and then does a read of the data, I want the query to return values for name and where.

If I go to the DynamoDB table created for the event type in the schema, I will see that the values from my query have been successfully written into the table. Now that was a pretty simple task to write and retrieve data based on a GraphQL API schema from a data source, don’t you think.


 Summary

AWS AppSync is currently in AWS AppSync is in Public Preview and you can sign up today. It supports development for iOS, Android, and JavaScript applications. You can take advantage of this managed GraphQL service by going to the AWS AppSync console or learn more by reviewing more details about the service by reading a tutorial in the AWS documentation for the service or checking out our AWS AppSync Developer Guide.

Tara

 

Using Amazon Redshift Spectrum, Amazon Athena, and AWS Glue with Node.js in Production

Post Syndicated from Rafi Ton original https://aws.amazon.com/blogs/big-data/using-amazon-redshift-spectrum-amazon-athena-and-aws-glue-with-node-js-in-production/

This is a guest post by Rafi Ton, founder and CEO of NUVIAD. NUVIAD is, in their own words, “a mobile marketing platform providing professional marketers, agencies and local businesses state of the art tools to promote their products and services through hyper targeting, big data analytics and advanced machine learning tools.”

At NUVIAD, we’ve been using Amazon Redshift as our main data warehouse solution for more than 3 years.

We store massive amounts of ad transaction data that our users and partners analyze to determine ad campaign strategies. When running real-time bidding (RTB) campaigns in large scale, data freshness is critical so that our users can respond rapidly to changes in campaign performance. We chose Amazon Redshift because of its simplicity, scalability, performance, and ability to load new data in near real time.

Over the past three years, our customer base grew significantly and so did our data. We saw our Amazon Redshift cluster grow from three nodes to 65 nodes. To balance cost and analytics performance, we looked for a way to store large amounts of less-frequently analyzed data at a lower cost. Yet, we still wanted to have the data immediately available for user queries and to meet their expectations for fast performance. We turned to Amazon Redshift Spectrum.

In this post, I explain the reasons why we extended Amazon Redshift with Redshift Spectrum as our modern data warehouse. I cover how our data growth and the need to balance cost and performance led us to adopt Redshift Spectrum. I also share key performance metrics in our environment, and discuss the additional AWS services that provide a scalable and fast environment, with data available for immediate querying by our growing user base.

Amazon Redshift as our foundation

The ability to provide fresh, up-to-the-minute data to our customers and partners was always a main goal with our platform. We saw other solutions provide data that was a few hours old, but this was not good enough for us. We insisted on providing the freshest data possible. For us, that meant loading Amazon Redshift in frequent micro batches and allowing our customers to query Amazon Redshift directly to get results in near real time.

The benefits were immediately evident. Our customers could see how their campaigns performed faster than with other solutions, and react sooner to the ever-changing media supply pricing and availability. They were very happy.

However, this approach required Amazon Redshift to store a lot of data for long periods, and our data grew substantially. In our peak, we maintained a cluster running 65 DC1.large nodes. The impact on our Amazon Redshift cluster was evident, and we saw our CPU utilization grow to 90%.

Why we extended Amazon Redshift to Redshift Spectrum

Redshift Spectrum gives us the ability to run SQL queries using the powerful Amazon Redshift query engine against data stored in Amazon S3, without needing to load the data. With Redshift Spectrum, we store data where we want, at the cost that we want. We have the data available for analytics when our users need it with the performance they expect.

Seamless scalability, high performance, and unlimited concurrency

Scaling Redshift Spectrum is a simple process. First, it allows us to leverage Amazon S3 as the storage engine and get practically unlimited data capacity.

Second, if we need more compute power, we can leverage Redshift Spectrum’s distributed compute engine over thousands of nodes to provide superior performance – perfect for complex queries running against massive amounts of data.

Third, all Redshift Spectrum clusters access the same data catalog so that we don’t have to worry about data migration at all, making scaling effortless and seamless.

Lastly, since Redshift Spectrum distributes queries across potentially thousands of nodes, they are not affected by other queries, providing much more stable performance and unlimited concurrency.

Keeping it SQL

Redshift Spectrum uses the same query engine as Amazon Redshift. This means that we did not need to change our BI tools or query syntax, whether we used complex queries across a single table or joins across multiple tables.

An interesting capability introduced recently is the ability to create a view that spans both Amazon Redshift and Redshift Spectrum external tables. With this feature, you can query frequently accessed data in your Amazon Redshift cluster and less-frequently accessed data in Amazon S3, using a single view.

Leveraging Parquet for higher performance

Parquet is a columnar data format that provides superior performance and allows Redshift Spectrum (or Amazon Athena) to scan significantly less data. With less I/O, queries run faster and we pay less per query. You can read all about Parquet at https://parquet.apache.org/ or https://en.wikipedia.org/wiki/Apache_Parquet.

Lower cost

From a cost perspective, we pay standard rates for our data in Amazon S3, and only small amounts per query to analyze data with Redshift Spectrum. Using the Parquet format, we can significantly reduce the amount of data scanned. Our costs are now lower, and our users get fast results even for large complex queries.

What we learned about Amazon Redshift vs. Redshift Spectrum performance

When we first started looking at Redshift Spectrum, we wanted to put it to the test. We wanted to know how it would compare to Amazon Redshift, so we looked at two key questions:

  1. What is the performance difference between Amazon Redshift and Redshift Spectrum on simple and complex queries?
  2. Does the data format impact performance?

During the migration phase, we had our dataset stored in Amazon Redshift and S3 as CSV/GZIP and as Parquet file formats. We tested three configurations:

  • Amazon Redshift cluster with 28 DC1.large nodes
  • Redshift Spectrum using CSV/GZIP
  • Redshift Spectrum using Parquet

We performed benchmarks for simple and complex queries on one month’s worth of data. We tested how much time it took to perform the query, and how consistent the results were when running the same query multiple times. The data we used for the tests was already partitioned by date and hour. Properly partitioning the data improves performance significantly and reduces query times.

Simple query

First, we tested a simple query aggregating billing data across a month:

SELECT 
  user_id, 
  count(*) AS impressions, 
  SUM(billing)::decimal /1000000 AS billing 
FROM <table_name> 
WHERE 
  date >= '2017-08-01' AND 
  date <= '2017-08-31'  
GROUP BY 
  user_id;

We ran the same query seven times and measured the response times (red marking the longest time and green the shortest time):

Execution Time (seconds)
  Amazon Redshift Redshift Spectrum
CSV
Redshift Spectrum Parquet
Run #1 39.65 45.11 11.92
Run #2 15.26 43.13 12.05
Run #3 15.27 46.47 13.38
Run #4 21.22 51.02 12.74
Run #5 17.27 43.35 11.76
Run #6 16.67 44.23 13.67
Run #7 25.37 40.39 12.75
Average 21.53  44.82 12.61

For simple queries, Amazon Redshift performed better than Redshift Spectrum, as we thought, because the data is local to Amazon Redshift.

What was surprising was that using Parquet data format in Redshift Spectrum significantly beat ‘traditional’ Amazon Redshift performance. For our queries, using Parquet data format with Redshift Spectrum delivered an average 40% performance gain over traditional Amazon Redshift. Furthermore, Redshift Spectrum showed high consistency in execution time with a smaller difference between the slowest run and the fastest run.

Comparing the amount of data scanned when using CSV/GZIP and Parquet, the difference was also significant:

Data Scanned (GB)
CSV (Gzip) 135.49
Parquet 2.83

Because we pay only for the data scanned by Redshift Spectrum, the cost saving of using Parquet is evident and substantial.

Complex query

Next, we compared the same three configurations with a complex query.

Execution Time (seconds)
  Amazon Redshift Redshift Spectrum CSV Redshift Spectrum Parquet
Run #1 329.80 84.20 42.40
Run #2 167.60 65.30 35.10
Run #3 165.20 62.20 23.90
Run #4 273.90 74.90 55.90
Run #5 167.70 69.00 58.40
Average 220.84 71.12 43.14

This time, Redshift Spectrum using Parquet cut the average query time by 80% compared to traditional Amazon Redshift!

Bottom line: For complex queries, Redshift Spectrum provided a 67% performance gain over Amazon Redshift. Using the Parquet data format, Redshift Spectrum delivered an 80% performance improvement over Amazon Redshift. For us, this was substantial.

Optimizing the data structure for different workloads

Because the cost of S3 is relatively inexpensive and we pay only for the data scanned by each query, we believe that it makes sense to keep our data in different formats for different workloads and different analytics engines. It is important to note that we can have any number of tables pointing to the same data on S3. It all depends on how we partition the data and update the table partitions.

Data permutations

For example, we have a process that runs every minute and generates statistics for the last minute of data collected. With Amazon Redshift, this would be done by running the query on the table with something as follows:

SELECT 
  user, 
  COUNT(*) 
FROM 
  events_table 
WHERE 
  ts BETWEEN ‘2017-08-01 14:00:00’ AND ‘2017-08-01 14:00:59’ 
GROUP BY 
  user;

(Assuming ‘ts’ is your column storing the time stamp for each event.)

With Redshift Spectrum, we pay for the data scanned in each query. If the data is partitioned by the minute instead of the hour, a query looking at one minute would be 1/60th the cost. If we use a temporary table that points only to the data of the last minute, we save that unnecessary cost.

Creating Parquet data efficiently

On the average, we have 800 instances that process our traffic. Each instance sends events that are eventually loaded into Amazon Redshift. When we started three years ago, we would offload data from each server to S3 and then perform a periodic copy command from S3 to Amazon Redshift.

Recently, Amazon Kinesis Firehose added the capability to offload data directly to Amazon Redshift. While this is now a viable option, we kept the same collection process that worked flawlessly and efficiently for three years.

This changed, however, when we incorporated Redshift Spectrum. With Redshift Spectrum, we needed to find a way to:

  • Collect the event data from the instances.
  • Save the data in Parquet format.
  • Partition the data effectively.

To accomplish this, we save the data as CSV and then transform it to Parquet. The most effective method to generate the Parquet files is to:

  1. Send the data in one-minute intervals from the instances to Kinesis Firehose with an S3 temporary bucket as the destination.
  2. Aggregate hourly data and convert it to Parquet using AWS Lambda and AWS Glue.
  3. Add the Parquet data to S3 by updating the table partitions.

With this new process, we had to give more attention to validating the data before we sent it to Kinesis Firehose, because a single corrupted record in a partition fails queries on that partition.

Data validation

To store our click data in a table, we considered the following SQL create table command:

create external TABLE spectrum.blog_clicks (
    user_id varchar(50),
    campaign_id varchar(50),
    os varchar(50),
    ua varchar(255),
    ts bigint,
    billing float
)
partitioned by (date date, hour smallint)  
stored as parquet
location 's3://nuviad-temp/blog/clicks/';

The above statement defines a new external table (all Redshift Spectrum tables are external tables) with a few attributes. We stored ‘ts’ as a Unix time stamp and not as Timestamp, and billing data is stored as float and not decimal (more on that later). We also said that the data is partitioned by date and hour, and then stored as Parquet on S3.

First, we need to get the table definitions. This can be achieved by running the following query:

SELECT 
  * 
FROM 
  svv_external_columns 
WHERE 
  tablename = 'blog_clicks';

This query lists all the columns in the table with their respective definitions:

schemaname tablename columnname external_type columnnum part_key
spectrum blog_clicks user_id varchar(50) 1 0
spectrum blog_clicks campaign_id varchar(50) 2 0
spectrum blog_clicks os varchar(50) 3 0
spectrum blog_clicks ua varchar(255) 4 0
spectrum blog_clicks ts bigint 5 0
spectrum blog_clicks billing double 6 0
spectrum blog_clicks date date 7 1
spectrum blog_clicks hour smallint 8 2

Now we can use this data to create a validation schema for our data:

const rtb_request_schema = {
    "name": "clicks",
    "items": {
        "user_id": {
            "type": "string",
            "max_length": 100
        },
        "campaign_id": {
            "type": "string",
            "max_length": 50
        },
        "os": {
            "type": "string",
            "max_length": 50            
        },
        "ua": {
            "type": "string",
            "max_length": 255            
        },
        "ts": {
            "type": "integer",
            "min_value": 0,
            "max_value": 9999999999999
        },
        "billing": {
            "type": "float",
            "min_value": 0,
            "max_value": 9999999999999
        }
    }
};

Next, we create a function that uses this schema to validate data:

function valueIsValid(value, item_schema) {
    if (schema.type == 'string') {
        return (typeof value == 'string' && value.length <= schema.max_length);
    }
    else if (schema.type == 'integer') {
        return (typeof value == 'number' && value >= schema.min_value && value <= schema.max_value);
    }
    else if (schema.type == 'float' || schema.type == 'double') {
        return (typeof value == 'number' && value >= schema.min_value && value <= schema.max_value);
    }
    else if (schema.type == 'boolean') {
        return typeof value == 'boolean';
    }
    else if (schema.type == 'timestamp') {
        return (new Date(value)).getTime() > 0;
    }
    else {
        return true;
    }
}

Near real-time data loading with Kinesis Firehose

On Kinesis Firehose, we created a new delivery stream to handle the events as follows:

Delivery stream name: events
Source: Direct PUT
S3 bucket: nuviad-events
S3 prefix: rtb/
IAM role: firehose_delivery_role_1
Data transformation: Disabled
Source record backup: Disabled
S3 buffer size (MB): 100
S3 buffer interval (sec): 60
S3 Compression: GZIP
S3 Encryption: No Encryption
Status: ACTIVE
Error logging: Enabled

This delivery stream aggregates event data every minute, or up to 100 MB, and writes the data to an S3 bucket as a CSV/GZIP compressed file. Next, after we have the data validated, we can safely send it to our Kinesis Firehose API:

if (validated) {
    let itemString = item.join('|')+'\n'; //Sending csv delimited by pipe and adding new line

    let params = {
        DeliveryStreamName: 'events',
        Record: {
            Data: itemString
        }
    };

    firehose.putRecord(params, function(err, data) {
        if (err) {
            console.error(err, err.stack);        
        }
        else {
            // Continue to your next step 
        }
    });
}

Now, we have a single CSV file representing one minute of event data stored in S3. The files are named automatically by Kinesis Firehose by adding a UTC time prefix in the format YYYY/MM/DD/HH before writing objects to S3. Because we use the date and hour as partitions, we need to change the file naming and location to fit our Redshift Spectrum schema.

Automating data distribution using AWS Lambda

We created a simple Lambda function triggered by an S3 put event that copies the file to a different location (or locations), while renaming it to fit our data structure and processing flow. As mentioned before, the files generated by Kinesis Firehose are structured in a pre-defined hierarchy, such as:

S3://your-bucket/your-prefix/2017/08/01/20/events-4-2017-08-01-20-06-06-536f5c40-6893-4ee4-907d-81e4d3b09455.gz

All we need to do is parse the object name and restructure it as we see fit. In our case, we did the following (the event is an object received in the Lambda function with all the data about the object written to S3):

/*
	object key structure in the event object:
your-prefix/2017/08/01/20/event-4-2017-08-01-20-06-06-536f5c40-6893-4ee4-907d-81e4d3b09455.gz
	*/

let key_parts = event.Records[0].s3.object.key.split('/'); 

let event_type = key_parts[0];
let date = key_parts[1] + '-' + key_parts[2] + '-' + key_parts[3];
let hour = key_parts[4];
if (hour.indexOf('0') == 0) {
 		hour = parseInt(hour, 10) + '';
}
    
let parts1 = key_parts[5].split('-');
let minute = parts1[7];
if (minute.indexOf('0') == 0) {
        minute = parseInt(minute, 10) + '';
}

Now, we can redistribute the file to the two destinations we need—one for the minute processing task and the other for hourly aggregation:

    copyObjectToHourlyFolder(event, date, hour, minute)
        .then(copyObjectToMinuteFolder.bind(null, event, date, hour, minute))
        .then(addPartitionToSpectrum.bind(null, event, date, hour, minute))
        .then(deleteOldMinuteObjects.bind(null, event))
        .then(deleteStreamObject.bind(null, event))        
        .then(result => {
            callback(null, { message: 'done' });            
        })
        .catch(err => {
            console.error(err);
            callback(null, { message: err });            
        }); 

Kinesis Firehose stores the data in a temporary folder. We copy the object to another folder that holds the data for the last processed minute. This folder is connected to a small Redshift Spectrum table where the data is being processed without needing to scan a much larger dataset. We also copy the data to a folder that holds the data for the entire hour, to be later aggregated and converted to Parquet.

Because we partition the data by date and hour, we created a new partition on the Redshift Spectrum table if the processed minute is the first minute in the hour (that is, minute 0). We ran the following:

ALTER TABLE 
  spectrum.events 
ADD partition
  (date='2017-08-01', hour=0) 
  LOCATION 's3://nuviad-temp/events/2017-08-01/0/';

After the data is processed and added to the table, we delete the processed data from the temporary Kinesis Firehose storage and from the minute storage folder.

Migrating CSV to Parquet using AWS Glue and Amazon EMR

The simplest way we found to run an hourly job converting our CSV data to Parquet is using Lambda and AWS Glue (and thanks to the awesome AWS Big Data team for their help with this).

Creating AWS Glue jobs

What this simple AWS Glue script does:

  • Gets parameters for the job, date, and hour to be processed
  • Creates a Spark EMR context allowing us to run Spark code
  • Reads CSV data into a DataFrame
  • Writes the data as Parquet to the destination S3 bucket
  • Adds or modifies the Redshift Spectrum / Amazon Athena table partition for the table
import sys
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
import boto3

## @params: [JOB_NAME]
args = getResolvedOptions(sys.argv, ['JOB_NAME','day_partition_key', 'hour_partition_key', 'day_partition_value', 'hour_partition_value' ])

#day_partition_key = "partition_0"
#hour_partition_key = "partition_1"
#day_partition_value = "2017-08-01"
#hour_partition_value = "0"

day_partition_key = args['day_partition_key']
hour_partition_key = args['hour_partition_key']
day_partition_value = args['day_partition_value']
hour_partition_value = args['hour_partition_value']

print("Running for " + day_partition_value + "/" + hour_partition_value)

sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)

df = spark.read.option("delimiter","|").csv("s3://nuviad-temp/events/"+day_partition_value+"/"+hour_partition_value)
df.registerTempTable("data")

df1 = spark.sql("select _c0 as user_id, _c1 as campaign_id, _c2 as os, _c3 as ua, cast(_c4 as bigint) as ts, cast(_c5 as double) as billing from data")

df1.repartition(1).write.mode("overwrite").parquet("s3://nuviad-temp/parquet/"+day_partition_value+"/hour="+hour_partition_value)

client = boto3.client('athena', region_name='us-east-1')

response = client.start_query_execution(
    QueryString='alter table parquet_events add if not exists partition(' + day_partition_key + '=\'' + day_partition_value + '\',' + hour_partition_key + '=' + hour_partition_value + ')  location \'s3://nuviad-temp/parquet/' + day_partition_value + '/hour=' + hour_partition_value + '\'' ,
    QueryExecutionContext={
        'Database': 'spectrumdb'
    },
    ResultConfiguration={
        'OutputLocation': 's3://nuviad-temp/convertresults'
    }
)

response = client.start_query_execution(
    QueryString='alter table parquet_events partition(' + day_partition_key + '=\'' + day_partition_value + '\',' + hour_partition_key + '=' + hour_partition_value + ') set location \'s3://nuviad-temp/parquet/' + day_partition_value + '/hour=' + hour_partition_value + '\'' ,
    QueryExecutionContext={
        'Database': 'spectrumdb'
    },
    ResultConfiguration={
        'OutputLocation': 's3://nuviad-temp/convertresults'
    }
)

job.commit()

Note: Because Redshift Spectrum and Athena both use the AWS Glue Data Catalog, we could use the Athena client to add the partition to the table.

Here are a few words about float, decimal, and double. Using decimal proved to be more challenging than we expected, as it seems that Redshift Spectrum and Spark use them differently. Whenever we used decimal in Redshift Spectrum and in Spark, we kept getting errors, such as:

S3 Query Exception (Fetch). Task failed due to an internal error. File 'https://s3-external-1.amazonaws.com/nuviad-temp/events/2017-08-01/hour=2/part-00017-48ae5b6b-906e-4875-8cde-bc36c0c6d0ca.c000.snappy.parquet has an incompatible Parquet schema for column 's3://nuviad-events/events.lat'. Column type: DECIMAL(18, 8), Parquet schema:\noptional float lat [i:4 d:1 r:0]\n (https://s3-external-1.amazonaws.com/nuviad-temp/events/2017-08-01/hour=2/part-00017-48ae5b6b-906e-4875-8cde-bc36c0c6d0ca.c000.snappy.parq

We had to experiment with a few floating-point formats until we found that the only combination that worked was to define the column as double in the Spark code and float in Spectrum. This is the reason you see billing defined as float in Spectrum and double in the Spark code.

Creating a Lambda function to trigger conversion

Next, we created a simple Lambda function to trigger the AWS Glue script hourly using a simple Python code:

import boto3
import json
from datetime import datetime, timedelta
 
client = boto3.client('glue')
 
def lambda_handler(event, context):
    last_hour_date_time = datetime.now() - timedelta(hours = 1)
    day_partition_value = last_hour_date_time.strftime("%Y-%m-%d") 
    hour_partition_value = last_hour_date_time.strftime("%-H") 
    response = client.start_job_run(
    JobName='convertEventsParquetHourly',
    Arguments={
         '--day_partition_key': 'date',
         '--hour_partition_key': 'hour',
         '--day_partition_value': day_partition_value,
         '--hour_partition_value': hour_partition_value
         }
    )

Using Amazon CloudWatch Events, we trigger this function hourly. This function triggers an AWS Glue job named ‘convertEventsParquetHourly’ and runs it for the previous hour, passing job names and values of the partitions to process to AWS Glue.

Redshift Spectrum and Node.js

Our development stack is based on Node.js, which is well-suited for high-speed, light servers that need to process a huge number of transactions. However, a few limitations of the Node.js environment required us to create workarounds and use other tools to complete the process.

Node.js and Parquet

The lack of Parquet modules for Node.js required us to implement an AWS Glue/Amazon EMR process to effectively migrate data from CSV to Parquet. We would rather save directly to Parquet, but we couldn’t find an effective way to do it.

One interesting project in the works is the development of a Parquet NPM by Marc Vertes called node-parquet (https://www.npmjs.com/package/node-parquet). It is not in a production state yet, but we think it would be well worth following the progress of this package.

Timestamp data type

According to the Parquet documentation, Timestamp data are stored in Parquet as 64-bit integers. However, JavaScript does not support 64-bit integers, because the native number type is a 64-bit double, giving only 53 bits of integer range.

The result is that you cannot store Timestamp correctly in Parquet using Node.js. The solution is to store Timestamp as string and cast the type to Timestamp in the query. Using this method, we did not witness any performance degradation whatsoever.

Lessons learned

You can benefit from our trial-and-error experience.

Lesson #1: Data validation is critical

As mentioned earlier, a single corrupt entry in a partition can fail queries running against this partition, especially when using Parquet, which is harder to edit than a simple CSV file. Make sure that you validate your data before scanning it with Redshift Spectrum.

Lesson #2: Structure and partition data effectively

One of the biggest benefits of using Redshift Spectrum (or Athena for that matter) is that you don’t need to keep nodes up and running all the time. You pay only for the queries you perform and only for the data scanned per query.

Keeping different permutations of your data for different queries makes a lot of sense in this case. For example, you can partition your data by date and hour to run time-based queries, and also have another set partitioned by user_id and date to run user-based queries. This results in faster and more efficient performance of your data warehouse.

Storing data in the right format

Use Parquet whenever you can. The benefits of Parquet are substantial. Faster performance, less data to scan, and much more efficient columnar format. However, it is not supported out-of-the-box by Kinesis Firehose, so you need to implement your own ETL. AWS Glue is a great option.

Creating small tables for frequent tasks

When we started using Redshift Spectrum, we saw our Amazon Redshift costs jump by hundreds of dollars per day. Then we realized that we were unnecessarily scanning a full day’s worth of data every minute. Take advantage of the ability to define multiple tables on the same S3 bucket or folder, and create temporary and small tables for frequent queries.

Lesson #3: Combine Athena and Redshift Spectrum for optimal performance

Moving to Redshift Spectrum also allowed us to take advantage of Athena as both use the AWS Glue Data Catalog. Run fast and simple queries using Athena while taking advantage of the advanced Amazon Redshift query engine for complex queries using Redshift Spectrum.

Redshift Spectrum excels when running complex queries. It can push many compute-intensive tasks, such as predicate filtering and aggregation, down to the Redshift Spectrum layer, so that queries use much less of your cluster’s processing capacity.

Lesson #4: Sort your Parquet data within the partition

We achieved another performance improvement by sorting data within the partition using sortWithinPartitions(sort_field). For example:

df.repartition(1).sortWithinPartitions("campaign_id")…

Conclusion

We were extremely pleased with using Amazon Redshift as our core data warehouse for over three years. But as our client base and volume of data grew substantially, we extended Amazon Redshift to take advantage of scalability, performance, and cost with Redshift Spectrum.

Redshift Spectrum lets us scale to virtually unlimited storage, scale compute transparently, and deliver super-fast results for our users. With Redshift Spectrum, we store data where we want at the cost we want, and have the data available for analytics when our users need it with the performance they expect.


About the Author

With 7 years of experience in the AdTech industry and 15 years in leading technology companies, Rafi Ton is the founder and CEO of NUVIAD. He enjoys exploring new technologies and putting them to use in cutting edge products and services, in the real world generating real money. Being an experienced entrepreneur, Rafi believes in practical-programming and fast adaptation of new technologies to achieve a significant market advantage.

 

 

How to Recover From Ransomware

Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/complete-guide-ransomware/

Here’s the scenario. You’re working on your computer and you notice that it seems slower. Or perhaps you can’t access document or media files that were previously available.

You might be getting error messages from Windows telling you that a file is of an “Unknown file type” or “Windows can’t open this file.”

Windows error message

If you’re on a Mac, you might see the message “No associated application,” or “There is no application set to open the document.”

MacOS error message

Another possibility is that you’re completely locked out of your system. If you’re in an office, you might be looking around and seeing that other people are experiencing the same problem. Some are already locked out, and others are just now wondering what’s going on, just as you are.

Then you see a message confirming your fears.

wana decrypt0r ransomware message

You’ve been infected with ransomware.

You’ll have lots of company this year. The number of ransomware attacks on businesses tripled in the past year, jumping from one attack every two minutes in Q1 to one every 40 seconds by Q3.There were over four times more new ransomware variants in the first quarter of 2017 than in the first quarter of 2016, and damages from ransomware are expected to exceed $5 billion this year.

Growth in Ransomware Variants Since December 2015

Source: Proofpoint Q1 2017 Quarterly Threat Report

This past summer, our local PBS and NPR station in San Francisco, KQED, was debilitated for weeks by a ransomware attack that forced them to go back to working the way they used to prior to computers. Five months have passed since the attack and they’re still recovering and trying to figure out how to prevent it from happening again.

How Does Ransomware Work?

Ransomware typically spreads via spam or phishing emails, but also through websites or drive-by downloads, to infect an endpoint and penetrate the network. Once in place, the ransomware then locks all files it can access using strong encryption. Finally, the malware demands a ransom (typically payable in bitcoins) to decrypt the files and restore full operations to the affected IT systems.

Encrypting ransomware or “cryptoware” is by far the most common recent variety of ransomware. Other types that might be encountered are:

  • Non-encrypting ransomware or lock screens (restricts access to files and data, but does not encrypt them)
  • Ransomware that encrypts the Master Boot Record (MBR) of a drive or Microsoft’s NTFS, which prevents victims’ computers from being booted up in a live OS environment
  • Leakware or extortionware (exfiltrates data that the attackers threaten to release if ransom is not paid)
  • Mobile Device Ransomware (infects cell-phones through “drive-by downloads” or fake apps)

The typical steps in a ransomware attack are:

1
Infection
After it has been delivered to the system via email attachment, phishing email, infected application or other method, the ransomware installs itself on the endpoint and any network devices it can access.
2
Secure Key Exchange
The ransomware contacts the command and control server operated by the cybercriminals behind the attack to generate the cryptographic keys to be used on the local system.
3
Encryption
The ransomware starts encrypting any files it can find on local machines and the network.
4
Extortion
With the encryption work done, the ransomware displays instructions for extortion and ransom payment, threatening destruction of data if payment is not made.
5
Unlocking
Organizations can either pay the ransom and hope for the cybercriminals to actually decrypt the affected files (which in many cases does not happen), or they can attempt recovery by removing infected files and systems from the network and restoring data from clean backups.

Who Gets Attacked?

Ransomware attacks target firms of all sizes — 5% or more of businesses in the top 10 industry sectors have been attacked — and no no size business, from SMBs to enterprises, are immune. Attacks are on the rise in every sector and in every size of business.

Recent attacks, such as WannaCry earlier this year, mainly affected systems outside of the United States. Hundreds of thousands of computers were infected from Taiwan to the United Kingdom, where it crippled the National Health Service.

The US has not been so lucky in other attacks, though. The US ranks the highest in the number of ransomware attacks, followed by Germany and then France. Windows computers are the main targets, but ransomware strains exist for Macintosh and Linux, as well.

The unfortunate truth is that ransomware has become so wide-spread that for most companies it is a certainty that they will be exposed to some degree to a ransomware or malware attack. The best they can do is to be prepared and understand the best ways to minimize the impact of ransomware.

“Ransomware is more about manipulating vulnerabilities in human psychology than the adversary’s technological sophistication.” — James Scott, expert in Artificial Intelligence

Phishing emails, malicious email attachments, and visiting compromised websites have been common vehicles of infection (we wrote about protecting against phishing recently), but other methods have become more common in past months. Weaknesses in Microsoft’s Server Message Block (SMB) and Remote Desktop Protocol (RDP) have allowed cryptoworms to spread. Desktop applications — in one case an accounting package — and even Microsoft Office (Microsoft’s Dynamic Data Exchange — DDE) have been the agents of infection.

Recent ransomware strains such as Petya, CryptoLocker, and WannaCry have incorporated worms to spread themselves across networks, earning the nickname, “cryptoworms.”

How to Defeat Ransomware

1
Isolate the Infection
Prevent the infection from spreading by separating all infected computers from each other, shared storage, and the network.
2
Identify the Infection
From messages, evidence on the computer, and identification tools, determine which malware strain you are dealing with.
3
Report
Report to the authorities to support and coordinate measures to counter attacks.
4
Determine Your Options
You have a number of ways to deal with the infection. Determine which approach is best for you.
5
Restore and Refresh
Use safe backups and program and software sources to restore your computer or outfit a new platform.
6
Plan to Prevent Recurrence
Make an assessment of how the infection occurred and what you can do to put measures into place that will prevent it from happening again.

1 — Isolate the Infection

The rate and speed of ransomware detection is critical in combating fast moving attacks before they succeed in spreading across networks and encrypting vital data.

The first thing to do when a computer is suspected of being infected is to isolate it from other computers and storage devices. Disconnect it from the network (both wired and Wi-Fi) and from any external storage devices. Cryptoworms actively seek out connections and other computers, so you want to prevent that happening. You also don’t want the ransomware communicating across the network with its command and control center.

Be aware that there may be more than just one patient zero, meaning that the ransomware may have entered your organization or home through multiple computers, or may be dormant and not yet shown itself on some systems. Treat all connected and networked computers with suspicion and apply measures to ensure that all systems are not infected.

This Week in Tech (TWiT.tv) did a videocast showing what happens when WannaCry is released on an isolated system and encrypts files and trys to spread itself to other computers. It’s a great lesson on how these types of cryptoworms operate.

2 — Identify the Infection

Most often the ransomware will identify itself when it asks for ransom. There are numerous sites that help you identify the ransomware, including ID Ransomware. The No More Ransomware! Project provides the Crypto Sheriff to help identify ransomware.

Identifying the ransomware will help you understand what type of ransomware you have, how it propagates, what types of files it encrypts, and maybe what your options are for removal and disinfection. It also will enable you to report the attack to the authorities, which is recommended.

wanna decryptor 2.0 ransomware message

WannaCry Ransomware Extortion Dialog

3 — Report to the Authorities

You’ll be doing everyone a favor by reporting all ransomware attacks to the authorities. The FBI urges ransomware victims to report ransomware incidents regardless of the outcome. Victim reporting provides law enforcement with a greater understanding of the threat, provides justification for ransomware investigations, and contributes relevant information to ongoing ransomware cases. Knowing more about victims and their experiences with ransomware will help the FBI to determine who is behind the attacks and how they are identifying or targeting victims.

You can file a report with the FBI at the Internet Crime Complaint Center.

There are other ways to report ransomware, as well.

4 — Determine Your Options

Your options when infected with ransomware are:

  1. Pay the ransom
  2. Try to remove the malware
  3. Wipe the system(s) and reinstall from scratch

It’s generally considered a bad idea to pay the ransom. Paying the ransom encourages more ransomware, and in most cases the unlocking of the encrypted files is not successful.

In a recent survey, more than three-quarters of respondents said their organization is not at all likely to pay the ransom in order to recover their data (77%). Only a small minority said they were willing to pay some ransom (3% of companies have already set up a Bitcoin account in preparation).

Even if you decide to pay, it’s very possible you won’t get back your data.

5 — Restore or Start Fresh

You have the choice of trying to remove the malware from your systems or wiping your systems and reinstalling from safe backups and clean OS and application sources.

Get Rid of the Infection

There are internet sites and software packages that claim to be able to remove ransomware from systems. The No More Ransom! Project is one. Other options can be found, as well.

Whether you can successfully and completely remove an infection is up for debate. A working decryptor doesn’t exist for every known ransomware, and unfortunately it’s true that the newer the ransomware, the more sophisticated it’s likely to be and a perhaps a decryptor has not yet been created.

It’s Best to Wipe All Systems Completely

The surest way of being certain that malware or ransomware has been removed from a system is to do a complete wipe of all storage devices and reinstall everything from scratch. If you’ve been following a sound backup strategy, you should have copies of all your documents, media, and important files right up to the time of the infection.

Be sure to determine as well as you can from file dates and other information what was the date of infection. Consider that an infection might have been dormant in your system for a while before it activated and made significant changes to your system. Identifying and learning about the particular malware that attacked your systems will enable you to understand how that malware operates and what your best strategy should be for restoring your systems.

Backblaze Backup enables you to go back in time and specify the date prior to which you wish to restore files. That date should precede the date your system was infected.

Choose files to restore from earlier date in Backblaze Backup

If you’ve been following a good backup policy with both local and off-site backups, you should be able to use backup copies that you are sure were not connected to your network after the time of attack and hence protected from infection. Backup drives that were completely disconnected should be safe, as are files stored in the cloud, as with Backblaze Backup.

System Restores Are not the Best Strategy for Dealing with Ransomware and Malware

You might be tempted to use a System Restore point to get your system back up and running. System Restore is not a good solution for removing viruses or other malware. Since malicious software is typically buried within all kinds of places on a system, you can’t rely on System Restore being able to root out all parts of the malware. Instead, you should rely on a quality virus scanner that you keep up to date. Also, System Restore does not save old copies of your personal files as part of its snapshot. It also will not delete or replace any of your personal files when you perform a restoration, so don’t count on System Restore as working like a backup. You should always have a good backup procedure in place for all your personal files.

Local backups can be encrypted by ransomware. If your backup solution is local and connected to a computer that gets hit with ransomware, the chances are good your backups will be encrypted along with the rest of your data.

With a good backup solution that is isolated from your local computers, such as Backblaze Backup, you can easily obtain the files you need to get your system working again. You have the flexility to determine which files to restore, from which date you want to restore, and how to obtain the files you need to restore your system.

Choose how to obtain your backup files

You’ll need to reinstall your OS and software applications from the source media or the internet. If you’ve been managing your account and software credentials in a sound manner, you should be able to reactivate accounts for applications that require it.

If you use a password manager, such as 1Password or LastPass, to store your account numbers, usernames, passwords, and other essential information, you can access that information through their web interface or mobile applications. You just need to be sure that you still know your master username and password to obtain access to these programs.

6 — How to Prevent a Ransomware Attack

“Ransomware is at an unprecedented level and requires international investigation.” — European police agency EuroPol

A ransomware attack can be devastating for a home or a business. Valuable and irreplaceable files can be lost and tens or even hundreds of hours of effort can be required to get rid of the infection and get systems working again.

Security experts suggest several precautionary measures for preventing a ransomware attack.

  1. Use anti-virus and anti-malware software or other security policies to block known payloads from launching.
  2. Make frequent, comprehensive backups of all important files and isolate them from local and open networks. Cybersecurity professionals view data backup and recovery (74% in a recent survey) by far as the most effective solution to respond to a successful ransomware attack.
  3. Keep offline backups of data stored in locations inaccessible from any potentially infected computer, such as external storage drives or the cloud, which prevents them from being accessed by the ransomware.
  4. Install the latest security updates issued by software vendors of your OS and applications. Remember to Patch Early and Patch Often to close known vulnerabilities in operating systems, browsers, and web plugins.
  5. Consider deploying security software to protect endpoints, email servers, and network systems from infection.
  6. Exercise cyber hygiene, such as using caution when opening email attachments and links.
  7. Segment your networks to keep critical computers isolated and to prevent the spread of malware in case of attack. Turn off unneeded network shares.
  8. Turn off admin rights for users who don’t require them. Give users the lowest system permissions they need to do their work.
  9. Restrict write permissions on file servers as much as possible.
  10. Educate yourself, your employees, and your family in best practices to keep malware out of your systems. Update everyone on the latest email phishing scams and human engineering aimed at turning victims into abettors.

It’s clear that the best way to respond to a ransomware attack is to avoid having one in the first place. Other than that, making sure your valuable data is backed up and unreachable by ransomware infection will ensure that your downtime and data loss will be minimal or avoided completely.

Have you endured a ransomware attack or have a strategy to avoid becoming a victim? Please let us know in the comments.

The post How to Recover From Ransomware appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Build a Flick-controlled marble maze

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/flick-marble-maze/

Wiggle your fingers to guide a ball through a 3D-printed marble maze using the Pi Supply Flick board for Raspberry Pi!

Wiggle, wiggle, wiggle, wiggle, yeah

Using the Flick, previously seen in last week’s Hacker House’s gesture-controlled holographic visualiser, South Africa–based Tom Van den Bon has created a touch-free marble maze. He was motivated by, if his Twitter is any indication, his love for game-making and 3D printing.

Tom Van den Bon on Twitter

Day 172 of #3dprint365. #3dprinted Raspberry PI Controlled Maze Thingie Part 3 #3dprint #3dprinter #thingiverse #raspberrypi #pisupply

All non-electronic parts of this build are 3D printed. The marble maze sits atop a motorised structure which moves along two axes thanks to servo motors. Tom controls the movement using gestures which are picked up by the Flick Zero, a Pi Zero–sized 3D-tracking board that can detect movement up to 15cm away.

Find the code for the maze, which takes advantage of the Flick library, on Tom’s GitHub account.

Make your own games

Our free resources are a treasure trove of fun home-brew games that you can build with your friends and family.

If you like physical games such as Tom’s gesture-controlled maze, you should definitely check out our Python quick reaction game! In it, players are pitted against each other to react as quickly as possible to a randomly lighting up LED.

raspberry pi marble maze

You can also play solo with our Lights out game, where it’s you against four erratic lights eager to remain lit.

For games you can build on your computer with no need for any extra tech, Scratch games such as our button-smashing Olympic weightlifter and Hurdler projects are perfect — you can play them just using a keyboard and browser!

raspberry pi marble maze

And if you’d like to really get stuck into learning about game development, then you’re in luck! CoderDojo’s Make your own game book guides you through all the steps of building a game in JavaScript, from creating the world to designing characters.

Cover of CoderDojo Nano Make your own game

And because I just found this while searching for image content for today’s blog, here is a photo of Eben’s and Liz’s cat Mooncake with a Raspberry Pi on her head. Enjoy!

A cat with a Raspberry Pi pin on its head — raspberry pi marble maze

Ras-purry Pi?

The post Build a Flick-controlled marble maze appeared first on Raspberry Pi.

Using taxies to monitor air quality in Peru

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/air-quality-peru/

When James Puderer moved to Lima, Peru, his roadside runs left a rather nasty taste in his mouth. Hit by the pollution from old diesel cars in the area, he decided to monitor the air quality in his new city using Raspberry Pis and the abundant taxies as his tech carriers.

Taxi Datalogger – Assembly

How to assemble the enclosure for my Taxi Datalogger project: https://www.hackster.io/james-puderer/distributed-air-quality-monitoring-using-taxis-69647e

Sensing air quality in Lima

Luckily for James, almost all taxies in Lima are equipped with the standard hollow vinyl roof sign seen in the video above, which makes them ideal for hacking.

Using a Raspberry Pi alongside various Adafuit tech including the BME280 Temperature/Humidity/Pressure Sensor and GPS Antenna, James created a battery-powered retrofit setup that fits snugly into the vinyl sign.

The schematic of the air quality monitor tech inside the taxi sign

With the onboard tech, the device collects data on longitude, latitude, humidity, temperature, pressure, and airborne particle count, feeding it back to an Android Things datalogger. This data is then pushed to Google IoT Core, where it can be remotely accessed.

Next, the data is processed by Google Dataflow and turned into a BigQuery table. Users can then visualize the collected measurements. And while James uses Google Maps to analyse his data, there are many tools online that will allow you to organise and study your figures depending on what final result you’re hoping to achieve.

A heat map of James' local area showing air quality

James hopped in a taxi and took his monitor on the road, collecting results throughout the journey

James has provided the complete build process, including all tech ingredients and code, on his Hackster.io project page, and urges makers to create their own air quality monitor for their local area. He also plans on building upon the existing design by adding a 12V power hookup for connecting to the taxi, functioning lights within the sign, and companion apps for drivers.

Sensing the world around you

We’ve seen a wide variety of Raspberry Pi projects using sensors to track the world around us, such as Kasia Molga’s Human Sensor costume series, which reacts to air pollution by lighting up, and Clodagh O’Mahony’s Social Interaction Dress, which she created to judge how conversation and physical human interaction can be scored and studied.

Human Sensor

Kasia Molga’s Human Sensor — a collection of hi-tech costumes that react to air pollution within the wearer’s environment.

Many people also build their own Pi-powered weather stations, or use the Raspberry Pi Oracle Weather Station, to measure and record conditions in their towns and cities from the roofs of schools, offices, and homes.

Have you incorporated sensors into your Raspberry Pi projects? Share your builds in the comments below or via social media by tagging us.

The post Using taxies to monitor air quality in Peru appeared first on Raspberry Pi.

Spam from Flock

Post Syndicated from Григор original http://www.gatchev.info/blog/?p=2095

Couple of weeks ago I received a mail from a site called Flock. It said that some guy invited me to join their social network. I would expect whoever invites me somewhere to do it in personal mail, without giving my e-mail address around. However, some people don’t think before acting – one should expect such things.

I wasn’t interested in joining and left that mail unanswered. However, during the next few days I got an avalanche of mails from Flock. Apparently they subscribe every e-mail address they lay their hands on to their spam.

One of their e-mails contained an unsubscription link. I clicked on it, only to learn that I have been unsubscribed from this invitation, and will continue to receive other e-mails from Flock. (Probably these, or at least a part of them, can be unsubscribed too. After you make an account with Flock and fill in all your personal info they might like to have. Guess what for.)

Naturally, that was the “enough is enough” line. I blocked all mails from Flock for the entire mail hosting that holds my e-mail – happily, I am the one responsible for it. So, far, the only reaction have been one thank-you from another victim of Flock whose mail is hosted there.

I am not evil. If Flock sends me a notarized legally binding declaration that they stop all spamming activities, I will unblock them happily. Until then, they will stay on my hosting’s blacklist. Unsubscribes, even complete, for me or other specific people don’t count. Any attempts of theirs for communication other than sending such a declaration will be automatically deleted before reaching me.

My suggestion to all mail providers around is to do the same. Think on how much money you lose due to spam, and decide if you want these losses to increase, or to decrease.

(Update: Forgot to add that the “unsubscription” does not unsubscribe you. As expected – spammers are spammers. Strange, eh?)

Welcome Carlo!

Post Syndicated from Yev original https://www.backblaze.com/blog/welcome-carlo/

Welcome Carlo!
As Backblaze continues to grow, we need to keep our web experience on point, so we put out a call for creative folks that can help us make the Backblaze experience all that it can be. We found Carlo! He’s a frontend web developer who used to work at Sea World. Lets learn a bit more about Carlo, shall we?

What is your Backblaze Title?
Senior Frontend Developer

Where are you originally from? 
I grew up in San Diego, California.

What attracted you to Backblaze?
I am excited that frontend architecture is approaching parity with the rest of the web services software development ecosystem. Most of my experience has been full stack development, but I have recently started focusing on the front end. Backblaze shares my goal of having a first class user experience using frameworks like React.

What do you expect to learn while being at Backblaze?
I’m interested in building solutions that help customers visualize and work with their data intuitively and efficiently.

Where else have you worked?
GoPro, Sungevity, and Sea World.

What’s your dream job?
Hip Hop dressage choreographer.

Favorite place you’ve traveled? 
The Arctic in Northern Finland, in a train in a boat sailing the gap between Germany and Denmark, and Vieques PR.

Favorite hobby?
Sketching, writing, and dressing up my hairless dogs.

Of what achievement are you most proud?
It’s either helping release a large SOA site, or orchestrating a Morrissey cover band flash mob #squadgoals. OK, maybe one those things didn’t happen…

Star Trek or Star Wars?
Interstellar!

Favorite food?
Mexican food.

Coke or Pepsi?
Ginger beer.

Why do you like certain things? 
Things that I like bring me joy a la Marie Kondo.

Anything else you’d like you’d like to tell us?
¯\_(ツ)_/¯

Wow, hip hop dressage choreographer — that is amazing. Welcome aboard Carlo!

The post Welcome Carlo! appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Prank your friends with the WhooPi Cushion

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/whoopi-cushion/

Learn about using switches and programming GPIO pins while you prank your friends with the Raspberry Pi-powered whoopee WhooPi Cushion!

Whoopee cushion PRANK with a Raspberry Pi: HOW-TO

Explore the world of Raspberry Pi physical computing with our free FutureLearn courses: http://rpf.io/futurelearn Free make your own Whoopi Cushion resource: http://rpf.io/whoopi For more information on Raspberry Pi and the charitable work of the Raspberry Pi Foundation, including Code Club and CoderDojo, visit http://rpf.io Our resources are free to use in schools, clubs, at home and at events.

The WhooPi Cushion

You might remember Carrie Anne and me showing off the WhooPi Cushion live on Facebook last year. The project was created as a simple proof of concept during a Pi Towers maker day. However, our viewers responded so enthusastically that we set about putting together a how-to resource for it.

A cartoon of a man sitting on a whoopee cushion - Raspberry Pi WhooPi Cushion Resource

When we made the resource available, it turned out to be so popular that we decided to include the project in one of our first FutureLearn courses and produced a WhooPi Cushion video tutorial to go with it.

A screen shot from our Raspberry Pi WhooPi Cushion Resource video

Our FutureLearn course attendees love the video, so last week we uploaded it to YouTube! Now everyone can follow along with James Robinson to make their own WhooPi Cushion out of easy-to-gather household items such as tinfoil, paper plates, and spongy material.

Build upon the WhooPi Cushion

Once you’ve completed your prank cushion, you’ll have learnt new skills that you can incorporate into other projects.

For example, you’ll know how to program an action in response to a button press — so how about playing a sound when the button is released instead? Just like that, you’ll have created a simple pressure-based alarm system. Or you could upgrade the functionality of the cushion by including a camera that takes a photo of your unwitting victim’s reaction!

A cartoon showing the stages of the Raspberry Pi Digital Curriculum from Creator to Builder, Developer and Maker

Building upon your skills to increase your knowledge of programming constructs and manufacturing techniques is key to becoming a digital maker. When you use the free Raspberry Pi resources, you’re also working through our digital curriculum, which guides you on this learning journey.

FutureLearn courses for free

Our FutureLearn courses are completely free and cover a variety of topics and skills, including object-oriented programming and teaching physical computing.

A GIF of a man on an island learning with FutureLearn

Regardless of your location, you can learn with us online to improve your knowledge of teaching digital making as well as your own hands-on digital skill set.

The post Prank your friends with the WhooPi Cushion appeared first on Raspberry Pi.