Tag Archives: Sync

Amazon Relational Database Service – Looking Back at 2017

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/amazon-relational-database-service-looking-back-at-2017/

The Amazon RDS team launched nearly 80 features in 2017. Some of them were covered in this blog, others on the AWS Database Blog, and the rest in What’s New or Forum posts. To wrap up my week, I thought it would be worthwhile to give you an organized recap. So here we go!

Certification & Security

Features

Engine Versions & Features

Regional Support

Instance Support

Price Reductions

And That’s a Wrap
I’m pretty sure that’s everything. As you can see, 2017 was quite the year! I can’t wait to see what the team delivers in 2018.

Jeff;

 

Server vs Endpoint Backup — Which is Best?

Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/endpoint-backup-for-distributed-computing/

server and computer backup to the cloud

How common are these statements in your organization?

  • I know I saved that file. The application must have put it somewhere outside of my documents folder.” — Mike in Marketing
  • I was on the road and couldn’t get a reliable VPN connection. I guess that’s why my laptop wasn’t backed up.” — Sally in Sales
  • I try to follow file policies, but I had a deadline this week and didn’t have time to copy my files to the server.” — Felicia in Finance
  • I just did a commit of my code changes and that was when the coffee mug was knocked over onto the laptop.” — Erin in Engineering
  • If you need a file restored from backup, contact the help desk at [email protected] The IT department will get back to you.” — XYZ corporate intranet
  • Why don’t employees save files on the network drive like they’re supposed to?” — Isaac in IT

If these statements are familiar, most likely you rely on file server backups to safeguard your valuable endpoint data.

The problem is, the workplace has changed. Where server backups might have fit how offices worked at one time in the past, relying solely on server backups today means you could be missing valuable endpoint data from your backups. On top of that, you likely are unnecessarily expending valuable user and IT time in attempting to secure and restore endpoint data.

Times Have Changed, and so have Effective Enterprise Backup Strategies

The ways we use computers and handle files today are vastly different from just five or ten years ago. Employees are mobile, and we no longer are limited to monolithic PC and Mac-based office suites. Cloud applications are everywhere. Company-mandated network drive policies are difficult to enforce as office practices change, devices proliferate, and organizational culture evolves. Besides, your IT staff has other things to do than babysit your employees to make sure they follow your organization’s policies for managing files.

Server Backup has its Place, but Does it Support How People Work Today?

Many organizations still rely on server backup. If your organization works primarily in centralized offices with all endpoints — likely desktops — connected directly to your network, and you maintain tight control of how employees manage their files, it still might work for you.

Your IT department probably has set network drive policies that require employees to save files in standard places that are regularly backed up to your file server. Turns out, though, that even standard applications don’t always save files where IT would like them to be. They could be in a directory or folder that’s not regularly backed up.

As employees have become more mobile, they have adopted practices that enable them to access files from different places, but these practices might not fit in with your organization’s server policies. An employee saving a file to Dropbox might be planning to copy it to an “official” location later, but whether that ever happens could be doubtful. Often people don’t realize until it’s too late that accidentally deleting a file in one sync service directory means that all copies in all locations — even the cloud — are also deleted.

Employees are under increasing demands to produce, which means that network drive policies aren’t always followed; time constraints and deadlines can cause best practices to go out the window. Users will attempt to comply with policies as best they can — and you might get 70% or even 75% effective compliance — but getting even to that level requires training, monitoring, and repeatedly reminding employees of policies they need to follow — none of which leads to a good work environment.

Even if you get to 75% compliance with network file policies, what happens if the critical file needed to close out an end-of-year financial summary isn’t one of the files backed up? The effort required for IT to get from 70% to 80% or 90% of an endpoint’s files effectively backed up could require multiple hours from your IT department, and you still might not have backed up the one critical file you need later.

Your Organization Operates on its Data — And Today That Data Exists in Multiple Locations

Users are no longer tied to one endpoint, and may use different computers in the office, at home, or traveling. The greater the number of endpoints used, the greater the chance of an accidental or malicious device loss or data corruption. The loss of the Sales VP’s laptop at the airport on her way back from meeting with major customers can affect an entire organization and require weeks to resolve.

Even with the best intentions and efforts, following policies when out of the office can be difficult or impossible. Connecting to your private network when remote most likely requires a VPN, and VPN connectivity can be challenging from the lobby Wi-Fi at the Radisson. Server restores require time from the IT staff, which can mean taking resources away from other IT priorities and a growing backlog of requests from users to need their files as soon as possible. When users are dependent on IT to get back files critical to their work, employee productivity and often deadlines are affected.

Managing Finite Server Storage Is an Ongoing Challenge

Network drive backup usually requires on-premises data storage for endpoint backups. Since it is a finite resource, allocating that storage is another burden on your IT staff. To make sure that storage isn’t exceeded, IT departments often ration storage by department and/or user — another oversight duty for IT, and even more choices required by your IT department and department heads who have to decide which files to prioritize for backing up.

Adding Backblaze Endpoint Backup Improves Business Continuity and Productivity

Having an endpoint backup strategy in place can mitigate these problems and improve user productivity, as well. A good endpoint backup service, such as Backblaze Cloud Backup, will ensure that all devices are backed up securely, automatically, without requiring any action by the user or by your IT department.

For 99% of users, no configuration is required for Backblaze Backup. Everything on the endpoint is encrypted and securely backed up to the cloud, including program configuration files and files outside of standard document folders. Even temp files are backed up, which can prove invaluable when recovering a file after a crash or other program interruption. Cloud storage is unlimited with Backblaze Backup, so there are no worries about running out of storage or rationing file backups.

The Backblaze client can be silently and remotely installed to both Macintosh and Windows clients with no user interaction. And, with Backblaze Groups, your IT staff has complete visibility into when files were last backed up. IT staff can recover any backed up file, folder, or entire computer from the admin panel, and even give file restore capability to the user, if desired, which reduces dependency on IT and time spent waiting for restores.

With over 500 petabytes of customer data stored and one million files restored every hour of every day by Backblaze customers, you know that Backblaze Backup works for its users.

You Need Data Security That Matches the Way People Work Today

Both file server and endpoint backup have their places in an organization’s data security plan, but their use and value differ. If you already are using file server backup, adding endpoint backup will make a valuable contribution to your organization by reducing workload, improving productivity, and increasing confidence that all critical files are backed up.

By guaranteeing fast and automatic backup of all endpoint data, and matching the current way organizations and people work with data, Backblaze Backup will enable you to effectively and affordably meet the data security demands of your organization.

The post Server vs Endpoint Backup — Which is Best? appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Nextcloud 13 is out

Post Syndicated from ris original https://lwn.net/Articles/746710/rss

Nextcloud 13 has been released. “This release brings improvements to the core File Sync and Share like easier moving of files and a tech preview of our end-to-end encryption for the ultimate protection of your data. It also introduces collaboration and communication capabilities, like auto-complete of comments and integrated real-time chat and video communication. Last but not least, Nextcloud was optimized and tuned to deliver up to 80% faster LDAP, much faster object storage and Windows Network Drive performance and a smoother user interface.

Security updates for Tuesday

Post Syndicated from ris original https://lwn.net/Articles/746701/rss

Security updates have been issued by Debian (xen), Fedora (clamav, community-mysql, dnsmasq, flatpak, libtasn1, mupdf, p7zip, rsync, squid, thunderbird, tomcat, unbound, and zziplib), Mageia (clamav, curl, dovecot, ffmpeg, gcab, kernel, libtiff, libvpx, php-smarty, pure-ftpd, redis, and thunderbird), openSUSE (apache-commons-email), Red Hat (rh-mariadb100-mariadb), SUSE (firefox), and Ubuntu (clamav, squid3, and systemd).

Reactive Microservices Architecture on AWS

Post Syndicated from Sascha Moellering original https://aws.amazon.com/blogs/architecture/reactive-microservices-architecture-on-aws/

Microservice-application requirements have changed dramatically in recent years. These days, applications operate with petabytes of data, need almost 100% uptime, and end users expect sub-second response times. Typical N-tier applications can’t deliver on these requirements.

Reactive Manifesto, published in 2014, describes the essential characteristics of reactive systems including: responsiveness, resiliency, elasticity, and being message driven.

Being message driven is perhaps the most important characteristic of reactive systems. Asynchronous messaging helps in the design of loosely coupled systems, which is a key factor for scalability. In order to build a highly decoupled system, it is important to isolate services from each other. As already described, isolation is an important aspect of the microservices pattern. Indeed, reactive systems and microservices are a natural fit.

Implemented Use Case
This reference architecture illustrates a typical ad-tracking implementation.

Many ad-tracking companies collect massive amounts of data in near-real-time. In many cases, these workloads are very spiky and heavily depend on the success of the ad-tech companies’ customers. Typically, an ad-tracking-data use case can be separated into a real-time part and a non-real-time part. In the real-time part, it is important to collect data as fast as possible and ask several questions including:,  “Is this a valid combination of parameters?,””Does this program exist?,” “Is this program still valid?”

Because response time has a huge impact on conversion rate in advertising, it is important for advertisers to respond as fast as possible. This information should be kept in memory to reduce communication overhead with the caching infrastructure. The tracking application itself should be as lightweight and scalable as possible. For example, the application shouldn’t have any shared mutable state and it should use reactive paradigms. In our implementation, one main application is responsible for this real-time part. It collects and validates data, responds to the client as fast as possible, and asynchronously sends events to backend systems.

The non-real-time part of the application consumes the generated events and persists them in a NoSQL database. In a typical tracking implementation, clicks, cookie information, and transactions are matched asynchronously and persisted in a data store. The matching part is not implemented in this reference architecture. Many ad-tech architectures use frameworks like Hadoop for the matching implementation.

The system can be logically divided into the data collection partand the core data updatepart. The data collection part is responsible for collecting, validating, and persisting the data. In the core data update part, the data that is used for validation gets updated and all subscribers are notified of new data.

Components and Services

Main Application
The main application is implemented using Java 8 and uses Vert.x as the main framework. Vert.x is an event-driven, reactive, non-blocking, polyglot framework to implement microservices. It runs on the Java virtual machine (JVM) by using the low-level IO library Netty. You can write applications in Java, JavaScript, Groovy, Ruby, Kotlin, Scala, and Ceylon. The framework offers a simple and scalable actor-like concurrency model. Vert.x calls handlers by using a thread known as an event loop. To use this model, you have to write code known as “verticles.” Verticles share certain similarities with actors in the actor model. To use them, you have to implement the verticle interface. Verticles communicate with each other by generating messages in  a single event bus. Those messages are sent on the event bus to a specific address, and verticles can register to this address by using handlers.

With only a few exceptions, none of the APIs in Vert.x block the calling thread. Similar to Node.js, Vert.x uses the reactor pattern. However, in contrast to Node.js, Vert.x uses several event loops. Unfortunately, not all APIs in the Java ecosystem are written asynchronously, for example, the JDBC API. Vert.x offers a possibility to run this, blocking APIs without blocking the event loop. These special verticles are called worker verticles. You don’t execute worker verticles by using the standard Vert.x event loops, but by using a dedicated thread from a worker pool. This way, the worker verticles don’t block the event loop.

Our application consists of five different verticles covering different aspects of the business logic. The main entry point for our application is the HttpVerticle, which exposes an HTTP-endpoint to consume HTTP-requests and for proper health checking. Data from HTTP requests such as parameters and user-agent information are collected and transformed into a JSON message. In order to validate the input data (to ensure that the program exists and is still valid), the message is sent to the CacheVerticle.

This verticle implements an LRU-cache with a TTL of 10 minutes and a capacity of 100,000 entries. Instead of adding additional functionality to a standard JDK map implementation, we use Google Guava, which has all the features we need. If the data is not in the L1 cache, the message is sent to the RedisVerticle. This verticle is responsible for data residing in Amazon ElastiCache and uses the Vert.x-redis-client to read data from Redis. In our example, Redis is the central data store. However, in a typical production implementation, Redis would just be the L2 cache with a central data store like Amazon DynamoDB. One of the most important paradigms of a reactive system is to switch from a pull- to a push-based model. To achieve this and reduce network overhead, we’ll use Redis pub/sub to push core data changes to our main application.

Vert.x also supports direct Redis pub/sub-integration, the following code shows our subscriber-implementation:

vertx.eventBus().<JsonObject>consumer(REDIS_PUBSUB_CHANNEL_VERTX, received -> {

JsonObject value = received.body().getJsonObject("value");

String message = value.getString("message");

JsonObject jsonObject = new JsonObject(message);

eb.send(CACHE_REDIS_EVENTBUS_ADDRESS, jsonObject);

});

redis.subscribe(Constants.REDIS_PUBSUB_CHANNEL, res -> {

if (res.succeeded()) {

LOGGER.info("Subscribed to " + Constants.REDIS_PUBSUB_CHANNEL);

} else {

LOGGER.info(res.cause());

}

});

The verticle subscribes to the appropriate Redis pub/sub-channel. If a message is sent over this channel, the payload is extracted and forwarded to the cache-verticle that stores the data in the L1-cache. After storing and enriching data, a response is sent back to the HttpVerticle, which responds to the HTTP request that initially hit this verticle. In addition, the message is converted to ByteBuffer, wrapped in protocol buffers, and send to an Amazon Kinesis Data Stream.

The following example shows a stripped-down version of the KinesisVerticle:

public class KinesisVerticle extends AbstractVerticle {

private static final Logger LOGGER = LoggerFactory.getLogger(KinesisVerticle.class);

private AmazonKinesisAsync kinesisAsyncClient;

private String eventStream = "EventStream";

@Override

public void start() throws Exception {

EventBus eb = vertx.eventBus();

kinesisAsyncClient = createClient();

eventStream = System.getenv(STREAM_NAME) == null ? "EventStream" : System.getenv(STREAM_NAME);

eb.consumer(Constants.KINESIS_EVENTBUS_ADDRESS, message -> {

try {

TrackingMessage trackingMessage = Json.decodeValue((String)message.body(), TrackingMessage.class);

String partitionKey = trackingMessage.getMessageId();

byte [] byteMessage = createMessage(trackingMessage);

ByteBuffer buf = ByteBuffer.wrap(byteMessage);

sendMessageToKinesis(buf, partitionKey);

message.reply("OK");

}

catch (KinesisException exc) {

LOGGER.error(exc);

}

});

}

Kinesis Consumer
This AWS Lambda function consumes data from an Amazon Kinesis Data Stream and persists the data in an Amazon DynamoDB table. In order to improve testability, the invocation code is separated from the business logic. The invocation code is implemented in the class KinesisConsumerHandler and iterates over the Kinesis events pulled from the Kinesis stream by AWS Lambda. Each Kinesis event is unwrapped and transformed from ByteBuffer to protocol buffers and converted into a Java object. Those Java objects are passed to the business logic, which persists the data in a DynamoDB table. In order to improve duration of successive Lambda calls, the DynamoDB-client is instantiated lazily and reused if possible.

Redis Updater
From time to time, it is necessary to update core data in Redis. A very efficient implementation for this requirement is using AWS Lambda and Amazon Kinesis. New core data is sent over the AWS Kinesis stream using JSON as data format and consumed by a Lambda function. This function iterates over the Kinesis events pulled from the Kinesis stream by AWS Lambda. Each Kinesis event is unwrapped and transformed from ByteBuffer to String and converted into a Java object. The Java object is passed to the business logic and stored in Redis. In addition, the new core data is also sent to the main application using Redis pub/sub in order to reduce network overhead and converting from a pull- to a push-based model.

The following example shows the source code to store data in Redis and notify all subscribers:

public void updateRedisData(final TrackingMessage trackingMessage, final Jedis jedis, final LambdaLogger logger) {

try {

ObjectMapper mapper = new ObjectMapper();

String jsonString = mapper.writeValueAsString(trackingMessage);

Map<String, String> map = marshal(jsonString);

String statusCode = jedis.hmset(trackingMessage.getProgramId(), map);

}

catch (Exception exc) {

if (null == logger)

exc.printStackTrace();

else

logger.log(exc.getMessage());

}

}

public void notifySubscribers(final TrackingMessage trackingMessage, final Jedis jedis, final LambdaLogger logger) {

try {

ObjectMapper mapper = new ObjectMapper();

String jsonString = mapper.writeValueAsString(trackingMessage);

jedis.publish(Constants.REDIS_PUBSUB_CHANNEL, jsonString);

}

catch (final IOException e) {

log(e.getMessage(), logger);

}

}

Similarly to our Kinesis Consumer, the Redis-client is instantiated somewhat lazily.

Infrastructure as Code
As already outlined, latency and response time are a very critical part of any ad-tracking solution because response time has a huge impact on conversion rate. In order to reduce latency for customers world-wide, it is common practice to roll out the infrastructure in different AWS Regions in the world to be as close to the end customer as possible. AWS CloudFormation can help you model and set up your AWS resources so that you can spend less time managing those resources and more time focusing on your applications that run in AWS.

You create a template that describes all the AWS resources that you want (for example, Amazon EC2 instances or Amazon RDS DB instances), and AWS CloudFormation takes care of provisioning and configuring those resources for you. Our reference architecture can be rolled out in different Regions using an AWS CloudFormation template, which sets up the complete infrastructure (for example, Amazon Virtual Private Cloud (Amazon VPC), Amazon Elastic Container Service (Amazon ECS) cluster, Lambda functions, DynamoDB table, Amazon ElastiCache cluster, etc.).

Conclusion
In this blog post we described reactive principles and an example architecture with a common use case. We leveraged the capabilities of different frameworks in combination with several AWS services in order to implement reactive principles—not only at the application-level but also at the system-level. I hope I’ve given you ideas for creating your own reactive applications and systems on AWS.

About the Author

Sascha Moellering is a Senior Solution Architect. Sascha is primarily interested in automation, infrastructure as code, distributed computing, containers and JVM. He can be reached at [email protected]

 

 

Success at Apache: A Newbie’s Narrative

Post Syndicated from mikesefanov original https://yahooeng.tumblr.com/post/170536010891

yahoodevelopers:

Kuhu Shukla (bottom center) and team at the 2017 DataWorks Summit


By Kuhu Shukla

This post first appeared here on the Apache Software Foundation blog as part of ASF’s “Success at Apache” monthly blog series.

As I sit at my desk on a rather frosty morning with my coffee, looking up new JIRAs from the previous day in the Apache Tez project, I feel rather pleased. The latest community release vote is complete, the bug fixes that we so badly needed are in and the new release that we tested out internally on our many thousand strong cluster is looking good. Today I am looking at a new stack trace from a different Apache project process and it is hard to miss how much of the exceptional code I get to look at every day comes from people all around the globe. A contributor leaves a JIRA comment before he goes on to pick up his kid from soccer practice while someone else wakes up to find that her effort on a bug fix for the past two months has finally come to fruition through a binding +1.

Yahoo – which joined AOL, HuffPost, Tumblr, Engadget, and many more brands to form the Verizon subsidiary Oath last year – has been at the frontier of open source adoption and contribution since before I was in high school. So while I have no historical trajectories to share, I do have a story on how I found myself in an epic journey of migrating all of Yahoo jobs from Apache MapReduce to Apache Tez, a then-new DAG based execution engine.

Oath grid infrastructure is through and through driven by Apache technologies be it storage through HDFS, resource management through YARN, job execution frameworks with Tez and user interface engines such as Hive, Hue, Pig, Sqoop, Spark, Storm. Our grid solution is specifically tailored to Oath’s business-critical data pipeline needs using the polymorphic technologies hosted, developed and maintained by the Apache community.

On the third day of my job at Yahoo in 2015, I received a YouTube link on An Introduction to Apache Tez. I watched it carefully trying to keep up with all the questions I had and recognized a few names from my academic readings of Yarn ACM papers. I continued to ramp up on YARN and HDFS, the foundational Apache technologies Oath heavily contributes to even today. For the first few weeks I spent time picking out my favorite (necessary) mailing lists to subscribe to and getting started on setting up on a pseudo-distributed Hadoop cluster. I continued to find my footing with newbie contributions and being ever more careful with whitespaces in my patches. One thing was clear – Tez was the next big thing for us. By the time I could truly call myself a contributor in the Hadoop community nearly 80-90% of the Yahoo jobs were now running with Tez. But just like hiking up the Grand Canyon, the last 20% is where all the pain was. Being a part of the solution to this challenge was a happy prospect and thankfully contributing to Tez became a goal in my next quarter.

The next sprint planning meeting ended with me getting my first major Tez assignment – progress reporting. The progress reporting in Tez was non-existent – “Just needs an API fix,”  I thought. Like almost all bugs in this ecosystem, it was not easy. How do you define progress? How is it different for different kinds of outputs in a graph? The questions were many.

I, however, did not have to go far to get answers. The Tez community actively came to a newbie’s rescue, finding answers and posing important questions. I started attending the bi-weekly Tez community sync up calls and asking existing contributors and committers for course correction. Suddenly the team was much bigger, the goals much more chiseled. This was new to anyone like me who came from the networking industry, where the most open part of the code are the RFCs and the implementation details are often hidden. These meetings served as a clean room for our coding ideas and experiments. Ideas were shared, to the extent of which data structure we should pick and what a future user of Tez would take from it. In between the usual status updates and extensive knowledge transfers were made.

Oath uses Apache Pig and Apache Hive extensively and most of the urgent requirements and requests came from Pig and Hive developers and users. Each issue led to a community JIRA and as we started running Tez at Oath scale, new feature ideas and bugs around performance and resource utilization materialized. Every year most of the Hadoop team at Oath travels to the Hadoop Summit where we meet our cohorts from the Apache community and we stand for hours discussing the state of the art and what is next for the project. One such discussion set the course for the next year and a half for me.

We needed an innovative way to shuffle data. Frameworks like MapReduce and Tez have a shuffle phase in their processing lifecycle wherein the data from upstream producers is made available to downstream consumers. Even though Apache Tez was designed with a feature set corresponding to optimization requirements in Pig and Hive, the Shuffle Handler Service was retrofitted from MapReduce at the time of the project’s inception. With several thousands of jobs on our clusters leveraging these features in Tez, the Shuffle Handler Service became a clear performance bottleneck. So as we stood talking about our experience with Tez with our friends from the community, we decided to implement a new Shuffle Handler for Tez. All the conversation points were tracked now through an umbrella JIRA TEZ-3334 and the to-do list was long. I picked a few JIRAs and as I started reading through I realized, this is all new code I get to contribute to and review. There might be a better way to put this, but to be honest it was just a lot of fun! All the whiteboards were full, the team took walks post lunch and discussed how to go about defining the API. Countless hours were spent debugging hangs while fetching data and looking at stack traces and Wireshark captures from our test runs. Six months in and we had the feature on our sandbox clusters. There were moments ranging from sheer frustration to absolute exhilaration with high fives as we continued to address review comments and fixing big and small issues with this evolving feature.

As much as owning your code is valued everywhere in the software community, I would never go on to say “I did this!” In fact, “we did!” It is this strong sense of shared ownership and fluid team structure that makes the open source experience at Apache truly rewarding. This is just one example. A lot of the work that was done in Tez was leveraged by the Hive and Pig community and cross Apache product community interaction made the work ever more interesting and challenging. Triaging and fixing issues with the Tez rollout led us to hit a 100% migration score last year and we also rolled the Tez Shuffle Handler Service out to our research clusters. As of last year we have run around 100 million Tez DAGs with a total of 50 billion tasks over almost 38,000 nodes.

In 2018 as I move on to explore Hadoop 3.0 as our future release, I hope that if someone outside the Apache community is reading this, it will inspire and intrigue them to contribute to a project of their choice. As an astronomy aficionado, going from a newbie Apache contributor to a newbie Apache committer was very much like looking through my telescope - it has endless possibilities and challenges you to be your best.

About the Author:

Kuhu Shukla is a software engineer at Oath and did her Masters in Computer Science at North Carolina State University. She works on the Big Data Platforms team on Apache Tez, YARN and HDFS with a lot of talented Apache PMCs and Committers in Champaign, Illinois. A recent Apache Tez Committer herself she continues to contribute to YARN and HDFS and spoke at the 2017 Dataworks Hadoop Summit on “Tez Shuffle Handler: Shuffling At Scale With Apache Hadoop”. Prior to that she worked on Juniper Networks’ router and switch configuration APIs. She likes to participate in open source conferences and women in tech events. In her spare time she loves singing Indian classical and jazz, laughing, whale watching, hiking and peering through her Dobsonian telescope.

Security updates for Thursday

Post Syndicated from jake original https://lwn.net/Articles/746078/rss

Security updates have been issued by Debian (chromium-browser, krb5, and smarty3), Fedora (firefox, GraphicsMagick, and moodle), Mageia (rsync), openSUSE (bind, chromium, freeimage, gd, GraphicsMagick, libtasn1, libvirt, nodejs6, php7, systemd, and webkit2gtk3), Red Hat (chromium-browser, systemd, and thunderbird), Scientific Linux (systemd), and Ubuntu (curl, firefox, and ruby2.3).

Jackpotting Attacks Against US ATMs

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2018/02/jackpotting_att.html

Brian Krebs is reporting sophisticated jackpotting attacks against US ATMs. The attacker gains physical access to the ATM, plants malware using specialized electronics, and then later returns and forces the machine to dispense all the cash it has inside.

The Secret Service alert explains that the attackers typically use an endoscope — a slender, flexible instrument traditionally used in medicine to give physicians a look inside the human body — to locate the internal portion of the cash machine where they can attach a cord that allows them to sync their laptop with the ATM’s computer.

“Once this is complete, the ATM is controlled by the fraudsters and the ATM will appear Out of Service to potential customers,” reads the confidential Secret Service alert.

At this point, the crook(s) installing the malware will contact co-conspirators who can remotely control the ATMs and force the machines to dispense cash.

“In previous Ploutus.D attacks, the ATM continuously dispensed at a rate of 40 bills every 23 seconds,” the alert continues. Once the dispense cycle starts, the only way to stop it is to press cancel on the keypad. Otherwise, the machine is completely emptied of cash, according to the alert.

Lots of details in the article.

Security updates for Tuesday

Post Syndicated from corbet original https://lwn.net/Articles/745775/rss

Security updates have been issued by Arch Linux (curl, lib32-curl, lib32-libcurl-compat, lib32-libcurl-gnutls, libcurl-compat, libcurl-gnutls, and rsync), Debian (curl), Fedora (clamav and java-1.8.0-openjdk), openSUSE (apache2), Oracle (kernel), and Ubuntu (linux-kvm and thunderbird).

500 Petabytes And Counting

Post Syndicated from Yev original https://www.backblaze.com/blog/500-petabytes-and-counting/

500 Petabytes = 500,000,000 Gigabytes

It seems like only yesterday that we crossed the 350 petabyte mark. It was actually June 2017, but boy have we been growing since. In October 2017 we crossed 400 petabytes. Today, we’re proud to announce we’ve crossed the 500 petabyte mark. That’s a very healthy clip, see for yourself!

Whether you have 50 GB, 500 GB or are just an avid blog reader, thank you for being on this incredible journey with us through the years.

…we’re literally moving at 1,000,000 files per hour.

We’re extremely proud of our track record. Throughout these 11 years we’ve striven to be the simplest, fastest, and most affordable online backup (and now cloud storage) solution available. We’re not just focusing on data ingress, but also adhering to our original goal of making sure that “no one ever loses data again.” How quickly are we restoring data? On average, we’re literally moving at 1,000,000 files per hour.

Even after all these years, one of the most frequent questions asked is, “How has Backblaze maintained such affordable pricing, particularly when the industry continues to move away from unlimited data plans?”

The cloud storage industry is very competitive, with cloud sync, storage, and backup providers leaving the unlimited market every single day: OneDrive, Amazon Cloud Storage, and most recently CrashPlan. Other providers either have tiered pricing (iDrive), or charge almost double or even triple for all the features we provide for our unlimited backup service (Carbonite). So how do we do it?

The answer comes down to our relentless pursuit of lowering costs. Our open-source Backblaze Storage Pods comprise our Backblaze Vaults, and the less expensive and more performant our Storage Pods are, the better the service that we can provide. This all directly translates into the service and pricing we can offer you.

A key part of our service is to be as open as possible with our costs and structure. After all, you are entrusting us with some of your most valuable assets. Still, it is very difficult to find an apples to apples comparison to what our competitors are doing. For example, we can gain some insight from a 2011 interview with Carbonite’s CEO, who gave an interview in which he said Carbonite’s cost of storing a petabyte was $250,000. At the time, our cost to store a petabyte was $76,481 (more on that calculation can be found here and here). If Backblaze’s fundamental cost to store data is one-third that of Carbonite’s, it makes sense that Carbonite’s cost to its customers would be more than Backblaze’s. Today, Backblaze backup is $50/year and Carbonite’s equivalent service is $149.99.

Our continued focus on reducing costs has allowed us to maintain a healthy business. And after accepting customer data for almost 10 years, we sincerely want to thank you all for giving us your trust, and allowing us to protect your important data and memories for you. Here’s to the next 500 petabytes; they’ll be here before we know it.


Update 2/5/18

Since publishing this post, we have posted the latest in our series of Hard Drive Stats, in which we summarize the performance of the hard drives we used in our data centers in 2017 and previously.

The post 500 Petabytes And Counting appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

The 4.15 kernel is out

Post Syndicated from corbet original https://lwn.net/Articles/744875/rss

Linus has released the 4.15 kernel.
After a release cycle that was unusual in so many (bad) ways, this
last week was really pleasant. Quiet and small, and no last-minute
panics, just small fixes for various issues. I never got a feeling
that I’d need to extend things by yet another week, and 4.15 looks
fine to me.

Some of the more significant features in this release include:
the long-awaited CPU controller for the
version-2 control-group interface,
significant live-patching improvements,
initial support for the RISC-V architecture,
support for AMD’s secure encrypted virtualization feature, and
the MAP_SYNC mechanism for working
with nonvolatile memory.
This release also, of course, includes mitigations for the Meltdown and Spectre variant-2
vulnerabilities
though, as Linus points out in the announcement, the
work of dealing with these issues is not yet done.

Security updates for Tuesday

Post Syndicated from ris original https://lwn.net/Articles/745165/rss

Security updates have been issued by Debian (smarty3), Fedora (bind, bind-dyndb-ldap, dnsperf, glibc, kernel, libtasn1, libvpx, mariadb, python-bottle, ruby, and sox), Red Hat (rh-eclipse46-jackson-databind), SUSE (kernel), and Ubuntu (kernel, linux, linux-aws, linux-euclid, linux-hwe, linux-azure, linux-gcp, linux-oem, linux-lts-trusty, linux-lts-xenial, linux-aws, and rsync).

Recent EC2 Goodies – Launch Templates and Spread Placement

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/recent-ec2-goodies-launch-templates-and-spread-placement/

We launched some important new EC2 instance types and features at AWS re:Invent. I’ve already told you about the M5, H1, T2 Unlimited and Bare Metal instances, and about Spot features such as Hibernation and the New Pricing Model. Randall told you about the Amazon Time Sync Service. Today I would like to tell you about two of the features that we launched: Spread placement groups and Launch Templates. Both features are available in the EC2 Console and from the EC2 APIs, and can be used in all of the AWS Regions in the “aws” partition:

Launch Templates
You can use launch templates to store the instance, network, security, storage, and advanced parameters that you use to launch EC2 instances, and can also include any desired tags. Each template can include any desired subset of the full collection of parameters. You can, for example, define common configuration parameters such as tags or network configurations in a template, and allow the other parameters to be specified as part of the actual launch.

Templates give you the power to set up a consistent launch environment that spans instances launched in On-Demand and Spot form, as well as through EC2 Auto Scaling and as part of a Spot Fleet. You can use them to implement organization-wide standards and to enforce best practices, and you can give your IAM users the ability to launch instances via templates while withholding the ability to do so via the underlying APIs.

Templates are versioned and you can use any desired version when you launch an instance. You can create templates from scratch, base them on the previous version, or copy the parameters from a running instance.

Here’s how you create a launch template in the Console:

Here’s how to include network interfaces, storage volumes, tags, and security groups:

And here’s how to specify advanced and specialized parameters:

You don’t have to specify values for all of these parameters in your templates; enter the values that are common to multiple instances or launches and specify the rest at launch time.

When you click Create launch template, the template is created and can be used to launch On-Demand instances, create Auto Scaling Groups, and create Spot Fleets:

The Launch Instance button now gives you the option to launch from a template:

Simply choose the template and the version, and finalize all of the launch parameters:

You can also manage your templates and template versions from the Console:

To learn more about this feature, read Launching an Instance from a Launch Template.

Spread Placement Groups
Spread placement groups indicate that you do not want the instances in the group to share the same underlying hardware. Applications that rely on a small number of critical instances can launch them in a spread placement group to reduce the odds that one hardware failure will impact more than one instance. Here are a couple of things to keep in mind when you use spread placement groups:

  • Availability Zones – A single spread placement group can span multiple Availability Zones. You can have a maximum of seven running instances per Availability Zone per group.
  • Unique Hardware – Launch requests can fail if there is insufficient unique hardware available. The situation changes over time as overall usage changes and as we add additional hardware; you can retry failed requests at a later time.
  • Instance Types – You can launch a wide variety of M4, M5, C3, R3, R4, X1, X1e, D2, H1, I2, I3, HS1, F1, G2, G3, P2, and P3 instances types in spread placement groups.
  • Reserved Instances – Instances launched into a spread placement group can make use of reserved capacity. However, you cannot currently reserve capacity for a placement group and could receive an ICE (Insufficient Capacity Error) even if you have some RI’s available.
  • Applicability – You cannot use spread placement groups in conjunction with Dedicated Instances or Dedicated Hosts.

You can create and use spread placement groups from the AWS Management Console, the AWS Command Line Interface (CLI), the AWS Tools for Windows PowerShell, and the AWS SDKs. The console has a new feature that will help you to learn how to use the command line:

You can specify an existing placement group or create a new one when you launch an EC2 instance:

To learn more, read about Placement Groups.

Jeff;

Security updates for Friday

Post Syndicated from ris original https://lwn.net/Articles/744791/rss

Security updates have been issued by Arch Linux (bind, irssi, nrpe, perl-xml-libxml, and transmission-cli), CentOS (java-1.8.0-openjdk), Debian (awstats, libgd2, mysql-5.5, rsync, smarty3, and transmission), Fedora (keycloak-httpd-client-install and rootsh), and Red Hat (java-1.7.0-oracle and java-1.8.0-oracle).

Security Breaches Don’t Affect Stock Price

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2018/01/security_breach.html

Interesting research: “Long-term market implications of data breaches, not,” by Russell Lange and Eric W. Burger.

Abstract: This report assesses the impact disclosure of data breaches has on the total returns and volatility of the affected companies’ stock, with a focus on the results relative to the performance of the firms’ peer industries, as represented through selected indices rather than the market as a whole. Financial performance is considered over a range of dates from 3 days post-breach through 6 months post-breach, in order to provide a longer-term perspective on the impact of the breach announcement.

Key findings:

  • While the difference in stock price between the sampled breached companies and their peers was negative (1.13%) in the first 3 days following announcement of a breach, by the 14th day the return difference had rebounded to + 0.05%, and on average remained positive through the period assessed.
  • For the differences in the breached companies’ betas and the beta of their peer sets, the differences in the means of 8 months pre-breach versus post-breach was not meaningful at 90, 180, and 360 day post-breach periods.

  • For the differences in the breached companies’ beta correlations against the peer indices pre- and post-breach, the difference in the means of the rolling 60 day correlation 8 months pre- breach versus post-breach was not meaningful at 90, 180, and 360 day post-breach periods.

  • In regression analysis, use of the number of accessed records, date, data sensitivity, and malicious versus accidental leak as variables failed to yield an R2 greater than 16.15% for response variables of 3, 14, 60, and 90 day return differential, excess beta differential, and rolling beta correlation differential, indicating that the financial impact on breached companies was highly idiosyncratic.

  • Based on returns, the most impacted industries at the 3 day post-breach date were U.S. Financial Services, Transportation, and Global Telecom. At the 90 day post-breach date, the three most impacted industries were U.S. Financial Services, U.S. Healthcare, and Global Telecom.

The market isn’t going to fix this. If we want better security, we need to regulate the market.

Note: The article is behind a paywall. An older version is here. A similar article is here.

Security updates for Wednesday

Post Syndicated from ris original https://lwn.net/Articles/744619/rss

Security updates have been issued by Debian (bind9, wordpress, and xbmc), Fedora (awstats, docker, gifsicle, irssi, microcode_ctl, mupdf, nasm, osc, osc-source_validator, and php), Gentoo (newsbeuter, poppler, and rsync), Mageia (gifsicle), Red Hat (linux-firmware and microcode_ctl), Scientific Linux (linux-firmware and microcode_ctl), SUSE (kernel and openssl), and Ubuntu (bind9, eglibc, glibc, and transmission).

Scale Your Web Application — One Step at a Time

Post Syndicated from Saurabh Shrivastava original https://aws.amazon.com/blogs/architecture/scale-your-web-application-one-step-at-a-time/

I often encounter people experiencing frustration as they attempt to scale their e-commerce or WordPress site—particularly around the cost and complexity related to scaling. When I talk to customers about their scaling plans, they often mention phrases such as horizontal scaling and microservices, but usually people aren’t sure about how to dive in and effectively scale their sites.

Now let’s talk about different scaling options. For instance if your current workload is in a traditional data center, you can leverage the cloud for your on-premises solution. This way you can scale to achieve greater efficiency with less cost. It’s not necessary to set up a whole powerhouse to light a few bulbs. If your workload is already in the cloud, you can use one of the available out-of-the-box options.

Designing your API in microservices and adding horizontal scaling might seem like the best choice, unless your web application is already running in an on-premises environment and you’ll need to quickly scale it because of unexpected large spikes in web traffic.

So how to handle this situation? Take things one step at a time when scaling and you may find horizontal scaling isn’t the right choice, after all.

For example, assume you have a tech news website where you did an early-look review of an upcoming—and highly-anticipated—smartphone launch, which went viral. The review, a blog post on your website, includes both video and pictures. Comments are enabled for the post and readers can also rate it. For example, if your website is hosted on a traditional Linux with a LAMP stack, you may find yourself with immediate scaling problems.

Let’s get more details on the current scenario and dig out more:

  • Where are images and videos stored?
  • How many read/write requests are received per second? Per minute?
  • What is the level of security required?
  • Are these synchronous or asynchronous requests?

We’ll also want to consider the following if your website has a transactional load like e-commerce or banking:

How is the website handling sessions?

  • Do you have any compliance requests—like the Payment Card Industry Data Security Standard (PCI DSS compliance) —if your website is using its own payment gateway?
  • How are you recording customer behavior data and fulfilling your analytics needs?
  • What are your loading balancing considerations (scaling, caching, session maintenance, etc.)?

So, if we take this one step at a time:

Step 1: Ease server load. We need to quickly handle spikes in traffic, generated by activity on the blog post, so let’s reduce server load by moving image and video to some third -party content delivery network (CDN). AWS provides Amazon CloudFront as a CDN solution, which is highly scalable with built-in security to verify origin access identity and handle any DDoS attacks. CloudFront can direct traffic to your on-premises or cloud-hosted server with its 113 Points of Presence (102 Edge Locations and 11 Regional Edge Caches) in 56 cities across 24 countries, which provides efficient caching.
Step 2: Reduce read load by adding more read replicas. MySQL provides a nice mirror replication for databases. Oracle has its own Oracle plug for replication and AWS RDS provide up to five read replicas, which can span across the region and even the Amazon database Amazon Aurora can have 15 read replicas with Amazon Aurora autoscaling support. If a workload is highly variable, you should consider Amazon Aurora Serverless database  to achieve high efficiency and reduced cost. While most mirror technologies do asynchronous replication, AWS RDS can provide synchronous multi-AZ replication, which is good for disaster recovery but not for scalability. Asynchronous replication to mirror instance means replication data can sometimes be stale if network bandwidth is low, so you need to plan and design your application accordingly.

I recommend that you always use a read replica for any reporting needs and try to move non-critical GET services to read replica and reduce the load on the master database. In this case, loading comments associated with a blog can be fetched from a read replica—as it can handle some delay—in case there is any issue with asynchronous reflection.

Step 3: Reduce write requests. This can be achieved by introducing queue to process the asynchronous message. Amazon Simple Queue Service (Amazon SQS) is a highly-scalable queue, which can handle any kind of work-message load. You can process data, like rating and review; or calculate Deal Quality Score (DQS) using batch processing via an SQS queue. If your workload is in AWS, I recommend using a job-observer pattern by setting up Auto Scaling to automatically increase or decrease the number of batch servers, using the number of SQS messages, with Amazon CloudWatch, as the trigger.  For on-premises workloads, you can use SQS SDK to create an Amazon SQS queue that holds messages until they’re processed by your stack. Or you can use Amazon SNS  to fan out your message processing in parallel for different purposes like adding a watermark in an image, generating a thumbnail, etc.

Step 4: Introduce a more robust caching engine. You can use Amazon Elastic Cache for Memcached or Redis to reduce write requests. Memcached and Redis have different use cases so if you can afford to lose and recover your cache from your database, use Memcached. If you are looking for more robust data persistence and complex data structure, use Redis. In AWS, these are managed services, which means AWS takes care of the workload for you and you can also deploy them in your on-premises instances or use a hybrid approach.

Step 5: Scale your server. If there are still issues, it’s time to scale your server.  For the greatest cost-effectiveness and unlimited scalability, I suggest always using horizontal scaling. However, use cases like database vertical scaling may be a better choice until you are good with sharding; or use Amazon Aurora Serverless for variable workloads. It will be wise to use Auto Scaling to manage your workload effectively for horizontal scaling. Also, to achieve that, you need to persist the session. Amazon DynamoDB can handle session persistence across instances.

If your server is on premises, consider creating a multisite architecture, which will help you achieve quick scalability as required and provide a good disaster recovery solution.  You can pick and choose individual services like Amazon Route 53, AWS CloudFormation, Amazon SQS, Amazon SNS, Amazon RDS, etc. depending on your needs.

Your multisite architecture will look like the following diagram:

In this architecture, you can run your regular workload on premises, and use your AWS workload as required for scalability and disaster recovery. Using Route 53, you can direct a precise percentage of users to an AWS workload.

If you decide to move all of your workloads to AWS, the recommended multi-AZ architecture would look like the following:

In this architecture, you are using a multi-AZ distributed workload for high availability. You can have a multi-region setup and use Route53 to distribute your workload between AWS Regions. CloudFront helps you to scale and distribute static content via an S3 bucket and DynamoDB, maintaining your application state so that Auto Scaling can apply horizontal scaling without loss of session data. At the database layer, RDS with multi-AZ standby provides high availability and read replica helps achieve scalability.

This is a high-level strategy to help you think through the scalability of your workload by using AWS even if your workload in on premises and not in the cloud…yet.

I highly recommend creating a hybrid, multisite model by placing your on-premises environment replica in the public cloud like AWS Cloud, and using Amazon Route53 DNS Service and Elastic Load Balancing to route traffic between on-premises and cloud environments. AWS now supports load balancing between AWS and on-premises environments to help you scale your cloud environment quickly, whenever required, and reduce it further by applying Amazon auto-scaling and placing a threshold on your on-premises traffic using Route 53.