Tag Archives: sast

When Joe Public Becomes a Commercial Pirate, a Little Knowledge is Dangerous

Post Syndicated from Andy original https://torrentfreak.com/joe-public-becomes-commercial-pirate-little-knowledge-dangerous-180603/

Back in March and just a few hours before the Anthony Joshua v Joseph Parker fight, I got chatting with some fellow fans in the local pub. While some were intending to pay for the fight, others were going down the Kodi route.

Soon after the conversation switched to IPTV. One of the guys had a subscription and he said that his supplier would be along shortly if anyone wanted a package to watch the fight at home. Of course, I was curious to hear what he had to say since it’s not often this kind of thing is offered ‘offline’.

The guy revealed that he sold more or less exclusively on eBay and called up the page on his phone to show me. The listing made interesting reading.

In common with hundreds of similar IPTV subscription offers easily findable on eBay, the listing offered “All the sports and films you need plus VOD and main UK channels” for the sum of just under £60 per year, which is fairly cheap in the current market. With a non-committal “hmmm” I asked a bit more about the guy’s business and surprisingly he was happy to provide some details.

Like many people offering such packages, the guy was a reseller of someone else’s product. He also insisted that selling access to copyrighted content is OK because it sits in a “gray area”. It’s also easy to keep listings up on eBay, he assured me, as long as a few simple rules are adhered to. Right, this should be interesting.

First of all, sellers shouldn’t be “too obvious” he advised, noting that individual channels or channel lists shouldn’t be listed on the site. Fair enough, but then he said the most important thing of all is to have a disclaimer like his in any listing, written as follows:

“PLEASE NOTE EBAY: THIS IS NOT A DE SCRAMBLER SERVICE, I AM NOT SELLING ANY ILLEGAL CHANNELS OR CHANNEL LISTS NOR DO I REPRESENT ANY MEDIA COMPANY NOR HAVE ACCESS TO ANY OF THEIR CONTENTS. NO TRADEMARK HAS BEEN INFRINGED. DO NOT REMOVE LISTING AS IT IS IN ACCORDANCE WITH EBAY POLICIES.”

Apparently, this paragraph is crucial to keeping listings up on eBay and is the equivalent of kryptonite when it comes to deflecting copyright holders, police, and Trading Standards. Sure enough, a few seconds with Google reveals the same wording on dozens of eBay listings and those offering IPTV subscriptions on external platforms.

It is, of course, absolutely worthless but the IPTV seller insisted otherwise, noting he’d sold “thousands” of subscriptions through eBay without any problems. While a similar logic can be applied to garlic and vampires, a second disclaimer found on many other illicit IPTV subscription listings treads an even more bizarre path.

“THE PRODUCTS OFFERED CAN NOT BE USED TO DESCRAMBLE OR OTHERWISE ENABLE ACCESS TO CABLE OR SATELLITE TELEVISION PROGRAMS THAT BYPASSES PAYMENT TO THE SERVICE PROVIDER. RECEIVING SUBSCRIPTION/BASED TV AIRTIME IS ILLEGAL WITHOUT PAYING FOR IT.”

This disclaimer (which apparently no sellers displaying it have ever read) seems to be have been culled from the Zgemma site, which advertises a receiving device which can technically receive pirate IPTV services but wasn’t designed for the purpose. In that context, the disclaimer makes sense but when applied to dedicated pirate IPTV subscriptions, it’s absolutely ridiculous.

It’s unclear why so many sellers on eBay, Gumtree, Craigslist and other platforms think that these disclaimers are useful. It leads one to the likely conclusion that these aren’t hardcore pirates at all but regular people simply out to make a bit of extra cash who have received bad advice.

What is clear, however, is that selling access to thousands of otherwise subscription channels without permission from copyright owners is definitely illegal in the EU. The European Court of Justice says so (1,2) and it’s been backed up by subsequent cases in the Netherlands.

While the odds of getting criminally prosecuted or sued for reselling such a service are relatively slim, it’s worrying that in 2018 people still believe that doing so is made legal by the inclusion of a paragraph of text. It’s even more worrying that these individuals apparently have no idea of the serious consequences should they become singled out for legal action.

Even more surprisingly, TorrentFreak spoke with a handful of IPTV suppliers higher up the chain who also told us that what they are doing is legal. A couple claimed to be protected by communication intermediary laws, others didn’t want to go into details. Most stopped responding to emails on the topic. Perhaps most tellingly, none wanted to go on the record.

The big take-home here is that following some important EU rulings, knowingly linking to copyrighted content for profit is nearly always illegal in Europe and leaves people open for targeting by copyright holders and the authorities. People really should be aware of that, especially the little guy making a little extra pocket money on eBay.

Of course, people are perfectly entitled to carry on regardless and test the limits of the law when things go wrong. At this point, however, it’s probably worth noting that IPTV provider Ace Hosting recently handed over £600,000 rather than fight the Premier League (1,2) when they clearly had the money to put up a defense.

Given their effectiveness, perhaps they should’ve put up a disclaimer instead?

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN reviews, discounts, offers and coupons.

Replacing macOS Server with Synology NAS

Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/replacing-macos-server-with-synology-nas/

Synology NAS boxes backed up to the cloud

Businesses and organizations that rely on macOS server for essential office and data services are facing some decisions about the future of their IT services.

Apple recently announced that it is deprecating a significant portion of essential network services in macOS Server, as they described in a support statement posted on April 24, 2018, “Prepare for changes to macOS Server.” Apple’s note includes:

macOS Server is changing to focus more on management of computers, devices, and storage on your network. As a result, some changes are coming in how Server works. A number of services will be deprecated, and will be hidden on new installations of an update to macOS Server coming in spring 2018.

The note lists the services that will be removed in a future release of macOS Server, including calendar and contact support, Dynamic Host Configuration Protocol (DHCP), Domain Name Services (DNS), mail, instant messages, virtual private networking (VPN), NetInstall, Web server, and the Wiki.

Apple assures users who have already configured any of the listed services that they will be able to use them in the spring 2018 macOS Server update, but the statement ends with links to a number of alternative services, including hosted services, that macOS Server users should consider as viable replacements to the features it is removing. These alternative services are all FOSS (Free and Open-Source Software).

As difficult as this could be for organizations that use macOS server, this is not unexpected. Apple left the server hardware space back in 2010, when Steve Jobs announced the company was ending its line of Xserve rackmount servers, which were introduced in May, 2002. Since then, macOS Server has hardly been a prominent part of Apple’s product lineup. It’s not just the product itself that has lost some luster, but the entire category of SMB office and business servers, which has been undergoing a gradual change in recent years.

Some might wonder how important the news about macOS Server is, given that macOS Server represents a pretty small share of the server market. macOS Server has been important to design shops, agencies, education users, and small businesses that likely have been on Macs for ages, but it’s not a significant part of the IT infrastructure of larger organizations and businesses.

What Comes After macOS Server?

Lovers of macOS Server don’t have to fear having their Mac minis pried from their cold, dead hands quite yet. Installed services will continue to be available. In the fall of 2018, new installations and upgrades of macOS Server will require users to migrate most services to other software. Since many of the services of macOS Server were already open-source, this means that a change in software might not be required. It does mean more configuration and management required from those who continue with macOS Server, however.

Users can continue with macOS Server if they wish, but many will see the writing on the wall and look for a suitable substitute.

The Times They Are A-Changin’

For many people working in organizations, what is significant about this announcement is how it reflects the move away from the once ubiquitous server-based IT infrastructure. Services that used to be centrally managed and office-based, such as storage, file sharing, communications, and computing, have moved to the cloud.

In selecting the next office IT platforms, there’s an opportunity to move to solutions that reflect and support how people are working and the applications they are using both in the office and remotely. For many, this means including cloud-based services in office automation, backup, and business continuity/disaster recovery planning. This includes Software as a Service, Platform as a Service, and Infrastructure as a Service (Saas, PaaS, IaaS) options.

IT solutions that integrate well with the cloud are worth strong consideration for what comes after a macOS Server-based environment.

Synology NAS as a macOS Server Alternative

One solution that is becoming popular is to replace macOS Server with a device that has the ability to provide important office services, but also bridges the office and cloud environments. Using Network-Attached Storage (NAS) to take up the server slack makes a lot of sense. Many customers are already using NAS for file sharing, local data backup, automatic cloud backup, and other uses. In the case of Synology, their operating system, Synology DiskStation Manager (DSM), is Linux based, and integrates the basic functions of file sharing, centralized backup, RAID storage, multimedia streaming, virtual storage, and other common functions.

Synology NAS box

Synology NAS

Since DSM is based on Linux, there are numerous server applications available, including many of the same ones that are available for macOS Server, which shares conceptual roots with Linux as it comes from BSD Unix.

Synology DiskStation Manager Package Center screenshot

Synology DiskStation Manager Package Center

According to Ed Lukacs, COO at 2FIFTEEN Systems Management in Salt Lake City, their customers have found the move from macOS Server to Synology NAS not only painless, but positive. DSM works seamlessly with macOS and has been faster for their customers, as well. Many of their customers are running Adobe Creative Suite and Google G Suite applications, so a workflow that combines local storage, remote access, and the cloud, is already well known to them. Remote users are supported by Synology’s QuickConnect or VPN.

Business continuity and backup are simplified by the flexible storage capacity of the NAS. Synology has built-in backup to Backblaze B2 Cloud Storage with Synology’s Cloud Sync, as well as a choice of a number of other B2-compatible applications, such as Cloudberry, Comet, and Arq.

Customers have been able to get up and running quickly, with only initial data transfers requiring some time to complete. After that, management of the NAS can be handled in-house or with the support of a Managed Service Provider (MSP).

Are You Sticking with macOS Server or Moving to Another Platform?

If you’re affected by this change in macOS Server, please let us know in the comments how you’re planning to cope. Are you using Synology NAS for server services? Please tell us how that’s working for you.

The post Replacing macOS Server with Synology NAS appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Connect Veeam to the B2 Cloud: Episode 3 — Using OpenDedup

Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/opendedup-for-cloud-storage/

Veeam backup to Backblaze B2 logo

In this, the third post in our series on connecting Veeam with Backblaze B2 Cloud Storage, we discuss how to back up your VMs to B2 using Veeam and OpenDedup. In our previous posts, we covered how to connect Veeam to the B2 cloud using Synology, and how to connect Veeam with B2 using StarWind VTL.

Deduplication and OpenDedup

Deduplication is simply the process of eliminating redundant data on disk. Deduplication reduces storage space requirements, improves backup speed, and lowers backup storage costs. The dedup field used to be dominated by a few big-name vendors who sold dedup systems that were too expensive for most of the SMB market. Then an open-source challenger came along in OpenDedup, a project that produced the Space Deduplication File System (SDFS). SDFS provides many of the features of commercial dedup products without their cost.

OpenDedup provides inline deduplication that can be used with applications such as Veeam, Veritas Backup Exec, and Veritas NetBackup.

Features Supported by OpenDedup:

  • Variable Block Deduplication to cloud storage
  • Local Data Caching
  • Encryption
  • Bandwidth Throttling
  • Fast Cloud Recovery
  • Windows and Linux Support

Why use Veeam with OpenDedup to Backblaze B2?

With your VMs backed up to B2, you have a number of options to recover from a disaster. If the unexpected occurs, you can quickly restore your VMs from B2 to the location of your choosing. You also have the option to bring up cloud compute through B2’s compute partners, thereby minimizing any loss of service and ensuring business continuity.

Veeam logo + OpenDedup logo + Backblaze B2 logo

Backblaze’s B2 is an ideal solution for backing up Veeam’s backup repository due to B2’s combination of low-cost and high availability. Users of B2 save up to 75% compared to other cloud solutions such as Microsoft Azure, Amazon AWS, or Google Cloud Storage. When combined with OpenDedup’s no-cost deduplication, you’re got an efficient and economical solution for backing up VMs to the cloud.

How to Use OpenDedup with B2

For step-by-step instructions for how to set up OpenDedup for use with B2 on Windows or Linux, see Backblaze B2 Enabled on the OpenDedup website.

Are you backing up Veeam to B2 using one of the solutions we’ve written about in this series? If you have, we’d love to hear from you in the comments.

View all posts in the Veeam series.

The post Connect Veeam to the B2 Cloud: Episode 3 — Using OpenDedup appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

‘Anonymous’ Hackers Deface Russian Govt. Site to Protest Web-Blocking (NSFW)

Post Syndicated from Andy original https://torrentfreak.com/anonymous-hackers-deface-russian-govt-site-to-protest-web-blocking-nsfw-180512/

Last month, Russian authorities demonstrated that when an entity breaks local Internet rules, no stone will be left unturned to make them pay, whatever the cost.

The disaster waiting to happen began when encrypted messaging service Telegram refused to hand over its encryption keys to the state. In response, the Federal Security Service filed a lawsuit, which it won, compelling it Telegram do so. With no response, Roscomnadzor obtained a court order to have Telegram blocked.

In a massive response, Russian ISPs – at Roscomnadzor’s behest – began mass-blocking IP addresses on a massive scale. Millions of IP addresses belong to Amazon, Google and other innocent parties were rendered inaccessible in Russia, causing chaos online.

Even VPN providers were targeted for facilitating access to Telegram but while the service strained under the pressure, it never went down and continues to function today.

In the wake of the operation there has been some attempt at a cleanup job, with Roscomnadzor announcing this week that it had unblocked millions of IP addresses belonging to Google.

“As part of a package of the measures to enforce the court’s decision on Telegram, Roskomnadzor has removed six Google subnets (more than 3.7 million IP-addresses) from the blocklist,” the telecoms watchdog said in a statement.

“In this case, the IP addresses of Telegram, which are part of these subnets, are fully installed and blocked. Subnets are unblocked in order to ensure the correct operation of third-party Internet resources.”

But while Roscomnadzor attempts to calm the seas, those angered by Russia’s carpet-bombing of the Internet were determined to make their voices heard. Hackers attacked the website of the Federal Agency for International Cooperation this week, defacing it with scathing criticism combined with NSFW suggestions and imagery.

“Greetings, Roskomnadzor,” the message began.

“Your recent destructive actions towards the Russian internet sector have led us to believe that you are nothing but a bunch of incompetent mindless worms. You shall not be able to continue this pointless vandalism any further.”

Signing off with advice to consider the defacement as a “final warning”, the hackers disappeared into the night after leaving a simple signature.

“Yours, Anonymous,” they wrote.

But the hackers weren’t done yet. In a NSFW cartoon strip that probably explains itself, ‘Anonymous’ suggested that Roscomnadzor should perhaps consider blocking itself, with the implement depicted in the final frame.

“Anus, block yourself Roscomnadzor”

But while Russia’s attack on Telegram raises eyebrows worldwide, the actions of those in authority continue to baffle.

Last week, Prime Minister Dmitry Medvedev’s press secretary, Natalia Timakova, publicly advised a colleague to circumvent the Telegram blockade using a VPN, effectively undermining the massive efforts of the authorities. This week the head of Roscomnadzor only added to the confusion.

Effectively quashing rumors that he’d resigned due to the Telegram fiasco, Alexander Zharov had a conversation with the editor-in-chief of radio station ‘Says Moscow’.

During the liason, which took place during the Victory Parade in Red Square, Zharov was asked how he could be contacted. When Telegram was presented as a potential method, Zharov confirmed that he could be reached via the platform.

Finally, in a move that’s hoped could bring an end to the attack on the platform and others like it, Telegram filed an appeal this week challenging a decision by the Supreme Court of Russia which allows the Federal Security Service to demand access to encryption keys.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN reviews, discounts, offers and coupons.

AWS Online Tech Talks – May and Early June 2018

Post Syndicated from Devin Watson original https://aws.amazon.com/blogs/aws/aws-online-tech-talks-may-and-early-june-2018/

AWS Online Tech Talks – May and Early June 2018  

Join us this month to learn about some of the exciting new services and solution best practices at AWS. We also have our first re:Invent 2018 webinar series, “How to re:Invent”. Sign up now to learn more, we look forward to seeing you.

Note – All sessions are free and in Pacific Time.

Tech talks featured this month:

Analytics & Big Data

May 21, 2018 | 11:00 AM – 11:45 AM PT Integrating Amazon Elasticsearch with your DevOps Tooling – Learn how you can easily integrate Amazon Elasticsearch Service into your DevOps tooling and gain valuable insight from your log data.

May 23, 2018 | 11:00 AM – 11:45 AM PTData Warehousing and Data Lake Analytics, Together – Learn how to query data across your data warehouse and data lake without moving data.

May 24, 2018 | 11:00 AM – 11:45 AM PTData Transformation Patterns in AWS – Discover how to perform common data transformations on the AWS Data Lake.

Compute

May 29, 2018 | 01:00 PM – 01:45 PM PT – Creating and Managing a WordPress Website with Amazon Lightsail – Learn about Amazon Lightsail and how you can create, run and manage your WordPress websites with Amazon’s simple compute platform.

May 30, 2018 | 01:00 PM – 01:45 PM PTAccelerating Life Sciences with HPC on AWS – Learn how you can accelerate your Life Sciences research workloads by harnessing the power of high performance computing on AWS.

Containers

May 24, 2018 | 01:00 PM – 01:45 PM PT – Building Microservices with the 12 Factor App Pattern on AWS – Learn best practices for building containerized microservices on AWS, and how traditional software design patterns evolve in the context of containers.

Databases

May 21, 2018 | 01:00 PM – 01:45 PM PTHow to Migrate from Cassandra to Amazon DynamoDB – Get the benefits, best practices and guides on how to migrate your Cassandra databases to Amazon DynamoDB.

May 23, 2018 | 01:00 PM – 01:45 PM PT5 Hacks for Optimizing MySQL in the Cloud – Learn how to optimize your MySQL databases for high availability, performance, and disaster resilience using RDS.

DevOps

May 23, 2018 | 09:00 AM – 09:45 AM PT.NET Serverless Development on AWS – Learn how to build a modern serverless application in .NET Core 2.0.

Enterprise & Hybrid

May 22, 2018 | 11:00 AM – 11:45 AM PTHybrid Cloud Customer Use Cases on AWS – Learn how customers are leveraging AWS hybrid cloud capabilities to easily extend their datacenter capacity, deliver new services and applications, and ensure business continuity and disaster recovery.

IoT

May 31, 2018 | 11:00 AM – 11:45 AM PTUsing AWS IoT for Industrial Applications – Discover how you can quickly onboard your fleet of connected devices, keep them secure, and build predictive analytics with AWS IoT.

Machine Learning

May 22, 2018 | 09:00 AM – 09:45 AM PTUsing Apache Spark with Amazon SageMaker – Discover how to use Apache Spark with Amazon SageMaker for training jobs and application integration.

May 24, 2018 | 09:00 AM – 09:45 AM PTIntroducing AWS DeepLens – Learn how AWS DeepLens provides a new way for developers to learn machine learning by pairing the physical device with a broad set of tutorials, examples, source code, and integration with familiar AWS services.

Management Tools

May 21, 2018 | 09:00 AM – 09:45 AM PTGaining Better Observability of Your VMs with Amazon CloudWatch – Learn how CloudWatch Agent makes it easy for customers like Rackspace to monitor their VMs.

Mobile

May 29, 2018 | 11:00 AM – 11:45 AM PT – Deep Dive on Amazon Pinpoint Segmentation and Endpoint Management – See how segmentation and endpoint management with Amazon Pinpoint can help you target the right audience.

Networking

May 31, 2018 | 09:00 AM – 09:45 AM PTMaking Private Connectivity the New Norm via AWS PrivateLink – See how PrivateLink enables service owners to offer private endpoints to customers outside their company.

Security, Identity, & Compliance

May 30, 2018 | 09:00 AM – 09:45 AM PT – Introducing AWS Certificate Manager Private Certificate Authority (CA) – Learn how AWS Certificate Manager (ACM) Private Certificate Authority (CA), a managed private CA service, helps you easily and securely manage the lifecycle of your private certificates.

June 1, 2018 | 09:00 AM – 09:45 AM PTIntroducing AWS Firewall Manager – Centrally configure and manage AWS WAF rules across your accounts and applications.

Serverless

May 22, 2018 | 01:00 PM – 01:45 PM PTBuilding API-Driven Microservices with Amazon API Gateway – Learn how to build a secure, scalable API for your application in our tech talk about API-driven microservices.

Storage

May 30, 2018 | 11:00 AM – 11:45 AM PTAccelerate Productivity by Computing at the Edge – Learn how AWS Snowball Edge support for compute instances helps accelerate data transfers, execute custom applications, and reduce overall storage costs.

June 1, 2018 | 11:00 AM – 11:45 AM PTLearn to Build a Cloud-Scale Website Powered by Amazon EFS – Technical deep dive where you’ll learn tips and tricks for integrating WordPress, Drupal and Magento with Amazon EFS.

 

 

 

 

TSB Bank Disaster

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2018/04/tsb_bank_disast.html

This seems like an absolute disaster:

The very short version is that a UK bank, TSB, which had been merged into and then many years later was spun out of Lloyds Bank, was bought by the Spanish bank Banco Sabadell in 2015. Lloyds had continued to run the TSB systems and was to transfer them over to Sabadell over the weekend. It’s turned out to be an epic failure, and it’s not clear if and when this can be straightened out.

It is bad enough that bank IT problem had been so severe and protracted a major newspaper, The Guardian, created a live blog for it that has now been running for two days.

The more serious issue is the fact that customers still can’t access online accounts and even more disconcerting, are sometimes being allowed into other people’s accounts, says there are massive problems with data integrity. That’s a nightmare to sort out.

Even worse, the fact that this situation has persisted strongly suggests that Lloyds went ahead with the migration without allowing for a rollback.

This seems to be a mistake, and not enemy action.

Cloud Empire: Meet the Rebel Alliance

Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/cloud-empire-meet-the-rebel-alliance/

Cloud Empire: Meet the Rebel Alliance

Last week Backblaze made the exciting announcement that through partnerships with Packet and ServerCentral, cloud computing is available to Backblaze B2 Cloud Storage customers.

Those of you familiar with cloud computing will understand the significance of this news. We are now offering the least expensive cloud storage + cloud computing available anywhere. You no longer have to submit to the lock-in tactics and exorbitant prices charged by the other big players in the cloud services biz.

As Robin Harris wrote in ZDNet about last week’s computing partners announcement, Cloud Empire: Meet the Rebel Alliance.

We understand that some of our cloud backup and storage customers might be unfamiliar with cloud computing. Backblaze made its name in cloud backup and object storage, and that’s what our customers know us for. In response to customers requests, we’ve directly connected our B2 cloud object storage with cloud compute providers. This adds the ability to use and run programs on data once it’s in the B2 cloud, opening up a world of new uses for B2. Just some of the possibilities include media transcoding and rendering, web hosting, application development and testing, business analytics, disaster recovery, on-demand computing capacity (cloud bursting), AI, and mobile and IoT applications.

The world has been moving to a multi-cloud / hybrid cloud world, and customers are looking for more choices than those offered by the existing cloud players. Our B2 compute partnerships build on our mission to offer cloud storage that’s astonishingly easy and low-cost. They enable our customers to move into a more flexible and affordable cloud services ecosystem that provides a greater variety of choices and costs far less. We believe we are helping to fulfill the promise of the internet by allowing customers to choose the best-of-breed services from the best vendors.

If You’re Not Familiar with Cloud Computing, Here’s a Quick Overview

Cloud computing is another component of cloud services, like object storage, that replicates in the cloud a basic function of a computer system. Think of services that operate in a cloud as an infinitely scalable version of what happens on your desktop computer. In your desktop computer you have computing/processing (CPU), fast storage (like an SSD), data storage (like your disk drive), and memory (RAM). Their counterparts in the cloud are computing (CPU), block storage (fast storage), object storage (data storage), and processing memory (RAM).

Computer building blocks

CPU, RAM, fast internal storage, and a hard drive are the basic building blocks of a computer
They also are the basic building blocks of cloud computing

Some customers require only some of these services, such as cloud storage. B2 as a standalone service has proven to be an outstanding solution for those customers interested in backing up or archiving data. There are many customers that would like additional capabilities, such as performing operations on that data once it’s in the cloud. They need object storage combined with computing.

With the just announced compute partnerships, Backblaze is able to offer computing services to anyone using B2. A direct connection between Backblaze’s and our partners’ data centers means that our customers can process data stored in B2 with high speed, low latency, and zero data transfer costs.

Backblaze, Packet and Server Central cloud compute workflow diagram

Cloud service providers package up CPU, storage, and memory into services that you can rent on an hourly basis
You can scale up and down and add or remove services as you need them

How Does Computing + B2 Work?

Those wanting to use B2 with computing will need to sign up for accounts with Backblaze and either Packet or ServerCentral. Packet customers need only select “SJC1” as their region and then get started. The process is also simple for ServerCentral customers — they just need to register with a ServerCentral account rep.

The direct connection between B2 and our compute partners means customers will experience very low latency (less than 10ms) between services. Even better, all data transfers between B2 and the compute partner are free. When combined with Backblaze B2, customers can obtain cloud computing services for as little as 50% of the cost of Amazon’s Elastic Compute Cloud (EC2).

Opening Up the Cloud “Walled Garden”

Traditionally, cloud vendors charge fees for customers to move data outside the “walled garden” of that particular vendor. These fees reach upwards of $0.12 per gigabyte (GB) for data egress. This large fee for customers accessing their own data restricts users from using a multi-cloud approach and taking advantage of less expensive or better performing options. With free transfers between B2 and Packet or ServerCentral, customers now have a predictable, scalable solution for computing and data storage while avoiding vendor lock in. Dropbox made waves when they saved $75 million by migrating off of AWS. Adding computing to B2 helps anyone interested in moving some or all of their computing off of AWS and thereby cutting their AWS bill by 50% or more.

What are the Advantages of Cloud Storage + Computing?

Using computing and storage in the cloud provide a number of advantages over using in-house resources.

  1. You don’t have to purchase the actual hardware, software licenses, and provide space and IT resources for the systems.
  2. Cloud computing is available with just a few minutes notice and you only pay for whatever period of time you need. You avoid having additional hardware on your balance sheet.
  3. Resources are in the cloud and can provide online services to customers, mobile users, and partners located anywhere in the world.
  4. You can isolate the work on these systems from your normal production environment, making them ideal for testing and trying out new applications and development projects.
  5. Computing resources scale when you need them to, providing temporary or ongoing extra resources for expected or unexpected demand.
  6. They can provide redundant and failover services when and if your primary systems are unavailable for whatever reason.

Where Can I Learn More?

We encourage B2 customers to explore the options available at our partner sites, Packet and ServerCentral. They are happy to help customers understand what services are available and how to get started.

We are excited to see what you build! And please tell us in the comments what you are doing or have planned with B2 + computing.

P.S. May the force be with all of us!

The post Cloud Empire: Meet the Rebel Alliance appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Backblaze Announces B2 Compute Partnerships

Post Syndicated from Gleb Budman original https://www.backblaze.com/blog/introducing-cloud-compute-services/

Backblaze Announces B2 Compute Partnerships

In 2015, we announced Backblaze B2 Cloud Storage — the most affordable, high performance storage cloud on the planet. The decision to release B2 as a service was in direct response to customers asking us if they could use the same cloud storage infrastructure we use for our Computer Backup service. With B2, we entered a market in direct competition with Amazon S3, Google Cloud Services, and Microsoft Azure Storage. Today, we have over 500 petabytes of data from customers in over 150 countries. At $0.005 / GB / month for storage (1/4th of S3) and $0.01 / GB for downloads (1/5th of S3), it turns out there’s a healthy market for cloud storage that’s easy and affordable.

As B2 has grown, customers wanted to use our cloud storage for a variety of use cases that required not only storage but compute. We’re happy to say that through partnerships with Packet & ServerCentral, today we’re announcing that compute is now available for B2 customers.

Cloud Compute and Storage

Backblaze has directly connected B2 with the compute servers of Packet and ServerCentral, thereby allowing near-instant (< 10 ms) data transfers between services. Also, transferring data between B2 and both our compute partners is free.

  • Storing data in B2 and want to run an AI analysis on it? — There are no fees to move the data to our compute partners.
  • Generating data in an application? — Run the application with one of our partners and store it in B2.
  • Transfers are free and you’ll save more than 50% off of the equivalent set of services from AWS.

These partnerships enable B2 customers to use compute, give our compute partners’ customers access to cloud storage, and introduce new customers to industry-leading storage and compute — all with high-performance, low-latency, and low-cost.

Is This a Big Deal? We Think So

Compute is one of the most requested services from our customers Why? Because it unlocks a number of use cases for them. Let’s look at three popular examples:

Transcoding Media Files

B2 has earned wide adoption in the Media & Entertainment (“M&E”) industry. Our affordable storage and download pricing make B2 great for a wide variety of M&E use cases. But many M&E workflows require compute. Content syndicators, like American Public Television, need the ability to transcode files to meet localization and distribution management requirements.

There are a multitude of reasons that transcode is needed — thumbnail and proxy generation enable M&E professionals to work efficiently. Without compute, the act of transcoding files remains cumbersome. Either the files need to be brought down from the cloud, transcoded, and then pushed back up or they must be kept locally until the project is complete. Both scenarios are inefficient.

Starting today, any content producer can spin up compute with one of our partners, pay by the hour for their transcode processing, and return the new media files to B2 for storage and distribution. The company saves money, moves faster, and ensures their files are safe and secure.

Disaster Recovery

Backblaze’s heritage is based on providing outstanding backup services. When you have incredibly affordable cloud storage, it ends up being a great destination for your backup data.

Most enterprises have virtual machines (“VMs”) running in their infrastructure and those VMs need to be backed up. In a disaster scenario, a business wants to know they can get back up and running quickly.

With all data stored in B2, a business can get up and running quickly. Simply restore your backed up VM to one of our compute providers, and your business will be able to get back online.

Since B2 does not place restrictions, delays, or penalties on getting data out, customers can get back up and running quickly and affordably.

Saving $74 Million (aka “The Dropbox Effect”)

Ten years ago, Backblaze decided that S3 was too costly a platform to build its cloud storage business. Instead, we created the Backblaze Storage Pod and our own cloud storage infrastructure. That decision enabled us to offer our customers storage at a previously unavailable price point and maintain those prices for over a decade. It also laid the foundation for Netflix Open Connect and Facebook Open Compute.

Dropbox recently migrated the majority of their cloud services off of AWS and onto Dropbox’s own infrastructure. By leaving AWS, Dropbox was able to build out their own data centers and still save over $74 Million. They achieved those savings by avoiding the fees AWS charges for storing and downloading data, which, incidentally, are five times higher than Backblaze B2.

For Dropbox, being able to realize savings was possible because they have access to enough capital and expertise that they can build out their own infrastructure. For companies that have such resources and scale, that’s a great answer.

“Before this offering, the economics of the cloud would have made our business simply unviable.” — Gabriel Menegatti, SlicingDice

The questions Backblaze and our compute partners pondered was “how can we democratize the Dropbox effect for our storage and compute customers? How can we help customers do more and pay less?” The answer we came up with was to connect Backblaze’s B2 storage with strategic compute partners and remove any transfer fees between them. You may not save $74 million as Dropbox did, but you can choose the optimal providers for your use case and realize significant savings in the process.

This Sounds Good — Tell Me More About Your Partners

We’re very fortunate to be launching our compute program with two fantastic partners in Packet and ServerCentral. These partners allow us to offer a range of computing services.

Packet

We recommend Packet for customers that need on-demand, high performance, bare metal servers available by the hour. They also have robust offerings for private / customized deployments. Their offerings end up costing 50-75% of the equivalent offerings from EC2.

To get started with Packet and B2, visit our partner page on Packet.net.

ServerCentral

ServerCentral is the right partner for customers that have business and IT challenges that require more than “just” hardware. They specialize in fully managed, custom cloud solutions that solve complex business and IT challenges. ServerCentral also has expertise in managed network solutions to address global connectivity and content delivery.

To get started with ServerCentral and B2, visit our partner page on ServerCentral.com.

What’s Next?

We’re excited to find out. The combination of B2 and compute unlocks use cases that were previously impossible or at least unaffordable.

“The combination of performance and price offered by this partnership enables me to create an entirely new business line. Before this offering, the economics of the cloud would have made our business simply unviable,” noted Gabriel Menegatti, co-founder at SlicingDice, a serverless data warehousing service. “Knowing that transfers between compute and B2 are free means I don’t have to worry about my business being successful. And, with download pricing from B2 at just $0.01 GB, I know I’m avoiding a 400% tax from AWS on data I retrieve.”

What can you do with B2 & compute? Please share your ideas with us in the comments. And, for those attending NAB 2018 in Las Vegas next week, please come by and say hello!

The post Backblaze Announces B2 Compute Partnerships appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Serverless Dynamic Web Pages in AWS: Provisioned with CloudFormation

Post Syndicated from AWS Admin original https://aws.amazon.com/blogs/architecture/serverless-dynamic-web-pages-in-aws-provisioned-with-cloudformation/

***This blog is authored by Mike Okner of Monsanto, an AWS customer. It originally appeared on the Monsanto company blog. Minor edits were made to the original post.***

Recently, I was looking to create a status page app to monitor a few important internal services. I wanted this app to be as lightweight, reliable, and hassle-free as possible, so using a “serverless” architecture that doesn’t require any patching or other maintenance was quite appealing.

I also don’t deploy anything in a production AWS environment outside of some sort of template (usually CloudFormation) as a rule. I don’t want to have to come back to something I created ad hoc in the console after 6 months and try to recall exactly how I architected all of the resources. I’ll inevitably forget something and create more problems before solving the original one. So building the status page in a template was a requirement.

The Design
I settled on a design using two Lambda functions, both written in Python 3.6.

The first Lambda function makes requests out to a list of important services and writes their current status to a DynamoDB table. This function is executed once per minute via CloudWatch Event Rule.

The second Lambda function reads each service’s status & uptime information from DynamoDB and renders a Jinja template. This function is behind an API Gateway that has been configured to return text/html instead of its default application/json Content-Type.

The CloudFormation Template
AWS provides a Serverless Application Model template transformer to streamline the templating of Lambda + API Gateway designs, but it assumes (like everything else about the API Gateway) that you’re actually serving an API that returns JSON content. So, unfortunately, it won’t work for this use-case because we want to return HTML content. Instead, we’ll have to enumerate every resource like usual.

The Skeleton
We’ll be using YAML for the template in this example. I find it easier to read than JSON, but you can easily convert between the two with a converter if you disagree.

---
AWSTemplateFormatVersion: '2010-09-09'
Description: Serverless status page app
Resources:
  # [...Resources]

The Status-Checker Lambda Resource
This one is triggered on a schedule by CloudWatch, and looks like:

# Status Checker Lambda
CheckerLambda:
  Type: AWS::Lambda::Function
  Properties:
    Code: ./lambda.zip
    Environment:
      Variables:
        TABLE_NAME: !Ref DynamoTable
    Handler: checker.handler
    Role:
      Fn::GetAtt:
      - CheckerLambdaRole
      - Arn
    Runtime: python3.6
    Timeout: 45
CheckerLambdaRole:
  Type: AWS::IAM::Role
  Properties:
    ManagedPolicyArns:
    - arn:aws:iam::aws:policy/AmazonDynamoDBFullAccess
    - arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
    AssumeRolePolicyDocument:
      Version: '2012-10-17'
      Statement:
      - Action:
        - sts:AssumeRole
        Effect: Allow
        Principal:
          Service:
          - lambda.amazonaws.com
CheckerLambdaTimer:
  Type: AWS::Events::Rule
  Properties:
    ScheduleExpression: rate(1 minute)
    Targets:
    - Id: CheckerLambdaTimerLambdaTarget
      Arn:
        Fn::GetAtt:
        - CheckerLambda
        - Arn
CheckerLambdaTimerPermission:
  Type: AWS::Lambda::Permission
  Properties:
    Action: lambda:invokeFunction
    FunctionName: !Ref CheckerLambda
    SourceArn:
      Fn::GetAtt:
      - CheckerLambdaTimer
      - Arn
    Principal: events.amazonaws.com

Let’s break that down a bit.

The CheckerLambda is the actual Lambda function. The Code section is a local path to a ZIP file containing the code and its dependencies. I’m using CloudFormation’s packaging feature to automatically push the deployable to S3.

The CheckerLambdaRole is the IAM role the Lambda will assume which grants it access to DynamoDB in addition to the usual Lambda logging permissions.

The CheckerLambdaTimer is the CloudWatch Events Rule that triggers the checker to run once per minute.

The CheckerLambdaTimerPermission grants CloudWatch the ability to invoke the checker Lambda function on its interval.

The Web Page Gateway
The API Gateway handles incoming requests for the web page, invokes the Lambda, and then returns the Lambda’s results as HTML content. Its template looks like:

# API Gateway for Web Page Lambda
PageGateway:
  Type: AWS::ApiGateway::RestApi
  Properties:
    Name: Service Checker Gateway
PageResource:
  Type: AWS::ApiGateway::Resource
  Properties:
    RestApiId: !Ref PageGateway
    ParentId:
      Fn::GetAtt:
      - PageGateway
      - RootResourceId
    PathPart: page
PageGatewayMethod:
  Type: AWS::ApiGateway::Method
  Properties:
    AuthorizationType: NONE
    HttpMethod: GET
    Integration:
      Type: AWS
      IntegrationHttpMethod: POST
      Uri:
        Fn::Sub: arn:aws:apigateway:${AWS::Region}:lambda:path/2015-03-31/functions/${WebRenderLambda.Arn}/invocations
      RequestTemplates:
        application/json: |
          {
              "method": "$context.httpMethod",
              "body" : $input.json('$'),
              "headers": {
                  #foreach($param in $input.params().header.keySet())
                  "$param": "$util.escapeJavaScript($input.params().header.get($param))"
                  #if($foreach.hasNext),#end
                  #end
              }
          }
      IntegrationResponses:
      - StatusCode: 200
        ResponseParameters:
          method.response.header.Content-Type: "'text/html'"
        ResponseTemplates:
          text/html: "$input.path('$')"
    ResourceId: !Ref PageResource
    RestApiId: !Ref PageGateway
    MethodResponses:
    - StatusCode: 200
      ResponseParameters:
        method.response.header.Content-Type: true
PageGatewayProdStage:
  Type: AWS::ApiGateway::Stage
  Properties:
    DeploymentId: !Ref PageGatewayDeployment
    RestApiId: !Ref PageGateway
    StageName: Prod
PageGatewayDeployment:
  Type: AWS::ApiGateway::Deployment
  DependsOn: PageGatewayMethod
  Properties:
    RestApiId: !Ref PageGateway
    Description: PageGateway deployment
    StageName: Stage

There’s a lot going on here, but the real meat is in the PageGatewayMethod section. There are a couple properties that deviate from the default which is why we couldn’t use the SAM transformer.

First, we’re passing request headers through to the Lambda in theRequestTemplates section. I’m doing this so I can validate incoming auth headers. The API Gateway can do some types of auth, but I found it easier to check auth myself in the Lambda function since the Gateway is designed to handle API calls and not browser requests.

Next, note that in the IntegrationResponses section we’re defining the Content-Type header to be ‘text/html’ (with single-quotes) and defining the ResponseTemplate to be $input.path(‘$’). This is what makes the request render as a HTML page in your browser instead of just raw text.

Due to the StageName and PathPart values in the other sections, your actual page will be accessible at https://someId.execute-api.region.amazonaws.com/Prod/page. I have the page behind an existing reverse-proxy and give it a saner URL for end-users. The reverse proxy also attaches the auth header I mentioned above. If that header isn’t present, the Lambda will render an error page instead so the proxy can’t be bypassed.

The Web Page Rendering Lambda
This Lambda is invoked by calls to the API Gateway and looks like:

# Web Page Lambda
WebRenderLambda:
  Type: AWS::Lambda::Function
  Properties:
    Code: ./lambda.zip
    Environment:
      Variables:
        TABLE_NAME: !Ref DynamoTable
    Handler: web.handler
    Role:
      Fn::GetAtt:
      - WebRenderLambdaRole
      - Arn
    Runtime: python3.6
    Timeout: 30
WebRenderLambdaRole:
  Type: AWS::IAM::Role
  Properties:
    ManagedPolicyArns:
    - arn:aws:iam::aws:policy/AmazonDynamoDBReadOnlyAccess
    - arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
    AssumeRolePolicyDocument:
      Version: '2012-10-17'
      Statement:
      - Action:
        - sts:AssumeRole
        Effect: Allow
        Principal:
          Service:
          - lambda.amazonaws.com
WebRenderLambdaGatewayPermission:
  Type: AWS::Lambda::Permission
  Properties:
    FunctionName: !Ref WebRenderLambda
    Action: lambda:invokeFunction
    Principal: apigateway.amazonaws.com
    SourceArn:
      Fn::Sub:
      - arn:aws:execute-api:${AWS::Region}:${AWS::AccountId}:${__ApiId__}/*/*/*
      - __ApiId__: !Ref PageGateway

The WebRenderLambda and WebRenderLambdaRole should look familiar.

The WebRenderLambdaGatewayPermission is similar to the Status Checker’s CloudWatch permission, only this time it allows the API Gateway to invoke this Lambda.

The DynamoDB Table
This one is straightforward.

# DynamoDB table
DynamoTable:
  Type: AWS::DynamoDB::Table
  Properties:
    AttributeDefinitions:
    - AttributeName: name
      AttributeType: S
    ProvisionedThroughput:
      WriteCapacityUnits: 1
      ReadCapacityUnits: 1
    TableName: status-page-checker-results
    KeySchema:
    - KeyType: HASH
      AttributeName: name

The Deployment
We’ve made it this far defining every resource in a template that we can check in to version control, so we might as well script the deployment as well rather than manually manage the CloudFormation Stack via the AWS web console.

Since I’m using the packaging feature, I first run:

$ aws cloudformation package \
    --template-file template.yaml \
    --s3-bucket <some-bucket-name> \
    --output-template-file template-packaged.yaml
Uploading to 34cd6e82c5e8205f9b35e71afd9e1548 1922559 / 1922559.0 (100.00%) Successfully packaged artifacts and wrote output template to file template-packaged.yaml.

Then to deploy the template (whether new or modified), I run:

$ aws cloudformation deploy \
    --region '<aws-region>' \
    --template-file template-packaged.yaml \
    --stack-name '<some-name>' \
    --capabilities CAPABILITY_IAM
Waiting for changeset to be created.. Waiting for stack create/update to complete Successfully created/updated stack - <some-name>

And that’s it! You’ve just created a dynamic web page that will never require you to SSH anywhere, patch a server, recover from a disaster after Amazon terminates your unhealthy EC2, or any other number of pitfalls that are now the problem of some ops person at AWS. And you can reproduce deployments and make changes with confidence because everything is defined in the template and can be tracked in version control.

Best Practices for Running Apache Cassandra on Amazon EC2

Post Syndicated from Prasad Alle original https://aws.amazon.com/blogs/big-data/best-practices-for-running-apache-cassandra-on-amazon-ec2/

Apache Cassandra is a commonly used, high performance NoSQL database. AWS customers that currently maintain Cassandra on-premises may want to take advantage of the scalability, reliability, security, and economic benefits of running Cassandra on Amazon EC2.

Amazon EC2 and Amazon Elastic Block Store (Amazon EBS) provide secure, resizable compute capacity and storage in the AWS Cloud. When combined, you can deploy Cassandra, allowing you to scale capacity according to your requirements. Given the number of possible deployment topologies, it’s not always trivial to select the most appropriate strategy suitable for your use case.

In this post, we outline three Cassandra deployment options, as well as provide guidance about determining the best practices for your use case in the following areas:

  • Cassandra resource overview
  • Deployment considerations
  • Storage options
  • Networking
  • High availability and resiliency
  • Maintenance
  • Security

Before we jump into best practices for running Cassandra on AWS, we should mention that we have many customers who decided to use DynamoDB instead of managing their own Cassandra cluster. DynamoDB is fully managed, serverless, and provides multi-master cross-region replication, encryption at rest, and managed backup and restore. Integration with AWS Identity and Access Management (IAM) enables DynamoDB customers to implement fine-grained access control for their data security needs.

Several customers who have been using large Cassandra clusters for many years have moved to DynamoDB to eliminate the complications of administering Cassandra clusters and maintaining high availability and durability themselves. Gumgum.com is one customer who migrated to DynamoDB and observed significant savings. For more information, see Moving to Amazon DynamoDB from Hosted Cassandra: A Leap Towards 60% Cost Saving per Year.

AWS provides options, so you’re covered whether you want to run your own NoSQL Cassandra database, or move to a fully managed, serverless DynamoDB database.

Cassandra resource overview

Here’s a short introduction to standard Cassandra resources and how they are implemented with AWS infrastructure. If you’re already familiar with Cassandra or AWS deployments, this can serve as a refresher.

ResourceCassandraAWS
Cluster

A single Cassandra deployment.

 

This typically consists of multiple physical locations, keyspaces, and physical servers.

A logical deployment construct in AWS that maps to an AWS CloudFormation StackSet, which consists of one or many CloudFormation stacks to deploy Cassandra.
DatacenterA group of nodes configured as a single replication group.

A logical deployment construct in AWS.

 

A datacenter is deployed with a single CloudFormation stack consisting of Amazon EC2 instances, networking, storage, and security resources.

Rack

A collection of servers.

 

A datacenter consists of at least one rack. Cassandra tries to place the replicas on different racks.

A single Availability Zone.
Server/nodeA physical virtual machine running Cassandra software.An EC2 instance.
TokenConceptually, the data managed by a cluster is represented as a ring. The ring is then divided into ranges equal to the number of nodes. Each node being responsible for one or more ranges of the data. Each node gets assigned with a token, which is essentially a random number from the range. The token value determines the node’s position in the ring and its range of data.Managed within Cassandra.
Virtual node (vnode)Responsible for storing a range of data. Each vnode receives one token in the ring. A cluster (by default) consists of 256 tokens, which are uniformly distributed across all servers in the Cassandra datacenter.Managed within Cassandra.
Replication factorThe total number of replicas across the cluster.Managed within Cassandra.

Deployment considerations

One of the many benefits of deploying Cassandra on Amazon EC2 is that you can automate many deployment tasks. In addition, AWS includes services, such as CloudFormation, that allow you to describe and provision all your infrastructure resources in your cloud environment.

We recommend orchestrating each Cassandra ring with one CloudFormation template. If you are deploying in multiple AWS Regions, you can use a CloudFormation StackSet to manage those stacks. All the maintenance actions (scaling, upgrading, and backing up) should be scripted with an AWS SDK. These may live as standalone AWS Lambda functions that can be invoked on demand during maintenance.

You can get started by following the Cassandra Quick Start deployment guide. Keep in mind that this guide does not address the requirements to operate a production deployment and should be used only for learning more about Cassandra.

Deployment patterns

In this section, we discuss various deployment options available for Cassandra in Amazon EC2. A successful deployment starts with thoughtful consideration of these options. Consider the amount of data, network environment, throughput, and availability.

  • Single AWS Region, 3 Availability Zones
  • Active-active, multi-Region
  • Active-standby, multi-Region

Single region, 3 Availability Zones

In this pattern, you deploy the Cassandra cluster in one AWS Region and three Availability Zones. There is only one ring in the cluster. By using EC2 instances in three zones, you ensure that the replicas are distributed uniformly in all zones.

To ensure the even distribution of data across all Availability Zones, we recommend that you distribute the EC2 instances evenly in all three Availability Zones. The number of EC2 instances in the cluster is a multiple of three (the replication factor).

This pattern is suitable in situations where the application is deployed in one Region or where deployments in different Regions should be constrained to the same Region because of data privacy or other legal requirements.

ProsCons

●     Highly available, can sustain failure of one Availability Zone.

●     Simple deployment

●     Does not protect in a situation when many of the resources in a Region are experiencing intermittent failure.

 

Active-active, multi-Region

In this pattern, you deploy two rings in two different Regions and link them. The VPCs in the two Regions are peered so that data can be replicated between two rings.

We recommend that the two rings in the two Regions be identical in nature, having the same number of nodes, instance types, and storage configuration.

This pattern is most suitable when the applications using the Cassandra cluster are deployed in more than one Region.

ProsCons

●     No data loss during failover.

●     Highly available, can sustain when many of the resources in a Region are experiencing intermittent failures.

●     Read/write traffic can be localized to the closest Region for the user for lower latency and higher performance.

●     High operational overhead

●     The second Region effectively doubles the cost

 

Active-standby, multi-region

In this pattern, you deploy two rings in two different Regions and link them. The VPCs in the two Regions are peered so that data can be replicated between two rings.

However, the second Region does not receive traffic from the applications. It only functions as a secondary location for disaster recovery reasons. If the primary Region is not available, the second Region receives traffic.

We recommend that the two rings in the two Regions be identical in nature, having the same number of nodes, instance types, and storage configuration.

This pattern is most suitable when the applications using the Cassandra cluster require low recovery point objective (RPO) and recovery time objective (RTO).

ProsCons

●     No data loss during failover.

●     Highly available, can sustain failure or partitioning of one whole Region.

●     High operational overhead.

●     High latency for writes for eventual consistency.

●     The second Region effectively doubles the cost.

Storage options

In on-premises deployments, Cassandra deployments use local disks to store data. There are two storage options for EC2 instances:

Your choice of storage is closely related to the type of workload supported by the Cassandra cluster. Instance store works best for most general purpose Cassandra deployments. However, in certain read-heavy clusters, Amazon EBS is a better choice.

The choice of instance type is generally driven by the type of storage:

  • If ephemeral storage is required for your application, a storage-optimized (I3) instance is the best option.
  • If your workload requires Amazon EBS, it is best to go with compute-optimized (C5) instances.
  • Burstable instance types (T2) don’t offer good performance for Cassandra deployments.

Instance store

Ephemeral storage is local to the EC2 instance. It may provide high input/output operations per second (IOPs) based on the instance type. An SSD-based instance store can support up to 3.3M IOPS in I3 instances. This high performance makes it an ideal choice for transactional or write-intensive applications such as Cassandra.

In general, instance storage is recommended for transactional, large, and medium-size Cassandra clusters. For a large cluster, read/write traffic is distributed across a higher number of nodes, so the loss of one node has less of an impact. However, for smaller clusters, a quick recovery for the failed node is important.

As an example, for a cluster with 100 nodes, the loss of 1 node is 3.33% loss (with a replication factor of 3). Similarly, for a cluster with 10 nodes, the loss of 1 node is 33% less capacity (with a replication factor of 3).

 Ephemeral storageAmazon EBSComments

IOPS

(translates to higher query performance)

Up to 3.3M on I3

80K/instance

10K/gp2/volume

32K/io1/volume

This results in a higher query performance on each host. However, Cassandra implicitly scales well in terms of horizontal scale. In general, we recommend scaling horizontally first. Then, scale vertically to mitigate specific issues.

 

Note: 3.3M IOPS is observed with 100% random read with a 4-KB block size on Amazon Linux.

AWS instance typesI3Compute optimized, C5Being able to choose between different instance types is an advantage in terms of CPU, memory, etc., for horizontal and vertical scaling.
Backup/ recoveryCustomBasic building blocks are available from AWS.

Amazon EBS offers distinct advantage here. It is small engineering effort to establish a backup/restore strategy.

a) In case of an instance failure, the EBS volumes from the failing instance are attached to a new instance.

b) In case of an EBS volume failure, the data is restored by creating a new EBS volume from last snapshot.

Amazon EBS

EBS volumes offer higher resiliency, and IOPs can be configured based on your storage needs. EBS volumes also offer some distinct advantages in terms of recovery time. EBS volumes can support up to 32K IOPS per volume and up to 80K IOPS per instance in RAID configuration. They have an annualized failure rate (AFR) of 0.1–0.2%, which makes EBS volumes 20 times more reliable than typical commodity disk drives.

The primary advantage of using Amazon EBS in a Cassandra deployment is that it reduces data-transfer traffic significantly when a node fails or must be replaced. The replacement node joins the cluster much faster. However, Amazon EBS could be more expensive, depending on your data storage needs.

Cassandra has built-in fault tolerance by replicating data to partitions across a configurable number of nodes. It can not only withstand node failures but if a node fails, it can also recover by copying data from other replicas into a new node. Depending on your application, this could mean copying tens of gigabytes of data. This adds additional delay to the recovery process, increases network traffic, and could possibly impact the performance of the Cassandra cluster during recovery.

Data stored on Amazon EBS is persisted in case of an instance failure or termination. The node’s data stored on an EBS volume remains intact and the EBS volume can be mounted to a new EC2 instance. Most of the replicated data for the replacement node is already available in the EBS volume and won’t need to be copied over the network from another node. Only the changes made after the original node failed need to be transferred across the network. That makes this process much faster.

EBS volumes are snapshotted periodically. So, if a volume fails, a new volume can be created from the last known good snapshot and be attached to a new instance. This is faster than creating a new volume and coping all the data to it.

Most Cassandra deployments use a replication factor of three. However, Amazon EBS does its own replication under the covers for fault tolerance. In practice, EBS volumes are about 20 times more reliable than typical disk drives. So, it is possible to go with a replication factor of two. This not only saves cost, but also enables deployments in a region that has two Availability Zones.

EBS volumes are recommended in case of read-heavy, small clusters (fewer nodes) that require storage of a large amount of data. Keep in mind that the Amazon EBS provisioned IOPS could get expensive. General purpose EBS volumes work best when sized for required performance.

Networking

If your cluster is expected to receive high read/write traffic, select an instance type that offers 10–Gb/s performance. As an example, i3.8xlarge and c5.9xlarge both offer 10–Gb/s networking performance. A smaller instance type in the same family leads to a relatively lower networking throughput.

Cassandra generates a universal unique identifier (UUID) for each node based on IP address for the instance. This UUID is used for distributing vnodes on the ring.

In the case of an AWS deployment, IP addresses are assigned automatically to the instance when an EC2 instance is created. With the new IP address, the data distribution changes and the whole ring has to be rebalanced. This is not desirable.

To preserve the assigned IP address, use a secondary elastic network interface with a fixed IP address. Before swapping an EC2 instance with a new one, detach the secondary network interface from the old instance and attach it to the new one. This way, the UUID remains same and there is no change in the way that data is distributed in the cluster.

If you are deploying in more than one region, you can connect the two VPCs in two regions using cross-region VPC peering.

High availability and resiliency

Cassandra is designed to be fault-tolerant and highly available during multiple node failures. In the patterns described earlier in this post, you deploy Cassandra to three Availability Zones with a replication factor of three. Even though it limits the AWS Region choices to the Regions with three or more Availability Zones, it offers protection for the cases of one-zone failure and network partitioning within a single Region. The multi-Region deployments described earlier in this post protect when many of the resources in a Region are experiencing intermittent failure.

Resiliency is ensured through infrastructure automation. The deployment patterns all require a quick replacement of the failing nodes. In the case of a regionwide failure, when you deploy with the multi-Region option, traffic can be directed to the other active Region while the infrastructure is recovering in the failing Region. In the case of unforeseen data corruption, the standby cluster can be restored with point-in-time backups stored in Amazon S3.

Maintenance

In this section, we look at ways to ensure that your Cassandra cluster is healthy:

  • Scaling
  • Upgrades
  • Backup and restore

Scaling

Cassandra is horizontally scaled by adding more instances to the ring. We recommend doubling the number of nodes in a cluster to scale up in one scale operation. This leaves the data homogeneously distributed across Availability Zones. Similarly, when scaling down, it’s best to halve the number of instances to keep the data homogeneously distributed.

Cassandra is vertically scaled by increasing the compute power of each node. Larger instance types have proportionally bigger memory. Use deployment automation to swap instances for bigger instances without downtime or data loss.

Upgrades

All three types of upgrades (Cassandra, operating system patching, and instance type changes) follow the same rolling upgrade pattern.

In this process, you start with a new EC2 instance and install software and patches on it. Thereafter, remove one node from the ring. For more information, see Cassandra cluster Rolling upgrade. Then, you detach the secondary network interface from one of the EC2 instances in the ring and attach it to the new EC2 instance. Restart the Cassandra service and wait for it to sync. Repeat this process for all nodes in the cluster.

Backup and restore

Your backup and restore strategy is dependent on the type of storage used in the deployment. Cassandra supports snapshots and incremental backups. When using instance store, a file-based backup tool works best. Customers use rsync or other third-party products to copy data backups from the instance to long-term storage. For more information, see Backing up and restoring data in the DataStax documentation. This process has to be repeated for all instances in the cluster for a complete backup. These backup files are copied back to new instances to restore. We recommend using S3 to durably store backup files for long-term storage.

For Amazon EBS based deployments, you can enable automated snapshots of EBS volumes to back up volumes. New EBS volumes can be easily created from these snapshots for restoration.

Security

We recommend that you think about security in all aspects of deployment. The first step is to ensure that the data is encrypted at rest and in transit. The second step is to restrict access to unauthorized users. For more information about security, see the Cassandra documentation.

Encryption at rest

Encryption at rest can be achieved by using EBS volumes with encryption enabled. Amazon EBS uses AWS KMS for encryption. For more information, see Amazon EBS Encryption.

Instance store–based deployments require using an encrypted file system or an AWS partner solution. If you are using DataStax Enterprise, it supports transparent data encryption.

Encryption in transit

Cassandra uses Transport Layer Security (TLS) for client and internode communications.

Authentication

The security mechanism is pluggable, which means that you can easily swap out one authentication method for another. You can also provide your own method of authenticating to Cassandra, such as a Kerberos ticket, or if you want to store passwords in a different location, such as an LDAP directory.

Authorization

The authorizer that’s plugged in by default is org.apache.cassandra.auth.Allow AllAuthorizer. Cassandra also provides a role-based access control (RBAC) capability, which allows you to create roles and assign permissions to these roles.

Conclusion

In this post, we discussed several patterns for running Cassandra in the AWS Cloud. This post describes how you can manage Cassandra databases running on Amazon EC2. AWS also provides managed offerings for a number of databases. To learn more, see Purpose-built databases for all your application needs.

If you have questions or suggestions, please comment below.


Additional Reading

If you found this post useful, be sure to check out Analyze Your Data on Amazon DynamoDB with Apache Spark and Analysis of Top-N DynamoDB Objects using Amazon Athena and Amazon QuickSight.


About the Authors

Prasad Alle is a Senior Big Data Consultant with AWS Professional Services. He spends his time leading and building scalable, reliable Big data, Machine learning, Artificial Intelligence and IoT solutions for AWS Enterprise and Strategic customers. His interests extend to various technologies such as Advanced Edge Computing, Machine learning at Edge. In his spare time, he enjoys spending time with his family.

 

 

 

Provanshu Dey is a Senior IoT Consultant with AWS Professional Services. He works on highly scalable and reliable IoT, data and machine learning solutions with our customers. In his spare time, he enjoys spending time with his family and tinkering with electronics & gadgets.

 

 

 

The Challenges of Opening a Data Center — Part 1

Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/choosing-data-center/

Backblaze storage pod in new data center

This is part one of a series. The second part will be posted later this week. Use the Join button above to receive notification of future posts in this series.

Though most of us have never set foot inside of a data center, as citizens of a data-driven world we nonetheless depend on the services that data centers provide almost as much as we depend on a reliable water supply, the electrical grid, and the highway system. Every time we send a tweet, post to Facebook, check our bank balance or credit score, watch a YouTube video, or back up a computer to the cloud we are interacting with a data center.

In this series, The Challenges of Opening a Data Center, we’ll talk in general terms about the factors that an organization needs to consider when opening a data center and the challenges that must be met in the process. Many of the factors to consider will be similar for opening a private data center or seeking space in a public data center, but we’ll assume for the sake of this discussion that our needs are more modest than requiring a data center dedicated solely to our own use (i.e. we’re not Google, Facebook, or China Telecom).

Data center technology and management are changing rapidly, with new approaches to design and operation appearing every year. This means we won’t be able to cover everything happening in the world of data centers in our series, however, we hope our brief overview proves useful.

What is a Data Center?

A data center is the structure that houses a large group of networked computer servers typically used by businesses, governments, and organizations for the remote storage, processing, or distribution of large amounts of data.

While many organizations will have computing services in the same location as their offices that support their day-to-day operations, a data center is a structure dedicated to 24/7 large-scale data processing and handling.

Depending on how you define the term, there are anywhere from a half million data centers in the world to many millions. While it’s possible to say that an organization’s on-site servers and data storage can be called a data center, in this discussion we are using the term data center to refer to facilities that are expressly dedicated to housing computer systems and associated components, such as telecommunications and storage systems. The facility might be a private center, which is owned or leased by one tenant only, or a shared data center that offers what are called “colocation services,” and rents space, services, and equipment to multiple tenants in the center.

A large, modern data center operates around the clock, placing a priority on providing secure and uninterrrupted service, and generally includes redundant or backup power systems or supplies, redundant data communication connections, environmental controls, fire suppression systems, and numerous security devices. Such a center is an industrial-scale operation often using as much electricity as a small town.

Types of Data Centers

There are a number of ways to classify data centers according to how they will be used, whether they are owned or used by one or multiple organizations, whether and how they fit into a topology of other data centers; which technologies and management approaches they use for computing, storage, cooling, power, and operations; and increasingly visible these days: how green they are.

Data centers can be loosely classified into three types according to who owns them and who uses them.

Exclusive Data Centers are facilities wholly built, maintained, operated and managed by the business for the optimal operation of its IT equipment. Some of these centers are well-known companies such as Facebook, Google, or Microsoft, while others are less public-facing big telecoms, insurance companies, or other service providers.

Managed Hosting Providers are data centers managed by a third party on behalf of a business. The business does not own data center or space within it. Rather, the business rents IT equipment and infrastructure it needs instead of investing in the outright purchase of what it needs.

Colocation Data Centers are usually large facilities built to accommodate multiple businesses within the center. The business rents its own space within the data center and subsequently fills the space with its IT equipment, or possibly uses equipment provided by the data center operator.

Backblaze, for example, doesn’t own its own data centers but colocates in data centers owned by others. As Backblaze’s storage needs grow, Backblaze increases the space it uses within a given data center and/or expands to other data centers in the same or different geographic areas.

Availability is Key

When designing or selecting a data center, an organization needs to decide what level of availability is required for its services. The type of business or service it provides likely will dictate this. Any organization that provides real-time and/or critical data services will need the highest level of availability and redundancy, as well as the ability to rapidly failover (transfer operation to another center) when and if required. Some organizations require multiple data centers not just to handle the computer or storage capacity they use, but to provide alternate locations for operation if something should happen temporarily or permanently to one or more of their centers.

Organizations operating data centers that can’t afford any downtime at all will typically operate data centers that have a mirrored site that can take over if something happens to the first site, or they operate a second site in parallel to the first one. These data center topologies are called Active/Passive, and Active/Active, respectively. Should disaster or an outage occur, disaster mode would dictate immediately moving all of the primary data center’s processing to the second data center.

While some data center topologies are spread throughout a single country or continent, others extend around the world. Practically, data transmission speeds put a cap on centers that can be operated in parallel with the appearance of simultaneous operation. Linking two data centers located apart from each other — say no more than 60 miles to limit data latency issues — together with dark fiber (leased fiber optic cable) could enable both data centers to be operated as if they were in the same location, reducing staffing requirements yet providing immediate failover to the secondary data center if needed.

This redundancy of facilities and ensured availability is of paramount importance to those needing uninterrupted data center services.

Active/Passive Data Centers

Active/Active Data Centers

LEED Certification

Leadership in Energy and Environmental Design (LEED) is a rating system devised by the United States Green Building Council (USGBC) for the design, construction, and operation of green buildings. Facilities can achieve ratings of certified, silver, gold, or platinum based on criteria within six categories: sustainable sites, water efficiency, energy and atmosphere, materials and resources, indoor environmental quality, and innovation and design.

Green certification has become increasingly important in data center design and operation as data centers require great amounts of electricity and often cooling water to operate. Green technologies can reduce costs for data center operation, as well as make the arrival of data centers more amenable to environmentally-conscious communities.

The ACT, Inc. data center in Iowa City, Iowa was the first data center in the U.S. to receive LEED-Platinum certification, the highest level available.

ACT Data Center exterior

ACT Data Center exterior

ACT Data Center interior

ACT Data Center interior

Factors to Consider When Selecting a Data Center

There are numerous factors to consider when deciding to build or to occupy space in a data center. Aspects such as proximity to available power grids, telecommunications infrastructure, networking services, transportation lines, and emergency services can affect costs, risk, security and other factors that need to be taken into consideration.

The size of the data center will be dictated by the business requirements of the owner or tenant. A data center can occupy one room of a building, one or more floors, or an entire building. Most of the equipment is often in the form of servers mounted in 19 inch rack cabinets, which are usually placed in single rows forming corridors (so-called aisles) between them. This allows staff access to the front and rear of each cabinet. Servers differ greatly in size from 1U servers (i.e. one “U” or “RU” rack unit measuring 44.50 millimeters or 1.75 inches), to Backblaze’s Storage Pod design that fits a 4U chassis, to large freestanding storage silos that occupy many square feet of floor space.

Location

Location will be one of the biggest factors to consider when selecting a data center and encompasses many other factors that should be taken into account, such as geological risks, neighboring uses, and even local flight paths. Access to suitable available power at a suitable price point is often the most critical factor and the longest lead time item, followed by broadband service availability.

With more and more data centers available providing varied levels of service and cost, the choices increase each year. Data center brokers can be employed to find a data center, just as one might use a broker for home or other commercial real estate.

Websites listing available colocation space, such as upstack.io, or entire data centers for sale or lease, are widely used. A common practice is for a customer to publish its data center requirements, and the vendors compete to provide the most attractive bid in a reverse auction.

Business and Customer Proximity

The center’s closeness to a business or organization may or may not be a factor in the site selection. The organization might wish to be close enough to manage the center or supervise the on-site staff from a nearby business location. The location of customers might be a factor, especially if data transmission speeds and latency are important, or the business or customers have regulatory, political, tax, or other considerations that dictate areas suitable or not suitable for the storage and processing of data.

Climate

Local climate is a major factor in data center design because the climatic conditions dictate what cooling technologies should be deployed. In turn this impacts uptime and the costs associated with cooling, which can total as much as 50% or more of a center’s power costs. The topology and the cost of managing a data center in a warm, humid climate will vary greatly from managing one in a cool, dry climate. Nevertheless, data centers are located in both extremely cold regions and extremely hot ones, with innovative approaches used in both extremes to maintain desired temperatures within the center.

Geographic Stability and Extreme Weather Events

A major obvious factor in locating a data center is the stability of the actual site as regards weather, seismic activity, and the likelihood of weather events such as hurricanes, as well as fire or flooding.

Backblaze’s Sacramento data center describes its location as one of the most stable geographic locations in California, outside fault zones and floodplains.

Sacramento Data Center

Sometimes the location of the center comes first and the facility is hardened to withstand anticipated threats, such as Equinix’s NAP of the Americas data center in Miami, one of the largest single-building data centers on the planet (six stories and 750,000 square feet), which is built 32 feet above sea level and designed to withstand category 5 hurricane winds.

Equinix Data Center in Miami

Equinix “NAP of the Americas” Data Center in Miami

Most data centers don’t have the extreme protection or history of the Bahnhof data center, which is located inside the ultra-secure former nuclear bunker Pionen, in Stockholm, Sweden. It is buried 100 feet below ground inside the White Mountains and secured behind 15.7 in. thick metal doors. It prides itself on its self-described “Bond villain” ambiance.

Bahnhof Data Center under White Mountain in Stockholm

Usually, the data center owner or tenant will want to take into account the balance between cost and risk in the selection of a location. The Ideal quadrant below is obviously favored when making this compromise.

Cost vs Risk in selecting a data center

Cost = Construction/lease, power, bandwidth, cooling, labor, taxes
Risk = Environmental (seismic, weather, water, fire), political, economic

Risk mitigation also plays a strong role in pricing. The extent to which providers must implement special building techniques and operating technologies to protect the facility will affect price. When selecting a data center, organizations must make note of the data center’s certification level on the basis of regulatory requirements in the industry. These certifications can ensure that an organization is meeting necessary compliance requirements.

Power

Electrical power usually represents the largest cost in a data center. The cost a service provider pays for power will be affected by the source of the power, the regulatory environment, the facility size and the rate concessions, if any, offered by the utility. At higher level tiers, battery, generator, and redundant power grids are a required part of the picture.

Fault tolerance and power redundancy are absolutely necessary to maintain uninterrupted data center operation. Parallel redundancy is a safeguard to ensure that an uninterruptible power supply (UPS) system is in place to provide electrical power if necessary. The UPS system can be based on batteries, saved kinetic energy, or some type of generator using diesel or another fuel. The center will operate on the UPS system with another UPS system acting as a backup power generator. If a power outage occurs, the additional UPS system power generator is available.

Many data centers require the use of independent power grids, with service provided by different utility companies or services, to prevent against loss of electrical service no matter what the cause. Some data centers have intentionally located themselves near national borders so that they can obtain redundant power from not just separate grids, but from separate geopolitical sources.

Higher redundancy levels required by a company will of invariably lead to higher prices. If one requires high availability backed by a service-level agreement (SLA), one can expect to pay more than another company with less demanding redundancy requirements.

Stay Tuned for Part 2 of The Challenges of Opening a Data Center

That’s it for part 1 of this post. In subsequent posts, we’ll take a look at some other factors to consider when moving into a data center such as network bandwidth, cooling, and security. We’ll take a look at what is involved in moving into a new data center (including stories from Backblaze’s experiences). We’ll also investigate what it takes to keep a data center running, and some of the new technologies and trends affecting data center design and use. You can discover all posts on our blog tagged with “Data Center” by following the link https://www.backblaze.com/blog/tag/data-center/.

The second part of this series on The Challenges of Opening a Data Center will be posted later this week. Use the Join button above to receive notification of future posts in this series.

The post The Challenges of Opening a Data Center — Part 1 appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Connect Veeam to the B2 Cloud: Episode 2 — Using StarWind VTL

Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/hybrid-cloud-example-veem-vtl-cloud/

Connect Veeam to the B2 Cloud

View all posts in the Veeam series.

In the first post in this series, we discussed how to connect Veeam to the B2 cloud using Synology. In this post, we continue our Veeam/B2 series with a discussion of how to back up Veeam to the Backblaze B2 Cloud using StarWind VTL.

StarWind provides “VTL” (Virtual Tape Library) technology that enables users to back up their “VMs” (virtual machines) from Veeam to on-premise or cloud storage. StarWind does this using standard “LTO” (Linear Tape-Open) protocols. This appeals to organizations that have LTO in place since it allows adoption of more scalable, cost efficient cloud storage without having to update the internal backup infrastructure.

Why An Additional Backup in the Cloud?

Common backup strategy, known as 3-2-1, dictates having three copies at a minimum of active data. Two copies are stored locally and one copy is in another location.

Relying solely on on-site redundancy does not guarantee data protection after a catastrophic or temporary loss of service affecting the primary data center. To reach maximum data security, an on-premises private cloud backup combined with an off-site public cloud backup, known as hybrid cloud, provides the best combination of security and rapid recovery when required.

Why Consider a Hybrid Cloud Solution?

The Hybrid Cloud Provides Superior Disaster Recovery and Business Continuity

Having a backup strategy that combines on-premise storage with public cloud storage in a single or multi-cloud configuration is becoming the solution of choice for organizations that wish to eliminate dependence on vulnerable on-premises storage. It also provides reliable and rapidly deployed recovery when needed.

If an organization requires restoration of service as quickly as possible after an outage or disaster, it needs to have a backup that isn’t dependent on the same network. That means a backup stored in the cloud that can be restored to another location or cloud-based compute service and put into service immediately after an outage.

Hybrid Cloud Example: VTL and the Cloud

Some organizations will already have made a significant investment in software and hardware that supports LTO protocols. Specifically, they are using Veeam to back up their VMs onto physical tape. Using StarWind to act as a VTL with Veeam enables users to save time and money by connecting their on-premises Veeam Backup & Replication archives to Backblaze B2 Cloud Storage.

Why Veeam, StarWind VTL, and Backblaze B2?

What are the primary reasons that an organization would want to adopt Veeam + StarWind VTL + B2 as a hybrid cloud backup solution?

  1. You are already invested in Veeam along with LTO software and hardware.

Using Veeam plus StarWind VTL with already-existing LTO infrastructure enables organizations to quickly and cost-effectively benefit from cloud storage.

  1. You require rapid and reliable recovery of service should anything disrupt your primary data center.

Having a backup in the cloud with B2 provides an economical primary or secondary cloud storage solution and enables fast restoration to a current or alternate location, as well as providing the option to quickly bring online a cloud-based compute service, thereby minimizing any loss of service and ensuring business continuity. Backblaze’s B2 is an ideal solution for backing up Veeam’s backup repository due to B2’s combination of low-cost and high availability compared to other cloud solutions such as Microsoft Azure or Amazon AWS.

Using Veeam, StarWind VTL, and Backblaze B2 cloud storage is a superior alternative to tape as B2 offers better economics, instant access, and faster recovery.

 

Workflow for how to connect Veeam to the Backblaze B2 Cloud using StarWind VTL

Connect Veeam to the Backblaze B2 Cloud using StarWind VTL (graphic courtesy of StarWind)

View all posts in the Veeam series.

The post Connect Veeam to the B2 Cloud: Episode 2 — Using StarWind VTL appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Preining: In memoriam Staszek Wawrykiewicz

Post Syndicated from ris original https://lwn.net/Articles/747123/rss

Norbert Preining reports
the sad news
that Staszek Wawrykiewicz has died. “Staszek was an
active member of the Polish TeX community, and an incredibly valuable TeX
Live Team member. His insistence and perseverance have saved TeX Live from
many disasters and bugs. Although I have been in contact with Staszek over
the TeX Live mailing lists since some years, I met him in person for the
first time on my first ever BachoTeX, the EuroBachoTeX 2007. His
friendliness, openness to all new things, his inquisitiveness, all took a
great place in my heart.
” (Thanks to Paul Wise)

Scale Your Web Application — One Step at a Time

Post Syndicated from Saurabh Shrivastava original https://aws.amazon.com/blogs/architecture/scale-your-web-application-one-step-at-a-time/

I often encounter people experiencing frustration as they attempt to scale their e-commerce or WordPress site—particularly around the cost and complexity related to scaling. When I talk to customers about their scaling plans, they often mention phrases such as horizontal scaling and microservices, but usually people aren’t sure about how to dive in and effectively scale their sites.

Now let’s talk about different scaling options. For instance if your current workload is in a traditional data center, you can leverage the cloud for your on-premises solution. This way you can scale to achieve greater efficiency with less cost. It’s not necessary to set up a whole powerhouse to light a few bulbs. If your workload is already in the cloud, you can use one of the available out-of-the-box options.

Designing your API in microservices and adding horizontal scaling might seem like the best choice, unless your web application is already running in an on-premises environment and you’ll need to quickly scale it because of unexpected large spikes in web traffic.

So how to handle this situation? Take things one step at a time when scaling and you may find horizontal scaling isn’t the right choice, after all.

For example, assume you have a tech news website where you did an early-look review of an upcoming—and highly-anticipated—smartphone launch, which went viral. The review, a blog post on your website, includes both video and pictures. Comments are enabled for the post and readers can also rate it. For example, if your website is hosted on a traditional Linux with a LAMP stack, you may find yourself with immediate scaling problems.

Let’s get more details on the current scenario and dig out more:

  • Where are images and videos stored?
  • How many read/write requests are received per second? Per minute?
  • What is the level of security required?
  • Are these synchronous or asynchronous requests?

We’ll also want to consider the following if your website has a transactional load like e-commerce or banking:

How is the website handling sessions?

  • Do you have any compliance requests—like the Payment Card Industry Data Security Standard (PCI DSS compliance) —if your website is using its own payment gateway?
  • How are you recording customer behavior data and fulfilling your analytics needs?
  • What are your loading balancing considerations (scaling, caching, session maintenance, etc.)?

So, if we take this one step at a time:

Step 1: Ease server load. We need to quickly handle spikes in traffic, generated by activity on the blog post, so let’s reduce server load by moving image and video to some third -party content delivery network (CDN). AWS provides Amazon CloudFront as a CDN solution, which is highly scalable with built-in security to verify origin access identity and handle any DDoS attacks. CloudFront can direct traffic to your on-premises or cloud-hosted server with its 113 Points of Presence (102 Edge Locations and 11 Regional Edge Caches) in 56 cities across 24 countries, which provides efficient caching.
Step 2: Reduce read load by adding more read replicas. MySQL provides a nice mirror replication for databases. Oracle has its own Oracle plug for replication and AWS RDS provide up to five read replicas, which can span across the region and even the Amazon database Amazon Aurora can have 15 read replicas with Amazon Aurora autoscaling support. If a workload is highly variable, you should consider Amazon Aurora Serverless database  to achieve high efficiency and reduced cost. While most mirror technologies do asynchronous replication, AWS RDS can provide synchronous multi-AZ replication, which is good for disaster recovery but not for scalability. Asynchronous replication to mirror instance means replication data can sometimes be stale if network bandwidth is low, so you need to plan and design your application accordingly.

I recommend that you always use a read replica for any reporting needs and try to move non-critical GET services to read replica and reduce the load on the master database. In this case, loading comments associated with a blog can be fetched from a read replica—as it can handle some delay—in case there is any issue with asynchronous reflection.

Step 3: Reduce write requests. This can be achieved by introducing queue to process the asynchronous message. Amazon Simple Queue Service (Amazon SQS) is a highly-scalable queue, which can handle any kind of work-message load. You can process data, like rating and review; or calculate Deal Quality Score (DQS) using batch processing via an SQS queue. If your workload is in AWS, I recommend using a job-observer pattern by setting up Auto Scaling to automatically increase or decrease the number of batch servers, using the number of SQS messages, with Amazon CloudWatch, as the trigger.  For on-premises workloads, you can use SQS SDK to create an Amazon SQS queue that holds messages until they’re processed by your stack. Or you can use Amazon SNS  to fan out your message processing in parallel for different purposes like adding a watermark in an image, generating a thumbnail, etc.

Step 4: Introduce a more robust caching engine. You can use Amazon Elastic Cache for Memcached or Redis to reduce write requests. Memcached and Redis have different use cases so if you can afford to lose and recover your cache from your database, use Memcached. If you are looking for more robust data persistence and complex data structure, use Redis. In AWS, these are managed services, which means AWS takes care of the workload for you and you can also deploy them in your on-premises instances or use a hybrid approach.

Step 5: Scale your server. If there are still issues, it’s time to scale your server.  For the greatest cost-effectiveness and unlimited scalability, I suggest always using horizontal scaling. However, use cases like database vertical scaling may be a better choice until you are good with sharding; or use Amazon Aurora Serverless for variable workloads. It will be wise to use Auto Scaling to manage your workload effectively for horizontal scaling. Also, to achieve that, you need to persist the session. Amazon DynamoDB can handle session persistence across instances.

If your server is on premises, consider creating a multisite architecture, which will help you achieve quick scalability as required and provide a good disaster recovery solution.  You can pick and choose individual services like Amazon Route 53, AWS CloudFormation, Amazon SQS, Amazon SNS, Amazon RDS, etc. depending on your needs.

Your multisite architecture will look like the following diagram:

In this architecture, you can run your regular workload on premises, and use your AWS workload as required for scalability and disaster recovery. Using Route 53, you can direct a precise percentage of users to an AWS workload.

If you decide to move all of your workloads to AWS, the recommended multi-AZ architecture would look like the following:

In this architecture, you are using a multi-AZ distributed workload for high availability. You can have a multi-region setup and use Route53 to distribute your workload between AWS Regions. CloudFront helps you to scale and distribute static content via an S3 bucket and DynamoDB, maintaining your application state so that Auto Scaling can apply horizontal scaling without loss of session data. At the database layer, RDS with multi-AZ standby provides high availability and read replica helps achieve scalability.

This is a high-level strategy to help you think through the scalability of your workload by using AWS even if your workload in on premises and not in the cloud…yet.

I highly recommend creating a hybrid, multisite model by placing your on-premises environment replica in the public cloud like AWS Cloud, and using Amazon Route53 DNS Service and Elastic Load Balancing to route traffic between on-premises and cloud environments. AWS now supports load balancing between AWS and on-premises environments to help you scale your cloud environment quickly, whenever required, and reduce it further by applying Amazon auto-scaling and placing a threshold on your on-premises traffic using Route 53.

Why Raspberry Pi isn’t vulnerable to Spectre or Meltdown

Post Syndicated from Eben Upton original https://www.raspberrypi.org/blog/why-raspberry-pi-isnt-vulnerable-to-spectre-or-meltdown/

Over the last couple of days, there has been a lot of discussion about a pair of security vulnerabilities nicknamed Spectre and Meltdown. These affect all modern Intel processors, and (in the case of Spectre) many AMD processors and ARM cores. Spectre allows an attacker to bypass software checks to read data from arbitrary locations in the current address space; Meltdown allows an attacker to read arbitrary data from the operating system kernel’s address space (which should normally be inaccessible to user programs).

Both vulnerabilities exploit performance features (caching and speculative execution) common to many modern processors to leak data via a so-called side-channel attack. Happily, the Raspberry Pi isn’t susceptible to these vulnerabilities, because of the particular ARM cores that we use.

To help us understand why, here’s a little primer on some concepts in modern processor design. We’ll illustrate these concepts using simple programs in Python syntax like this one:

t = a+b
u = c+d
v = e+f
w = v+g
x = h+i
y = j+k

While the processor in your computer doesn’t execute Python directly, the statements here are simple enough that they roughly correspond to a single machine instruction. We’re going to gloss over some details (notably pipelining and register renaming) which are very important to processor designers, but which aren’t necessary to understand how Spectre and Meltdown work.

For a comprehensive description of processor design, and other aspects of modern computer architecture, you can’t do better than Hennessy and Patterson’s classic Computer Architecture: A Quantitative Approach.

What is a scalar processor?

The simplest sort of modern processor executes one instruction per cycle; we call this a scalar processor. Our example above will execute in six cycles on a scalar processor.

Examples of scalar processors include the Intel 486 and the ARM1176 core used in Raspberry Pi 1 and Raspberry Pi Zero.

What is a superscalar processor?

The obvious way to make a scalar processor (or indeed any processor) run faster is to increase its clock speed. However, we soon reach limits of how fast the logic gates inside the processor can be made to run; processor designers therefore quickly began to look for ways to do several things at once.

An in-order superscalar processor examines the incoming stream of instructions and tries execute more than one at once, in one of several “pipes”, subject to dependencies between the instructions. Dependencies are important: you might think that a two-way superscalar processor could just pair up (or dual-issue) the six instructions in our example like this:

t, u = a+b, c+d
v, w = e+f, v+g
x, y = h+i, j+k

But this doesn’t make sense: we have to compute v before we can compute w, so the third and fourth instructions can’t be executed at the same time. Our two-way superscalar processor won’t be able to find anything to pair with the third instruction, so our example will execute in four cycles:

t, u = a+b, c+d
v    = e+f                   # second pipe does nothing here
w, x = v+g, h+i
y    = j+k

Examples of superscalar processors include the Intel Pentium, and the ARM Cortex-A7 and Cortex-A53 cores used in Raspberry Pi 2 and Raspberry Pi 3 respectively. Raspberry Pi 3 has only a 33% higher clock speed than Raspberry Pi 2, but has roughly double the performance: the extra performance is partly a result of Cortex-A53’s ability to dual-issue a broader range of instructions than Cortex-A7.

What is an out-of-order processor?

Going back to our example, we can see that, although we have a dependency between v and w, we have other independent instructions later in the program that we could potentially have used to fill the empty pipe during the second cycle. An out-of-order superscalar processor has the ability to shuffle the order of incoming instructions (again subject to dependencies) in order to keep its pipelines busy.

An out-of-order processor might effectively swap the definitions of w and x in our example like this:

t = a+b
u = c+d
v = e+f
x = h+i
w = v+g
y = j+k

allowing it to execute in three cycles:

t, u = a+b, c+d
v, x = e+f, h+i
w, y = v+g, j+k

Examples of out-of-order processors include the Intel Pentium 2 (and most subsequent Intel and AMD x86 processors), and many recent ARM cores, including Cortex-A9, -A15, -A17, and -A57.

What is speculation?

Reordering sequential instructions is a powerful way to recover more instruction-level parallelism, but as processors become wider (able to triple- or quadruple-issue instructions) it becomes harder to keep all those pipes busy. Modern processors have therefore grown the ability to speculate. Speculative execution lets us issue instructions which might turn out not to be required (because they are branched over): this keeps a pipe busy, and if it turns out that the instruction isn’t executed, we can just throw the result away.

To demonstrate the benefits of speculation, let’s look at another example:

t = a+b
u = t+c
v = u+d
if v:
   w = e+f
   x = w+g
   y = x+h

Now we have dependencies from t to u to v, and from w to x to y, so a two-way out-of-order processor without speculation won’t ever be able to fill its second pipe. It spends three cycles computing t, u, and v, after which it knows whether the body of the if statement will execute, in which case it then spends three cycles computing w, x, and y. Assuming the if (a branch instruction) takes one cycle, our example takes either four cycles (if v turns out to be zero) or seven cycles (if v is non-zero).

Speculation effectively shuffles the program like this:

t = a+b
u = t+c
v = u+d
w_ = e+f
x_ = w_+g
y_ = x_+h
if v:
   w, x, y = w_, x_, y_

so we now have additional instruction level parallelism to keep our pipes busy:

t, w_ = a+b, e+f
u, x_ = t+c, w_+g
v, y_ = u+d, x_+h
if v:
   w, x, y = w_, x_, y_

Cycle counting becomes less well defined in speculative out-of-order processors, but the branch and conditional update of w, x, and y are (approximately) free, so our example executes in (approximately) three cycles.

What is a cache?

In the good old days*, the speed of processors was well matched with the speed of memory access. My BBC Micro, with its 2MHz 6502, could execute an instruction roughly every 2µs (microseconds), and had a memory cycle time of 0.25µs. Over the ensuing 35 years, processors have become very much faster, but memory only modestly so: a single Cortex-A53 in a Raspberry Pi 3 can execute an instruction roughly every 0.5ns (nanoseconds), but can take up to 100ns to access main memory.

At first glance, this sounds like a disaster: every time we access memory, we’ll end up waiting for 100ns to get the result back. In this case, this example:

a = mem[0]
b = mem[1]

would take 200ns.

In practice, programs tend to access memory in relatively predictable ways, exhibiting both temporal locality (if I access a location, I’m likely to access it again soon) and spatial locality (if I access a location, I’m likely to access a nearby location soon). Caching takes advantage of these properties to reduce the average cost of access to memory.

A cache is a small on-chip memory, close to the processor, which stores copies of the contents of recently used locations (and their neighbours), so that they are quickly available on subsequent accesses. With caching, the example above will execute in a little over 100ns:

a = mem[0]    # 100ns delay, copies mem[0:15] into cache
b = mem[1]    # mem[1] is in the cache

From the point of view of Spectre and Meltdown, the important point is that if you can time how long a memory access takes, you can determine whether the address you accessed was in the cache (short time) or not (long time).

What is a side channel?

From Wikipedia:

“… a side-channel attack is any attack based on information gained from the physical implementation of a cryptosystem, rather than brute force or theoretical weaknesses in the algorithms (compare cryptanalysis). For example, timing information, power consumption, electromagnetic leaks or even sound can provide an extra source of information, which can be exploited to break the system.”

Spectre and Meltdown are side-channel attacks which deduce the contents of a memory location which should not normally be accessible by using timing to observe whether another location is present in the cache.

Putting it all together

Now let’s look at how speculation and caching combine to permit the Meltdown attack. Consider the following example, which is a user program that sometimes reads from an illegal (kernel) address:

t = a+b
u = t+c
v = u+d
if v:
   w = kern_mem[address]   # if we get here crash
   x = w&0x100
   y = user_mem[x]

Now our out-of-order two-way superscalar processor shuffles the program like this:

t, w_ = a+b, kern_mem[address]
u, x_ = t+c, w_&0x100
v, y_ = u+d, user_mem[x_]

if v:
   # crash
   w, x, y = w_, x_, y_      # we never get here

Even though the processor always speculatively reads from the kernel address, it must defer the resulting fault until it knows that v was non-zero. On the face of it, this feels safe because either:

  • v is zero, so the result of the illegal read isn’t committed to w
  • v is non-zero, so the program crashes before the read is committed to w

However, suppose we flush our cache before executing the code, and arrange a, b, c, and d so that v is zero. Now, the speculative load in the third cycle:

v, y_ = u+d, user_mem[x_]

will read from either address 0x000 or address 0x100 depending on the eighth bit of the result of the illegal read. Because v is zero, the results of the speculative instructions will be discarded, and execution will continue. If we time a subsequent access to one of those addresses, we can determine which address is in the cache. Congratulations: you’ve just read a single bit from the kernel’s address space!

The real Meltdown exploit is more complex than this, but the principle is the same. Spectre uses a similar approach to subvert software array bounds checks.

Conclusion

Modern processors go to great lengths to preserve the abstraction that they are in-order scalar machines that access memory directly, while in fact using a host of techniques including caching, instruction reordering, and speculation to deliver much higher performance than a simple processor could hope to achieve. Meltdown and Spectre are examples of what happens when we reason about security in the context of that abstraction, and then encounter minor discrepancies between the abstraction and reality.

The lack of speculation in the ARM1176, Cortex-A7, and Cortex-A53 cores used in Raspberry Pi render us immune to attacks of the sort.

* days may not be that old, or that good

The post Why Raspberry Pi isn’t vulnerable to Spectre or Meltdown appeared first on Raspberry Pi.

DAST vs SAST – Dynamic Application Security Testing vs Static

Post Syndicated from Darknet original https://www.darknet.org.uk/2017/12/dast-vs-sast-dynamic-application-security-testing-vs-static/?utm_source=rss&utm_medium=social&utm_campaign=darknetfeed

DAST vs SAST – Dynamic Application Security Testing vs Static

In security testing, much like most things technical there are two very contrary methods, Dynamic Application Security Testing or DAST and Static Application Security Testing or SAST.

Dynamic testing relying on a black-box external approach, attacking the application in it’s running state as a regular malicious attacker would.

Static testing is more white-box looking at the source-code of the application for potential flaws.

Personally, I don’t see them as ‘vs’ each other, but more like they compliment each other – it’s easy to have SAST tests as part of your CI/CD pipeline with tools like Code Climate.

Read the rest of DAST vs SAST – Dynamic Application Security Testing vs Static now! Only available at Darknet.