Tag Archives: restoration

The Practical Effects of GDPR at Backblaze

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/the-practical-effects-of-gdpr-at-backblaze/


GDPR day, May 25, 2018, is nearly here. On that day, will your inbox explode with update notices, opt-in agreements, and offers from lawyers searching for GDPR violators? Perhaps all the companies on earth that are not GDPR ready will just dissolve into dust. More likely, there will be some changes, but business as usual will continue and we’ll all be more aware of data privacy. Let’s go with the last one.

What’s Different With GDPR at Backblaze

The biggest difference you’ll notice is a completely updated Privacy Policy. Last week we sent out a service email announcing the new Privacy Policy. Some people asked what was different. Basically everything. About 95% of the agreement was rewritten. In the agreement, we added in the appropriate provisions required by GDPR, and hopefully did a better job specifying the data we collect from you, why we collect it, and what we are going to do with it.

As a reminder, at Backblaze your data falls into two catagories. The first type of data is the data you store with us — stored data. These are the files and objects you upload and store, and as needed, restore. We do not share this data. We do not process this data, except as requested by you to store and restore the data. We do not analyze this data looking for keywords, tags, images, etc. No one outside of Backblaze has access to this data unless you explicitly shared the data by providing that person access to one or more files.

The second type of data is your account data. Some of your account data is considered personal data. This is the information we collect from you to provide our Personal Backup, Business Backup and B2 Cloud Storage services. Examples include your email address to provide access to your account, or the name of your computer so we can organize your files like they are arranged on your computer to make restoration easier. We have written a number of Help Articles covering the different ways this information is collected and processed. In addition, these help articles outline the various “rights” granted via GDPR. We will continue to add help articles over the coming weeks to assist in making it easy to work with us to understand and exercise your rights.

What’s New With GDPR at Backblaze

The most obvious addition is the Data Processing Addendum (DPA). This covers how we protect the data you store with us, i.e. stored data. As noted above, we don’t do anything with your data, except store it and keep it safe until you need it. Now we have a separate document saying that.

It is important to note the new Data Processing Addendum is now incorporated by reference into our Terms of Service, which everyone agrees to when they sign up for any of our services. Now all of our customers have a shiny new Data Processing Agreement to go along with the updated Privacy Policy. We promise they are not long or complicated, and we encourage you to read them. If you have any questions, stop by our GDPR help section on our website.

Patience, Please

Every company we have dealt with over the last few months is working hard to comply with GDPR. It has been a tough road whether you tried to do it yourself or like Backblaze, hired an EU-based law firm for advice. Over the coming weeks and months as you reach out to discover and assert your rights, please have a little patience. We are all going through a steep learning curve as GDPR gets put into practice. Along the way there are certain to be some growing pains — give us a chance, we all want to get it right.

Regardless, at Backblaze we’ve been diligently protecting our customers’ data for over 11 years and nothing that will happen on May 25th will change that.

The post The Practical Effects of GDPR at Backblaze appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Amazon Aurora Backtrack – Turn Back Time

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/amazon-aurora-backtrack-turn-back-time/

We’ve all been there! You need to make a quick, seemingly simple fix to an important production database. You compose the query, give it a once-over, and let it run. Seconds later you realize that you forgot the WHERE clause, dropped the wrong table, or made another serious mistake, and interrupt the query, but the damage has been done. You take a deep breath, whistle through your teeth, wish that reality came with an Undo option. Now what?

New Amazon Aurora Backtrack
Today I would like to tell you about the new backtrack feature for Amazon Aurora. This is as close as we can come, given present-day technology, to an Undo option for reality.

This feature can be enabled at launch time for all newly-launched Aurora database clusters. To enable it, you simply specify how far back in time you might want to rewind, and use the database as usual (this is on the Configure advanced settings page):

Aurora uses a distributed, log-structured storage system (read Design Considerations for High Throughput Cloud-Native Relational Databases to learn a lot more); each change to your database generates a new log record, identified by a Log Sequence Number (LSN). Enabling the backtrack feature provisions a FIFO buffer in the cluster for storage of LSNs. This allows for quick access and recovery times measured in seconds.

After that regrettable moment when all seems lost, you simply pause your application, open up the Aurora Console, select the cluster, and click Backtrack DB cluster:

Then you select Backtrack and choose the point in time just before your epic fail, and click Backtrack DB cluster:

Then you wait for the rewind to take place, unpause your application and proceed as if nothing had happened. When you initiate a backtrack, Aurora will pause the database, close any open connections, drop uncommitted writes, and wait for the backtrack to complete. Then it will resume normal operation and being to accept requests. The instance state will be backtracking while the rewind is underway:

The console will let you know when the backtrack is complete:

If it turns out that you went back a bit too far, you can backtrack to a later time. Other Aurora features such as cloning, backups, and restores continue to work on an instance that has been configured for backtrack.

I’m sure you can think of some creative and non-obvious use cases for this cool new feature. For example, you could use it to restore a test database after running a test that makes changes to the database. You can initiate the restoration from the API or the CLI, making it easy to integrate into your existing test framework.

Things to Know
This option applies to newly created MySQL-compatible Aurora database clusters and to MySQL-compatible clusters that have been restored from a backup. You must opt-in when you create or restore a cluster; you cannot enable it for a running cluster.

This feature is available now in all AWS Regions where Amazon Aurora runs, and you can start using it today.

Jeff;

Best Practices for Running Apache Cassandra on Amazon EC2

Post Syndicated from Prasad Alle original https://aws.amazon.com/blogs/big-data/best-practices-for-running-apache-cassandra-on-amazon-ec2/

Apache Cassandra is a commonly used, high performance NoSQL database. AWS customers that currently maintain Cassandra on-premises may want to take advantage of the scalability, reliability, security, and economic benefits of running Cassandra on Amazon EC2.

Amazon EC2 and Amazon Elastic Block Store (Amazon EBS) provide secure, resizable compute capacity and storage in the AWS Cloud. When combined, you can deploy Cassandra, allowing you to scale capacity according to your requirements. Given the number of possible deployment topologies, it’s not always trivial to select the most appropriate strategy suitable for your use case.

In this post, we outline three Cassandra deployment options, as well as provide guidance about determining the best practices for your use case in the following areas:

  • Cassandra resource overview
  • Deployment considerations
  • Storage options
  • Networking
  • High availability and resiliency
  • Maintenance
  • Security

Before we jump into best practices for running Cassandra on AWS, we should mention that we have many customers who decided to use DynamoDB instead of managing their own Cassandra cluster. DynamoDB is fully managed, serverless, and provides multi-master cross-region replication, encryption at rest, and managed backup and restore. Integration with AWS Identity and Access Management (IAM) enables DynamoDB customers to implement fine-grained access control for their data security needs.

Several customers who have been using large Cassandra clusters for many years have moved to DynamoDB to eliminate the complications of administering Cassandra clusters and maintaining high availability and durability themselves. Gumgum.com is one customer who migrated to DynamoDB and observed significant savings. For more information, see Moving to Amazon DynamoDB from Hosted Cassandra: A Leap Towards 60% Cost Saving per Year.

AWS provides options, so you’re covered whether you want to run your own NoSQL Cassandra database, or move to a fully managed, serverless DynamoDB database.

Cassandra resource overview

Here’s a short introduction to standard Cassandra resources and how they are implemented with AWS infrastructure. If you’re already familiar with Cassandra or AWS deployments, this can serve as a refresher.

ResourceCassandraAWS
Cluster

A single Cassandra deployment.

 

This typically consists of multiple physical locations, keyspaces, and physical servers.

A logical deployment construct in AWS that maps to an AWS CloudFormation StackSet, which consists of one or many CloudFormation stacks to deploy Cassandra.
DatacenterA group of nodes configured as a single replication group.

A logical deployment construct in AWS.

 

A datacenter is deployed with a single CloudFormation stack consisting of Amazon EC2 instances, networking, storage, and security resources.

Rack

A collection of servers.

 

A datacenter consists of at least one rack. Cassandra tries to place the replicas on different racks.

A single Availability Zone.
Server/nodeA physical virtual machine running Cassandra software.An EC2 instance.
TokenConceptually, the data managed by a cluster is represented as a ring. The ring is then divided into ranges equal to the number of nodes. Each node being responsible for one or more ranges of the data. Each node gets assigned with a token, which is essentially a random number from the range. The token value determines the node’s position in the ring and its range of data.Managed within Cassandra.
Virtual node (vnode)Responsible for storing a range of data. Each vnode receives one token in the ring. A cluster (by default) consists of 256 tokens, which are uniformly distributed across all servers in the Cassandra datacenter.Managed within Cassandra.
Replication factorThe total number of replicas across the cluster.Managed within Cassandra.

Deployment considerations

One of the many benefits of deploying Cassandra on Amazon EC2 is that you can automate many deployment tasks. In addition, AWS includes services, such as CloudFormation, that allow you to describe and provision all your infrastructure resources in your cloud environment.

We recommend orchestrating each Cassandra ring with one CloudFormation template. If you are deploying in multiple AWS Regions, you can use a CloudFormation StackSet to manage those stacks. All the maintenance actions (scaling, upgrading, and backing up) should be scripted with an AWS SDK. These may live as standalone AWS Lambda functions that can be invoked on demand during maintenance.

You can get started by following the Cassandra Quick Start deployment guide. Keep in mind that this guide does not address the requirements to operate a production deployment and should be used only for learning more about Cassandra.

Deployment patterns

In this section, we discuss various deployment options available for Cassandra in Amazon EC2. A successful deployment starts with thoughtful consideration of these options. Consider the amount of data, network environment, throughput, and availability.

  • Single AWS Region, 3 Availability Zones
  • Active-active, multi-Region
  • Active-standby, multi-Region

Single region, 3 Availability Zones

In this pattern, you deploy the Cassandra cluster in one AWS Region and three Availability Zones. There is only one ring in the cluster. By using EC2 instances in three zones, you ensure that the replicas are distributed uniformly in all zones.

To ensure the even distribution of data across all Availability Zones, we recommend that you distribute the EC2 instances evenly in all three Availability Zones. The number of EC2 instances in the cluster is a multiple of three (the replication factor).

This pattern is suitable in situations where the application is deployed in one Region or where deployments in different Regions should be constrained to the same Region because of data privacy or other legal requirements.

ProsCons

●     Highly available, can sustain failure of one Availability Zone.

●     Simple deployment

●     Does not protect in a situation when many of the resources in a Region are experiencing intermittent failure.

 

Active-active, multi-Region

In this pattern, you deploy two rings in two different Regions and link them. The VPCs in the two Regions are peered so that data can be replicated between two rings.

We recommend that the two rings in the two Regions be identical in nature, having the same number of nodes, instance types, and storage configuration.

This pattern is most suitable when the applications using the Cassandra cluster are deployed in more than one Region.

ProsCons

●     No data loss during failover.

●     Highly available, can sustain when many of the resources in a Region are experiencing intermittent failures.

●     Read/write traffic can be localized to the closest Region for the user for lower latency and higher performance.

●     High operational overhead

●     The second Region effectively doubles the cost

 

Active-standby, multi-region

In this pattern, you deploy two rings in two different Regions and link them. The VPCs in the two Regions are peered so that data can be replicated between two rings.

However, the second Region does not receive traffic from the applications. It only functions as a secondary location for disaster recovery reasons. If the primary Region is not available, the second Region receives traffic.

We recommend that the two rings in the two Regions be identical in nature, having the same number of nodes, instance types, and storage configuration.

This pattern is most suitable when the applications using the Cassandra cluster require low recovery point objective (RPO) and recovery time objective (RTO).

ProsCons

●     No data loss during failover.

●     Highly available, can sustain failure or partitioning of one whole Region.

●     High operational overhead.

●     High latency for writes for eventual consistency.

●     The second Region effectively doubles the cost.

Storage options

In on-premises deployments, Cassandra deployments use local disks to store data. There are two storage options for EC2 instances:

Your choice of storage is closely related to the type of workload supported by the Cassandra cluster. Instance store works best for most general purpose Cassandra deployments. However, in certain read-heavy clusters, Amazon EBS is a better choice.

The choice of instance type is generally driven by the type of storage:

  • If ephemeral storage is required for your application, a storage-optimized (I3) instance is the best option.
  • If your workload requires Amazon EBS, it is best to go with compute-optimized (C5) instances.
  • Burstable instance types (T2) don’t offer good performance for Cassandra deployments.

Instance store

Ephemeral storage is local to the EC2 instance. It may provide high input/output operations per second (IOPs) based on the instance type. An SSD-based instance store can support up to 3.3M IOPS in I3 instances. This high performance makes it an ideal choice for transactional or write-intensive applications such as Cassandra.

In general, instance storage is recommended for transactional, large, and medium-size Cassandra clusters. For a large cluster, read/write traffic is distributed across a higher number of nodes, so the loss of one node has less of an impact. However, for smaller clusters, a quick recovery for the failed node is important.

As an example, for a cluster with 100 nodes, the loss of 1 node is 3.33% loss (with a replication factor of 3). Similarly, for a cluster with 10 nodes, the loss of 1 node is 33% less capacity (with a replication factor of 3).

 Ephemeral storageAmazon EBSComments

IOPS

(translates to higher query performance)

Up to 3.3M on I3

80K/instance

10K/gp2/volume

32K/io1/volume

This results in a higher query performance on each host. However, Cassandra implicitly scales well in terms of horizontal scale. In general, we recommend scaling horizontally first. Then, scale vertically to mitigate specific issues.

 

Note: 3.3M IOPS is observed with 100% random read with a 4-KB block size on Amazon Linux.

AWS instance typesI3Compute optimized, C5Being able to choose between different instance types is an advantage in terms of CPU, memory, etc., for horizontal and vertical scaling.
Backup/ recoveryCustomBasic building blocks are available from AWS.

Amazon EBS offers distinct advantage here. It is small engineering effort to establish a backup/restore strategy.

a) In case of an instance failure, the EBS volumes from the failing instance are attached to a new instance.

b) In case of an EBS volume failure, the data is restored by creating a new EBS volume from last snapshot.

Amazon EBS

EBS volumes offer higher resiliency, and IOPs can be configured based on your storage needs. EBS volumes also offer some distinct advantages in terms of recovery time. EBS volumes can support up to 32K IOPS per volume and up to 80K IOPS per instance in RAID configuration. They have an annualized failure rate (AFR) of 0.1–0.2%, which makes EBS volumes 20 times more reliable than typical commodity disk drives.

The primary advantage of using Amazon EBS in a Cassandra deployment is that it reduces data-transfer traffic significantly when a node fails or must be replaced. The replacement node joins the cluster much faster. However, Amazon EBS could be more expensive, depending on your data storage needs.

Cassandra has built-in fault tolerance by replicating data to partitions across a configurable number of nodes. It can not only withstand node failures but if a node fails, it can also recover by copying data from other replicas into a new node. Depending on your application, this could mean copying tens of gigabytes of data. This adds additional delay to the recovery process, increases network traffic, and could possibly impact the performance of the Cassandra cluster during recovery.

Data stored on Amazon EBS is persisted in case of an instance failure or termination. The node’s data stored on an EBS volume remains intact and the EBS volume can be mounted to a new EC2 instance. Most of the replicated data for the replacement node is already available in the EBS volume and won’t need to be copied over the network from another node. Only the changes made after the original node failed need to be transferred across the network. That makes this process much faster.

EBS volumes are snapshotted periodically. So, if a volume fails, a new volume can be created from the last known good snapshot and be attached to a new instance. This is faster than creating a new volume and coping all the data to it.

Most Cassandra deployments use a replication factor of three. However, Amazon EBS does its own replication under the covers for fault tolerance. In practice, EBS volumes are about 20 times more reliable than typical disk drives. So, it is possible to go with a replication factor of two. This not only saves cost, but also enables deployments in a region that has two Availability Zones.

EBS volumes are recommended in case of read-heavy, small clusters (fewer nodes) that require storage of a large amount of data. Keep in mind that the Amazon EBS provisioned IOPS could get expensive. General purpose EBS volumes work best when sized for required performance.

Networking

If your cluster is expected to receive high read/write traffic, select an instance type that offers 10–Gb/s performance. As an example, i3.8xlarge and c5.9xlarge both offer 10–Gb/s networking performance. A smaller instance type in the same family leads to a relatively lower networking throughput.

Cassandra generates a universal unique identifier (UUID) for each node based on IP address for the instance. This UUID is used for distributing vnodes on the ring.

In the case of an AWS deployment, IP addresses are assigned automatically to the instance when an EC2 instance is created. With the new IP address, the data distribution changes and the whole ring has to be rebalanced. This is not desirable.

To preserve the assigned IP address, use a secondary elastic network interface with a fixed IP address. Before swapping an EC2 instance with a new one, detach the secondary network interface from the old instance and attach it to the new one. This way, the UUID remains same and there is no change in the way that data is distributed in the cluster.

If you are deploying in more than one region, you can connect the two VPCs in two regions using cross-region VPC peering.

High availability and resiliency

Cassandra is designed to be fault-tolerant and highly available during multiple node failures. In the patterns described earlier in this post, you deploy Cassandra to three Availability Zones with a replication factor of three. Even though it limits the AWS Region choices to the Regions with three or more Availability Zones, it offers protection for the cases of one-zone failure and network partitioning within a single Region. The multi-Region deployments described earlier in this post protect when many of the resources in a Region are experiencing intermittent failure.

Resiliency is ensured through infrastructure automation. The deployment patterns all require a quick replacement of the failing nodes. In the case of a regionwide failure, when you deploy with the multi-Region option, traffic can be directed to the other active Region while the infrastructure is recovering in the failing Region. In the case of unforeseen data corruption, the standby cluster can be restored with point-in-time backups stored in Amazon S3.

Maintenance

In this section, we look at ways to ensure that your Cassandra cluster is healthy:

  • Scaling
  • Upgrades
  • Backup and restore

Scaling

Cassandra is horizontally scaled by adding more instances to the ring. We recommend doubling the number of nodes in a cluster to scale up in one scale operation. This leaves the data homogeneously distributed across Availability Zones. Similarly, when scaling down, it’s best to halve the number of instances to keep the data homogeneously distributed.

Cassandra is vertically scaled by increasing the compute power of each node. Larger instance types have proportionally bigger memory. Use deployment automation to swap instances for bigger instances without downtime or data loss.

Upgrades

All three types of upgrades (Cassandra, operating system patching, and instance type changes) follow the same rolling upgrade pattern.

In this process, you start with a new EC2 instance and install software and patches on it. Thereafter, remove one node from the ring. For more information, see Cassandra cluster Rolling upgrade. Then, you detach the secondary network interface from one of the EC2 instances in the ring and attach it to the new EC2 instance. Restart the Cassandra service and wait for it to sync. Repeat this process for all nodes in the cluster.

Backup and restore

Your backup and restore strategy is dependent on the type of storage used in the deployment. Cassandra supports snapshots and incremental backups. When using instance store, a file-based backup tool works best. Customers use rsync or other third-party products to copy data backups from the instance to long-term storage. For more information, see Backing up and restoring data in the DataStax documentation. This process has to be repeated for all instances in the cluster for a complete backup. These backup files are copied back to new instances to restore. We recommend using S3 to durably store backup files for long-term storage.

For Amazon EBS based deployments, you can enable automated snapshots of EBS volumes to back up volumes. New EBS volumes can be easily created from these snapshots for restoration.

Security

We recommend that you think about security in all aspects of deployment. The first step is to ensure that the data is encrypted at rest and in transit. The second step is to restrict access to unauthorized users. For more information about security, see the Cassandra documentation.

Encryption at rest

Encryption at rest can be achieved by using EBS volumes with encryption enabled. Amazon EBS uses AWS KMS for encryption. For more information, see Amazon EBS Encryption.

Instance store–based deployments require using an encrypted file system or an AWS partner solution. If you are using DataStax Enterprise, it supports transparent data encryption.

Encryption in transit

Cassandra uses Transport Layer Security (TLS) for client and internode communications.

Authentication

The security mechanism is pluggable, which means that you can easily swap out one authentication method for another. You can also provide your own method of authenticating to Cassandra, such as a Kerberos ticket, or if you want to store passwords in a different location, such as an LDAP directory.

Authorization

The authorizer that’s plugged in by default is org.apache.cassandra.auth.Allow AllAuthorizer. Cassandra also provides a role-based access control (RBAC) capability, which allows you to create roles and assign permissions to these roles.

Conclusion

In this post, we discussed several patterns for running Cassandra in the AWS Cloud. This post describes how you can manage Cassandra databases running on Amazon EC2. AWS also provides managed offerings for a number of databases. To learn more, see Purpose-built databases for all your application needs.

If you have questions or suggestions, please comment below.


Additional Reading

If you found this post useful, be sure to check out Analyze Your Data on Amazon DynamoDB with Apache Spark and Analysis of Top-N DynamoDB Objects using Amazon Athena and Amazon QuickSight.


About the Authors

Prasad Alle is a Senior Big Data Consultant with AWS Professional Services. He spends his time leading and building scalable, reliable Big data, Machine learning, Artificial Intelligence and IoT solutions for AWS Enterprise and Strategic customers. His interests extend to various technologies such as Advanced Edge Computing, Machine learning at Edge. In his spare time, he enjoys spending time with his family.

 

 

 

Provanshu Dey is a Senior IoT Consultant with AWS Professional Services. He works on highly scalable and reliable IoT, data and machine learning solutions with our customers. In his spare time, he enjoys spending time with his family and tinkering with electronics & gadgets.

 

 

 

Connect Veeam to the B2 Cloud: Episode 2 — Using StarWind VTL

Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/hybrid-cloud-example-veem-vtl-cloud/

Connect Veeam to the B2 Cloud

View all posts in the Veeam series.

In the first post in this series, we discussed how to connect Veeam to the B2 cloud using Synology. In this post, we continue our Veeam/B2 series with a discussion of how to back up Veeam to the Backblaze B2 Cloud using StarWind VTL.

StarWind provides “VTL” (Virtual Tape Library) technology that enables users to back up their “VMs” (virtual machines) from Veeam to on-premise or cloud storage. StarWind does this using standard “LTO” (Linear Tape-Open) protocols. This appeals to organizations that have LTO in place since it allows adoption of more scalable, cost efficient cloud storage without having to update the internal backup infrastructure.

Why An Additional Backup in the Cloud?

Common backup strategy, known as 3-2-1, dictates having three copies at a minimum of active data. Two copies are stored locally and one copy is in another location.

Relying solely on on-site redundancy does not guarantee data protection after a catastrophic or temporary loss of service affecting the primary data center. To reach maximum data security, an on-premises private cloud backup combined with an off-site public cloud backup, known as hybrid cloud, provides the best combination of security and rapid recovery when required.

Why Consider a Hybrid Cloud Solution?

The Hybrid Cloud Provides Superior Disaster Recovery and Business Continuity

Having a backup strategy that combines on-premise storage with public cloud storage in a single or multi-cloud configuration is becoming the solution of choice for organizations that wish to eliminate dependence on vulnerable on-premises storage. It also provides reliable and rapidly deployed recovery when needed.

If an organization requires restoration of service as quickly as possible after an outage or disaster, it needs to have a backup that isn’t dependent on the same network. That means a backup stored in the cloud that can be restored to another location or cloud-based compute service and put into service immediately after an outage.

Hybrid Cloud Example: VTL and the Cloud

Some organizations will already have made a significant investment in software and hardware that supports LTO protocols. Specifically, they are using Veeam to back up their VMs onto physical tape. Using StarWind to act as a VTL with Veeam enables users to save time and money by connecting their on-premises Veeam Backup & Replication archives to Backblaze B2 Cloud Storage.

Why Veeam, StarWind VTL, and Backblaze B2?

What are the primary reasons that an organization would want to adopt Veeam + StarWind VTL + B2 as a hybrid cloud backup solution?

  1. You are already invested in Veeam along with LTO software and hardware.

Using Veeam plus StarWind VTL with already-existing LTO infrastructure enables organizations to quickly and cost-effectively benefit from cloud storage.

  1. You require rapid and reliable recovery of service should anything disrupt your primary data center.

Having a backup in the cloud with B2 provides an economical primary or secondary cloud storage solution and enables fast restoration to a current or alternate location, as well as providing the option to quickly bring online a cloud-based compute service, thereby minimizing any loss of service and ensuring business continuity. Backblaze’s B2 is an ideal solution for backing up Veeam’s backup repository due to B2’s combination of low-cost and high availability compared to other cloud solutions such as Microsoft Azure or Amazon AWS.

Using Veeam, StarWind VTL, and Backblaze B2 cloud storage is a superior alternative to tape as B2 offers better economics, instant access, and faster recovery.

 

Workflow for how to connect Veeam to the Backblaze B2 Cloud using StarWind VTL

Connect Veeam to the Backblaze B2 Cloud using StarWind VTL (graphic courtesy of StarWind)

View all posts in the Veeam series.

The post Connect Veeam to the B2 Cloud: Episode 2 — Using StarWind VTL appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

B2 Cloud Storage Roundup

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/b2-cloud-storage-roundup/

B2 Integrations
Over the past several months, B2 Cloud Storage has continued to grow like we planted magic beans. During that time we have added a B2 Java SDK, and certified integrations with GoodSync, Arq, Panic, UpdraftPlus, Morro Data, QNAP, Archiware, Restic, and more. In addition, B2 customers like Panna Cooking, Sermon Audio, and Fellowship Church are happy they chose B2 as their cloud storage provider. If any of that sounds interesting, read on.

The B2 Java SDK

While the Backblaze B2 API is well documented and straight-forward to implement, we were asked by a few of our Integration Partners if we had an SDK they could use. So we developed one as an open-course project on GitHub, where we hope interested parties will not only use our Java SDK, but make it better for everyone else.

There are different reasons one might use the Java SDK, but a couple of areas where the SDK can simplify the coding process are:

Expiring Authorization — B2 requires an application key for a given account be reissued once a day when using the API. If the application key expires while you are in the middle of transferring files or some other B2 activity (bucket list, etc.), the SDK can be used to detect and then update the application key on the fly. Your B2 related activities will continue without incident and without having to capture and code your own exception case.

Error Handling — There are different types of error codes B2 will return, from expired application keys to detecting malformed requests to command time-outs. The SDK can dramatically simplify the coding needed to capture and account for the various things that can happen.

While Backblaze has created the Java SDK, developers in the GitHub community have also created other SDKs for B2, for example, for PHP (https://github.com/cwhite92/b2-sdk-php,) and Go (https://github.com/kurin/blazer.) Let us know in the comments about other SDKs you’d like to see or perhaps start your own GitHub project. We will publish any updates in our next B2 roundup.

What You Can Do with Affordable and Available Cloud Storage

You’re probably aware that B2 is up to 75% less expensive than other similar cloud storage services like Amazon S3 and Microsoft Azure. Businesses and organizations are finding that projects that previously weren’t economically feasible with other Cloud Storage services are now not only possible, but a reality with B2. Here are a few recent examples:

SermonAudio logoSermonAudio wanted their media files to be readily available, but didn’t want to build and manage their own internal storage farm. Until B2, cloud storage was just too expensive to use. Now they use B2 to store their audio and video files, and also as the primary source of downloads and streaming requests from their subscribers.
Fellowship Church logoFellowship Church wanted to escape from the ever increasing amount of time they were spending saving their data to their LTO-based system. Using B2 saved countless hours of personnel time versus LTO, fit easily into their video processing workflow, and provided instant access at any time to their media library.
Panna logoPanna Cooking replaced their closet full of archive hard drives with a cost-efficient hybrid-storage solution combining 45Drives and Backblaze B2 Cloud Storage. Archived media files that used to take hours to locate are now readily available regardless of whether they reside in local storage or in the B2 Cloud.

B2 Integrations

Leading companies in backup, archive, and sync continue to add B2 Cloud Storage as a storage destination for their customers. These companies realize that by offering B2 as an option, they can dramatically lower the total cost of ownership for their customers — and that’s always a good thing.

If your favorite application is not integrated to B2, you can do something about it. One integration partner told us they received over 200 customer requests for a B2 integration. The partner got the message and the integration is currently in beta test.

Below are some of the partner integrations completed in the past few months. You can check the B2 Partner Integrations page for a complete list.

Archiware — Both P5 Archive and P5 Backup can now store data in the B2 Cloud making your offsite media files readily available while keeping your off-site storage costs predictable and affordable.

Arq — Combine Arq and B2 for amazingly affordable backup of external drives, network drives, NAS devices, Windows PCs, Windows Servers, and Macs to the cloud.

GoodSync — Automatically synchronize and back up all your photos, music, email, and other important files between all your desktops, laptops, servers, external drives, and sync, or back up to B2 Cloud Storage for off-site storage.

QNAP — QNAP Hybrid Backup Sync consolidates backup, restoration, and synchronization functions into a single QTS application to easily transfer your data to local, remote, and cloud storage.

Morro Data — Their CloudNAS solution stores files in the cloud, caches them locally as needed, and syncs files globally among other CloudNAS systems in an organization.

Restic – Restic is a fast, secure, multi-platform command line backup program. Files are uploaded to a B2 bucket as de-duplicated, encrypted chunks. Each backup is a snapshot of only the data that has changed, making restores of a specific date or time easy.

Transmit 5 by Panic — Transmit 5, the gold standard for macOS file transfer apps, now supports B2. Upload, download, and manage files on tons of servers with an easy, familiar, and powerful UI.

UpdraftPlus — WordPress developers and admins can now use the UpdraftPlus Premium WordPress plugin to affordably back up their data to the B2 Cloud.

Getting Started with B2 Cloud Storage

If you’re using B2 today, thank you. If you’d like to try B2, but don’t know where to start, here’s a guide to getting started with the B2 Web Interface — no programming or scripting is required. You get 10 gigabytes of free storage and 1 gigabyte a day in free downloads. Give it a try.

The post B2 Cloud Storage Roundup appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Digitising film reels with Pi Film Capture

Post Syndicated from Janina Ander original https://www.raspberrypi.org/blog/digitising-reels-pi-film-capture/

Joe Herman’s Pi Film Capture project combines old projectors and a stepper motor with a Raspberry Pi and a Raspberry Pi Camera Module, to transform his grandfather’s 8- and 16-mm home movies into glorious digital films.

We chatted to him about his Pi Film Capture build at Maker Faire New York 2016:

Film to Digital Conversion at Maker Faire New York 2016

Uploaded by Raspberry Pi on 2017-08-25.

What inspired Pi Film Capture?

Joe’s grandfather, Leo Willmott, loved recording home movies of his family of eight children and their grandchildren. He passed away when Joe was five, but in 2013 Joe found a way to connect with his legacy: while moving house, a family member uncovered a box of more than a hundred of Leo’s film reels. These covered decades of family history, and some dated back as far as 1939.

Super 8 film reels

Kodachrome film reels of the type Leo used

This provided an unexpected opportunity for Leo’s family to restore some of their shared history. Joe immediately made plans to digitise the material, knowing that the members of his extensive family tree would provide an eager audience.

Building Pi Film Capture

After a failed attempt with a DSLR camera, Joe realised he couldn’t simply re-film the movies — instead, he would have to capture each frame individually. He combined a Raspberry Pi with an old Super 8 projector, and set about rigging up something to do just that.

He went through numerous stages of prototyping, and his final hardware setup works very well. A NEMA 17 stepper motor  moves the film reel forward in the projector. A magnetic reed switch triggers the Camera Module each time the reel moves on to the next frame. Joe hacked the Camera Module so that it has a different focal distance, and he also added a magnifying lens. Moreover, he realised it would be useful to have a diffuser to ‘smooth’ some of the faults in the aged film reel material. To do this, he mounted “a bit of translucent white plastic from an old ceiling fixture” parallel with the film.

Pi Film Capture device by Joe Herman

Joe’s 16-mm projector, with embedded Raspberry Pi hardware

Software solutions

In addition to capturing every single frame (sometimes with multiple exposure settings), Joe found that he needed intensive post-processing to restore some of the films. He settled on sending the images from the Pi to a more powerful Linux machine. To enable processing of the raw data, he had to write Python scripts implementing several open-source software packages. For example, to deal with the varying quality of the film reels more easily, Joe implemented a GUI (written with the help of PyQt), which he uses to change the capture parameters. This was a demanding job, as he was relatively new to using these tools.

Top half of GUI for Pi Film Capture Joe Herman

The top half of Joe’s GUI, because the whole thing is really long and really thin and would have looked weird on the blog…

If a frame is particularly damaged, Joe can capture multiple instances of the image at different settings. These are then merged to achieve a good-quality image using OpenCV functionality. Joe uses FFmpeg to stitch the captured images back together into a film. Some of his grandfather’s reels were badly degraded, but luckily Joe found scripts written by other people to perform advanced digital restoration of film with AviSynth. He provides code he has written for the project on his GitHub account.

For an account of the project in his own words, check out Joe’s guest post on the IEEE Spectrum website. He also described some of the issues he encountered, and how he resolved them, in The MagPi.

What does Pi Film Capture deliver?

Joe provides videos related to Pi Film Capture on two sites: on his YouTube channel, you’ll find videos in which he has documented the build process of his digitising project. Final results of the project live on Joe’s Vimeo channel, where so far he has uploaded 55 digitised home videos.

m093a: Tom Herman Wedding, Detroit 8/10/63

Shot on 8mm by Leo Willmott, captured and restored by Joe Herman (Not a Wozniak film, but placed in that folder b/c it may be of interest to Hermans)

We’re beyond pleased that our tech is part of this amazing project, helping to reconnect the entire Herman/Willmott clan with their past. And it was great to be able to catch up with Joe, and talk about his build at Maker Faire last year!

Maker Faire New York 2017

We’ll be at Maker Faire New York again on the 23-24 September, and we can’t wait to see the amazing makes the Raspberry Pi community will be presenting there!

Are you going to be at MFNY to show off your awesome Pi-powered project? Tweet us, so we can meet up, check it out and share your achievements!

The post Digitising film reels with Pi Film Capture appeared first on Raspberry Pi.