Tag Archives: streaming

How Netflix brings safer and faster streaming experience to the living room on crowded networks…

Post Syndicated from Netflix Technology Blog original https://netflixtechblog.com/how-netflix-brings-safer-and-faster-streaming-experience-to-the-living-room-on-crowded-networks-78b8de7f758c

How Netflix brings safer and faster streaming experience to the living room on crowded networks using TLS 1.3

By Sekwon Choi

At Netflix, we are obsessed with the best streaming experiences. We want playback to start instantly and to never stop unexpectedly in any network environment. We are also committed to protecting users’ privacy and service security without sacrificing any part of the playback experience.

To achieve that, we are efficiently using ABR (adaptive bitrate streaming) for a better playback experience, DRM (Digital Right Management) to protect our service and TLS (Transport Layer Security) to protect customer privacy and to create a safer streaming experience.

Netflix on consumer electronics devices such as TVs, set-top boxes and streaming sticks was until recently using TLS 1.2 for streaming traffic. Now we support TLS 1.3 for safer and faster experiences.

What is TLS?

For two parties to communicate securely, a secure channel is necessary. This needs to have the following three properties.

  • Authentication: Identity of the communicating party is verified.
  • Confidentiality: Data sent over the channel is only visible to the endpoints.
  • Integrity: Data sent over the channel cannot be modified by attackers without detection.

The TLS protocol is designed to provide a secure channel between two peers by providing tools and methods to achieve the above properties.

TLS 1.3

TLS 1.3 is the latest version of the Transport Layer Security protocol. It is simpler, more secure and more efficient than its predecessor.

Perfect Forward Secrecy

One thing we believe is very important at Netflix is providing PFS (Perfect Forward Secrecy).

PFS is a feature of the key exchange algorithm that assures that session keys will not be compromised, even if the server’s private key is compromised. By generating new keys for each session, PFS protects past sessions against the future compromise of secret keys.

TLS 1.2 supports key exchange algorithms with PFS, but it also allows key exchange algorithms that do not support PFS. Even with the previous version of TLS 1.2, Netflix has always selected a key exchange algorithm that provides PFS such as ECDHE (Elliptic Curve Diffie Hellman Ephemeral). TLS 1.3, however, enforces this concept even more by removing all the key exchange algorithms that do not provide PFS, such as static RSA.

Authenticated Encryption

For encryption, TLS 1.3 removes all weak ciphers and uses only Authenticated Encryption with Associated Data (AEAD). This assures the confidentiality, integrity, and authenticity of the data. We use AES Galois/Counter Mode, as it also provides good performance and high throughput.

Secure Handshake

While the above changes are important, the most important change in TLS 1.3 is perhaps its redesign of the handshake protocol.

The TLS 1.2 handshake was not designed to protect the integrity of the entire handshake. It protected only the part of the handshake after the cipher suite negotiation and this opened up the possibility of downgrade attacks which may allow the attackers to force the use of insecure cipher suites.

With TLS 1.3, the server signs the entire handshake including the cipher suite negotiation and thus prevents the attacker from downgrading the cipher suite.

Also in TLS 1.2, extensions were sent in the clear in the ServerHello. Now with TLS 1.3, even extensions are encrypted and all handshake messages after ServerHello are now encrypted.

Reduced Handshake

TLS 1.2 supports numerous key exchange algorithms, cipher suites and digital signatures, including weak and vulnerable ones. Therefore, it requires more messages to perform a handshake and two network round trips.

In contrast, the handshake in TLS 1.3 now requires only one round trip, with a simplified design and with all weak and vulnerable algorithms removed.

In addition, it has a new feature called 0-RTT, or TLS early data, for the resumed handshake. This allows an application to include application data with its initial handshake message, instead of having to wait until the handshake completes.

At Netflix, by the efficient resumption of the TLS session and careful use of 0-RTT for the streaming data, we can reduce the play delay.

A/B Testing Result

We were pretty confident that TLS 1.3 would bring us better security from the analysis of its protocol composition, but we did not know how it would perform in the context of streaming.

Since TLS 1.3’s performance-related feature is the 0-RTT mode with the resumed handshake, our hypothesis is that TLS 1.3 would reduce play delay, as we are no longer required to wait for the handshake to finish and we can instead issue the HTTP request for media data and receive the HTTP response for media data earlier.

To see the actual performance of TLS 1.3 in the field, we performed an experiment with

  • User accounts: half-million user accounts per cell.
  • Device type: mid-performance device with Quad ARM core @ 1.7GHz.
  • Control cell: TLS 1.2
  • Treatment cell: TLS 1.3

Play Delay

Play Delay is defined by how long it takes for playback to start. Below are the results of the play delay measured in the experiment. The results imply that on slower or congested networks, which can be represented by the quantiles of at least 0.75, TLS 1.3 achieves the largest gains, with improvements across all network conditions.

Below is the time series median play delay graph for this mid-performance device in the field. It also shows that playback starts earlier with TLS 1.3.

Media Rebuffer

At Netflix, we define a media rebuffer as a non-network originated rebuffer. It typically occurs when media data is not processed quickly enough by the device due to the high load on the CPU. Comparing the control cell with TLS 1.2, the experiment cell with TLS 1.3 showed about a 7.4% improvement in media rebuffers. This result implies that using TLS 1.3 with 0-RTT is more efficient and can reduce the CPU load.

Conclusion

From the security analysis, we are confident that TLS 1.3 improves communication security over TLS 1.2. From the field test, we are confident that TLS 1.3 provides us a better streaming experience.

At the time of writing this article, the Internet is experiencing higher than usual traffic and congestion. We believe saving even small amounts of data and round trips can be meaningful and even better if it also provides a more secure and efficient streaming experience.

Therefore, we have started deploying TLS 1.3 on newer consumer electronics devices and we are expecting even more devices to be deployed with TLS 1.3 capability in the near future.


How Netflix brings safer and faster streaming experience to the living room on crowded networks… was originally published in Netflix TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Serverless Stream-Based Processing for Real-Time Insights

Post Syndicated from Justin Pirtle original https://aws.amazon.com/blogs/architecture/serverless-stream-based-processing-for-real-time-insights/

Building on our previous posts regarding messaging patterns and queue-based processing, we now explore stream-based processing and how it helps you achieve low-latency, near real-time data processing in your applications. AWS offers two managed services for streaming, Amazon Kinesis and Amazon Managed Streaming for Apache Kafka (Amazon MSK).

What is streaming data?

At AWS, we define streaming data as data that is emitted at high volume in a continuous, incremental manner with the goal of low-latency processing. Whereas traditional batch-oriented business intelligence would offer insights in retrospect after months, days, or hours have passed, stream-based processing can offer actionable insights in real time. Stream-based processing is commonly used to respond to clickstream events, rapidly ingest various types of logs, and extract, transform, and load (ETL) data in real-time into data lakes and data warehouses.

Amazon Kinesis is the AWS service that makes it easy to collect, process, and analyze such real-time, streaming data with four different capabilities:

For this blog post, we focus on Kinesis Data Streams and Kinesis Data Firehose, since both of these services are foundational for streaming, ingestion, buffering, and processing in your streaming data pipeline.

Kinesis Data Streams

Amazon Kinesis Data Streams is a massively scalable service that can continuously capture gigabytes of data per second from hundreds of thousands of sources. Like many distributed systems, Kinesis Data Streams achieves this level of scalability by partitioning or sharding your data where records are simultaneously written to and read from different shards in parallel. All Kinesis Data Streams require allocation of at least one shard and you choose how many shards you want to allocate to a given stream.

When writing to a shard in a Kinesis Data Stream, each shard supports ingestion of up to 1 MB of data per second or 1,000 records written per second. When reading from a shard, each shard supports output of 2 MB of data per second. You choose an initial number of shards to allocate for your Kinesis Data Stream, then can update your shard allocation over time. Increasing your shard allocation enables your application to easily scale from thousands of records to millions of records written per second.

Producing streaming data

Streaming data producers are processes that put records onto a Kinesis stream by calling the putRecord API to write a single record or putRecords API to write multiple records in a single invocation. Common approaches for producing messages including direct use of AWS tools, including:

  • AWS SDK, which simplifies authentication and other semantics of invoking AWS service APIs
  • Amazon Kinesis Agent, which enables local file/log monitoring and rotation sending in real time
  • Amazon Kinesis Producer Library, which simplifies aggregating records into larger payloads to improve throughput.

Additionally, several AWS services natively integrate with Amazon Kinesis as a data producer:

There are also several third-party services that offer native integration as data producers, including:

Regardless of the producer service/tool of choice, all data producers put records onto a stream by providing a partition key, stream name, and the data itself, which altogether must not exceed 1 MB in size. The partition key provided is used to determine which shard the data should be written to on the stream. Amazon Kinesis Data Streams offers ordering guarantees and maintain message ordering within a given shard in a stream using sequence numbers to track the unique position of each message sent.

Consuming streaming data

Once records are written to a Kinesis Data Stream, they are buffered in their respective shards for consumption. Unlike queue-based processing, the records are buffered until the data retention period set on the stream elapses, enabling one or more consumers to replay all in the messages in the shards of the stream. If your application must deliver your records to a data lake, data warehouse, Elasticsearch Service cluster, or Splunk, Kinesis Data Firehose can natively deliver your records to the following without needing to write any custom code:

  • Amazon S3
  • Amazon Redshift
  • Amazon Elasticsearch Service
  • Splunk

You simply indicate the desired delivery destination and configuration regarding how to batch and deliver the messages. Kinesis Data Firehose can also use your configured S3 desired object naming, Amazon Redshift table name, Amazon Elasticsearch index name, and more.

For custom processing or destinations outside of the Amazon Kinesis Data Firehose supported services above, you will need to write and execute custom code to consume data from the stream. Though you can use the Kinesis Client Library (KCL) to run your own custom processing application on persistent virtual machines or container instances, AWS Lambda offers serverless computing with native event source integration with Amazon Kinesis Data Streams. AWS Lambda as a stream consumer takes care of the operational overhead of reading shards, maintaining record order, check pointing as records are processed, and parallelizing processing.

Serverless stream processing with AWS Lambda

When configured with a Kinesis Stream as its event source, AWS Lambda continuously polls every shard in your stream at no extra charge and only invokes your Lambda code if and when there are messages in the stream. It additionally scales up the number of concurrent executions to parallelize reading all shards of a stream at the same time (and can have multiple executions reading the same shard simultaneously for a higher parallelization factor, if desired). AWS Lambda automatically checkpoints which records were successfully processed and handles retries and any failures automatically according to your desired configuration.

Best of all, there is no additional cost of the Lambda service handling all of these operational needs for you. You only pay for compute time when your function is invoked and messages are available on the stream for processing. You’re able to focus on processing your data with your business logic directly in your code since your records are sent as an array to your Lambda code. There is no additional code to author/manage regarding checkpointing, shard splits/merges, or other complexities.

Conclusion

In this blog, we defined streaming data and explored the Amazon Kinesis service and its various capabilities. We then reviewed the various options available for producing and consuming real-time streaming data with Amazon Kinesis, including using AWS Lambda for serverless streaming data processing. Please refer to the following resources for further learning on AWS streaming data processing:

A Disk-Backed ArrayList

Post Syndicated from Bozho original https://techblog.bozho.net/a-disk-backed-arraylist/

It sometimes happens that your list can become too big to fit in memory and you have to do something in order to avoid running out of memory.

The proper way to do that is streaming – instead of fitting everything in memory, you should stream data from the source and discard the entries that are already processed.

However, there are cases when code that’s outside of your control requires a List and you can’t use streaming. These cases are rather rare but in case you hit them, you have to find a workaround. One is to re-implement the code to work with streaming, but depending on the way the library is written, it may not be possible. So the other option is to use a disk-backed list – one that works as a list, but underneath stores and loads elements from disk.

Searching for existing solutions results in several 3+ years old repos like this one and this one and this one.

And then there’s MapDB, which is great and supported. It’s mostly about maps, but it does support a List as well, as shown here.

And finally, you have the option to implement something simpler yourself, in case you need just iteration and almost nothing else. I’ve done it here – DiskBackedArrayList.java. It doesn’t support many things (not all methods are overridden to throw an exception, but they should). But most importantly, it doesn’t support random adding and random getting, and also toArray(). It’s purely “fill the collection” and then “iterate the collection”. It relies on ObjectOutputStream which is not terribly efficient, but is simple to use. Note that I’ve allowed a short in-memory prependList in case small amounts of data need to be prepended to the list.

The list gets filled in memory until a specified threshold and then gets flushed to disk, clearing the memory which starts getting filled again. This too can be more efficient – with background flushing in another thread that doesn’t interfere with adding elements to the list, but optimizations complicate things and in this case the total running time was not an issue. Most importantly, the iterator() method is overridden to return a custom iterator that first streams the prepended list, then reads everything from disk and finally iterates over the latest batch which is still in memory. And finally, the clear() method should be called in the end in order to close the underlying stream. An output stream could be opened and closed on each flush, but ObjectOutputStream can’t be used in append mode due to some implementation specific about writing headers first.

So basically we hide the streaming approach underneath a List interface – it’s still streaming elements and discarding them when not needed. Ideally this should be done at the source of the data (e.g. a database, message queue, etc.) rather than using the disk as overflow space, but there are cases where using the disk is fine. This implementation is a starting point, as it’s not tested in production, but illustrates that you can adapt existing classes to use different data access patterns if needed.

The post A Disk-Backed ArrayList appeared first on Bozho's tech blog.

AWS Online Tech Talks – June 2018

Post Syndicated from Devin Watson original https://aws.amazon.com/blogs/aws/aws-online-tech-talks-june-2018/

AWS Online Tech Talks – June 2018

Join us this month to learn about AWS services and solutions. New this month, we have a fireside chat with the GM of Amazon WorkSpaces and our 2nd episode of the “How to re:Invent” series. We’ll also cover best practices, deep dives, use cases and more! Join us and register today!

Note – All sessions are free and in Pacific Time.

Tech talks featured this month:

 

Analytics & Big Data

June 18, 2018 | 11:00 AM – 11:45 AM PTGet Started with Real-Time Streaming Data in Under 5 Minutes – Learn how to use Amazon Kinesis to capture, store, and analyze streaming data in real-time including IoT device data, VPC flow logs, and clickstream data.
June 20, 2018 | 11:00 AM – 11:45 AM PT – Insights For Everyone – Deploying Data across your Organization – Learn how to deploy data at scale using AWS Analytics and QuickSight’s new reader role and usage based pricing.

 

AWS re:Invent
June 13, 2018 | 05:00 PM – 05:30 PM PTEpisode 2: AWS re:Invent Breakout Content Secret Sauce – Hear from one of our own AWS content experts as we dive deep into the re:Invent content strategy and how we maintain a high bar.
Compute

June 25, 2018 | 01:00 PM – 01:45 PM PTAccelerating Containerized Workloads with Amazon EC2 Spot Instances – Learn how to efficiently deploy containerized workloads and easily manage clusters at any scale at a fraction of the cost with Spot Instances.

June 26, 2018 | 01:00 PM – 01:45 PM PTEnsuring Your Windows Server Workloads Are Well-Architected – Get the benefits, best practices and tools on running your Microsoft Workloads on AWS leveraging a well-architected approach.

 

Containers
June 25, 2018 | 09:00 AM – 09:45 AM PTRunning Kubernetes on AWS – Learn about the basics of running Kubernetes on AWS including how setup masters, networking, security, and add auto-scaling to your cluster.

 

Databases

June 18, 2018 | 01:00 PM – 01:45 PM PTOracle to Amazon Aurora Migration, Step by Step – Learn how to migrate your Oracle database to Amazon Aurora.
DevOps

June 20, 2018 | 09:00 AM – 09:45 AM PTSet Up a CI/CD Pipeline for Deploying Containers Using the AWS Developer Tools – Learn how to set up a CI/CD pipeline for deploying containers using the AWS Developer Tools.

 

Enterprise & Hybrid
June 18, 2018 | 09:00 AM – 09:45 AM PTDe-risking Enterprise Migration with AWS Managed Services – Learn how enterprise customers are de-risking cloud adoption with AWS Managed Services.

June 19, 2018 | 11:00 AM – 11:45 AM PTLaunch AWS Faster using Automated Landing Zones – Learn how the AWS Landing Zone can automate the set up of best practice baselines when setting up new

 

AWS Environments

June 21, 2018 | 11:00 AM – 11:45 AM PTLeading Your Team Through a Cloud Transformation – Learn how you can help lead your organization through a cloud transformation.

June 21, 2018 | 01:00 PM – 01:45 PM PTEnabling New Retail Customer Experiences with Big Data – Learn how AWS can help retailers realize actual value from their big data and deliver on differentiated retail customer experiences.

June 28, 2018 | 01:00 PM – 01:45 PM PTFireside Chat: End User Collaboration on AWS – Learn how End User Compute services can help you deliver access to desktops and applications anywhere, anytime, using any device.
IoT

June 27, 2018 | 11:00 AM – 11:45 AM PTAWS IoT in the Connected Home – Learn how to use AWS IoT to build innovative Connected Home products.

 

Machine Learning

June 19, 2018 | 09:00 AM – 09:45 AM PTIntegrating Amazon SageMaker into your Enterprise – Learn how to integrate Amazon SageMaker and other AWS Services within an Enterprise environment.

June 21, 2018 | 09:00 AM – 09:45 AM PTBuilding Text Analytics Applications on AWS using Amazon Comprehend – Learn how you can unlock the value of your unstructured data with NLP-based text analytics.

 

Management Tools

June 20, 2018 | 01:00 PM – 01:45 PM PTOptimizing Application Performance and Costs with Auto Scaling – Learn how selecting the right scaling option can help optimize application performance and costs.

 

Mobile
June 25, 2018 | 11:00 AM – 11:45 AM PTDrive User Engagement with Amazon Pinpoint – Learn how Amazon Pinpoint simplifies and streamlines effective user engagement.

 

Security, Identity & Compliance

June 26, 2018 | 09:00 AM – 09:45 AM PTUnderstanding AWS Secrets Manager – Learn how AWS Secrets Manager helps you rotate and manage access to secrets centrally.
June 28, 2018 | 09:00 AM – 09:45 AM PTUsing Amazon Inspector to Discover Potential Security Issues – See how Amazon Inspector can be used to discover security issues of your instances.

 

Serverless

June 19, 2018 | 01:00 PM – 01:45 PM PTProductionize Serverless Application Building and Deployments with AWS SAM – Learn expert tips and techniques for building and deploying serverless applications at scale with AWS SAM.

 

Storage

June 26, 2018 | 11:00 AM – 11:45 AM PTDeep Dive: Hybrid Cloud Storage with AWS Storage Gateway – Learn how you can reduce your on-premises infrastructure by using the AWS Storage Gateway to connecting your applications to the scalable and reliable AWS storage services.
June 27, 2018 | 01:00 PM – 01:45 PM PTChanging the Game: Extending Compute Capabilities to the Edge – Discover how to change the game for IIoT and edge analytics applications with AWS Snowball Edge plus enhanced Compute instances.
June 28, 2018 | 11:00 AM – 11:45 AM PTBig Data and Analytics Workloads on Amazon EFS – Get best practices and deployment advice for running big data and analytics workloads on Amazon EFS.

Project Floofball and more: Pi pet stuff

Post Syndicated from Janina Ander original https://www.raspberrypi.org/blog/project-floofball-pi-pet-stuff/

It’s a public holiday here today (yes, again). So, while we indulge in the traditional pastime of barbecuing stuff (ourselves, mainly), here’s a little trove of Pi projects that cater for our various furry friends.

Project Floofball

Nicole Horward created Project Floofball for her hamster, Harold. It’s an IoT hamster wheel that uses a Raspberry Pi and a magnetic door sensor to log how far Harold runs.

Project Floofball: an IoT hamster wheel

An IoT Hamsterwheel using a Raspberry Pi and a magnetic door sensor, to see how far my hamster runs.

You can follow Harold’s runs in real time on his ThingSpeak channel, and you’ll find photos of the build on imgur. Nicole’s Python code, as well as her template for the laser-cut enclosure that houses the wiring and LCD display, are available on the hamster wheel’s GitHub repo.

A live-streaming pet feeder

JaganK3 used to work long hours that meant he couldn’t be there to feed his dog on time. He found that he couldn’t buy an automated feeder in India without paying a lot to import one, so he made one himself. It uses a Raspberry Pi to control a motor that turns a dispensing valve in a hopper full of dry food, giving his dog a portion of food at set times.

A transparent cylindrical hopper of dry dog food, with a motor that can turn a dispensing valve at the lower end. The motor is connected to a Raspberry Pi in a plastic case. Hopper, motor, Pi, and wiring are all mounted on a board on the wall.

He also added a web cam for live video streaming, because he could. Find out more in JaganK3’s Instructable for his pet feeder.

Shark laser cat toy

Sam Storino, meanwhile, is using a Raspberry Pi to control a laser-pointer cat toy with a goshdarned SHARK (which is kind of what I’d expect from the guy who made the steampunk-looking cat feeder a few weeks ago). The idea is to keep his cats interested and active within the confines of a compact city apartment.

Raspberry Pi Automatic Cat Laser Pointer Toy

Post with 52 votes and 7004 views. Tagged with cat, shark, lasers, austin powers, raspberry pi; Shared by JeorgeLeatherly. Raspberry Pi Automatic Cat Laser Pointer Toy

If I were a cat, I would definitely be entirely happy with this. Find out more on Sam’s website.

And there’s more

Michel Parreno has written a series of articles to help you monitor and feed your pet with Raspberry Pi.

All of these makers are generous in acknowledging the tutorials and build logs that helped them with their projects. It’s lovely to see the Raspberry Pi and maker community working like this, and I bet their projects will inspire others too.

Now, if you’ll excuse me. I’m late for a barbecue.

The post Project Floofball and more: Pi pet stuff appeared first on Raspberry Pi.

Replacing macOS Server with Synology NAS

Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/replacing-macos-server-with-synology-nas/

Synology NAS boxes backed up to the cloud

Businesses and organizations that rely on macOS server for essential office and data services are facing some decisions about the future of their IT services.

Apple recently announced that it is deprecating a significant portion of essential network services in macOS Server, as they described in a support statement posted on April 24, 2018, “Prepare for changes to macOS Server.” Apple’s note includes:

macOS Server is changing to focus more on management of computers, devices, and storage on your network. As a result, some changes are coming in how Server works. A number of services will be deprecated, and will be hidden on new installations of an update to macOS Server coming in spring 2018.

The note lists the services that will be removed in a future release of macOS Server, including calendar and contact support, Dynamic Host Configuration Protocol (DHCP), Domain Name Services (DNS), mail, instant messages, virtual private networking (VPN), NetInstall, Web server, and the Wiki.

Apple assures users who have already configured any of the listed services that they will be able to use them in the spring 2018 macOS Server update, but the statement ends with links to a number of alternative services, including hosted services, that macOS Server users should consider as viable replacements to the features it is removing. These alternative services are all FOSS (Free and Open-Source Software).

As difficult as this could be for organizations that use macOS server, this is not unexpected. Apple left the server hardware space back in 2010, when Steve Jobs announced the company was ending its line of Xserve rackmount servers, which were introduced in May, 2002. Since then, macOS Server has hardly been a prominent part of Apple’s product lineup. It’s not just the product itself that has lost some luster, but the entire category of SMB office and business servers, which has been undergoing a gradual change in recent years.

Some might wonder how important the news about macOS Server is, given that macOS Server represents a pretty small share of the server market. macOS Server has been important to design shops, agencies, education users, and small businesses that likely have been on Macs for ages, but it’s not a significant part of the IT infrastructure of larger organizations and businesses.

What Comes After macOS Server?

Lovers of macOS Server don’t have to fear having their Mac minis pried from their cold, dead hands quite yet. Installed services will continue to be available. In the fall of 2018, new installations and upgrades of macOS Server will require users to migrate most services to other software. Since many of the services of macOS Server were already open-source, this means that a change in software might not be required. It does mean more configuration and management required from those who continue with macOS Server, however.

Users can continue with macOS Server if they wish, but many will see the writing on the wall and look for a suitable substitute.

The Times They Are A-Changin’

For many people working in organizations, what is significant about this announcement is how it reflects the move away from the once ubiquitous server-based IT infrastructure. Services that used to be centrally managed and office-based, such as storage, file sharing, communications, and computing, have moved to the cloud.

In selecting the next office IT platforms, there’s an opportunity to move to solutions that reflect and support how people are working and the applications they are using both in the office and remotely. For many, this means including cloud-based services in office automation, backup, and business continuity/disaster recovery planning. This includes Software as a Service, Platform as a Service, and Infrastructure as a Service (Saas, PaaS, IaaS) options.

IT solutions that integrate well with the cloud are worth strong consideration for what comes after a macOS Server-based environment.

Synology NAS as a macOS Server Alternative

One solution that is becoming popular is to replace macOS Server with a device that has the ability to provide important office services, but also bridges the office and cloud environments. Using Network-Attached Storage (NAS) to take up the server slack makes a lot of sense. Many customers are already using NAS for file sharing, local data backup, automatic cloud backup, and other uses. In the case of Synology, their operating system, Synology DiskStation Manager (DSM), is Linux based, and integrates the basic functions of file sharing, centralized backup, RAID storage, multimedia streaming, virtual storage, and other common functions.

Synology NAS box

Synology NAS

Since DSM is based on Linux, there are numerous server applications available, including many of the same ones that are available for macOS Server, which shares conceptual roots with Linux as it comes from BSD Unix.

Synology DiskStation Manager Package Center screenshot

Synology DiskStation Manager Package Center

According to Ed Lukacs, COO at 2FIFTEEN Systems Management in Salt Lake City, their customers have found the move from macOS Server to Synology NAS not only painless, but positive. DSM works seamlessly with macOS and has been faster for their customers, as well. Many of their customers are running Adobe Creative Suite and Google G Suite applications, so a workflow that combines local storage, remote access, and the cloud, is already well known to them. Remote users are supported by Synology’s QuickConnect or VPN.

Business continuity and backup are simplified by the flexible storage capacity of the NAS. Synology has built-in backup to Backblaze B2 Cloud Storage with Synology’s Cloud Sync, as well as a choice of a number of other B2-compatible applications, such as Cloudberry, Comet, and Arq.

Customers have been able to get up and running quickly, with only initial data transfers requiring some time to complete. After that, management of the NAS can be handled in-house or with the support of a Managed Service Provider (MSP).

Are You Sticking with macOS Server or Moving to Another Platform?

If you’re affected by this change in macOS Server, please let us know in the comments how you’re planning to cope. Are you using Synology NAS for server services? Please tell us how that’s working for you.

The post Replacing macOS Server with Synology NAS appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Analyze Apache Parquet optimized data using Amazon Kinesis Data Firehose, Amazon Athena, and Amazon Redshift

Post Syndicated from Roy Hasson original https://aws.amazon.com/blogs/big-data/analyzing-apache-parquet-optimized-data-using-amazon-kinesis-data-firehose-amazon-athena-and-amazon-redshift/

Amazon Kinesis Data Firehose is the easiest way to capture and stream data into a data lake built on Amazon S3. This data can be anything—from AWS service logs like AWS CloudTrail log files, Amazon VPC Flow Logs, Application Load Balancer logs, and others. It can also be IoT events, game events, and much more. To efficiently query this data, a time-consuming ETL (extract, transform, and load) process is required to massage and convert the data to an optimal file format, which increases the time to insight. This situation is less than ideal, especially for real-time data that loses its value over time.

To solve this common challenge, Kinesis Data Firehose can now save data to Amazon S3 in Apache Parquet or Apache ORC format. These are optimized columnar formats that are highly recommended for best performance and cost-savings when querying data in S3. This feature directly benefits you if you use Amazon Athena, Amazon Redshift, AWS Glue, Amazon EMR, or any other big data tools that are available from the AWS Partner Network and through the open-source community.

Amazon Connect is a simple-to-use, cloud-based contact center service that makes it easy for any business to provide a great customer experience at a lower cost than common alternatives. Its open platform design enables easy integration with other systems. One of those systems is Amazon Kinesis—in particular, Kinesis Data Streams and Kinesis Data Firehose.

What’s really exciting is that you can now save events from Amazon Connect to S3 in Apache Parquet format. You can then perform analytics using Amazon Athena and Amazon Redshift Spectrum in real time, taking advantage of this key performance and cost optimization. Of course, Amazon Connect is only one example. This new capability opens the door for a great deal of opportunity, especially as organizations continue to build their data lakes.

Amazon Connect includes an array of analytics views in the Administrator dashboard. But you might want to run other types of analysis. In this post, I describe how to set up a data stream from Amazon Connect through Kinesis Data Streams and Kinesis Data Firehose and out to S3, and then perform analytics using Athena and Amazon Redshift Spectrum. I focus primarily on the Kinesis Data Firehose support for Parquet and its integration with the AWS Glue Data Catalog, Amazon Athena, and Amazon Redshift.

Solution overview

Here is how the solution is laid out:

 

 

The following sections walk you through each of these steps to set up the pipeline.

1. Define the schema

When Kinesis Data Firehose processes incoming events and converts the data to Parquet, it needs to know which schema to apply. The reason is that many times, incoming events contain all or some of the expected fields based on which values the producers are advertising. A typical process is to normalize the schema during a batch ETL job so that you end up with a consistent schema that can easily be understood and queried. Doing this introduces latency due to the nature of the batch process. To overcome this issue, Kinesis Data Firehose requires the schema to be defined in advance.

To see the available columns and structures, see Amazon Connect Agent Event Streams. For the purpose of simplicity, I opted to make all the columns of type String rather than create the nested structures. But you can definitely do that if you want.

The simplest way to define the schema is to create a table in the Amazon Athena console. Open the Athena console, and paste the following create table statement, substituting your own S3 bucket and prefix for where your event data will be stored. A Data Catalog database is a logical container that holds the different tables that you can create. The default database name shown here should already exist. If it doesn’t, you can create it or use another database that you’ve already created.

CREATE EXTERNAL TABLE default.kfhconnectblog (
  awsaccountid string,
  agentarn string,
  currentagentsnapshot string,
  eventid string,
  eventtimestamp string,
  eventtype string,
  instancearn string,
  previousagentsnapshot string,
  version string
)
STORED AS parquet
LOCATION 's3://your_bucket/kfhconnectblog/'
TBLPROPERTIES ("parquet.compression"="SNAPPY")

That’s all you have to do to prepare the schema for Kinesis Data Firehose.

2. Define the data streams

Next, you need to define the Kinesis data streams that will be used to stream the Amazon Connect events.  Open the Kinesis Data Streams console and create two streams.  You can configure them with only one shard each because you don’t have a lot of data right now.

3. Define the Kinesis Data Firehose delivery stream for Parquet

Let’s configure the Data Firehose delivery stream using the data stream as the source and Amazon S3 as the output. Start by opening the Kinesis Data Firehose console and creating a new data delivery stream. Give it a name, and associate it with the Kinesis data stream that you created in Step 2.

As shown in the following screenshot, enable Record format conversion (1) and choose Apache Parquet (2). As you can see, Apache ORC is also supported. Scroll down and provide the AWS Glue Data Catalog database name (3) and table names (4) that you created in Step 1. Choose Next.

To make things easier, the output S3 bucket and prefix fields are automatically populated using the values that you defined in the LOCATION parameter of the create table statement from Step 1. Pretty cool. Additionally, you have the option to save the raw events into another location as defined in the Source record S3 backup section. Don’t forget to add a trailing forward slash “ / “ so that Data Firehose creates the date partitions inside that prefix.

On the next page, in the S3 buffer conditions section, there is a note about configuring a large buffer size. The Parquet file format is highly efficient in how it stores and compresses data. Increasing the buffer size allows you to pack more rows into each output file, which is preferred and gives you the most benefit from Parquet.

Compression using Snappy is automatically enabled for both Parquet and ORC. You can modify the compression algorithm by using the Kinesis Data Firehose API and update the OutputFormatConfiguration.

Be sure to also enable Amazon CloudWatch Logs so that you can debug any issues that you might run into.

Lastly, finalize the creation of the Firehose delivery stream, and continue on to the next section.

4. Set up the Amazon Connect contact center

After setting up the Kinesis pipeline, you now need to set up a simple contact center in Amazon Connect. The Getting Started page provides clear instructions on how to set up your environment, acquire a phone number, and create an agent to accept calls.

After setting up the contact center, in the Amazon Connect console, choose your Instance Alias, and then choose Data Streaming. Under Agent Event, choose the Kinesis data stream that you created in Step 2, and then choose Save.

At this point, your pipeline is complete.  Agent events from Amazon Connect are generated as agents go about their day. Events are sent via Kinesis Data Streams to Kinesis Data Firehose, which converts the event data from JSON to Parquet and stores it in S3. Athena and Amazon Redshift Spectrum can simply query the data without any additional work.

So let’s generate some data. Go back into the Administrator console for your Amazon Connect contact center, and create an agent to handle incoming calls. In this example, I creatively named mine Agent One. After it is created, Agent One can get to work and log into their console and set their availability to Available so that they are ready to receive calls.

To make the data a bit more interesting, I also created a second agent, Agent Two. I then made some incoming and outgoing calls and caused some failures to occur, so I now have enough data available to analyze.

5. Analyze the data with Athena

Let’s open the Athena console and run some queries. One thing you’ll notice is that when we created the schema for the dataset, we defined some of the fields as Strings even though in the documentation they were complex structures.  The reason for doing that was simply to show some of the flexibility of Athena to be able to parse JSON data. However, you can define nested structures in your table schema so that Kinesis Data Firehose applies the appropriate schema to the Parquet file.

Let’s run the first query to see which agents have logged into the system.

The query might look complex, but it’s fairly straightforward:

WITH dataset AS (
  SELECT 
    from_iso8601_timestamp(eventtimestamp) AS event_ts,
    eventtype,
    -- CURRENT STATE
    json_extract_scalar(
      currentagentsnapshot,
      '$.agentstatus.name') AS current_status,
    from_iso8601_timestamp(
      json_extract_scalar(
        currentagentsnapshot,
        '$.agentstatus.starttimestamp')) AS current_starttimestamp,
    json_extract_scalar(
      currentagentsnapshot, 
      '$.configuration.firstname') AS current_firstname,
    json_extract_scalar(
      currentagentsnapshot,
      '$.configuration.lastname') AS current_lastname,
    json_extract_scalar(
      currentagentsnapshot, 
      '$.configuration.username') AS current_username,
    json_extract_scalar(
      currentagentsnapshot, 
      '$.configuration.routingprofile.defaultoutboundqueue.name') AS               current_outboundqueue,
    json_extract_scalar(
      currentagentsnapshot, 
      '$.configuration.routingprofile.inboundqueues[0].name') as current_inboundqueue,
    -- PREVIOUS STATE
    json_extract_scalar(
      previousagentsnapshot, 
      '$.agentstatus.name') as prev_status,
    from_iso8601_timestamp(
      json_extract_scalar(
        previousagentsnapshot, 
       '$.agentstatus.starttimestamp')) as prev_starttimestamp,
    json_extract_scalar(
      previousagentsnapshot, 
      '$.configuration.firstname') as prev_firstname,
    json_extract_scalar(
      previousagentsnapshot, 
      '$.configuration.lastname') as prev_lastname,
    json_extract_scalar(
      previousagentsnapshot, 
      '$.configuration.username') as prev_username,
    json_extract_scalar(
      previousagentsnapshot, 
      '$.configuration.routingprofile.defaultoutboundqueue.name') as current_outboundqueue,
    json_extract_scalar(
      previousagentsnapshot, 
      '$.configuration.routingprofile.inboundqueues[0].name') as prev_inboundqueue
  from kfhconnectblog
  where eventtype <> 'HEART_BEAT'
)
SELECT
  current_status as status,
  current_username as username,
  event_ts
FROM dataset
WHERE eventtype = 'LOGIN' AND current_username <> ''
ORDER BY event_ts DESC

The query output looks something like this:

Here is another query that shows the sessions each of the agents engaged with. It tells us where they were incoming or outgoing, if they were completed, and where there were missed or failed calls.

WITH src AS (
  SELECT
     eventid,
     json_extract_scalar(currentagentsnapshot, '$.configuration.username') as username,
     cast(json_extract(currentagentsnapshot, '$.contacts') AS ARRAY(JSON)) as c,
     cast(json_extract(previousagentsnapshot, '$.contacts') AS ARRAY(JSON)) as p
  from kfhconnectblog
),
src2 AS (
  SELECT *
  FROM src CROSS JOIN UNNEST (c, p) AS contacts(c_item, p_item)
),
dataset AS (
SELECT 
  eventid,
  username,
  json_extract_scalar(c_item, '$.contactid') as c_contactid,
  json_extract_scalar(c_item, '$.channel') as c_channel,
  json_extract_scalar(c_item, '$.initiationmethod') as c_direction,
  json_extract_scalar(c_item, '$.queue.name') as c_queue,
  json_extract_scalar(c_item, '$.state') as c_state,
  from_iso8601_timestamp(json_extract_scalar(c_item, '$.statestarttimestamp')) as c_ts,
  
  json_extract_scalar(p_item, '$.contactid') as p_contactid,
  json_extract_scalar(p_item, '$.channel') as p_channel,
  json_extract_scalar(p_item, '$.initiationmethod') as p_direction,
  json_extract_scalar(p_item, '$.queue.name') as p_queue,
  json_extract_scalar(p_item, '$.state') as p_state,
  from_iso8601_timestamp(json_extract_scalar(p_item, '$.statestarttimestamp')) as p_ts
FROM src2
)
SELECT 
  username,
  c_channel as channel,
  c_direction as direction,
  p_state as prev_state,
  c_state as current_state,
  c_ts as current_ts,
  c_contactid as id
FROM dataset
WHERE c_contactid = p_contactid
ORDER BY id DESC, current_ts ASC

The query output looks similar to the following:

6. Analyze the data with Amazon Redshift Spectrum

With Amazon Redshift Spectrum, you can query data directly in S3 using your existing Amazon Redshift data warehouse cluster. Because the data is already in Parquet format, Redshift Spectrum gets the same great benefits that Athena does.

Here is a simple query to show querying the same data from Amazon Redshift. Note that to do this, you need to first create an external schema in Amazon Redshift that points to the AWS Glue Data Catalog.

SELECT 
  eventtype,
  json_extract_path_text(currentagentsnapshot,'agentstatus','name') AS current_status,
  json_extract_path_text(currentagentsnapshot, 'configuration','firstname') AS current_firstname,
  json_extract_path_text(currentagentsnapshot, 'configuration','lastname') AS current_lastname,
  json_extract_path_text(
    currentagentsnapshot,
    'configuration','routingprofile','defaultoutboundqueue','name') AS current_outboundqueue,
FROM default_schema.kfhconnectblog

The following shows the query output:

Summary

In this post, I showed you how to use Kinesis Data Firehose to ingest and convert data to columnar file format, enabling real-time analysis using Athena and Amazon Redshift. This great feature enables a level of optimization in both cost and performance that you need when storing and analyzing large amounts of data. This feature is equally important if you are investing in building data lakes on AWS.

 


Additional Reading

If you found this post useful, be sure to check out Analyzing VPC Flow Logs with Amazon Kinesis Firehose, Amazon Athena, and Amazon QuickSight and Work with partitioned data in AWS Glue.


About the Author

Roy Hasson is a Global Business Development Manager for AWS Analytics. He works with customers around the globe to design solutions to meet their data processing, analytics and business intelligence needs. Roy is big Manchester United fan cheering his team on and hanging out with his family.

 

 

 

Stream to Twitch with the push of a button

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/tinkernut-twitch-streaming/

Stream your video gaming exploits to the internet at the touch of a button with the Twitch-O-Matic. Everyone else is doing it, so you should too.

Twitch-O-Matic: Raspberry Pi Twitch Streaming Device – Weekend Hacker #1804

Some gaming consoles make it easy to stream to Twitch, some gaming consoles don’t (come on, Nintendo). So for those that don’t, I’ve made this beta version of the “Twitch-O-Matic”. No it doesn’t chop onions or fold your laundry, but what it DOES do is stream anything with HDMI output to your Twitch channel with the simple push of a button!

eSports and online game streaming

Interest in eSports has skyrocketed over the last few years, with viewership numbers in the hundreds of millions, sponsorship deals increasing in value and prestige, and tournament prize funds reaching millions of dollars. So it’s no wonder that more and more gamers are starting to stream live to online platforms in order to boost their fanbase and try to cash in on this growing industry.

Streaming to Twitch

Launched in 2011, Twitch.tv is an online live-streaming platform with a primary focus on video gaming. Users can create accounts to contribute their comments and content to the site, as well as watching live-streamed gaming competitions and broadcasts. With a staggering fifteen million daily users, Twitch is accessible via smartphone and gaming console apps, smart TVs, computers, and tablets. But if you want to stream to Twitch, you may find yourself using third-party software in order to do so. And with more buttons to click and more wires to plug in for older, app-less consoles, streaming can get confusing.

Enter Tinkernut.

Side note: we ❤ Tinkernut

We’ve featured Tinkernut a few times on the Raspberry Pi blog – his tutorials are clear, his projects are interesting and useful, and his live-streamed comment videos for every build are a nice touch to sharing homebrew builds on the internet.

Tinkernut Raspberry Pi Zero W Twitch-O-Matic

So, yes, we love him. [This is true. Alex never shuts up about him. – Ed.] And since he has over 500K subscribers on YouTube, we’re obviously not the only ones. We wave our Tinkernut flags with pride.

Twitch-O-Matic

With a Raspberry Pi Zero W, an HDMI to CSI adapter, and a case to fit it all in, Tinkernut’s Twitch-O-Matic allows easy connection to the Twitch streaming service. You’ll also need a button – the bigger, the better in our opinion, though Tinkernut has opted for the Adafruit 16mm Illuminated Pushbutton for his build, and not the 100mm Massive Arcade Button that, sadly, we still haven’t found a reason to use yet.

Adafruit massive button

“I’m sorry, Dave…”

For added frills and pizzazz, Tinketnut has also incorporated Adafruit’s White LED Backlight Module into the case, though you don’t have to do so unless you’re feeling super fancy.

The setup

The Raspberry Pi Zero W is connected to the HDMI to CSI adapter via the camera connector, in the same way you’d attach the camera ribbon. Tinkernut uses a standard Raspbian image on an 8GB SD card, with SSH enabled for remote access from his laptop. He uses the simple command Raspivid to test the HDMI connection by recording ten seconds of video footage from his console.

Tinkernut Raspberry Pi Zero W Twitch-O-Matic

One lead is all you need

Once you have the Pi receiving video from your console, you can connect to Twitch using your Twitch stream key, which you can find by logging in to your account at Twitch.tv. Tinkernut’s tutorial gives you all the commands you need to stream from your Pi.

The frills

To up the aesthetic impact of your project, adding buttons and backlights is fairly straightforward.

Tinkernut Raspberry Pi Zero W Twitch-O-Matic

Pretty LED frills

To run the stream command, Tinketnut uses a button: press once to start the stream, press again to stop. Pressing the button also turns on the LED backlight, so it’s obvious when streaming is in progress.

The tutorial

For the full code and 3D-printable case STL file, head to Tinketnut’s hackster.io project page. And if you’re already using a Raspberry Pi for Twitch streaming, share your build setup with us. Cheers!

The post Stream to Twitch with the push of a button appeared first on Raspberry Pi.

Performing Unit Testing in an AWS CodeStar Project

Post Syndicated from Jerry Mathen Jacob original https://aws.amazon.com/blogs/devops/performing-unit-testing-in-an-aws-codestar-project/

In this blog post, I will show how you can perform unit testing as a part of your AWS CodeStar project. AWS CodeStar helps you quickly develop, build, and deploy applications on AWS. With AWS CodeStar, you can set up your continuous delivery (CD) toolchain and manage your software development from one place.

Because unit testing tests individual units of application code, it is helpful for quickly identifying and isolating issues. As a part of an automated CI/CD process, it can also be used to prevent bad code from being deployed into production.

Many of the AWS CodeStar project templates come preconfigured with a unit testing framework so that you can start deploying your code with more confidence. The unit testing is configured to run in the provided build stage so that, if the unit tests do not pass, the code is not deployed. For a list of AWS CodeStar project templates that include unit testing, see AWS CodeStar Project Templates in the AWS CodeStar User Guide.

The scenario

As a big fan of superhero movies, I decided to list my favorites and ask my friends to vote on theirs by using a WebService endpoint I created. The example I use is a Python web service running on AWS Lambda with AWS CodeCommit as the code repository. CodeCommit is a fully managed source control system that hosts Git repositories and works with all Git-based tools.

Here’s how you can create the WebService endpoint:

Sign in to the AWS CodeStar console. Choose Start a project, which will take you to the list of project templates.

create project

For code edits I will choose AWS Cloud9, which is a cloud-based integrated development environment (IDE) that you use to write, run, and debug code.

choose cloud9

Here are the other tasks required by my scenario:

  • Create a database table where the votes can be stored and retrieved as needed.
  • Update the logic in the Lambda function that was created for posting and getting the votes.
  • Update the unit tests (of course!) to verify that the logic works as expected.

For a database table, I’ve chosen Amazon DynamoDB, which offers a fast and flexible NoSQL database.

Getting set up on AWS Cloud9

From the AWS CodeStar console, go to the AWS Cloud9 console, which should take you to your project code. I will open up a terminal at the top-level folder under which I will set up my environment and required libraries.

Use the following command to set the PYTHONPATH environment variable on the terminal.

export PYTHONPATH=/home/ec2-user/environment/vote-your-movie

You should now be able to use the following command to execute the unit tests in your project.

python -m unittest discover vote-your-movie/tests

cloud9 setup

Start coding

Now that you have set up your local environment and have a copy of your code, add a DynamoDB table to the project by defining it through a template file. Open template.yml, which is the Serverless Application Model (SAM) template file. This template extends AWS CloudFormation to provide a simplified way of defining the Amazon API Gateway APIs, AWS Lambda functions, and Amazon DynamoDB tables required by your serverless application.

AWSTemplateFormatVersion: 2010-09-09
Transform:
- AWS::Serverless-2016-10-31
- AWS::CodeStar

Parameters:
  ProjectId:
    Type: String
    Description: CodeStar projectId used to associate new resources to team members

Resources:
  # The DB table to store the votes.
  MovieVoteTable:
    Type: AWS::Serverless::SimpleTable
    Properties:
      PrimaryKey:
        # Name of the "Candidate" is the partition key of the table.
        Name: Candidate
        Type: String
  # Creating a new lambda function for retrieving and storing votes.
  MovieVoteLambda:
    Type: AWS::Serverless::Function
    Properties:
      Handler: index.handler
      Runtime: python3.6
      Environment:
        # Setting environment variables for your lambda function.
        Variables:
          TABLE_NAME: !Ref "MovieVoteTable"
          TABLE_REGION: !Ref "AWS::Region"
      Role:
        Fn::ImportValue:
          !Join ['-', [!Ref 'ProjectId', !Ref 'AWS::Region', 'LambdaTrustRole']]
      Events:
        GetEvent:
          Type: Api
          Properties:
            Path: /
            Method: get
        PostEvent:
          Type: Api
          Properties:
            Path: /
            Method: post

We’ll use Python’s boto3 library to connect to AWS services. And we’ll use Python’s mock library to mock AWS service calls for our unit tests.
Use the following command to install these libraries:

pip install --upgrade boto3 mock -t .

install dependencies

Add these libraries to the buildspec.yml, which is the YAML file that is required for CodeBuild to execute.

version: 0.2

phases:
  install:
    commands:

      # Upgrade AWS CLI to the latest version
      - pip install --upgrade awscli boto3 mock

  pre_build:
    commands:

      # Discover and run unit tests in the 'tests' directory. For more information, see <https://docs.python.org/3/library/unittest.html#test-discovery>
      - python -m unittest discover tests

  build:
    commands:

      # Use AWS SAM to package the application by using AWS CloudFormation
      - aws cloudformation package --template template.yml --s3-bucket $S3_BUCKET --output-template template-export.yml

artifacts:
  type: zip
  files:
    - template-export.yml

Open the index.py where we can write the simple voting logic for our Lambda function.

import json
import datetime
import boto3
import os

table_name = os.environ['TABLE_NAME']
table_region = os.environ['TABLE_REGION']

VOTES_TABLE = boto3.resource('dynamodb', region_name=table_region).Table(table_name)
CANDIDATES = {"A": "Black Panther", "B": "Captain America: Civil War", "C": "Guardians of the Galaxy", "D": "Thor: Ragnarok"}

def handler(event, context):
    if event['httpMethod'] == 'GET':
        resp = VOTES_TABLE.scan()
        return {'statusCode': 200,
                'body': json.dumps({item['Candidate']: int(item['Votes']) for item in resp['Items']}),
                'headers': {'Content-Type': 'application/json'}}

    elif event['httpMethod'] == 'POST':
        try:
            body = json.loads(event['body'])
        except:
            return {'statusCode': 400,
                    'body': 'Invalid input! Expecting a JSON.',
                    'headers': {'Content-Type': 'application/json'}}
        if 'candidate' not in body:
            return {'statusCode': 400,
                    'body': 'Missing "candidate" in request.',
                    'headers': {'Content-Type': 'application/json'}}
        if body['candidate'] not in CANDIDATES.keys():
            return {'statusCode': 400,
                    'body': 'You must vote for one of the following candidates - {}.'.format(get_allowed_candidates()),
                    'headers': {'Content-Type': 'application/json'}}

        resp = VOTES_TABLE.update_item(
            Key={'Candidate': CANDIDATES.get(body['candidate'])},
            UpdateExpression='ADD Votes :incr',
            ExpressionAttributeValues={':incr': 1},
            ReturnValues='ALL_NEW'
        )
        return {'statusCode': 200,
                'body': "{} now has {} votes".format(CANDIDATES.get(body['candidate']), resp['Attributes']['Votes']),
                'headers': {'Content-Type': 'application/json'}}

def get_allowed_candidates():
    l = []
    for key in CANDIDATES:
        l.append("'{}' for '{}'".format(key, CANDIDATES.get(key)))
    return ", ".join(l)

What our code basically does is take in the HTTPS request call as an event. If it is an HTTP GET request, it gets the votes result from the table. If it is an HTTP POST request, it sets a vote for the candidate of choice. We also validate the inputs in the POST request to filter out requests that seem malicious. That way, only valid calls are stored in the table.

In the example code provided, we use a CANDIDATES variable to store our candidates, but you can store the candidates in a JSON file and use Python’s json library instead.

Let’s update the tests now. Under the tests folder, open the test_handler.py and modify it to verify the logic.

import os
# Some mock environment variables that would be used by the mock for DynamoDB
os.environ['TABLE_NAME'] = "MockHelloWorldTable"
os.environ['TABLE_REGION'] = "us-east-1"

# The library containing our logic.
import index

# Boto3's core library
import botocore
# For handling JSON.
import json
# Unit test library
import unittest
## Getting StringIO based on your setup.
try:
    from StringIO import StringIO
except ImportError:
    from io import StringIO
## Python mock library
from mock import patch, call
from decimal import Decimal

@patch('botocore.client.BaseClient._make_api_call')
class TestCandidateVotes(unittest.TestCase):

    ## Test the HTTP GET request flow. 
    ## We expect to get back a successful response with results of votes from the table (mocked).
    def test_get_votes(self, boto_mock):
        # Input event to our method to test.
        expected_event = {'httpMethod': 'GET'}
        # The mocked values in our DynamoDB table.
        items_in_db = [{'Candidate': 'Black Panther', 'Votes': Decimal('3')},
                        {'Candidate': 'Captain America: Civil War', 'Votes': Decimal('8')},
                        {'Candidate': 'Guardians of the Galaxy', 'Votes': Decimal('8')},
                        {'Candidate': "Thor: Ragnarok", 'Votes': Decimal('1')}
                    ]
        # The mocked DynamoDB response.
        expected_ddb_response = {'Items': items_in_db}
        # The mocked response we expect back by calling DynamoDB through boto.
        response_body = botocore.response.StreamingBody(StringIO(str(expected_ddb_response)),
                                                        len(str(expected_ddb_response)))
        # Setting the expected value in the mock.
        boto_mock.side_effect = [expected_ddb_response]
        # Expecting that there would be a call to DynamoDB Scan function during execution with these parameters.
        expected_calls = [call('Scan', {'TableName': os.environ['TABLE_NAME']})]

        # Call the function to test.
        result = index.handler(expected_event, {})

        # Run unit test assertions to verify the expected calls to mock have occurred and verify the response.
        assert result.get('headers').get('Content-Type') == 'application/json'
        assert result.get('statusCode') == 200

        result_body = json.loads(result.get('body'))
        # Verifying that the results match to that from the table.
        assert len(result_body) == len(items_in_db)
        for i in range(len(result_body)):
            assert result_body.get(items_in_db[i].get("Candidate")) == int(items_in_db[i].get("Votes"))

        assert boto_mock.call_count == 1
        boto_mock.assert_has_calls(expected_calls)

    ## Test the HTTP POST request flow that places a vote for a selected candidate.
    ## We expect to get back a successful response with a confirmation message.
    def test_place_valid_candidate_vote(self, boto_mock):
        # Input event to our method to test.
        expected_event = {'httpMethod': 'POST', 'body': "{\"candidate\": \"D\"}"}
        # The mocked response in our DynamoDB table.
        expected_ddb_response = {'Attributes': {'Candidate': "Thor: Ragnarok", 'Votes': Decimal('2')}}
        # The mocked response we expect back by calling DynamoDB through boto.
        response_body = botocore.response.StreamingBody(StringIO(str(expected_ddb_response)),
                                                        len(str(expected_ddb_response)))
        # Setting the expected value in the mock.
        boto_mock.side_effect = [expected_ddb_response]
        # Expecting that there would be a call to DynamoDB UpdateItem function during execution with these parameters.
        expected_calls = [call('UpdateItem', {
                                                'TableName': os.environ['TABLE_NAME'], 
                                                'Key': {'Candidate': 'Thor: Ragnarok'},
                                                'UpdateExpression': 'ADD Votes :incr',
                                                'ExpressionAttributeValues': {':incr': 1},
                                                'ReturnValues': 'ALL_NEW'
                                            })]
        # Call the function to test.
        result = index.handler(expected_event, {})
        # Run unit test assertions to verify the expected calls to mock have occurred and verify the response.
        assert result.get('headers').get('Content-Type') == 'application/json'
        assert result.get('statusCode') == 200

        assert result.get('body') == "{} now has {} votes".format(
            expected_ddb_response['Attributes']['Candidate'], 
            expected_ddb_response['Attributes']['Votes'])

        assert boto_mock.call_count == 1
        boto_mock.assert_has_calls(expected_calls)

    ## Test the HTTP POST request flow that places a vote for an non-existant candidate.
    ## We expect to get back a successful response with a confirmation message.
    def test_place_invalid_candidate_vote(self, boto_mock):
        # Input event to our method to test.
        # The valid IDs for the candidates are A, B, C, and D
        expected_event = {'httpMethod': 'POST', 'body': "{\"candidate\": \"E\"}"}
        # Call the function to test.
        result = index.handler(expected_event, {})
        # Run unit test assertions to verify the expected calls to mock have occurred and verify the response.
        assert result.get('headers').get('Content-Type') == 'application/json'
        assert result.get('statusCode') == 400
        assert result.get('body') == 'You must vote for one of the following candidates - {}.'.format(index.get_allowed_candidates())

    ## Test the HTTP POST request flow that places a vote for a selected candidate but associated with an invalid key in the POST body.
    ## We expect to get back a failed (400) response with an appropriate error message.
    def test_place_invalid_data_vote(self, boto_mock):
        # Input event to our method to test.
        # "name" is not the expected input key.
        expected_event = {'httpMethod': 'POST', 'body': "{\"name\": \"D\"}"}
        # Call the function to test.
        result = index.handler(expected_event, {})
        # Run unit test assertions to verify the expected calls to mock have occurred and verify the response.
        assert result.get('headers').get('Content-Type') == 'application/json'
        assert result.get('statusCode') == 400
        assert result.get('body') == 'Missing "candidate" in request.'

    ## Test the HTTP POST request flow that places a vote for a selected candidate but not as a JSON string which the body of the request expects.
    ## We expect to get back a failed (400) response with an appropriate error message.
    def test_place_malformed_json_vote(self, boto_mock):
        # Input event to our method to test.
        # "body" receives a string rather than a JSON string.
        expected_event = {'httpMethod': 'POST', 'body': "Thor: Ragnarok"}
        # Call the function to test.
        result = index.handler(expected_event, {})
        # Run unit test assertions to verify the expected calls to mock have occurred and verify the response.
        assert result.get('headers').get('Content-Type') == 'application/json'
        assert result.get('statusCode') == 400
        assert result.get('body') == 'Invalid input! Expecting a JSON.'

if __name__ == '__main__':
    unittest.main()

I am keeping the code samples well commented so that it’s clear what each unit test accomplishes. It tests the success conditions and the failure paths that are handled in the logic.

In my unit tests I use the patch decorator (@patch) in the mock library. @patch helps mock the function you want to call (in this case, the botocore library’s _make_api_call function in the BaseClient class).
Before we commit our changes, let’s run the tests locally. On the terminal, run the tests again. If all the unit tests pass, you should expect to see a result like this:

You:~/environment $ python -m unittest discover vote-your-movie/tests
.....
----------------------------------------------------------------------
Ran 5 tests in 0.003s

OK
You:~/environment $

Upload to AWS

Now that the tests have passed, it’s time to commit and push the code to source repository!

Add your changes

From the terminal, go to the project’s folder and use the following command to verify the changes you are about to push.

git status

To add the modified files only, use the following command:

git add -u

Commit your changes

To commit the changes (with a message), use the following command:

git commit -m "Logic and tests for the voting webservice."

Push your changes to AWS CodeCommit

To push your committed changes to CodeCommit, use the following command:

git push

In the AWS CodeStar console, you can see your changes flowing through the pipeline and being deployed. There are also links in the AWS CodeStar console that take you to this project’s build runs so you can see your tests running on AWS CodeBuild. The latest link under the Build Runs table takes you to the logs.

unit tests at codebuild

After the deployment is complete, AWS CodeStar should now display the AWS Lambda function and DynamoDB table created and synced with this project. The Project link in the AWS CodeStar project’s navigation bar displays the AWS resources linked to this project.

codestar resources

Because this is a new database table, there should be no data in it. So, let’s put in some votes. You can download Postman to test your application endpoint for POST and GET calls. The endpoint you want to test is the URL displayed under Application endpoints in the AWS CodeStar console.

Now let’s open Postman and look at the results. Let’s create some votes through POST requests. Based on this example, a valid vote has a value of A, B, C, or D.
Here’s what a successful POST request looks like:

POST success

Here’s what it looks like if I use some value other than A, B, C, or D:

 

POST Fail

Now I am going to use a GET request to fetch the results of the votes from the database.

GET success

And that’s it! You have now created a simple voting web service using AWS Lambda, Amazon API Gateway, and DynamoDB and used unit tests to verify your logic so that you ship good code.
Happy coding!

Security of Cloud HSMBackups

Post Syndicated from Balaji Iyer original https://aws.amazon.com/blogs/architecture/security-of-cloud-hsmbackups/

Today, our customers use AWS CloudHSM to meet corporate, contractual and regulatory compliance requirements for data security by using dedicated Hardware Security Module (HSM) instances within the AWS cloud. CloudHSM delivers all the benefits of traditional HSMs including secure generation, storage, and management of cryptographic keys used for data encryption that are controlled and accessible only by you.

As a managed service, it automates time-consuming administrative tasks such as hardware provisioning, software patching, high availability, backups and scaling for your sensitive and regulated workloads in a cost-effective manner. Backup and restore functionality is the core building block enabling scalability, reliability and high availability in CloudHSM.

You should consider using AWS CloudHSM if you require:

  • Keys stored in dedicated, third-party validated hardware security modules under your exclusive control
  • FIPS 140-2 compliance
  • Integration with applications using PKCS#11, Java JCE, or Microsoft CNG interfaces
  • High-performance in-VPC cryptographic acceleration (bulk crypto)
  • Financial applications subject to PCI regulations
  • Healthcare applications subject to HIPAA regulations
  • Streaming video solutions subject to contractual DRM requirements

We recently released a whitepaper, “Security of CloudHSM Backups” that provides in-depth information on how backups are protected in all three phases of the CloudHSM backup lifecycle process: Creation, Archive, and Restore.

About the Author

Balaji Iyer is a senior consultant in the Professional Services team at Amazon Web Services. In this role, he has helped several customers successfully navigate their journey to AWS. His specialties include architecting and implementing highly-scalable distributed systems, operational security, large scale migrations, and leading strategic AWS initiatives.

Real-Time Hotspot Detection in Amazon Kinesis Analytics

Post Syndicated from Randall Hunt original https://aws.amazon.com/blogs/aws/real-time-hotspot-detection-in-amazon-kinesis-analytics/

Today we’re releasing a new machine learning feature in Amazon Kinesis Data Analytics for detecting “hotspots” in your streaming data. We launched Kinesis Data Analytics in August of 2016 and we’ve continued to add features since. As you may already know, Kinesis Data Analytics is a fully managed real-time processing engine for streaming data that lets you write SQL queries to derive meaning from your data and output the results to Kinesis Data Firehose, Kinesis Data Streams, or even an AWS Lambda function. The new HOTSPOT function adds to the existing machine learning capabilities in Kinesis that allow customers to leverage unsupervised streaming based machine learning algorithms. Customers don’t need to be experts in data science or machine learning to take advantage of these capabilities.

Hotspots

The HOTSPOTS function is a new Kinesis Data Analytics SQL function you can use to idenitfy relatively dense regions in your data without having to explicity build and train complicated machine learning models. You can identify subsections of your data that need immediate attention and take action programatically by streaming the hotspots out to a Kinesis Data stream, to a Firehose delivery stream, or by invoking a AWS Lambda function.

There are a ton of really cool scenarios where this could make your operations easier. Imagine a ride-share program or autonomous vehicle fleet communicating spatiotemporal data about traffic jams and congestion, or a datacenter where a number of servers start to overheat indicating an HVAC issue. HOTSPOTS is not limited to spatiotemporal data and you could apply it across many problem domains.

The function follows some simple syntax and accepts the DOUBLE, INTEGER, FLOAT, TINYINT, SMALLINT, REAL, and BIGINT data types.

The HOTSPOT function takes a cursor as input and returns a JSON string describing the hotspot. This will be easier to understand with an example.

Using Kinesis Data Analytics to Detect Hotspots

Let’s take a simple data set from NY Taxi and Limousine Commission that tracks yellow cab pickup and dropoff locations. Most of this data is already on S3 and publicly accessible at s3://nyc-tlc/. We will create a small python script to load our Kinesis Data Stream with Taxi records which will feed our Kinesis Data Analytics. Finally we’ll output all of this to a Kinesis Data Firehose connected to an Amazon Elasticsearch Service cluster for visualization with Kibana. I know from living in New York for 5 years that we’ll probably find a hotspot or two in this data.

First, we’ll create an input Kinesis stream and start sending our NYC Taxi Ride data into it. I just wrote a quick python script to read from one of the CSV files and used boto3 to push the records into Kinesis. You can put the record in whatever way works for you.

 

import csv
import json
import boto3
def chunkit(l, n):
    """Yield successive n-sized chunks from l."""
    for i in range(0, len(l), n):
        yield l[i:i + n]

kinesis = boto3.client("kinesis")
with open("taxidata2.csv") as f:
    reader = csv.DictReader(f)
    records = chunkit([{"PartitionKey": "taxis", "Data": json.dumps(row)} for row in reader], 500)
    for chunk in records:
        kinesis.put_records(StreamName="TaxiData", Records=chunk)

Next, we’ll create the Kinesis Data Analytics application and add our input stream with our taxi data as the source.

Next we’ll automatically detect the schema.

Now we’ll create a quick SQL Script to detect our hotspots and add that to the Real Time Analytics section of our application.

CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
    "pickup_longitude" DOUBLE,
    "pickup_latitude" DOUBLE,
    HOTSPOTS_RESULT VARCHAR(10000)
); 
CREATE OR REPLACE PUMP "STREAM_PUMP" AS INSERT INTO "DESTINATION_SQL_STREAM" 
    SELECT "pickup_longitude", "pickup_latitude", "HOTSPOTS_RESULT" FROM
        TABLE(HOTSPOTS(
            CURSOR(SELECT STREAM * FROM "SOURCE_SQL_STREAM_001"),
            1000,
            0.013,
            20
        )
    );


Our HOTSPOTS function takes an input stream, a window size, scan radius, and a minimum number of points to count as a hotspot. The values for these are application dependent but you can tinker with them in the console easily until you get the results you want. There are more details about the parameters themselves in the documentation. The HOTSPOTS_RESULT returns some useful JSON that would let us plot bounding boxes around our hotspots:

{
  "hotspots": [
    {
      "density": "elided",
      "minValues": [40.7915039, -74.0077401],
      "maxValues": [40.7915041, -74.0078001]
    }
  ]
}

 

When we have our desired results we can save the script and connect our application to our Amazon Elastic Search Service Firehose Delivery Stream. We can run an intermediate lambda function in the firehose to transform our record into a format more suitable for geographic work. Then we can update our mapping in Elasticsearch to index the hotspot objects as Geo-Shapes.

Finally, we can connect to Kibana and visualize the results.

Looks like Manhattan is pretty busy!

Available Now
This feature is available now in all existing regions with Kinesis Data Analytics. I think this is a really interesting new feature of Kinesis Data Analytics that can bring immediate value to many applications. Let us know what you build with it on Twitter or in the comments!

Randall

GStreamer 1.14 released

Post Syndicated from ris original https://lwn.net/Articles/749765/rss

The GStreamer team has announced
a major feature release of the GStreamer cross-platform multimedia
framework. Highlights include WebRTC support, experimental support for the
next-gen royalty-free AV1 video codec, support for the Secure Reliable
Transport (SRT) video streaming protocol, and much more. The release notes
contain more details.

2018-03-17 малък видео setup

Post Syndicated from Vasil Kolev original https://vasil.ludost.net/blog/?p=3381

Събирам (засега основно в главата си) setup за видео streaming и запис в hackerspace-овете в България. Изискванията са:

– минимална инвестиция в нов хардуер;
– (сравнително) лесно за използване (предполагам, че хората там са поне донякъде технически грамотни);
– възможност за stream-ване на текущите платформи, и може би и в тяхната си страница;
– запис/архивиране;
– поносимо качество.

Целта на setup-а е да се справи с най-простия тип събитие, което е един лектор с презентация.

Компонентите са следните:

– запис на звука – може да е от въздуха, но по-добре една брошка на лектора, + запис на залата по някакъв начин, за въпроси и т.н.;
– усилване на звука – дори в малка зала е добре да се усили звука от лектора и да се пусне на едни колони, най-малкото има feedback дали си е пуснал микрофона;
– видео запис – да се запише видеото от презентацията и може би самия лектор как говори. Това има варианта с камера, която снима лектора и екрана, или screen capture, директно от лаптопа му (или някой по-сложен setup, за който вероятно няма смисъл да пиша);
– streaming – да се извадят аудио/видео сигнала в/у някакъв протокол и да се stream-нат до някоя услуга;
– restreaming – услугата да го разпрати навсякъде и може би да го запише.

Вариантите за компоненти/setup-и в главата ми са следните:

– ffmpeg команда, която stream-ва екрана + звук от звуковата карта, в която има един свестен микрофон – това го имаме в няколко варианта, тествани и работещи (за windows и linux), трябва да ги качим някъде. Това е най-бързия начин, почти не иска допълнителен хардуер (освен един микрофон, щото тия на лаптопите за нищо не стават). Микрофонът може да е например някоя bluetooth/usb слушалка, или просто от слушалки с микрофон, да е близо до главата на лектора. Може да е от стандартните брошки, които се използват по различни събития, аз имам една китайска цифрова, дето в общи линии ме радва и е около 200-и-нещо лева от aliexpress;

– проста малка камера, която може да записва видео от екрана и звук, която може да бълва и по IP някакси. Това в общи линии са gopro-та (ако се намери как да им се пъхне звук) и още някакви подобни камери, които нямат особено добро качество (особено на звука, та задължително трябва външен микрофон), но на хората и се намират.

– проста камера, която обаче не може да бълва по IP, и има HDMI изход. Това е от нещата, които на хората им се намират по някакви причини, и в тая категория са половината DSLR-и и фотоапарати (които не прегряват след дълга (2-часова) употреба), gopro-та и нормален клас камери. Това се комбинира с устройство, което може да capture-ва HDMI и да го stream-ва, където засега опцията е един китайски device.

– streaming service – човек може да ползва youtube, моя streaming, или ако се мрази, facebook. Много места би трябвало да могат да си пуснат нещо просто при тях (например един nginx с модула за rtmp), да stream-ват до него, то да записва, и от него да restream-ват на други места и да дават някакъв лесен начин на хората ги гледат (с едно video.js/hls.js, както последно направихме за openfest).

Та, за момента основните неща, които издирвам са:

– евтини и работещи микрофони;
– евтини работещи камери с hdmi изход (или с ethernet порт, тва с wifi-то е боза), които да са switchable м/у 50hz и 60hz;
– hdmi capture вариант.

Приемам идеи, и ще гледам да сглобя едно такова за initLab.

Our Newest AWS Community Heroes (Spring 2018 Edition)

Post Syndicated from Betsy Chernoff original https://aws.amazon.com/blogs/aws/our-newest-aws-community-heroes-spring-2018-edition/

The AWS Community Heroes program helps shine a spotlight on some of the innovative work being done by rockstar AWS developers around the globe. Marrying cloud expertise with a passion for community building and education, these Heroes share their time and knowledge across social media and in-person events. Heroes also actively help drive content at Meetups, workshops, and conferences.

This March, we have five Heroes that we’re happy to welcome to our network of cloud innovators:

Peter Sbarski

Peter Sbarski is VP of Engineering at A Cloud Guru and the organizer of Serverlessconf, the world’s first conference dedicated entirely to serverless architectures and technologies. His work at A Cloud Guru allows him to work with, talk and write about serverless architectures, cloud computing, and AWS. He has written a book called Serverless Architectures on AWS and is currently collaborating on another book called Serverless Design Patterns with Tim Wagner and Yochay Kiriaty.

Peter is always happy to talk about cloud computing and AWS, and can be found at conferences and meetups throughout the year. He helps to organize Serverless Meetups in Melbourne and Sydney in Australia, and is always keen to share his experience working on interesting and innovative cloud projects.

Peter’s passions include serverless technologies, event-driven programming, back end architecture, microservices, and orchestration of systems. Peter holds a PhD in Computer Science from Monash University, Australia and can be followed on Twitter, LinkedIn, Medium, and GitHub.

 

 

 

Michael Wittig

Michael Wittig is co-founder of widdix, a consulting company focused on cloud architecture, DevOps, and software development on AWS. widdix maintains several AWS related open source projects, most notably a collection of production-ready CloudFormation templates. In 2016, widdix released marbot: a Slack bot supporting your DevOps team to detect and solve incidents on AWS.

In close collaboration with his brother Andreas Wittig, the Wittig brothers are actively creating AWS related content. Their book Amazon Web Services in Action (Manning) introduces AWS with a strong focus on automation. Andreas and Michael run the blog cloudonaut.io where they share their knowledge about AWS with the community. The Wittig brothers also published a bunch of video courses with O’Reilly, Manning, Pluralsight, and A Cloud Guru. You can also find them speaking at conferences and user groups in Europe. Both brothers are co-organizing the AWS user group in Stuttgart.

 

 

 

 

Fernando Hönig

Fernando is an experienced Infrastructure Solutions Leader, holding 5 AWS Certifications, with extensive IT Architecture and Management experience in a variety of market sectors. Working as a Cloud Architect Consultant in United Kingdom since 2014, Fernando built an online community for Hispanic speakers worldwide.

Fernando founded a LinkedIn Group, a Slack Community and a YouTube channel all of them named “AWS en Español”, and started to run a monthly webinar via YouTube streaming where different leaders discuss aspects and challenges around AWS Cloud.

During the last 18 months he’s been helping to run and coach AWS User Group leaders across LATAM and Spain, and 10 new User Groups were founded during this time.

Feel free to follow Fernando on Twitter, connect with him on LinkedIn, or join the ever-growing Hispanic Community via Slack, LinkedIn or YouTube.

 

 

 

Anders Bjørnestad

Anders is a consultant and cloud evangelist at Webstep AS in Norway. He finished his degree in Computer Science at the Norwegian Institute of Technology at about the same time the Internet emerged as a public service. Since then he has been an IT consultant and a passionate advocate of knowledge-sharing.

He architected and implemented his first customer solution on AWS back in 2010, and is essential in building Webstep’s core cloud team. Anders applies his broad expert knowledge across all layers of the organizational stack. He engages with developers on technology and architectures and with top management where he advises about cloud strategies and new business models.

Anders enjoys helping people increase their understanding of AWS and cloud in general, and holds several AWS certifications. He co-founded and co-organizes the AWS User Groups in the largest cities in Norway (Oslo, Bergen, Trondheim and Stavanger), and also uses any opportunity to engage in events related to AWS and cloud wherever he is.

You can follow him on Twitter or connect with him on LinkedIn.

To learn more about the AWS Community Heroes Program and how to get involved with your local AWS community, click here.

 

 

 

 

 

 

 

 

Backblaze Cuts B2 Download Price In Half

Post Syndicated from Ahin Thomas original https://www.backblaze.com/blog/backblaze-b2-drops-download-price-in-half/

Backblaze B2 downloads now cost 50% less
Backblaze is pleased to announce that, effective immediately, we are reducing the price of Backblaze B2 Cloud Storage downloads by 50%. This means that B2 download pricing drops from $0.02 to $0.01 per GB. As always, the first gigabyte of data downloaded each day remains free.

If some of this sounds familiar, that’s because a little under a year ago, we dropped our download price from $0.05 to $0.02. While that move solidified our position as the affordability leader in the high performance cloud storage space, we continue to innovate on our platform and are excited to provide this additional value to our customers.

This price reduction applies immediately to all existing and new customers. In keeping with Backblaze’s overall approach to providing services, there are no tiers or minimums. It’s automatic and it starts today.

Why Is Backblaze Lowering What Is Already The Industry’s Lowest Price?

Because it makes cloud storage more useful for more people.

When we decided to use Backblaze B2 as our cloud storage service, their download pricing at the time enabled us to offer our broadcasters unlimited audio uploads so they can upload past decades of preaching to our extensive library for streaming and downloading. With Backblaze cutting the bandwidth prices 50% to just one penny a gigabyte, we are excited about offering much higher quality video. — Ian Wagner, Senior Developer, Sermon Audio

Since our founding in 2007, Backblaze’s mission has been to make storing data astonishingly easy and affordable. We have a well documented, relentless pursuit of lowering storage costs — it starts with our storage pods and runs through everything we do. Today, we have over 500 petabytes of customer data stored. B2’s storage pricing already being 14 that of Amazon’s S3 has certainly helped us get there. Today’s pricing reduction puts our download pricing 15 that of S3. The “affordable” part of our story is well established.

I’d like to take a moment to discuss the “easy” part. Our industry has historically done a poor job of putting ourselves in our customers’ shoes. When customers are faced with the decision of where to put their data, price is certainly a factor. But it’s not just the price of storage that customers must consider. There’s a cost to download your data. The business need for providers to charge for this is reasonable — downloading data requires bandwidth, and bandwidth costs money. We discussed that in a prior post on the Cost of Cloud Storage.

But there’s a difference between the costs of bandwidth and what the industry is charging today. There’s a joke that some of the storage clouds are competing to become “Hotel California” — you can check out anytime you want, but your data can never leave.1 Services that make it expensive to restore data or place time lag impediments to data access are reducing the usefulness of your data. Customers should not have to wonder if they can afford to access their own data.

When replacing LTO with StarWind VTL and cloud storage, our customers had only one concern left: the possible cost of data retrieval. Backblaze just wiped this concern out of the way by lowering that cost to just one penny per gig. — Max Kolomyeytsev, Director of Product Management, StarWind

Many businesses have not yet been able to back up their data to the cloud because of the costs. Many of those companies are forced to continue backing up to tape. That tape is an inefficient means for data storage is clear. Solution providers like StarWind VTL specialize in helping businesses move off of antiquated tape libraries. However, as Max Kolomyeytsev, Director of Product Management at StarWind points out, “When replacing LTO with StarWind VTL and cloud storage our customers had only one concern left: the possible cost of data retrieval. Backblaze just wiped this concern out of the way by lowering that cost to just one penny per gig.”

Customers that have already adopted the cloud often are forced to make difficult tradeoffs between data they want to access and the cost associated with that access. Surrendering the use of your own data defeats many of the benefits that “the cloud” brings in the first place. Because of B2’s download price, Ian Wagner, a Senior Developer at Sermon Audio, is able to lower his costs and expand his product offering. “When we decided to use Backblaze B2 as our cloud storage service, their download pricing at the time enabled us to offer our broadcasters unlimited audio uploads so they can upload past decades of preaching to our extensive library for streaming and downloading. With Backblaze cutting the bandwidth prices 50% to just one penny a gigabyte, we are excited about offering much higher quality video.”

Better Download Pricing Also Helps Third Party Applications Deliver Customer Solutions

Many organizations use third party applications or devices to help manage their workflows. Those applications are the hub for customers getting their data to where it needs to go. Leaders in verticals like Media Asset Management, Server & NAS Backup, and Enterprise Storage have already chosen to integrate with B2.

With Backblaze lowering their download price to an amazing one penny a gigabyte, our CloudNAS is even a better fit for photographers, videographers and business owners who need to have their files at their fingertips, with an easy, reliable, low cost way to use Backblaze for unlimited primary storage and active archive. — Paul Tian, CEO, Morro Data

For Paul Tian, founder of Ready NAS and CEO of Morro Data, reasonable download pricing also helps his company better serve its customers. “With Backblaze lowering their download price to an amazing one penny a gigabyte, our CloudNAS is even a better fit for photographers, videographers and business owners who need to have their files at their fingertips, with an easy, reliable, low cost way to use Backblaze for unlimited primary storage and active archive.”

If you use an application that hasn’t yet integrated with B2, please ask your provider to add B2 Cloud Storage and mention the application in the comments below.

 

How Do the Major Cloud Storage Providers Compare on Pricing?

Not only is Backblaze B2 storage 14 the price of Amazon S3, Google Cloud, or Azure, but our download pricing is now 15 their price as well.

Pricing TierBackblaze B2Amazon S3Microsoft AzureGoogle Cloud
First 1 TB$0.01$0.09$0.09$0.12
Next 9 TB$0.01$0.09$0.09$0.11
Next 40 TB$0.01$0.085$0.09$0.08
Next 100 TB$0.01$0.07$0.07$0.08
Next 350 TB+$0.01$0.05$0.05$0.08

Using the chart above, let’s compute a few examples of download costs…

DataBackblaze B2Amazon S3Microsoft AzureGoogle Cloud
1 terabyte$10$90$90$120
10 terabytes$100$900$900$1,200
50 terabytes$500$4,300$4,500$4,310
500 terabytes$5,000$28,800$29,000$40,310
Not only is Backblaze B2 pricing dramatically lower cost, it’s also simple — one price for any amount of data downloaded to anywhere. In comparison, to compute the cost of downloading 500 TB of data with S3 you start with the following formula:
(($0.09 * 10) + ($0.085 * 40) + ($0.07 * 100) + ($0.05 * 350)) * 1,000
Want to see this comparison for the amount of data you manage?
Use our cloud storage calculator.

Customers Want to Avoid Vendor Lock In

Halving the price of downloads is a crazy move — the kind of crazy our customers will be excited about. When using our Transmit 5 app on the Mac to upload their data to B2 Cloud Storage, our users can sleep soundly knowing they’ll be getting a truly affordable price when they need to restore that data. Cool beans, Backblaze. — Cabel Sasser, Co-Founder, Panic

As the cloud storage industry grows, customers are increasingly concerned with getting locked in to one vendor. No business wants to be fully dependent on one vendor for anything. In addition, customers want multiple copies of their data to mitigate against a vendor outage or other issues.

Many vendors offer the ability for customers to replicate data across “regions.” This enables customers to store data in two physical locations of the customer’s choosing. Of course, customers pay for storing both copies of the data and for the data transfer between regions.

At 1¢ per GB, transferring data out of Backblaze is more affordable than transferring data between most other vendor regions. For example, if a customer is storing data in Amazon S3’s Northern California region (US West) and wants to replicate data to S3 in Northern Virginia (US East), she will pay 2¢ per GB to simply move the data.

However, if that same customer wanted to replicate data from Backblaze B2 to S3 in Northern Virginia, she would pay 1¢ per GB to move the data. She can achieve her replication strategy while also mitigating against vendor risk — all while cutting the bandwidth bill by 50%. Of course, this is also before factoring the savings on her storage bill as B2 storage is 14 of the price of S3.

How Is Backblaze Doing This?

Simple. We just changed our pricing table and updated our website.

The longer answer is that the cost of bandwidth is a function of a few factors, including how it’s being used and the volume of usage. With another year of data for B2, over a decade of experience in the cloud storage industry, and data growth exceeding 100 PB per quarter, we know we can sustainably offer this pricing to our customers; we also know how better download pricing can make our customers and partners more effective in their work. So it is an easy call to make.

Our pricing is simple. Storage is $0.005/GB/Month, Download costs are $0.01/GB. There are no tiers or minimums and you can get started any time you wish.

Our desire is to provide a great service at a fair price. We’re proud to be the affordability leader in the Cloud Storage space and hope you’ll give us the opportunity to show you what B2 Cloud Storage can enable for you.

Enjoy the service and I’d love to hear what this price reduction does for you in the comments below…or, if you are attending NAB this year, come by to visit and tell us in person!


1 For those readers who don’t get the Eagles reference there, please click here…I promise you won’t regret the next 7 minutes of your life.

The post Backblaze Cuts B2 Download Price In Half appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

2018-03-09 чл. 13

Post Syndicated from Vasil Kolev original https://vasil.ludost.net/blog/?p=3379

(Даже вестниците вече писаха за това, крайно време е да довърша тоя post)

Та, има “нова” директива на ЕК, за унифициране на законите за авторските права из европейския съюз.
(нова – на 10на години е, ама чак сега се опитва да влезе, и има някакви бегли шансове да се вкара по времето на нашето председателство)

В нея има количество глупости, но частта, с която имам занимание е чл. 13, който в общи линии казва “internet/hosting доставчиците са отговорни за съдържанието, което се качва при тях и трябва да го филтрират”. Не знам за вас, но за мен цялото нещо е твърде много deja vu, в последните 20на години все някой се опитва да пробута нещо подобно, в някакъв вид. Това без значение, че няма как да се направи работеща система, без значение, че всъщност е инфраструктура за цензура (но по-мощна от тази, която има в закона за хазарта) и че в крайна сметка няма да доведе до подобрение за когото и да е, освен може би хората, които продават “филтриращи” системи.

Един от хубавите доводи за това, че такава система няма как да стане идва от wikipedia – има две правителства, които са им поискали да филтрират съдържание, Иран и Китай. Не знам за колко са добре технологично иранците, но ми се струва, че китайците ако можеха, вече щяха да са направили работеща такава система и нямаше да занимават wikipedia. Нещата, които аз съм виждал, са неефективни и с достатъчно грешки (и false positives, и false negatives), че да са неизползваеми за среден размер сайт, понеже така и така някой трябва да преглежда проблемите.
(Вероятно не съм единствения, дето поради идиотщини на youtube постоянно има отворени няколко copyright dispute-а. В един момент човек решава да си ползва собствен streaming и да не го занимават с такива неща…)

Та, има една инициатива по темата това да не влезе, article13.bg, към които съм се включил. Подкрепя ги фондация “Отворени проекти”, initLab и всякакви други добри хора. Вижте, подкрепете, споменете на всички, дето ще ги засегне (всички хостинги, например…).