Tag Archives: NAB

Backblaze’s Must See List for NAB 2019

Post Syndicated from Skip Levens original https://www.backblaze.com/blog/what-not-to-miss-nab2019/

Collage of logos from Backblaze B2 cloud storage partners

With NAB 2019 only days away, the Backblaze team is excited to launch into the world’s largest event for creatives, and our biggest booth yet!

Must See — Backblaze Booth

This year we’ll be celebrating some of the phenomenal creative work by our customers, including American Public Television, Crisp Video, Falcons’ Digital Creative, WunderVu, and many more.

We’ll have workflow experts standing by to chat with you about your workflow frustrations, and how Backblaze B2 Cloud Storage can be the key to unlocking efficiency and solving storage challenges throughout your entire workflow: From Action! To Archive. With B2, you can focus on creating and managing content, not managing storage.

Create: Bring Your Story to Life

Stop by our booth and we can show you how you can protect your content from ingest through work-in-process by syncing seamlessly to the cloud. We can also detail how you can improve team collaboration and increase content reuse by organizing your content with one of our MAM integrations.

Distribute: Share Your Story With the World

Our experts can show you how B2 can help you scale your content library instantly and indefinitely, and avoid the hassle and expense of on-premises storage. We can demonstrate how everything in your content library can be served directly from your B2 account or through our content delivery partners like Cloudflare.

Preserve: Make Sure Your Story Lives Forever

Want to see the math behind the first cloud storage that’s more affordable than LTO? We can step through the numbers. We can also show you how B2 will keep your archived content accessible, anytime, and anywhere, through a web browser, API calls, or one of our integrated applications listed below.

Must See — Workflow Integrations You Can Count On

Our fantastic workflow partners are a critical part of your creative workflow backed by Backblaze — and there’s a lot of partner news to catch up on!

Drop by our booth to pick up a handy map to help you find Backblaze partners on the show floor including:

Backup and Archive Workflow Integrations

Archiware P5, booth SL15416
SyncBackPro, Wynn Salon — J

File Transfer Acceleration, Data Wrangling, Data Movement

FileCatalyst, booth SL12116
Hedge, booth SL14805

Asset and Collaboration Managers

axle ai, booth SL15116
Cantemo iconik, booth SL6021
Cantemo (Portal), booth SL6021
CatDV, booth SL5421
Cubix (Ortana Media Group), booth SL5922
eMAM, booth SL10224

Workflow Storage

Facilis, booth SL6321
GB Labs, booth SL5324
ProMAX, booth SL6313
Scale Logic, booth SL11109
Tiger Technology, booth SL8505
QNAP, booth SL15716
Seagate, booth SL8511
StorageDNA, booth SL11810

Must See — Backblaze Events during NAB

Monday morning we’re delivering a presentation in the Scale Logic Knowledge Zone, and Tuesday night of NAB we’re honored to help sponsor the all-new Faster Together event that replaces the long-standing Las Vegas Creative User Supermeet event.

We’ll be raffling off a Hover2 4K drone powered by AI to help you get that perfect drone shot for your next creative film! So after the NAB show wraps up on Tuesday, head over to the Rio main ballroom for a night of mingling with creatives and amazing talks by some of the top editors, colorists, and VFX artists in the industry.

ProVideoTech and Backblaze at Scale Logic Knowledge Zone
Monday April 8 at 11 AM
Scale Logic Knowledge Zone, NAB Booth SL111109
Monday of NAB, Backblaze and PVT will deliver a live presentation for NAB attendees on how to build hybrid-cloud workflows with Cantemo and Backblaze.
Scale Logic Media Management Knowledge Zone

Backblaze at The Faster Together Stage
Tuesday, April 9
Rio Las Vegas Hotel and Casino
Doors open at 4:30 PM, stage presentations begin at 7:00 PM
Reserve Tickets for the Faster Together event

If you haven’t yet, be sure to sign up and reserve your meeting time with the Backblaze team, and add us to your Map My Show NAB plan and we’ll see you there!

  NAB 2019 is just a few days away. NABShow logoSchedule a meeting with our cloud storage experts to learn how B2 Cloud Storage can streamline your workflow today!

The post Backblaze’s Must See List for NAB 2019 appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Migrating Your Legacy Archive to Future-Ready Architecture

Post Syndicated from Janet Lafleur original https://www.backblaze.com/blog/ortana-cubix-core-media-archive/

This is one in a series of posts on professional media management leading up to NAB 2019 in Las Vegas, April 8 to 11.
–Editor

Guest blog post by James Gibson, Founder & CEO of Ortana Media Group

There’s a wide range of reasons why businesses want to migrate away from their current archive solution, ranging from managing risk, concerns over legacy hardware, media degradation and format support. Many businesses also find themselves stuck with closed format solutions that are based on legacy middleware with escalating support costs. It is a common problem that we at Ortana have helped many clients overcome through smart and effective use of the many storage solutions available on the market today. As founder and CEO of Ortana, I want to share some of our collective experience around this topic and how we have found success for our clients.

First, we often forget how quickly the storage landscape changes. Let’s take a typical case.

It’s Christmas 2008 and a CTO has just finalised the order on their new enterprise-grade hierarchical storage management (HSM) system with an LTO-4 tape robot. Beyonce’s Single Ladies is playing on the radio, GPS on phones has just started to be rolled out, and there is this new means of deploying mobile apps called the Apple™ App Store! The system purchased is from a well established, reputable company and provides peace of mind and scalability — what more can you ask for? The CTO goes home for the festive season — job well done — and hopes Santa brings him one of the new Android phones that have just launched.

Ten years on, the world is very different and Moore’s law tells us that the pace of technological change is only set to increase. That growing archive has remained on the same hardware, controlled by the same HSM and has gone through one or two expensive LTO format changes. “These migrations had to happen,” the CTO concedes, as support for the older LTO formats was being dropped by the hardware supplier. Their whole content library had to be restored and archived back to the new tapes. New LTO formats also required new versions of the HSM, and whilst these often included new features — over codec support, intelligent repacking and reporting — the fundamentals of the system remained: closed format, restricted accessibility, and expensive. Worse still, the annual support costs are increasing whilst the new feature development has ground to a halt. Sure the archive still works, but for how much longer?

Decisions, Decisions, So Many Migration Decisions

As businesses make the painful decision to migrate their legacy archive, the choices of what, where, and how become overwhelming. The storage landscape today is a completely different picture from when closed format solutions went live. This change alone offers significant opportunities to businesses. By combining the right storage solutions with seamless architecture and with lights out orchestration driving the entire process, businesses can flourish by allowing their storage to react to the needs of the business, not constrain them. Ortana has purposefully ensured Cubix (our asset management, automation, and orchestration platform) is as storage agnostic as possible by integrating a range of on-premises and cloud-based solutions, and built an orchestration engine that is fully abstracted from this integration layer. The end result is that workflow changes can be done in seconds without affecting the storage.

screenshot of Cubix workflow
Cubix’s orchestration platform includes a Taskflow engine for creating customized workflow paths

As our example CTO would say (shaking their head no doubt whilst saying it), a company’s main priority is to not-be-here-again, and the key is to store media in an open format, not bound to any one vendor, but also accessible to the business needs both today and tomorrow. The cost of online cloud storage such as Backblaze has now made storing content in the cloud more cost effective than LTO and this cost is only set to reduce further. This, combined with the ample internet bandwidth that has become ubiquitous, makes cloud storage an obvious primary storage target. Entirely agnostic to the format and codec of content you are storing, aligned with MPAA best practices and easily integrated to any on-premise or cloud-based workflows, cloud storage removes many of the issues faced by closed-format HSMs deployed in so many facilities today. It also begins to change the dialogue over main vs DR storage, since it’s no longer based at a facility within the business.

Cloud Storage Opens Up New Capabilities

Sometimes people worry that cloud storage will be too slow. Where this is true, it is almost always due to poor cloud implementation. B2 is online, meaning that the time-to-first-byte is almost zero, whereas other cloud solutions such as Amazon Glacier are cold storage, meaning that the time-to-first-byte ranges from at best one to two hours, but in general six to twelve hours. Anything that is to replace an LTO solution needs to match or beat the capacity and speed of the incumbent solution, and good workflow design can ensure that restores are done as promptly as possible and direct to where the media is needed.

But what about those nasty egress costs? People can get caught off guard when this is not budgeted for correctly, or when their workflow does not make good use of simple solutions such as proxies. Regardless of whether your archive is located on LTO or in the cloud, proxies are critical to keeping accessibility up and costs and restore times down. By default, when we deploy Cubix for clients we always generate a frame accurate proxy for video content, often devalued through the use of burnt-in timecode (BITC), logos, and overlays. Generated using open source transcoders, they are incredibly cost effective to generate and are often only a fraction of the size of the source files. These proxies, which can also be stored and served directly from B2 storage, are then used throughout all our portals to allow users to search, find, and view content. This avoids the time and cost required to restore the high resolution master files. Only when the exact content required is found is a restore submitted for the full-resolution masters.

Multiple Copies Stored at Multiple Locations by Multiple Providers

Moving content to the cloud doesn’t remove the risk of working with a single provider, however. No matter how good or big they are, it’s always a wise idea to ensure an active disaster recovery solution is present within your workflows. This last resort copy does not need all the capabilities of the primary storage, and can even be more punitive when it comes to restore costs and times. But it should be possible to enable in moments, and be part of the orchestration engine rather than being a manual process.

The need to de-risk that single provider, or for workflows where 30-40% of the original content has to be regularly restored (as proxies do not meet the needs of the workflow), on premise archive solutions still can be deployed without being caught in the issues discussed earlier. Firstly, LTO now offers portability benefits through LTFS, an easy to use open format, which critically has its specification and implementation within the public domain. This ensures it is easily supported by many vendors and guarantees support longevity for on-premises storage. Ortana with its Cubix platform supports many HSMs that can write content in native LTFS format that can be read by any standalone drive from any vendor supporting LTFS.

Also, with 12 TB hard drives now standard in the marketplace, nearline based storage has also become a strong contender for content when combined with intelligent storage tiering to the cloud or LTO. Cubix can fully automate this process, especially when complemented by such vendors as GB Labs’ wide range of hardware solutions. This mix of cloud, nearline and LTO — being driven by an intelligent MAM and orchestration platform like Cubix to manage content in the most efficient means possible on a per workflow basis — blurs the lines between primary storage, DR, and last resort copies.

Streamlining the Migration Process

Once you have your storage mix agreed upon and in place, now your fraught task is getting your existing library onto the new solution whilst not impacting access to the business. Some HSM vendors suggest swapping your LTO tapes by physically removing them from one library and inserting them into another. Ortana knows that libraries are often the linchpin of the organisation and any downtime has significant negative impact that can fill media managers with dread, especially since these one shot, one direction migrations can easily go wrong. Moreover, when following this route, simply moving tapes does not persist any editorial metadata or resolve many of the objectives around making content more available. Cubix not only manages the media and the entire transformation process, but also retains the editorial metadata from the existing archive also.

screenshot of Cubix search results
During the migration process, content can be indexed via AI-powered speech to text and image recognition

Given the high speeds that LTO delivers, combined with the scalability of Cubix, the largest libraries can be migrated in short timescales, whilst having zero downtime on the archive. Whilst the content is being migrated to the defined mix of storage targets, Cubix can perform several tasks on the content to further augment the metadata, including basics such as proxy and waveform generation, through to AI based image detection and speech to text. Such processes only further reduce the time spent by staff looking for content, and further refine the search capability to ensure only that content required is restored — translating directly to reduced restore times and egress costs.

A Real-World Customer Example

Many of the above concerns and considerations led a large broadcaster to Ortana for a large-scale migration project. The broadcaster produces in-house news and post production with multi-channel linear playout and video-on-demand (VoD). Their existing archive was 3 PB of media across two generations of LTO tape managed by Oracle™ DIVArchive & DIVADirector. They were concerned about on-going support for DIVA and wanted to fully migrate all tape and disk-based content to a new HSM in an expedited manner, making full use of the dedicated drive resources available.

Their primary goal was to fully migrate all editorial metadata into Cubix, including all ancillary files (subtitles, scripts, etc.), and index all media using AI-powered content discovery to reduce searching times for news, promos /and sports departments at the same time. They also wanted to replace the legacy Windows Media Video (WMV) proxy with new full HD H264 frame accurate proxy, and provide the business secure, group-based access to the content. Finally, they wanted all the benefits of cloud storage, whilst keeping costs to a minimum.

With Ortana’s Cubix Core, the broadcaster was able to safely migrate their DIVAarchive to two storage platforms: LTFS with a Quantum HSM system and Backblaze B2 cloud storage. Their content was indexed via AI powered image recognition (Google Vision) and speech to text (Speechmatics) during the migration process, and the Cubix UI replaced existing archive as media portal for both internal and external stakeholders.

The new solution has vastly reduced the timescales for content processing across all departments, and has led to a direct reduction in staff costs. Researchers report a 50-70% reduction in time spent searching for content, and the archive shows a 40% reduction in restore requests. By having the content located in two distinct geographical locations they’ve entirely removed their business risk of having their archive with a single vendor and in a single location. Most importantly, their archived content is more active than ever and they can be sure it will stay alive for the future.

How exactly did Ortana help them do it? Join our webinar Evading Extinction: Migrating Legacy Archives on Thursday, March 28, 2019. We’ll detail all the steps we took in the process and include a live demo of Cubix. We’ll show you how straightforward and painless the archive migration can be with the right strategy, the right tools, and the right storage.

— James Gibson, Founder & CEO, Ortana Media Group

•  •  •

Backblaze will be exhibiting at NAB 2019 in Las Vegas on April 8-11, 2019.NABShow logoSchedule a meeting with our cloud storage experts to learn how B2 Cloud Storage can streamline your workflow today!

The post Migrating Your Legacy Archive to Future-Ready Architecture appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

A Workflow Playbook for Migrating Your Media Assets to a MAM

Post Syndicated from Skip Levens original https://www.backblaze.com/blog/workflow-playbook-migrating-your-media-assets-to-a-mam/

This post was originally published in 2019 and has been updated with the latest information on media asset management.

As a media professional, you’ve probably come across some…let’s say, “creative” file naming conventions in your day. While it’s hilarious, “Episode 6: OH YEAH BABY THIS IS THE RIGHT ONE LOL.mp4” isn’t going to be the easiest thing to find years later when you’re searching through hundreds of files for…the right one.

Whether you make videos, images, or music, the more you produce, the more difficult those assets become to manage, organize, find, and protect. Managing files by carefully placing them in specific folders and implementing more logical naming conventions can only get you so far. At some point, as the scale of your business grows, you’ll find your current way of organizing and searching for assets can’t keep up.

Getting your assets into a media asset management (MAM) system will make your library much easier to navigate. You’ll be able to quickly search for the exact media you need for a new project. Your team will be more efficient and organized, and you will be able to deliver your finished content faster.

In this post, we’ll explain some asset management basics and introduce five key plays you can put into practice to get the most out of your assets, including how to move them into an asset management system or migrate from an older system to a new one. Read on to learn more.

Interested in learning more? Get the complete guide to optimizing media workflows at the link below.

➔ Download Our Media Workflows E-book

Assets and Metadata

Before you start building a playbook to get the most from your creative assets, let’s review a few key concepts.

Asset: A rich media file with intrinsic metadata.

An asset is simply a file that is the result of your creative operation. Most often, it is a rich media file like an image or a video. Typically, these files are captured or created in a raw state, then your creative team adds value to that raw asset by editing it together with other assets to create a finished story that in turn, becomes another asset to manage.

Metadata: Information about a file, either embedded within the file itself or associated with the file by another system, typically a MAM application.

Any given file carries information about itself that can be understood by your laptop or workstation’s operating system. Some of these seem obvious, like the name of the file, how much storage space it occupies, when it was first created, and when it was last modified. These would all be helpful ways to try to find one particular file you are looking for among thousands just using the tools available in your OS’s file manager.

File Metadata

There’s usually another level of metadata embedded in media files that is not so obvious but potentially enormously useful: metadata embedded in the file when it’s created by a camera, film scanner, or output by a program.

Results of a file inspected by an operating system's file manager
An example of metadata embedded in a rich media file.

For example, this image taken in Backblaze’s data center carries all kinds of interesting information. When I inspect the file on macOS’s Finder with Get Info, I can see a wealth of information. I now know the image’s dimensions and when the image was taken, as well as exactly what kind of camera took this picture and the lens settings that were used.

As you can see, this metadata could be very useful if you want to find all images taken on that day, or even images taken with that same camera, focal length, F-stop, or exposure.

When a File and Folder System Can’t Keep Up

Going through files one at a time to find the one you need is incredibly inefficient. Yet that’s how things still work in many creative environments—an ad hoc system of folders plus the memory of whoever’s been with the team longest. Files are often kept on the same storage used for production or even on an external hard drive.

Teams quickly outgrow that system when they find themselves juggling multiple hard drives or they run out of space on production storage. Worst of all, assets kept on a single hard drive are vulnerable to disk damage or to being accidentally copied or overwritten.

Why Your Assets Need to be Managed

To meet this challenge, creative teams have often turned to MAMs. A MAM automatically extracts all of the assets’ inherent metadata, helps move files to protected storage, and makes them instantly available to MAM users. In a way, these MAMs become a private media search engine where any file attribute can be a search query to instantly uncover the needed files in even the largest media asset libraries.

Beyond that, asset management systems are rapidly becoming highly effective collaboration and workflow tools. For example, tagging a series of files as Field Interviews — April 2019, or flagging an edited piece of content as HOLD — do not show customer can be very useful indeed.

The Inner Workings of a Media Asset Manager

When you add files into an asset management system, the application inspects each file, extracting every available bit of information about the file, noting the file’s location on storage, and often creating a smaller stand-in or proxy version of the file that is easier to present to users.

To keep track of this information, asset manager applications employ a database and keep information about your files in it. This way, when you’re searching for a particular set of files among your entire asset library, you can simply make a query of your asset manager’s database in an instant rather than rifling through your entire asset library storage system. The application takes the results of that database query and retrieves the files you need.

A MAM Case Study: Complex Networks

Complex Networks used a TerraBlock by Facilis storage device. As acquisitions added new groups to their team, they found they were starting to run out of space. Whenever the local shared storage filled up, they would pull assets off to give everybody enough room to continue working.

They found media asset management provider iconik and immediately recognized its advantages. They moved all of their assets there, and, with Backblaze’s integration with iconik, backed them all up to the Backblaze B2 Cloud Storage. They’re now free to focus on what they do best—making culture-defining content—rather than spending time searching for assets.

The Asset Migration Playbook

Whether you need to move from a file and folder based system to a new asset manager, or have been using an older system and want to move to a new one without losing all of the metadata that you have painstakingly developed, a sound playbook for migrating your assets can help guide you. Below we’ll explain five plays you can use to approach your asset management journey:

  1. Protecting Assets Saved in a Folder Hierarchy Without an Asset Management System.
  2. Moving Assets Saved in a Folder Hierarchy into Your Asset Management System and Archiving in Cloud Storage.
  3. Getting a Lot of Assets on Local Storage into Your Asset Management System and Backing Up to Cloud Storage.
  4. Moving from One Asset Manager System to a New One Without Losing Metadata.
  5. Moving Quickly from a MAM on Local Storage to a Cloud-based System.

Play 1: Protecting Assets Saved in a Folder Hierarchy Without an Asset Management System

In this scenario, your assets are in a set of files and folders, and you aren’t ready to implement your asset management system yet.

The first consideration is for the safety of the assets. Files on a single hard drive are vulnerable, so if you are not ready to choose an asset manager your first priority should be to get those files into a secure cloud storage service like Backblaze B2.

Check out our post, “How Backup and Archive Are Different for Professional Media Workflows,” for a detailed guide on backing up and archiving your assets and best practices for doing so.

Then, when you have chosen an asset management system, you can simply point the system at your cloud-based asset storage to extract the metadata out of the files and populate the asset information in your asset manager.

The TL/DR Version:

  1. Get assets archived or moved to cloud storage.
  2. Choose your asset management system.
  3. Ingest assets directly from your cloud storage.

Play 2: Moving Assets Saved in a Folder Hierarchy Into Your Asset Management System and Archiving in Cloud Storage

In this scenario, you’ve chosen your asset management system, and need to get your local assets in files and folders ingested and protected in the most efficient way possible.

You’ll ingest all of your files into your asset manager from local storage, then back them up to cloud storage. Once your asset manager has been configured with your cloud storage credentials, it can automatically move a copy of local files to the cloud for you. Later, when you have confirmed that the file has been copied to the cloud, you can safely delete the local copy.

The TL/DR Version:

  1. Ingest assets from local storage directly into your asset manager system
  2. From within your asset manager system archive a copy of files to your cloud storage
  3. Once safely archived, the local copy can be deleted

Play 3: Getting a Lot of Assets on Local Storage into Your Asset Management System and Backing Up to Cloud Storage

If you have a lot of content, more than say, 20TB, you will want to use a rapid ingest service similar to the Backblaze Fireball system. You copy the files to the Backblaze Fireball, Backblaze puts them directly into your asset management bucket, and the asset manager is then updated with the file’s new location in your Backblaze B2 account.

This can be a manual process, or can be done with scripting to make the process faster.

You can read about one such migration using this play here:
“iconik and Backblaze: The Cloud Production Solution You’ve Always Wanted.”

The TL/DR Version:

  1. Ingest assets from local storage directly into your asset manager system.
  2. Archive your local assets to Fireball (up to 90TB at a time).
  3. Once the files have been uploaded by Backblaze, relink the new location of the cloud copy in your asset management system.

You can read more about the Backblaze Fireball on our website.

Play 4: Moving from One Asset Manager System to a New One Without Losing Metadata

In this scenario you have an existing asset management system and need to move to a new one as efficiently as possible. You want to take advantage of your new system’s features and safeguard in cloud storage in a way that does not impact your existing production.

Some asset management systems will allow you to export the database contents in a format that can be imported by a new system. Some older systems may not have that feature and will require the expertise of a database expert to manually extract the metadata. Either way, you can expect to need to map the fields from the old system to the fields in the new system.

Making a copy of your old database is a must. Don’t work on the primary copy, and be sure to conduct tests on small groups of files as you’re migrating from the older system to the new. You need to ensure that the metadata is correct in the new system, with special attention that the actual file location is mapped properly. It’s wise to keep the old system up and running for a while before completely phasing it out.

The TL/DR Version:

  1. Export the database from the old system.
  2. Import the records into the new system.
  3. Ensure that the metadata is correct in the new system and file locations are working properly.
  4. Make archive copies of your files to cloud storage.
  5. Once the new system has been running through a few production cycles, it’s safe to power down the old system.

Play 5: Moving Quickly from a MAM on Local Storage to a Cloud-based System

In this variation of Play 4, you can move content to object storage with a rapid ingest service like Backblaze Fireball at the same time that you migrate to a cloud-based system. This step will benefit from scripting to create records in your new system with all of your metadata, then relink with the actual file location in your cloud storage all in one pass.

You should test that your asset management system can recognize a file already in the system without creating a duplicate copy of the file. This is done differently by each asset management system.

The TL/DR Version:

  1. Export the database from the old system.
  2. Import the records into the new system while creating placeholder records with the metadata only.
  3. Archive your local assets to the Backblaze Fireball (up to 90TB at a time).
  4. Once the files have been uploaded by Backblaze, relink the cloud based location to the asset record.
Bonus Play: Using Cloud Storage to Scale a Media Heavy Workload

Photographer Gavin Wade was dissatisfied with digital image delivery systems in 2014, so he set out to create a better platform for photographers worldwide—CloudSpot. Rapid growth led storage costs for the 120 million assets he had under management to snowball under legacy provider Amazon S3.

CloudSpot proceeded to move its 700 TB to Backblaze in six days with no service disruption, reducing broader operating costs 50% and data transfer costs 90%.

Wrapping Up

Every creative environment is different, but all need the same thing: to be able to find assets fast and organize content to enhance productivity and rest easy knowing that content is safe.

With these plays, you can take that step and be ready for any future production challenges and opportunities.

If you’re interested in learning more, download our e-book on optimizing media workflows.

The post A Workflow Playbook for Migrating Your Media Assets to a MAM appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

How Backup and Archive Are Different for Professional Media Workflows

Post Syndicated from Janet Lafleur original https://www.backblaze.com/blog/backup-vs-archive-professional-media-production/

When to back up and when to archive? It’s a simple question, and the simple answer is that it depends on the function of the data you’re looking to archive or back up. For media teams, a solid understanding of how your data functions, how often you need it, and how fast you need it is required in order to implement the right tools.

In this post, we’ll explain the difference between backing up and archiving for media teams, and we’ll walk through a real-world application from UCSC Silicon Valley.

Backup vs. Archive: A Refresher

We explored the broader topic of backing up vs. archiving in our “What’s the Diff: Backup vs. Archive” post. In short, you should use a backup if you intend to keep the data available in case of loss. If you make a copy for regulatory compliance, or to move older, less-used data off to cheaper storage, you should use an archive. Simple, right? Not always, if you’re talking about image, video, and other media files. Read on to learn more.

Backup vs. Archive for Professional Media Workflows

Definitions of backup and archive that apply to general business use cases don’t always apply to professional media workflows. Video and image files differ from typical business data in a number of ways, and that profoundly impacts how they’re protected and preserved throughout their lifecycle.

When backing up media files, there are key differences between which files get backed up and how they get backed up. When archiving media files, there are key differences between when files get archived and why they’re archived. The main differences between business files and media workflow files include:

  • Size: Media files are much larger and more intermediate files are generated through the production process.
  • Archive use case: Media teams archive to save space on their on-premises production storage, while most businesses archive to meet compliance requirements.
  • Archive timing: Media teams will frequently archive source files immediately upon ingestion in addition to final cuts, whereas only final versions need to be archived in business use cases.

We’ll explain each of these elements in more detail below.

Large Media File Sizes Slow Down Backups

The most obvious difference is that media files are BIG. Most business documents are under 30MB in size, yet even a single second of video could be larger than 30MB depending on the resolution and frame rate. In a typical business use case, a company might plan to back up files overnight, say for incremental backups, or over a weekend for a full backup. But backing up large media files might exceed those windows. And you can’t expect deduplication to shorten backup times or reduce backup sizes, either. Video and images don’t dedupe well.

Furthermore, the editing process generates a flurry of intermediate or temporary files in the active content creation workspace that don’t need to be backed up because they can be easily regenerated from source files.

The best backup solutions for media allow you to specify exactly which directories and file types you want backed up, so that you’re taking time for and paying for only what you need.

Archiving to Save Space on Production Storage

Media teams tend to use archiving to reduce production storage costs, whereas businesses are much more likely to use archives for compliance purposes. High-resolution video editing, for example, requires expensive, high-performance storage to deliver multiple streams of content to multiple users simultaneously without dropping frames. Since high-resolution files are so large, this expensive storage resource fills up quickly. Once a project is complete, most media teams prefer to clear space for the next project. Archiving completed projects and infrequently-used assets can keep production storage capacities under control.

Media asset managers (MAMs) can simplify the archive, retrieval, and distribution process. Assets can be archived directly through the MAM’s user interface, and after archiving, thumbnails or proxies remain visible to users. Archived content remains fully searchable by its metadata and can also be retrieved directly through the MAM interface. For more information on MAMs, read “What’s the Diff: DAM vs. MAM.”

Media teams can manage budgets effectively by strategically archiving select media files to less expensive storage. Content is readily accessible should it be needed for redistribution, repurposing, and monetization, especially when archiving is done properly.

Permanently Secure Source Files and Raw Footage on Ingest

A less obvious way that media workflows are different from business workflows is that video files are fixed content that are not actually altered during the editing process. Instead, editing suites compile changes to be made to the original and apply the changes only when making the final cut and format for delivery. Since these source files are often irreplaceable, many facilities save a copy to secondary storage immediately as soon as they’re ingested to the workflow. This copy serves as a backup to the file on local storage during the editing process. Later, when the local copy is no longer actively being used, it can be safely deleted knowing it’s secured in the archive. I mean backup. Wait, which is it?

Whether you call it archive or backup, make a copy of source files in a storage location that lives forever and is accessible for repurposing throughout your workflow.

To see how all this works in the real world, here’s how UCSC Silicon Valley designed a new solution that integrates backup, archive, and asset management with Backblaze B2 Cloud Storage so that their media is protected, preserved, and organized at every step of their workflow.

Still from UC Scout AP Psychology video
Still from UC Scout AP Psychology video.

How UCSC Silicon Valley Secured Their Workflow’s Data

UCSC Silicon Valley built a greenfield video production workflow to support UC Scout, a University of California online learning program that gives high school students access to the advanced courses they need to be eligible and competitive for college. Three teams of editors, producers, graphic designers, and animation artists—a total of 22 creative professionals—needed to share files and collaborate effectively, and Digital Asset Manager, Sara Brylowski, was tasked with building and managing their workflow.

Brylowski and her team had specific requirements:

  • For backup, they needed to protect active files on their media server with an automated backup solution that allowed accidentally deleted files to be easily restored.
  • To manage storage capacity more effectively on their media server, they wanted to archive completed videos and other assets that they didn’t expect to need immediately.
  • To organize content, they needed an asset manager with seamless archive capabilities, including fast self-service archive retrieval.

They wanted the reliability and simplicity of the cloud to store both their backup and archive data. “We had no interest in using LTO tape for backup or archive. Tape would ultimately require more work and the media would degrade. We wanted something more hands off and reliable,” Brylowski explained.

They chose Backblaze B2 Cloud Storage along with a Facilis media storage system and CatDV media asset management software.

The solution delivered results quickly. Production team members could fully focus on creating content without concern for storage challenges. Retrievals and restores, as needed, became a breeze. Meanwhile, UCSC IT staff were freed from wrestling gnarly video data. And the whole setup helped Brylowski bring UC Scout’s off-premises storage costs under control as she plans for significant content growth ahead.

“With our new workflow, we can manage our content within its life cycle and at the same time, have reliable backup storage for the items we know we’re going to want in the future. That’s allowed us to concentrate on creating videos, not managing storage.”
—Sara Brylowski, UCSC Silicon Valley

To find out exactly how Brylowski and her team solved their challenges and more, read the full case study on UC Scout at UCSC Silicon Valley and learn how their new workflow enables them to concentrate on creating videos, not managing storage.

Looking for storage to fit your backup or archive workflows? Backblaze B2 Cloud Storage is simple to use, always active, and workflow friendly.

The post How Backup and Archive Are Different for Professional Media Workflows appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

What’s the Diff: DAM vs. MAM

Post Syndicated from Janet Lafleur original https://www.backblaze.com/blog/whats-the-diff-dam-vs-mam/

What's the Diff DAM vs. MAM

On the surface, outlining the difference between digital asset management (DAM) and media asset management (MAM) might seem like splitting hairs. After all, you’re working with digital media, so what’s the difference between focusing on the “digital” vs. focusing on the “media?”

There are plenty of reasons these two terms are often used interchangeably—both exist to give organizations a central repository of digital assets from video and images to text documents. They both help manage those assets from initial raw source files, to finished production, to archive. And they both make managing and collaborating on those files much simpler for larger teams.

So, what’s the difference? Put it this way: Not all DAM systems are MAM systems, but all MAM systemss are DAM systems.

In essence, MAM is just DAM that offers more capability when it comes to video. While DAM can manage video files, it’s more of a general-purpose tool. There are a lot of nuances that get glossed over in the simplified answer, so it’s worth taking a closer look at the differences between them.

What to Expect From Any Asset Manager

Explaining the difference between a DAM system and a MAM system requires a basic understanding of what an asset manager is, so before we begin, a brief primer. The first thing you need to understand is that any given asset a team might want to work with—a video clip, a document, an image—is usually presented by the asset manager as a single item to the user. Behind the scenes, however, it is composed of three elements:

  • The master source file.
  • A thumbnail or proxy that’s displayed.
  • Metadata about the object itself.

And unlike typical files stored on your own computer, the metadata in asset management files is far more robust than just a simple “date modified” or “file size.” It’s a broader set of attributes, including details about the actual content of the file which we will explain in further detail later on. So, with all of that said, here are the basics of what an asset manager should offer to teams:

  • Collaboration: Members of content creation teams should all have direct access to assets in the asset management system from their own workstations.
  • Access control: Access to specific assets or groups of assets should be allowed or restricted based on the user’s rights and permission settings. These permissions let you isolate certain files for use by a particular department, or allow external clients to view files without making changes.
  • Browse: Assets should be easily identifiable by more than their file name, such as thumbnails or proxies for videos, and browsable in the asset manager’s graphical interface.
  • Metadata search: Assets should be searchable by the attributes used to describe them in the file’s metadata. Metadata assignment capabilities should be flexible and extensible over time.
  • Preview: For larger or archived assets, a preview or quick review capability should be provided, such as playing video proxies or mouse-over zoom for thumbnails.
  • Versions: Based on permissions, team members should be able to add new versions of existing assets or add new assets so that material can be easily repurposed for future projects.

Why Metadata Matters So Much

Metadata matters because it is essentially the biggest difference between organizing content in an asset manager and just chucking it in a folder somewhere. Sure, there are ways to organize files without metadata, but it usually results in letter salad file names like 20190118-gbudman-broll-01-lv-0001.mp4, which strings together a shoot date, subject, camera number, clip number, and who knows what else. Structured file naming might be a “good enough for government work” fix, but it doesn’t scale easily to larger teams of contributors and creators. And metadata is not used only to search for assets, it can be fed into other workflow applications integrated with the asset manager for use there.

If you’re working with images and video (which you probably are if you’re using an asset manager) then metadata is vital. Because unlike text-based documents, images and video can’t be searched for keywords. Metadata can describe in detail what’s in the image or video. In the example below, we see a video of a BMW M635CSi which has been tagged with metadata like “car,” “vehicle,” and “driving” to help it be more easily searchable. If you look further down in the metadata, you’ll see where tags have been added to describe elements at precise moments or ranges of time in the video, known as timecodes. That way, someone searching for a particular moment within the video will be able to hone in on the exact segment they need with a simple search of the asset manager.

iconik MAM
iconik MAM.

Workflow Integration and Archive Support

Whether you’re using a DAM system or a MAM system, the more robust it is in terms of features, the more efficient it is going to make your workflow. These are the features that simplify every step of the process including features for editorial review, automated metadata extraction (e.g., transcription or facial recognition), multilingual support, automated transcode, and much more. This is where different asset management solutions diverge the most and show their customization for a particular type of workflow or industry.

Maybe you need all of these flashy features for your unique set of needs, maybe you don’t. But you should know that over time, any content library is going to grow to the point where at the bare minimum, you’re going to need storage management features, starting with archiving.

Archiving completed projects and assets that are infrequently used can conserve disk space on your server by moving them off to less expensive storage, such as cloud storage or digital tape. Images and video are infamous for hogging storage, a reputation which has only become more pronounced as resolution has increased, making these files balloon in size. Regular archiving can keep costs down and keep you from having to upgrade your expensive storage server every year.

Refresher: What’s the Difference Between Archive and Backup for Media Teams?

Archiving saves space by moving large files out of the asset management system and into a separate archive, but how exactly is that different from the backups you’re already (hopefully) creating? As we’ve outlined before, a backup exists to aid in recovery of files in the event of hardware failure or data corruption, while archiving is a way to better manage file storage and create long-term file retention.

Ideally, you should be doing both, as they serve far different purposes.

While there are a slew of different features that vary between asset managers, integrated automatic archiving might be one of the most important to look for. Asset managers with this feature will let you access these files from the graphical interface just like any other file in its system. After archiving, the thumbnails or proxies of the archived assets continue to appear as before, with a visual indication that they have been archived (like a graphic callout on the thumbnail—think of the notification widget letting you know you have an email). Users can retrieve the asset as before, albeit with some time delay that depends on the archive storage and network connection chosen.

A good asset manager will offer multiple choices for archive storage—from cloud storage, to LTO tape, to inexpensive disk—and from different vendors. An excellent one will let you automatically make multiple copies to different archive storage for added data protection.

Hybrid Cloud Workflows for Media Teams

Obviously, if you’re reading this it’s because you’ve been looking into asset management solutions for a large team, often working remotely. Which means you have a highly complicated workflow that dominates your day-to-day life. Which means you might have questions well outside the scope of what separates DAM from MAM.

You can read up here on the various ways a hybrid cloud workflow might benefit you, regardless of what kind of asset manager you choose.

What Is MAM?

With all of that said, we can now answer the question you came here asking: What is the difference between DAM and MAM?

While they have much in common, the crucial difference is that MAM systems are designed from the ground up for video production. There is some crossover—DAM systems can generally manage video assets, and MAM systems can manage images and documents—but MAM systems offer more tools for video production and are geared towards the particular needs of a video workflow. That means metadata creation and management, application integrations, and workflow orchestration are all video-oriented.

Both, for example, will be able to track a photo or video from the metadata created the moment that content is captured, e.g., data about the camera, the settings, and the few notes the photographer or videographer will add after. But a MAM system will allow you to add more detailed metadata to make that photo or video more easily searchable. Nearly all MAM systems offer some type of manual logging to create timecode-based metadata. MAM systems built for live broadcast events like sports provide shortcut buttons for key events, such as a face-off or slap shot in a hockey game.

More advanced systems offer additional tools for automated metadata extraction. For example, some will use facial recognition to automatically identify actors or public figures.

You can even add metadata that shows where that asset has been used, how many times it has been used, and what sorts of edits have been made to it. There’s no end to what you can describe and categorize with metadata. Defining it for a content library of any reasonable size can be a major undertaking.

MAM Systems Integrate Video Production Applications

Another huge difference between a DAM system and a MAM system, particularly for those working with video, is that a MAM system will integrate tools built specifically for video production. These widely ranging integrated applications include ingest tools, video editing suites, visual effects, graphics tools, transcode, quality assurance, file transport, specific distribution systems, and much more.

Modern MAM solutions integrate cloud storage throughout the workflow, and not just for archive, but also for creating content through proxy editing. Proxy editing gives editors a greater amount of flexibility by letting them work on a lower-resolution copy of the video stored locally. When the final cut is rendered, those edits will be applied to the full-resolution version stored in the cloud.

MAM Systems May Be Tailored for Specific Industry Niches and Workflows

To sum up, the longer explanation for DAM vs. MAM is that MAM focuses on video production, with better MAM systems offering all the integrations needed for complex video workflows. And because specific niches within the industry have wildly different needs and workflows, you’ll find MAM systems that are tailored specifically for sports, film, news, and more. The size of the organization or team matters, too. To stay within budget, a small postproduction house might want to choose a more affordable MAM system that lacks some of the more advanced features they wouldn’t need anyway.

This wide variety of needs is a large part of the reason there are so many MAM systems on the market, and why choosing one can be a daunting task with a long evaluation process. Despite the length of that process, it’s actually fairly common for a group to migrate from one asset manager to another as their needs shift.

Pro tip: Working with a trusted system integrator that serves your industry niche can save you a lot of heartache and money in the long run.

It’s worth noting that, for legacy reasons, sometimes what’s marketed as a DAM system will have all the video capabilities you’d expect from a MAM system. So, don’t let the name throw you off. Whether it’s billed as MAM or DAM, look for a solution that fits your workflow with the features and integrated tools you need today, while also providing the flexibility you need as your business changes in the future.

If you’re interested in learning how you can make your cloud-based workflow more efficient (and you should be) check out our comprehensive e-book outlining how to optimize your workflow.

The post What’s the Diff: DAM vs. MAM appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

AWS Online Tech Talks – June 2018

Post Syndicated from Devin Watson original https://aws.amazon.com/blogs/aws/aws-online-tech-talks-june-2018/

AWS Online Tech Talks – June 2018

Join us this month to learn about AWS services and solutions. New this month, we have a fireside chat with the GM of Amazon WorkSpaces and our 2nd episode of the “How to re:Invent” series. We’ll also cover best practices, deep dives, use cases and more! Join us and register today!

Note – All sessions are free and in Pacific Time.

Tech talks featured this month:

 

Analytics & Big Data

June 18, 2018 | 11:00 AM – 11:45 AM PTGet Started with Real-Time Streaming Data in Under 5 Minutes – Learn how to use Amazon Kinesis to capture, store, and analyze streaming data in real-time including IoT device data, VPC flow logs, and clickstream data.
June 20, 2018 | 11:00 AM – 11:45 AM PT – Insights For Everyone – Deploying Data across your Organization – Learn how to deploy data at scale using AWS Analytics and QuickSight’s new reader role and usage based pricing.

 

AWS re:Invent
June 13, 2018 | 05:00 PM – 05:30 PM PTEpisode 2: AWS re:Invent Breakout Content Secret Sauce – Hear from one of our own AWS content experts as we dive deep into the re:Invent content strategy and how we maintain a high bar.
Compute

June 25, 2018 | 01:00 PM – 01:45 PM PTAccelerating Containerized Workloads with Amazon EC2 Spot Instances – Learn how to efficiently deploy containerized workloads and easily manage clusters at any scale at a fraction of the cost with Spot Instances.

June 26, 2018 | 01:00 PM – 01:45 PM PTEnsuring Your Windows Server Workloads Are Well-Architected – Get the benefits, best practices and tools on running your Microsoft Workloads on AWS leveraging a well-architected approach.

 

Containers
June 25, 2018 | 09:00 AM – 09:45 AM PTRunning Kubernetes on AWS – Learn about the basics of running Kubernetes on AWS including how setup masters, networking, security, and add auto-scaling to your cluster.

 

Databases

June 18, 2018 | 01:00 PM – 01:45 PM PTOracle to Amazon Aurora Migration, Step by Step – Learn how to migrate your Oracle database to Amazon Aurora.
DevOps

June 20, 2018 | 09:00 AM – 09:45 AM PTSet Up a CI/CD Pipeline for Deploying Containers Using the AWS Developer Tools – Learn how to set up a CI/CD pipeline for deploying containers using the AWS Developer Tools.

 

Enterprise & Hybrid
June 18, 2018 | 09:00 AM – 09:45 AM PTDe-risking Enterprise Migration with AWS Managed Services – Learn how enterprise customers are de-risking cloud adoption with AWS Managed Services.

June 19, 2018 | 11:00 AM – 11:45 AM PTLaunch AWS Faster using Automated Landing Zones – Learn how the AWS Landing Zone can automate the set up of best practice baselines when setting up new

 

AWS Environments

June 21, 2018 | 11:00 AM – 11:45 AM PTLeading Your Team Through a Cloud Transformation – Learn how you can help lead your organization through a cloud transformation.

June 21, 2018 | 01:00 PM – 01:45 PM PTEnabling New Retail Customer Experiences with Big Data – Learn how AWS can help retailers realize actual value from their big data and deliver on differentiated retail customer experiences.

June 28, 2018 | 01:00 PM – 01:45 PM PTFireside Chat: End User Collaboration on AWS – Learn how End User Compute services can help you deliver access to desktops and applications anywhere, anytime, using any device.
IoT

June 27, 2018 | 11:00 AM – 11:45 AM PTAWS IoT in the Connected Home – Learn how to use AWS IoT to build innovative Connected Home products.

 

Machine Learning

June 19, 2018 | 09:00 AM – 09:45 AM PTIntegrating Amazon SageMaker into your Enterprise – Learn how to integrate Amazon SageMaker and other AWS Services within an Enterprise environment.

June 21, 2018 | 09:00 AM – 09:45 AM PTBuilding Text Analytics Applications on AWS using Amazon Comprehend – Learn how you can unlock the value of your unstructured data with NLP-based text analytics.

 

Management Tools

June 20, 2018 | 01:00 PM – 01:45 PM PTOptimizing Application Performance and Costs with Auto Scaling – Learn how selecting the right scaling option can help optimize application performance and costs.

 

Mobile
June 25, 2018 | 11:00 AM – 11:45 AM PTDrive User Engagement with Amazon Pinpoint – Learn how Amazon Pinpoint simplifies and streamlines effective user engagement.

 

Security, Identity & Compliance

June 26, 2018 | 09:00 AM – 09:45 AM PTUnderstanding AWS Secrets Manager – Learn how AWS Secrets Manager helps you rotate and manage access to secrets centrally.
June 28, 2018 | 09:00 AM – 09:45 AM PTUsing Amazon Inspector to Discover Potential Security Issues – See how Amazon Inspector can be used to discover security issues of your instances.

 

Serverless

June 19, 2018 | 01:00 PM – 01:45 PM PTProductionize Serverless Application Building and Deployments with AWS SAM – Learn expert tips and techniques for building and deploying serverless applications at scale with AWS SAM.

 

Storage

June 26, 2018 | 11:00 AM – 11:45 AM PTDeep Dive: Hybrid Cloud Storage with AWS Storage Gateway – Learn how you can reduce your on-premises infrastructure by using the AWS Storage Gateway to connecting your applications to the scalable and reliable AWS storage services.
June 27, 2018 | 01:00 PM – 01:45 PM PTChanging the Game: Extending Compute Capabilities to the Edge – Discover how to change the game for IIoT and edge analytics applications with AWS Snowball Edge plus enhanced Compute instances.
June 28, 2018 | 11:00 AM – 11:45 AM PTBig Data and Analytics Workloads on Amazon EFS – Get best practices and deployment advice for running big data and analytics workloads on Amazon EFS.

Build your own weather station with our new guide!

Post Syndicated from Richard Hayler original https://www.raspberrypi.org/blog/build-your-own-weather-station/

One of the most common enquiries I receive at Pi Towers is “How can I get my hands on a Raspberry Pi Oracle Weather Station?” Now the answer is: “Why not build your own version using our guide?”

Build Your Own weather station kit assembled

Tadaaaa! The BYO weather station fully assembled.

Our Oracle Weather Station

In 2016 we sent out nearly 1000 Raspberry Pi Oracle Weather Station kits to schools from around the world who had applied to be part of our weather station programme. In the original kit was a special HAT that allows the Pi to collect weather data with a set of sensors.

The original Raspberry Pi Oracle Weather Station HAT – Build Your Own Raspberry Pi weather station

The original Raspberry Pi Oracle Weather Station HAT

We designed the HAT to enable students to create their own weather stations and mount them at their schools. As part of the programme, we also provide an ever-growing range of supporting resources. We’ve seen Oracle Weather Stations in great locations with a huge differences in climate, and they’ve even recorded the effects of a solar eclipse.

Our new BYO weather station guide

We only had a single batch of HATs made, and unfortunately we’ve given nearly* all the Weather Station kits away. Not only are the kits really popular, we also receive lots of questions about how to add extra sensors or how to take more precise measurements of a particular weather phenomenon. So today, to satisfy your demand for a hackable weather station, we’re launching our Build your own weather station guide!

Build Your Own Raspberry Pi weather station

Fun with meteorological experiments!

Our guide suggests the use of many of the sensors from the Oracle Weather Station kit, so can build a station that’s as close as possible to the original. As you know, the Raspberry Pi is incredibly versatile, and we’ve made it easy to hack the design in case you want to use different sensors.

Many other tutorials for Pi-powered weather stations don’t explain how the various sensors work or how to store your data. Ours goes into more detail. It shows you how to put together a breadboard prototype, it describes how to write Python code to take readings in different ways, and it guides you through recording these readings in a database.

Build Your Own Raspberry Pi weather station on a breadboard

There’s also a section on how to make your station weatherproof. And in case you want to move past the breadboard stage, we also help you with that. The guide shows you how to solder together all the components, similar to the original Oracle Weather Station HAT.

Who should try this build

We think this is a great project to tackle at home, at a STEM club, Scout group, or CoderDojo, and we’re sure that many of you will be chomping at the bit to get started. Before you do, please note that we’ve designed the build to be as straight-forward as possible, but it’s still fairly advanced both in terms of electronics and programming. You should read through the whole guide before purchasing any components.

Build Your Own Raspberry Pi weather station – components

The sensors and components we’re suggesting balance cost, accuracy, and easy of use. Depending on what you want to use your station for, you may wish to use different components. Similarly, the final soldered design in the guide may not be the most elegant, but we think it is achievable for someone with modest soldering experience and basic equipment.

You can build a functioning weather station without soldering with our guide, but the build will be more durable if you do solder it. If you’ve never tried soldering before, that’s OK: we have a Getting started with soldering resource plus video tutorial that will walk you through how it works step by step.

Prototyping HAT for Raspberry Pi weather station sensors

For those of you who are more experienced makers, there are plenty of different ways to put the final build together. We always like to hear about alternative builds, so please post your designs in the Weather Station forum.

Our plans for the guide

Our next step is publishing supplementary guides for adding extra functionality to your weather station. We’d love to hear which enhancements you would most like to see! Our current ideas under development include adding a webcam, making a tweeting weather station, adding a light/UV meter, and incorporating a lightning sensor. Let us know which of these is your favourite, or suggest your own amazing ideas in the comments!

*We do have a very small number of kits reserved for interesting projects or locations: a particularly cool experiment, a novel idea for how the Oracle Weather Station could be used, or places with specific weather phenomena. If have such a project in mind, please send a brief outline to [email protected], and we’ll consider how we might be able to help you.

The post Build your own weather station with our new guide! appeared first on Raspberry Pi.

Amazon SageMaker Updates – Tokyo Region, CloudFormation, Chainer, and GreenGrass ML

Post Syndicated from Randall Hunt original https://aws.amazon.com/blogs/aws/sagemaker-tokyo-summit-2018/

Today, at the AWS Summit in Tokyo we announced a number of updates and new features for Amazon SageMaker. Starting today, SageMaker is available in Asia Pacific (Tokyo)! SageMaker also now supports CloudFormation. A new machine learning framework, Chainer, is now available in the SageMaker Python SDK, in addition to MXNet and Tensorflow. Finally, support for running Chainer models on several devices was added to AWS Greengrass Machine Learning.

Amazon SageMaker Chainer Estimator


Chainer is a popular, flexible, and intuitive deep learning framework. Chainer networks work on a “Define-by-Run” scheme, where the network topology is defined dynamically via forward computation. This is in contrast to many other frameworks which work on a “Define-and-Run” scheme where the topology of the network is defined separately from the data. A lot of developers enjoy the Chainer scheme since it allows them to write their networks with native python constructs and tools.

Luckily, using Chainer with SageMaker is just as easy as using a TensorFlow or MXNet estimator. In fact, it might even be a bit easier since it’s likely you can take your existing scripts and use them to train on SageMaker with very few modifications. With TensorFlow or MXNet users have to implement a train function with a particular signature. With Chainer your scripts can be a little bit more portable as you can simply read from a few environment variables like SM_MODEL_DIR, SM_NUM_GPUS, and others. We can wrap our existing script in a if __name__ == '__main__': guard and invoke it locally or on sagemaker.


import argparse
import os

if __name__ =='__main__':

    parser = argparse.ArgumentParser()

    # hyperparameters sent by the client are passed as command-line arguments to the script.
    parser.add_argument('--epochs', type=int, default=10)
    parser.add_argument('--batch-size', type=int, default=64)
    parser.add_argument('--learning-rate', type=float, default=0.05)

    # Data, model, and output directories
    parser.add_argument('--output-data-dir', type=str, default=os.environ['SM_OUTPUT_DATA_DIR'])
    parser.add_argument('--model-dir', type=str, default=os.environ['SM_MODEL_DIR'])
    parser.add_argument('--train', type=str, default=os.environ['SM_CHANNEL_TRAIN'])
    parser.add_argument('--test', type=str, default=os.environ['SM_CHANNEL_TEST'])

    args, _ = parser.parse_known_args()

    # ... load from args.train and args.test, train a model, write model to args.model_dir.

Then, we can run that script locally or use the SageMaker Python SDK to launch it on some GPU instances in SageMaker. The hyperparameters will get passed in to the script as CLI commands and the environment variables above will be autopopulated. When we call fit the input channels we pass will be populated in the SM_CHANNEL_* environment variables.


from sagemaker.chainer.estimator import Chainer
# Create my estimator
chainer_estimator = Chainer(
    entry_point='example.py',
    train_instance_count=1,
    train_instance_type='ml.p3.2xlarge',
    hyperparameters={'epochs': 10, 'batch-size': 64}
)
# Train my estimator
chainer_estimator.fit({'train': train_input, 'test': test_input})

# Deploy my estimator to a SageMaker Endpoint and get a Predictor
predictor = chainer_estimator.deploy(
    instance_type="ml.m4.xlarge",
    initial_instance_count=1
)

Now, instead of bringing your own docker container for training and hosting with Chainer, you can just maintain your script. You can see the full sagemaker-chainer-containers on github. One of my favorite features of the new container is built-in chainermn for easy multi-node distribution of your chainer training jobs.

There’s a lot more documentation and information available in both the README and the example notebooks.

AWS GreenGrass ML with Chainer

AWS GreenGrass ML now includes a pre-built Chainer package for all devices powered by Intel Atom, NVIDIA Jetson, TX2, and Raspberry Pi. So, now GreenGrass ML provides pre-built packages for TensorFlow, Apache MXNet, and Chainer! You can train your models on SageMaker then easily deploy it to any GreenGrass-enabled device using GreenGrass ML.

JAWS UG

I want to give a quick shout out to all of our wonderful and inspirational friends in the JAWS UG who attended the AWS Summit in Tokyo today. I’ve very much enjoyed seeing your pictures of the summit. Thanks for making Japan an amazing place for AWS developers! I can’t wait to visit again and meet with all of you.

Randall

New – Pay-per-Session Pricing for Amazon QuickSight, Another Region, and Lots More

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-pay-per-session-pricing-for-amazon-quicksight-another-region-and-lots-more/

Amazon QuickSight is a fully managed cloud business intelligence system that gives you Fast & Easy to Use Business Analytics for Big Data. QuickSight makes business analytics available to organizations of all shapes and sizes, with the ability to access data that is stored in your Amazon Redshift data warehouse, your Amazon Relational Database Service (RDS) relational databases, flat files in S3, and (via connectors) data stored in on-premises MySQL, PostgreSQL, and SQL Server databases. QuickSight scales to accommodate tens, hundreds, or thousands of users per organization.

Today we are launching a new, session-based pricing option for QuickSight, along with additional region support and other important new features. Let’s take a look at each one:

Pay-per-Session Pricing
Our customers are making great use of QuickSight and take full advantage of the power it gives them to connect to data sources, create reports, and and explore visualizations.

However, not everyone in an organization needs or wants such powerful authoring capabilities. Having access to curated data in dashboards and being able to interact with the data by drilling down, filtering, or slicing-and-dicing is more than adequate for their needs. Subscribing them to a monthly or annual plan can be seen as an unwarranted expense, so a lot of such casual users end up not having access to interactive data or BI.

In order to allow customers to provide all of their users with interactive dashboards and reports, the Enterprise Edition of Amazon QuickSight now allows Reader access to dashboards on a Pay-per-Session basis. QuickSight users are now classified as Admins, Authors, or Readers, with distinct capabilities and prices:

Authors have access to the full power of QuickSight; they can establish database connections, upload new data, create ad hoc visualizations, and publish dashboards, all for $9 per month (Standard Edition) or $18 per month (Enterprise Edition).

Readers can view dashboards, slice and dice data using drill downs, filters and on-screen controls, and download data in CSV format, all within the secure QuickSight environment. Readers pay $0.30 for 30 minutes of access, with a monthly maximum of $5 per reader.

Admins have all authoring capabilities, and can manage users and purchase SPICE capacity in the account. The QuickSight admin now has the ability to set the desired option (Author or Reader) when they invite members of their organization to use QuickSight. They can extend Reader invites to their entire user base without incurring any up-front or monthly costs, paying only for the actual usage.

To learn more, visit the QuickSight Pricing page.

A New Region
QuickSight is now available in the Asia Pacific (Tokyo) Region:

The UI is in English, with a localized version in the works.

Hourly Data Refresh
Enterprise Edition SPICE data sets can now be set to refresh as frequently as every hour. In the past, each data set could be refreshed up to 5 times a day. To learn more, read Refreshing Imported Data.

Access to Data in Private VPCs
This feature was launched in preview form late last year, and is now available in production form to users of the Enterprise Edition. As I noted at the time, you can use it to implement secure, private communication with data sources that do not have public connectivity, including on-premises data in Teradata or SQL Server, accessed over an AWS Direct Connect link. To learn more, read Working with AWS VPC.

Parameters with On-Screen Controls
QuickSight dashboards can now include parameters that are set using on-screen dropdown, text box, numeric slider or date picker controls. The default value for each parameter can be set based on the user name (QuickSight calls this a dynamic default). You could, for example, set an appropriate default based on each user’s office location, department, or sales territory. Here’s an example:

To learn more, read about Parameters in QuickSight.

URL Actions for Linked Dashboards
You can now connect your QuickSight dashboards to external applications by defining URL actions on visuals. The actions can include parameters, and become available in the Details menu for the visual. URL actions are defined like this:

You can use this feature to link QuickSight dashboards to third party applications (e.g. Salesforce) or to your own internal applications. Read Custom URL Actions to learn how to use this feature.

Dashboard Sharing
You can now share QuickSight dashboards across every user in an account.

Larger SPICE Tables
The per-data set limit for SPICE tables has been raised from 10 GB to 25 GB.

Upgrade to Enterprise Edition
The QuickSight administrator can now upgrade an account from Standard Edition to Enterprise Edition with a click. This enables provisioning of Readers with pay-per-session pricing, private VPC access, row-level security for dashboards and data sets, and hourly refresh of data sets. Enterprise Edition pricing applies after the upgrade.

Available Now
Everything I listed above is available now and you can start using it today!

You can try QuickSight for 60 days at no charge, and you can also attend our June 20th Webinar.

Jeff;

 

Amazon Neptune Generally Available

Post Syndicated from Randall Hunt original https://aws.amazon.com/blogs/aws/amazon-neptune-generally-available/

Amazon Neptune is now Generally Available in US East (N. Virginia), US East (Ohio), US West (Oregon), and EU (Ireland). Amazon Neptune is a fast, reliable, fully-managed graph database service that makes it easy to build and run applications that work with highly connected datasets. At the core of Neptune is a purpose-built, high-performance graph database engine optimized for storing billions of relationships and querying the graph with millisecond latencies. Neptune supports two popular graph models, Property Graph and RDF, through Apache TinkerPop Gremlin and SPARQL, allowing you to easily build queries that efficiently navigate highly connected datasets. Neptune can be used to power everything from recommendation engines and knowledge graphs to drug discovery and network security. Neptune is fully-managed with automatic minor version upgrades, backups, encryption, and fail-over. I wrote about Neptune in detail for AWS re:Invent last year and customers have been using the preview and providing great feedback that the team has used to prepare the service for GA.

Now that Amazon Neptune is generally available there are a few changes from the preview:

Launching an Amazon Neptune Cluster

Launching a Neptune cluster is as easy as navigating to the AWS Management Console and clicking create cluster. Of course you can also launch with CloudFormation, the CLI, or the SDKs.

You can monitor your cluster health and the health of individual instances through Amazon CloudWatch and the console.

Additional Resources

We’ve created two repos with some additional tools and examples here. You can expect continuous development on these repos as we add additional tools and examples.

  • Amazon Neptune Tools Repo
    This repo has a useful tool for converting GraphML files into Neptune compatible CSVs for bulk loading from S3.
  • Amazon Neptune Samples Repo
    This repo has a really cool example of building a collaborative filtering recommendation engine for video game preferences.

Purpose Built Databases

There’s an industry trend where we’re moving more and more onto purpose-built databases. Developers and businesses want to access their data in the format that makes the most sense for their applications. As cloud resources make transforming large datasets easier with tools like AWS Glue, we have a lot more options than we used to for accessing our data. With tools like Amazon Redshift, Amazon Athena, Amazon Aurora, Amazon DynamoDB, and more we get to choose the best database for the job or even enable entirely new use-cases. Amazon Neptune is perfect for workloads where the data is highly connected across data rich edges.

I’m really excited about graph databases and I see a huge number of applications. Looking for ideas of cool things to build? I’d love to build a web crawler in AWS Lambda that uses Neptune as the backing store. You could further enrich it by running Amazon Comprehend or Amazon Rekognition on the text and images found and creating a search engine on top of Neptune.

As always, feel free to reach out in the comments or on twitter to provide any feedback!

Randall

Monitoring your Amazon SNS message filtering activity with Amazon CloudWatch

Post Syndicated from Rachel Richardson original https://aws.amazon.com/blogs/compute/monitoring-your-amazon-sns-message-filtering-activity-with-amazon-cloudwatch/

This post is courtesy of Otavio Ferreira, Manager, Amazon SNS, AWS Messaging.

Amazon SNS message filtering provides a set of string and numeric matching operators that allow each subscription to receive only the messages of interest. Hence, SNS message filtering can simplify your pub/sub messaging architecture by offloading the message filtering logic from your subscriber systems, as well as the message routing logic from your publisher systems.

After you set the subscription attribute that defines a filter policy, the subscribing endpoint receives only the messages that carry attributes matching this filter policy. Other messages published to the topic are filtered out for this subscription. In this way, the native integration between SNS and Amazon CloudWatch provides visibility into the number of messages delivered, as well as the number of messages filtered out.

CloudWatch metrics are captured automatically for you. To get started with SNS message filtering, see Filtering Messages with Amazon SNS.

Message Filtering Metrics

The following six CloudWatch metrics are relevant to understanding your SNS message filtering activity:

  • NumberOfMessagesPublished – Inbound traffic to SNS. This metric tracks all the messages that have been published to the topic.
  • NumberOfNotificationsDelivered – Outbound traffic from SNS. This metric tracks all the messages that have been successfully delivered to endpoints subscribed to the topic. A delivery takes place either when the incoming message attributes match a subscription filter policy, or when the subscription has no filter policy at all, which results in a catch-all behavior.
  • NumberOfNotificationsFilteredOut – This metric tracks all the messages that were filtered out because they carried attributes that didn’t match the subscription filter policy.
  • NumberOfNotificationsFilteredOut-NoMessageAttributes – This metric tracks all the messages that were filtered out because they didn’t carry any attributes at all and, consequently, didn’t match the subscription filter policy.
  • NumberOfNotificationsFilteredOut-InvalidAttributes – This metric keeps track of messages that were filtered out because they carried invalid or malformed attributes and, thus, didn’t match the subscription filter policy.
  • NumberOfNotificationsFailed – This last metric tracks all the messages that failed to be delivered to subscribing endpoints, regardless of whether a filter policy had been set for the endpoint. This metric is emitted after the message delivery retry policy is exhausted, and SNS stops attempting to deliver the message. At that moment, the subscribing endpoint is likely no longer reachable. For example, the subscribing SQS queue or Lambda function has been deleted by its owner. You may want to closely monitor this metric to address message delivery issues quickly.

Message filtering graphs

Through the AWS Management Console, you can compose graphs to display your SNS message filtering activity. The graph shows the number of messages published, delivered, and filtered out within the timeframe you specify (1h, 3h, 12h, 1d, 3d, 1w, or custom).

SNS message filtering for CloudWatch Metrics

To compose an SNS message filtering graph with CloudWatch:

  1. Open the CloudWatch console.
  2. Choose Metrics, SNS, All Metrics, and Topic Metrics.
  3. Select all metrics to add to the graph, such as:
    • NumberOfMessagesPublished
    • NumberOfNotificationsDelivered
    • NumberOfNotificationsFilteredOut
  4. Choose Graphed metrics.
  5. In the Statistic column, switch from Average to Sum.
  6. Title your graph with a descriptive name, such as “SNS Message Filtering”

After you have your graph set up, you may want to copy the graph link for bookmarking, emailing, or sharing with co-workers. You may also want to add your graph to a CloudWatch dashboard for easy access in the future. Both actions are available to you on the Actions menu, which is found above the graph.

Summary

SNS message filtering defines how SNS topics behave in terms of message delivery. By using CloudWatch metrics, you gain visibility into the number of messages published, delivered, and filtered out. This enables you to validate the operation of filter policies and more easily troubleshoot during development phases.

SNS message filtering can be implemented easily with existing AWS SDKs by applying message and subscription attributes across all SNS supported protocols (Amazon SQS, AWS Lambda, HTTP, SMS, email, and mobile push). CloudWatch metrics for SNS message filtering is available now, in all AWS Regions.

For information about pricing, see the CloudWatch pricing page.

For more information, see:

A set of Git security releases

Post Syndicated from corbet original https://lwn.net/Articles/755935/rss

Git versions v2.17.1, v2.13.7, v2.14.4, v2.15.2 and v2.16.4 have all been
released with fixes to a couple of security issues. The nastier of the two
(CVE-2018-11235) enables arbitrary code execution controlled by a hostile
repository. See this
Microsoft blog entry
for more details — after updating.

Measuring the throughput for Amazon MQ using the JMS Benchmark

Post Syndicated from Rachel Richardson original https://aws.amazon.com/blogs/compute/measuring-the-throughput-for-amazon-mq-using-the-jms-benchmark/

This post is courtesy of Alan Protasio, Software Development Engineer, Amazon Web Services

Just like compute and storage, messaging is a fundamental building block of enterprise applications. Message brokers (aka “message-oriented middleware”) enable different software systems, often written in different languages, on different platforms, running in different locations, to communicate and exchange information. Mission-critical applications, such as CRM and ERP, rely on message brokers to work.

A common performance consideration for customers deploying a message broker in a production environment is the throughput of the system, measured as messages per second. This is important to know so that application environments (hosts, threads, memory, etc.) can be configured correctly.

In this post, we demonstrate how to measure the throughput for Amazon MQ, a new managed message broker service for ActiveMQ, using JMS Benchmark. It should take between 15–20 minutes to set up the environment and an hour to run the benchmark. We also provide some tips on how to configure Amazon MQ for optimal throughput.

Benchmarking throughput for Amazon MQ

ActiveMQ can be used for a number of use cases. These use cases can range from simple fire and forget tasks (that is, asynchronous processing), low-latency request-reply patterns, to buffering requests before they are persisted to a database.

The throughput of Amazon MQ is largely dependent on the use case. For example, if you have non-critical workloads such as gathering click events for a non-business-critical portal, you can use ActiveMQ in a non-persistent mode and get extremely high throughput with Amazon MQ.

On the flip side, if you have a critical workload where durability is extremely important (meaning that you can’t lose a message), then you are bound by the I/O capacity of your underlying persistence store. We recommend using mq.m4.large for the best results. The mq.t2.micro instance type is intended for product evaluation. Performance is limited, due to the lower memory and burstable CPU performance.

Tip: To improve your throughput with Amazon MQ, make sure that you have consumers processing messaging as fast as (or faster than) your producers are pushing messages.

Because it’s impossible to talk about how the broker (ActiveMQ) behaves for each and every use case, we walk through how to set up your own benchmark for Amazon MQ using our favorite open-source benchmarking tool: JMS Benchmark. We are fans of the JMS Benchmark suite because it’s easy to set up and deploy, and comes with a built-in visualizer of the results.

Non-Persistent Scenarios – Queue latency as you scale producer throughput

JMS Benchmark nonpersistent scenarios

Getting started

At the time of publication, you can create an mq.m4.large single-instance broker for testing for $0.30 per hour (US pricing).

This walkthrough covers the following tasks:

  1.  Create and configure the broker.
  2. Create an EC2 instance to run your benchmark
  3. Configure the security groups
  4.  Run the benchmark.

Step 1 – Create and configure the broker
Create and configure the broker using Tutorial: Creating and Configuring an Amazon MQ Broker.

Step 2 – Create an EC2 instance to run your benchmark
Launch the EC2 instance using Step 1: Launch an Instance. We recommend choosing the m5.large instance type.

Step 3 – Configure the security groups
Make sure that all the security groups are correctly configured to let the traffic flow between the EC2 instance and your broker.

  1. Sign in to the Amazon MQ console.
  2. From the broker list, choose the name of your broker (for example, MyBroker)
  3. In the Details section, under Security and network, choose the name of your security group or choose the expand icon ( ).
  4. From the security group list, choose your security group.
  5. At the bottom of the page, choose Inbound, Edit.
  6. In the Edit inbound rules dialog box, add a role to allow traffic between your instance and the broker:
    • Choose Add Rule.
    • For Type, choose Custom TCP.
    • For Port Range, type the ActiveMQ SSL port (61617).
    • For Source, leave Custom selected and then type the security group of your EC2 instance.
    • Choose Save.

Your broker can now accept the connection from your EC2 instance.

Step 4 – Run the benchmark
Connect to your EC2 instance using SSH and run the following commands:

$ cd ~
$ curl -L https://github.com/alanprot/jms-benchmark/archive/master.zip -o master.zip
$ unzip master.zip
$ cd jms-benchmark-master
$ chmod a+x bin/*
$ env \
  SERVER_SETUP=false \
  SERVER_ADDRESS={activemq-endpoint} \
  ACTIVEMQ_TRANSPORT=ssl\
  ACTIVEMQ_PORT=61617 \
  ACTIVEMQ_USERNAME={activemq-user} \
  ACTIVEMQ_PASSWORD={activemq-password} \
  ./bin/benchmark-activemq

After the benchmark finishes, you can find the results in the ~/reports directory. As you may notice, the performance of ActiveMQ varies based on the number of consumers, producers, destinations, and message size.

Amazon MQ architecture

The last bit that’s important to know so that you can better understand the results of the benchmark is how Amazon MQ is architected.

Amazon MQ is architected to be highly available (HA) and durable. For HA, we recommend using the multi-AZ option. After a message is sent to Amazon MQ in persistent mode, the message is written to the highly durable message store that replicates the data across multiple nodes in multiple Availability Zones. Because of this replication, for some use cases you may see a reduction in throughput as you migrate to Amazon MQ. Customers have told us they appreciate the benefits of message replication as it helps protect durability even in the face of the loss of an Availability Zone.

Conclusion

We hope this gives you an idea of how Amazon MQ performs. We encourage you to run tests to simulate your own use cases.

To learn more, see the Amazon MQ website. You can try Amazon MQ for free with the AWS Free Tier, which includes up to 750 hours of a single-instance mq.t2.micro broker and up to 1 GB of storage per month for one year.

Protecting your API using Amazon API Gateway and AWS WAF — Part I

Post Syndicated from Chris Munns original https://aws.amazon.com/blogs/compute/protecting-your-api-using-amazon-api-gateway-and-aws-waf-part-i/

This post courtesy of Thiago Morais, AWS Solutions Architect

When you build web applications or expose any data externally, you probably look for a platform where you can build highly scalable, secure, and robust REST APIs. As APIs are publicly exposed, there are a number of best practices for providing a secure mechanism to consumers using your API.

Amazon API Gateway handles all the tasks involved in accepting and processing up to hundreds of thousands of concurrent API calls, including traffic management, authorization and access control, monitoring, and API version management.

In this post, I show you how to take advantage of the regional API endpoint feature in API Gateway, so that you can create your own Amazon CloudFront distribution and secure your API using AWS WAF.

AWS WAF is a web application firewall that helps protect your web applications from common web exploits that could affect application availability, compromise security, or consume excessive resources.

As you make your APIs publicly available, you are exposed to attackers trying to exploit your services in several ways. The AWS security team published a whitepaper solution using AWS WAF, How to Mitigate OWASP’s Top 10 Web Application Vulnerabilities.

Regional API endpoints

Edge-optimized APIs are endpoints that are accessed through a CloudFront distribution created and managed by API Gateway. Before the launch of regional API endpoints, this was the default option when creating APIs using API Gateway. It primarily helped to reduce latency for API consumers that were located in different geographical locations than your API.

When API requests predominantly originate from an Amazon EC2 instance or other services within the same AWS Region as the API is deployed, a regional API endpoint typically lowers the latency of connections. It is recommended for such scenarios.

For better control around caching strategies, customers can use their own CloudFront distribution for regional APIs. They also have the ability to use AWS WAF protection, as I describe in this post.

Edge-optimized API endpoint

The following diagram is an illustrated example of the edge-optimized API endpoint where your API clients access your API through a CloudFront distribution created and managed by API Gateway.

Regional API endpoint

For the regional API endpoint, your customers access your API from the same Region in which your REST API is deployed. This helps you to reduce request latency and particularly allows you to add your own content delivery network, as needed.

Walkthrough

In this section, you implement the following steps:

  • Create a regional API using the PetStore sample API.
  • Create a CloudFront distribution for the API.
  • Test the CloudFront distribution.
  • Set up AWS WAF and create a web ACL.
  • Attach the web ACL to the CloudFront distribution.
  • Test AWS WAF protection.

Create the regional API

For this walkthrough, use an existing PetStore API. All new APIs launch by default as the regional endpoint type. To change the endpoint type for your existing API, choose the cog icon on the top right corner:

After you have created the PetStore API on your account, deploy a stage called “prod” for the PetStore API.

On the API Gateway console, select the PetStore API and choose Actions, Deploy API.

For Stage name, type prod and add a stage description.

Choose Deploy and the new API stage is created.

Use the following AWS CLI command to update your API from edge-optimized to regional:

aws apigateway update-rest-api \
--rest-api-id {rest-api-id} \
--patch-operations op=replace,path=/endpointConfiguration/types/EDGE,value=REGIONAL

A successful response looks like the following:

{
    "description": "Your first API with Amazon API Gateway. This is a sample API that integrates via HTTP with your demo Pet Store endpoints", 
    "createdDate": 1511525626, 
    "endpointConfiguration": {
        "types": [
            "REGIONAL"
        ]
    }, 
    "id": "{api-id}", 
    "name": "PetStore"
}

After you change your API endpoint to regional, you can now assign your own CloudFront distribution to this API.

Create a CloudFront distribution

To make things easier, I have provided an AWS CloudFormation template to deploy a CloudFront distribution pointing to the API that you just created. Click the button to deploy the template in the us-east-1 Region.

For Stack name, enter RegionalAPI. For APIGWEndpoint, enter your API FQDN in the following format:

{api-id}.execute-api.us-east-1.amazonaws.com

After you fill out the parameters, choose Next to continue the stack deployment. It takes a couple of minutes to finish the deployment. After it finishes, the Output tab lists the following items:

  • A CloudFront domain URL
  • An S3 bucket for CloudFront access logs
Output from CloudFormation

Output from CloudFormation

Test the CloudFront distribution

To see if the CloudFront distribution was configured correctly, use a web browser and enter the URL from your distribution, with the following parameters:

https://{your-distribution-url}.cloudfront.net/{api-stage}/pets

You should get the following output:

[
  {
    "id": 1,
    "type": "dog",
    "price": 249.99
  },
  {
    "id": 2,
    "type": "cat",
    "price": 124.99
  },
  {
    "id": 3,
    "type": "fish",
    "price": 0.99
  }
]

Set up AWS WAF and create a web ACL

With the new CloudFront distribution in place, you can now start setting up AWS WAF to protect your API.

For this demo, you deploy the AWS WAF Security Automations solution, which provides fine-grained control over the requests attempting to access your API.

For more information about deployment, see Automated Deployment. If you prefer, you can launch the solution directly into your account using the following button.

For CloudFront Access Log Bucket Name, add the name of the bucket created during the deployment of the CloudFormation stack for your CloudFront distribution.

The solution allows you to adjust thresholds and also choose which automations to enable to protect your API. After you finish configuring these settings, choose Next.

To start the deployment process in your account, follow the creation wizard and choose Create. It takes a few minutes do finish the deployment. You can follow the creation process through the CloudFormation console.

After the deployment finishes, you can see the new web ACL deployed on the AWS WAF console, AWSWAFSecurityAutomations.

Attach the AWS WAF web ACL to the CloudFront distribution

With the solution deployed, you can now attach the AWS WAF web ACL to the CloudFront distribution that you created earlier.

To assign the newly created AWS WAF web ACL, go back to your CloudFront distribution. After you open your distribution for editing, choose General, Edit.

Select the new AWS WAF web ACL that you created earlier, AWSWAFSecurityAutomations.

Save the changes to your CloudFront distribution and wait for the deployment to finish.

Test AWS WAF protection

To validate the AWS WAF Web ACL setup, use Artillery to load test your API and see AWS WAF in action.

To install Artillery on your machine, run the following command:

$ npm install -g artillery

After the installation completes, you can check if Artillery installed successfully by running the following command:

$ artillery -V
$ 1.6.0-12

As the time of publication, Artillery is on version 1.6.0-12.

One of the WAF web ACL rules that you have set up is a rate-based rule. By default, it is set up to block any requesters that exceed 2000 requests under 5 minutes. Try this out.

First, use cURL to query your distribution and see the API output:

$ curl -s https://{distribution-name}.cloudfront.net/prod/pets
[
  {
    "id": 1,
    "type": "dog",
    "price": 249.99
  },
  {
    "id": 2,
    "type": "cat",
    "price": 124.99
  },
  {
    "id": 3,
    "type": "fish",
    "price": 0.99
  }
]

Based on the test above, the result looks good. But what if you max out the 2000 requests in under 5 minutes?

Run the following Artillery command:

artillery quick -n 2000 --count 10  https://{distribution-name}.cloudfront.net/prod/pets

What you are doing is firing 2000 requests to your API from 10 concurrent users. For brevity, I am not posting the Artillery output here.

After Artillery finishes its execution, try to run the cURL request again and see what happens:

 

$ curl -s https://{distribution-name}.cloudfront.net/prod/pets

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<HTML><HEAD><META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<TITLE>ERROR: The request could not be satisfied</TITLE>
</HEAD><BODY>
<H1>ERROR</H1>
<H2>The request could not be satisfied.</H2>
<HR noshade size="1px">
Request blocked.
<BR clear="all">
<HR noshade size="1px">
<PRE>
Generated by cloudfront (CloudFront)
Request ID: [removed]
</PRE>
<ADDRESS>
</ADDRESS>
</BODY></HTML>

As you can see from the output above, the request was blocked by AWS WAF. Your IP address is removed from the blocked list after it falls below the request limit rate.

Conclusion

In this first part, you saw how to use the new API Gateway regional API endpoint together with Amazon CloudFront and AWS WAF to secure your API from a series of attacks.

In the second part, I will demonstrate some other techniques to protect your API using API keys and Amazon CloudFront custom headers.

Use Slack ChatOps to Deploy Your Code – How to Integrate Your Pipeline in AWS CodePipeline with Your Slack Channel

Post Syndicated from Rumi Olsen original https://aws.amazon.com/blogs/devops/use-slack-chatops-to-deploy-your-code-how-to-integrate-your-pipeline-in-aws-codepipeline-with-your-slack-channel/

Slack is widely used by DevOps and development teams to communicate status. Typically, when a build has been tested and is ready to be promoted to a staging environment, a QA engineer or DevOps engineer kicks off the deployment. Using Slack in a ChatOps collaboration model, the promotion can be done in a single click from a Slack channel. And because the promotion happens through a Slack channel, the whole development team knows what’s happening without checking email.

In this blog post, I will show you how to integrate AWS services with a Slack application. I use an interactive message button and incoming webhook to promote a stage with a single click.

To follow along with the steps in this post, you’ll need a pipeline in AWS CodePipeline. If you don’t have a pipeline, the fastest way to create one for this use case is to use AWS CodeStar. Go to the AWS CodeStar console and select the Static Website template (shown in the screenshot). AWS CodeStar will create a pipeline with an AWS CodeCommit repository and an AWS CodeDeploy deployment for you. After the pipeline is created, you will need to add a manual approval stage.

You’ll also need to build a Slack app with webhooks and interactive components, write two Lambda functions, and create an API Gateway API and a SNS topic.

As you’ll see in the following diagram, when I make a change and merge a new feature into the master branch in AWS CodeCommit, the check-in kicks off my CI/CD pipeline in AWS CodePipeline. When CodePipeline reaches the approval stage, it sends a notification to Amazon SNS, which triggers an AWS Lambda function (ApprovalRequester).

The Slack channel receives a prompt that looks like the following screenshot. When I click Yes to approve the build promotion, the approval result is sent to CodePipeline through API Gateway and Lambda (ApprovalHandler). The pipeline continues on to deploy the build to the next environment.

Create a Slack app

For App Name, type a name for your app. For Development Slack Workspace, choose the name of your workspace. You’ll see in the following screenshot that my workspace is AWS ChatOps.

After the Slack application has been created, you will see the Basic Information page, where you can create incoming webhooks and enable interactive components.

To add incoming webhooks:

  1. Under Add features and functionality, choose Incoming Webhooks. Turn the feature on by selecting Off, as shown in the following screenshot.
  2. Now that the feature is turned on, choose Add New Webhook to Workspace. In the process of creating the webhook, Slack lets you choose the channel where messages will be posted.
  3. After the webhook has been created, you’ll see its URL. You will use this URL when you create the Lambda function.

If you followed the steps in the post, the pipeline should look like the following.

Write the Lambda function for approval requests

This Lambda function is invoked by the SNS notification. It sends a request that consists of an interactive message button to the incoming webhook you created earlier.  The following sample code sends the request to the incoming webhook. WEBHOOK_URL and SLACK_CHANNEL are the environment variables that hold values of the webhook URL that you created and the Slack channel where you want the interactive message button to appear.

# This function is invoked via SNS when the CodePipeline manual approval action starts.
# It will take the details from this approval notification and sent an interactive message to Slack that allows users to approve or cancel the deployment.

import os
import json
import logging
import urllib.parse

from base64 import b64decode
from urllib.request import Request, urlopen
from urllib.error import URLError, HTTPError

# This is passed as a plain-text environment variable for ease of demonstration.
# Consider encrypting the value with KMS or use an encrypted parameter in Parameter Store for production deployments.
SLACK_WEBHOOK_URL = os.environ['SLACK_WEBHOOK_URL']
SLACK_CHANNEL = os.environ['SLACK_CHANNEL']

logger = logging.getLogger()
logger.setLevel(logging.INFO)

def lambda_handler(event, context):
    print("Received event: " + json.dumps(event, indent=2))
    message = event["Records"][0]["Sns"]["Message"]
    
    data = json.loads(message) 
    token = data["approval"]["token"]
    codepipeline_name = data["approval"]["pipelineName"]
    
    slack_message = {
        "channel": SLACK_CHANNEL,
        "text": "Would you like to promote the build to production?",
        "attachments": [
            {
                "text": "Yes to deploy your build to production",
                "fallback": "You are unable to promote a build",
                "callback_id": "wopr_game",
                "color": "#3AA3E3",
                "attachment_type": "default",
                "actions": [
                    {
                        "name": "deployment",
                        "text": "Yes",
                        "style": "danger",
                        "type": "button",
                        "value": json.dumps({"approve": True, "codePipelineToken": token, "codePipelineName": codepipeline_name}),
                        "confirm": {
                            "title": "Are you sure?",
                            "text": "This will deploy the build to production",
                            "ok_text": "Yes",
                            "dismiss_text": "No"
                        }
                    },
                    {
                        "name": "deployment",
                        "text": "No",
                        "type": "button",
                        "value": json.dumps({"approve": False, "codePipelineToken": token, "codePipelineName": codepipeline_name})
                    }  
                ]
            }
        ]
    }

    req = Request(SLACK_WEBHOOK_URL, json.dumps(slack_message).encode('utf-8'))

    response = urlopen(req)
    response.read()
    
    return None

 

Create a SNS topic

Create a topic and then create a subscription that invokes the ApprovalRequester Lambda function. You can configure the manual approval action in the pipeline to send a message to this SNS topic when an approval action is required. When the pipeline reaches the approval stage, it sends a notification to this SNS topic. SNS publishes a notification to all of the subscribed endpoints. In this case, the Lambda function is the endpoint. Therefore, it invokes and executes the Lambda function. For information about how to create a SNS topic, see Create a Topic in the Amazon SNS Developer Guide.

Write the Lambda function for handling the interactive message button

This Lambda function is invoked by API Gateway. It receives the result of the interactive message button whether or not the build promotion was approved. If approved, an API call is made to CodePipeline to promote the build to the next environment. If not approved, the pipeline stops and does not move to the next stage.

The Lambda function code might look like the following. SLACK_VERIFICATION_TOKEN is the environment variable that contains your Slack verification token. You can find your verification token under Basic Information on Slack manage app page. When you scroll down, you will see App Credential. Verification token is found under the section.

# This function is triggered via API Gateway when a user acts on the Slack interactive message sent by approval_requester.py.

from urllib.parse import parse_qs
import json
import os
import boto3

SLACK_VERIFICATION_TOKEN = os.environ['SLACK_VERIFICATION_TOKEN']

#Triggered by API Gateway
#It kicks off a particular CodePipeline project
def lambda_handler(event, context):
	#print("Received event: " + json.dumps(event, indent=2))
	body = parse_qs(event['body'])
	payload = json.loads(body['payload'][0])

	# Validate Slack token
	if SLACK_VERIFICATION_TOKEN == payload['token']:
		send_slack_message(json.loads(payload['actions'][0]['value']))
		
		# This will replace the interactive message with a simple text response.
		# You can implement a more complex message update if you would like.
		return  {
			"isBase64Encoded": "false",
			"statusCode": 200,
			"body": "{\"text\": \"The approval has been processed\"}"
		}
	else:
		return  {
			"isBase64Encoded": "false",
			"statusCode": 403,
			"body": "{\"error\": \"This request does not include a vailid verification token.\"}"
		}


def send_slack_message(action_details):
	codepipeline_status = "Approved" if action_details["approve"] else "Rejected"
	codepipeline_name = action_details["codePipelineName"]
	token = action_details["codePipelineToken"] 

	client = boto3.client('codepipeline')
	response_approval = client.put_approval_result(
							pipelineName=codepipeline_name,
							stageName='Approval',
							actionName='ApprovalOrDeny',
							result={'summary':'','status':codepipeline_status},
							token=token)
	print(response_approval)

 

Create the API Gateway API

  1. In the Amazon API Gateway console, create a resource called InteractiveMessageHandler.
  2. Create a POST method.
    • For Integration type, choose Lambda Function.
    • Select Use Lambda Proxy integration.
    • From Lambda Region, choose a region.
    • In Lambda Function, type a name for your function.
  3.  Deploy to a stage.

For more information, see Getting Started with Amazon API Gateway in the Amazon API Developer Guide.

Now go back to your Slack application and enable interactive components.

To enable interactive components for the interactive message (Yes) button:

  1. Under Features, choose Interactive Components.
  2. Choose Enable Interactive Components.
  3. Type a request URL in the text box. Use the invoke URL in Amazon API Gateway that will be called when the approval button is clicked.

Now that all the pieces have been created, run the solution by checking in a code change to your CodeCommit repo. That will release the change through CodePipeline. When the CodePipeline comes to the approval stage, it will prompt to your Slack channel to see if you want to promote the build to your staging or production environment. Choose Yes and then see if your change was deployed to the environment.

Conclusion

That is it! You have now created a Slack ChatOps solution using AWS CodeCommit, AWS CodePipeline, AWS Lambda, Amazon API Gateway, and Amazon Simple Notification Service.

Now that you know how to do this Slack and CodePipeline integration, you can use the same method to interact with other AWS services using API Gateway and Lambda. You can also use Slack’s slash command to initiate an action from a Slack channel, rather than responding in the way demonstrated in this post.

HackSpace magazine 7: Internet of Everything

Post Syndicated from Andrew Gregory original https://www.raspberrypi.org/blog/hackspace-magazine-7-internet-of-everything/

We’re usually averse to buzzwords at HackSpace magazine, but not this month: in issue 7, we’re taking a deep dive into the Internet of Things.HackSpace magazine issue 7 cover

Internet of Things (IoT)

To many people, IoT is a shady term used by companies to sell you something you already own, but this time with WiFi; to us, it’s a way to make our builds smarter, more useful, and more connected. In HackSpace magazine #7, you can join us on a tour of the boards that power IoT projects, marvel at the ways in which other makers are using IoT, and get started with your first IoT project!

Awesome projects

DIY retro computing: this issue, we’re taking our collective hat off to Spencer Owen. He stuck his home-brew computer on Tindie thinking he might make a bit of beer money — now he’s paying the mortgage with his making skills and inviting others to build modules for his machine. And if that tickles your fancy, why not take a crack at our Z80 tutorial? Get out your breadboard, assemble your jumper wires, and prepare to build a real-life computer!

Inside HackSpace magazine issue 7

Shameless patriotism: combine Lego, Arduino, and the car of choice for 1960 gold bullion thieves, and you’ve got yourself a groovy weekend project. We proudly present to you one man’s epic quest to add LED lights (controllable via a smartphone!) to his daughter’s LEGO Mini Cooper.

Makerspaces

Patriotism intensifies: for the last 200-odd years, the Black Country has been a hotbed of making. Urban Hax, based in Walsall, is the latest makerspace to show off its riches in the coveted Space of the Month pages. Every space has its own way of doing things, but not every space has a portrait of Rob Halford on the wall. All hail!

Inside HackSpace magazine issue 7

Diversity: advice on diversity often boils down to ‘Be nice to people’, which might feel more vague than actionable. This is where we come in to help: it is truly worth making the effort to give people of all backgrounds access to your makerspace, so we take a look at why it’s nice to be nice, and at the ways in which one makerspace has put niceness into practice — with great results.

And there’s more!

We also show you how to easily calculate the size and radius of laser-cut gears, use a bank of LEDs to etch PCBs in your own mini factory, and use chemistry to mess with your lunch menu.

Inside HackSpace magazine issue 7
Helen Steer inside HackSpace magazine issue 7
Inside HackSpace magazine issue 7

All this plus much, much more waits for you in HackSpace magazine issue 7!

Get your copy of HackSpace magazine

If you like the sound of that, you can find HackSpace magazine in WHSmith, Tesco, Sainsbury’s, and independent newsagents in the UK. If you live in the US, check out your local Barnes & Noble, Fry’s, or Micro Center next week. We’re also shipping to stores in Australia, Hong Kong, Canada, Singapore, Belgium, and Brazil, so be sure to ask your local newsagent whether they’ll be getting HackSpace magazine.

And if you can’t get to the shops, fear not: you can subscribe from £4 an issue from our online shop. And if you’d rather try before you buy, you can always download the free PDF. Happy reading, and happy making!

The post HackSpace magazine 7: Internet of Everything appeared first on Raspberry Pi.