Tag Archives: storage

Migrating .NET Classic Applications to Amazon ECS Using Windows Containers

Post Syndicated from Sundar Narasiman original https://aws.amazon.com/blogs/compute/migrating-net-classic-applications-to-amazon-ecs-using-windows-containers/

This post contributed by Sundar Narasiman, Arun Kannan, and Thomas Fuller.

AWS recently announced the general availability of Windows container management for Amazon Elastic Container Service (Amazon ECS). Docker containers and Amazon ECS make it easy to run and scale applications on a virtual machine by abstracting the complex cluster management and setup needed.

Classic .NET applications are developed with .NET Framework 4.7.1 or older and can run only on a Windows platform. These include Windows Communication Foundation (WCF), ASP.NET Web Forms, and an ASP.NET MVC web app or web API.

Why classic ASP.NET?

ASP.NET MVC 4.6 and older versions of ASP.NET occupy a significant footprint in the enterprise web application space. As enterprises move towards microservices for new or existing applications, containers are one of the stepping stones for migrating from monolithic to microservices architectures. Additionally, the support for Windows containers in Windows 10, Windows Server 2016, and Visual Studio Tooling support for Docker simplifies the containerization of ASP.NET MVC apps.

Getting started

In this post, you pick an ASP.NET 4.6.2 MVC application and get step-by-step instructions for migrating to ECS using Windows containers. The detailed steps, AWS CloudFormation template, Microsoft Visual Studio solution, ECS service definition, and ECS task definition are available in the aws-ecs-windows-aspnet GitHub repository.

To help you getting started running Windows containers, here is the reference architecture for Windows containers on GitHub: ecs-refarch-cloudformation-windows. This reference architecture is the layered CloudFormation stack, in that it calls the other stacks to create the environment. The CloudFormation YAML template in this reference architecture is referenced to create a single JSON CloudFormation stack, which is used in the steps for the migration.

Steps for Migration

The code and templates to implement this migration can be found on GitHub: https://github.com/aws-samples/aws-ecs-windows-aspnet.

  1. Your development environment needs to have the latest version and updates for Visual Studio 2017, Windows 10, and Docker for Windows Stable.
  2. Next, containerize the ASP.NET application and test it locally. The size of Windows container application images is generally larger compared to Linux containers. This is because the base image of the Windows container itself is large in size, typically greater than 9 GB.
  3. After the application is containerized, the container image needs to be pushed to Amazon Elastic Container Registry (Amazon ECR). Images stored in ECR are compressed to improve pull times and reduce storage costs. In this case, you can see that ECR compresses the image to around 1 GB, for an optimization factor of 90%.
  4. Create a CloudFormation stack using the template in the ‘CloudFormation template’ folder. This creates an ECS service, task definition (referring the containerized ASP.NET application), and other related components mentioned in the ECS reference architecture for Windows containers.
  5. After the stack is created, verify the successful creation of the ECS service, ECS instances, running tasks (with the threshold mentioned in the task definition), and the Application Load Balancer’s successful health check against running containers.
  6. Navigate to the Application Load Balancer URL and see the successful rendering of the containerized ASP.NET MVC app in the browser.

Key Notes

  • Generally, Windows container images occupy large amount of space (in the order of few GBs).
  • All the task definition parameters for Linux containers are not available for Windows containers. For more information, see Windows Task Definitions.
  • An Application Load Balancer can be configured to route requests to one or more ports on each container instance in a cluster. The dynamic port mapping allows you to have multiple tasks from a single service on the same container instance.
  • IAM roles for Windows tasks require extra configuration. For more information, see Windows IAM Roles for Tasks. For this post, configuration was handled by the CloudFormation template.
  • The ECS container agent log file can be accessed for troubleshooting Windows containers: C:\ProgramData\Amazon\ECS\log\ecs-agent.log

Summary

In this post, you migrated an ASP.NET MVC application to ECS using Windows containers.

The logical next step is to automate the activities for migration to ECS and build a fully automated continuous integration/continuous deployment (CI/CD) pipeline for Windows containers. This can be orchestrated by leveraging services such as AWS CodeCommit, AWS CodePipeline, AWS CodeBuild, Amazon ECR, and Amazon ECS. You can learn more about how this is done in the Set Up a Continuous Delivery Pipeline for Containers Using AWS CodePipeline and Amazon ECS post.

If you have questions or suggestions, please comment below.

Cloud Babble: The Jargon of Cloud Storage

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/what-is-cloud-computing/

Cloud Babble

One of the things we in the technology business are good at is coming up with names, phrases, euphemisms, and acronyms for the stuff that we create. The Cloud Storage market is no different, and we’d like to help by illuminating some of the cloud storage related terms that you might come across. We know this is just a start, so please feel free to add in your favorites in the comments section below and we’ll update this post accordingly.

Clouds

The cloud is really just a collection of purpose built servers. In a public cloud the servers are shared between multiple unrelated tenants. In a private cloud, the servers are dedicated to a single tenant or sometimes a group of related tenants. A public cloud is off-site, while a private cloud can be on-site or off-site – or on-prem or off-prem, if you prefer.

Both Sides Now: Hybrid Clouds

Speaking of on-prem and off-prem, there are Hybrid Clouds or Hybrid Data Clouds depending on what you need. Both are based on the idea that you extend your local resources (typically on-prem) to the cloud (typically off-prem) as needed. This extension is controlled by software that decides, based on rules you define, what needs to be done where.

A Hybrid Data Cloud is specific to data. For example, you can set up a rule that says all accounting files that have not been touched in the last year are automatically moved off-prem to cloud storage. The files are still available; they are just no longer stored on your local systems. The rules can be defined to fit an organization’s workflow and data retention policies.

A Hybrid Cloud is similar to a Hybrid Data Cloud except it also extends compute. For example, at the end of the quarter, you can spin up order processing application instances off-prem as needed to add to your on-prem capacity. Of course, determining where the transactional data used and created by these applications resides can be an interesting systems design challenge.

Clouds in my Coffee: Fog

Typically, public and private clouds live in large buildings called data centers. Full of servers, networking equipment, and clean air, data centers need lots of power, lots of networking bandwidth, and lots of space. This often limits where data centers are located. The further away you are from a data center, the longer it generally takes to get your data to and from there. This is known as latency. That’s where “Fog” comes in.

Fog is often referred to as clouds close to the ground. Fog, in our cloud world, is basically having a “little” data center near you. This can make data storage and even cloud based processing faster for everyone nearby. Data, and less so processing, can be transferred to/from the Fog to the Cloud when time is less a factor. Data could also be aggregated in the Fog and sent to the Cloud. For example, your electric meter could report its minute-by-minute status to the Fog for diagnostic purposes. Then once a day the aggregated data could be send to the power company’s Cloud for billing purposes.

Another term used in place of Fog is Edge, as in computing at the Edge. In either case, a given cloud (data center) usually has multiple Edges (little data centers) connected to it. The connection between the Edge and the Cloud is sometimes known as the middle-mile. The network in the middle-mile can be less robust than that required to support a stand-alone data center. For example, the middle-mile can use 1 Gbps lines, versus a data center, which would require multiple 10 Gbps lines.

Heavy Clouds No Rain: Data

We’re all aware that we are creating, processing, and storing data faster than ever before. All of this data is stored in either a structured or more likely an unstructured way. Databases and data warehouses are structured ways to store data, but a vast amount of data is unstructured – meaning the schema and data access requirements are not known until the data is queried. A large pool of unstructured data in a flat architecture can be referred to as a Data Lake.

A Data Lake is often created so we can perform some type of “big data” analysis. In an over simplified example, let’s extend the lake metaphor a bit and ask the question; “how many fish are in our lake?” To get an answer, we take a sufficient sample of our lake’s water (data), count the number of fish we find, and extrapolate based on the size of the lake to get an answer within a given confidence interval.

A Data Lake is usually found in the cloud, an excellent place to store large amounts of non-transactional data. Watch out as this can lead to our data having too much Data Gravity or being locked in the Hotel California. This could also create a Data Silo, thereby making a potential data Lift-and-Shift impossible. Let me explain:

  • Data Gravity — Generally, the more data you collect in one spot, the harder it is to move. When you store data in a public cloud, you have to pay egress and/or network charges to download the data to another public cloud or even to your own on-premise systems. Some public cloud vendors charge a lot more than others, meaning that depending on your public cloud provider, your data could financially have a lot more gravity than you expected.
  • Hotel California — This is like Data Gravity but to a lesser scale. Your data is in the Hotel California if, to paraphrase, “your data can check out any time you want, but it can never leave.” If the cost of downloading your data is limiting the things you want to do with that data, then your data is in the Hotel California. Data is generally most valuable when used, and with cloud storage that can include archived data. This assumes of course that the archived data is readily available, and affordable, to download. When considering a cloud storage project always figure in the cost of using your own data.
  • Data Silo — Over the years, businesses have suffered from organizational silos as information is not shared between different groups, but instead needs to travel up to the top of the silo before it can be transferred to another silo. If your data is “trapped” in a given cloud by the cost it takes to share such data, then you may have a Data Silo, and that’s exactly opposite of what the cloud should do.
  • Lift-and-Shift — This term is used to define the movement of data or applications from one data center to another or from on-prem to off-prem systems. The move generally occurs all at once and once everything is moved, systems are operational and data is available at the new location with few, if any, changes. If your data has too much gravity or is locked in a hotel, a data lift-and-shift may break the bank.

I Can See Clearly Now

Hopefully, the cloudy terms we’ve covered are well, less cloudy. As we mentioned in the beginning, our compilation is just a start, so please feel free to add in your favorite cloud term in the comments section below and we’ll update this post with your contributions. Keep your entries “clean,” and please no words or phrases that are really adverts for your company. Thanks.

The post Cloud Babble: The Jargon of Cloud Storage appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Kim Dotcom Loses Megaupload Domain Names, Gets “Destroyed” Gaming Chair Back

Post Syndicated from Ernesto original https://torrentfreak.com/kim-dotcom-loses-megaupload-domain-names-gets-destroyed-gaming-chair-back-180117/

Following the 2012 raid on Megaupload and Kim Dotcom, U.S. and New Zealand authorities seized millions of dollars in cash and other property, located around the world.

Claiming the assets were obtained through copyright and money laundering crimes, the U.S. government launched separate civil cases in which it asked the court to forfeit bank accounts, servers, domain names, and other seized possessions of the Megaupload defendants.

One of these cases was lost after the U.S. branded Dotcom and his colleagues as “fugitives”.The defense team appealed the ruling, but lost again, and a subsequent petition at the Supreme Court was denied.

Following this lost battle, the U.S. also moved to conclude a separate civil forfeiture case, which was still pending at a federal court in Virginia.

The assets listed in this case are several bank accounts, including several at PayPal, as well as 60 servers Megaupload bought at Leaseweb. What has the most symbolic value, however, are the domain names that were seized, including Megaupload.com, Megaporn.com and Megavideo.com.

Mega’s domains

This week a U.S. federal court decided that all claims of Kim Dotcom, his former colleague Mathias Ortman, and several Megaupload-related companies should be stricken. A default was entered against them on Tuesday.

The same fugitive disentitlement argument was used in this case. This essentially means that someone who’s considered to be a fugitive from justice is not allowed to get relief from the judicial system he or she evades.

“Claimants Kim Dotcom and Mathias Ortmann have deliberately avoided prosecution by declining to enter or reenter the United States,” Judge Liam O’Grady writes in his order to strike the claims.

“Because Claimant Kim Dotcom, who is himself a fugitive under Section 2466, is the Corporate Claimants’ controlling shareholder and, in particular, because he signed the claims on behalf of the corporations, a presumption of disentitlement applies to the corporations as well.”

As a result, the domain names which once served 50 million users per day, are now lost to the US Government. The court records list 18 domains in total, which were registered through Godaddy, DotRegistrar, and Fabulous.

Given the legal history, the domains and other assets are likely lost for good. However, Megaupload defense lawyer Ira Rothken is not giving up yet.

“We are still evaluating the legal options in a climate where Kim Dotcom is being labeled a fugitive in a US criminal copyright case even though he has never been to the US, is merely asserting his US-NZ extradition treaty rights, and the NZ High Court has ruled that he and his co-defendants did not commit criminal copyright infringement under NZ law,” Rothken tells TorrentFreak.

There might be a possibility that assets located outside the US could be saved. Foreign courts are more open to defense arguments, it seems, as a Hong Kong court previously ordered the US to return several assets belonging to Kim Dotcom.

The Hong Kong case also brought some good news this week. At least, something that was supposed to be positive. On Twitter, Dotcom writes that two containers with seized assets were returned, but in a “rotten and destroyed” state.

“A shipment of 2 large containers just arrived in New Zealand. This is how all my stuff looks now. Rotten & destroyed. Photo: My favorite gaming chair,” Dotcom wrote.

According to Dotcom, the US Government asked him to pay for ‘climate controlled’ storage for more than half a decade to protect the seized goods. However, judging from the look of the chair and the state of some other belongings, something clearly went wrong.

Rotten & destroyed

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

Early Challenges: Managing Cash Flow

Post Syndicated from Gleb Budman original https://www.backblaze.com/blog/managing-cash-flow/

Cash flow projection charts

This post by Backblaze’s CEO and co-founder Gleb Budman is the eighth in a series about entrepreneurship. You can choose posts in the series from the list below:

  1. How Backblaze got Started: The Problem, The Solution, and the Stuff In-Between
  2. Building a Competitive Moat: Turning Challenges Into Advantages
  3. From Idea to Launch: Getting Your First Customers
  4. How to Get Your First 1,000 Customers
  5. Surviving Your First Year
  6. How to Compete with Giants
  7. The Decision on Transparency
  8. Early Challenges: Managing Cash Flow

Use the Join button above to receive notification of new posts in this series.

Running out of cash is one of the quickest ways for a startup to go out of business. When you are starting a company the question of where to get cash is usually the top priority, but managing cash flow is critical for every stage in the lifecycle of a company. As a primarily bootstrapped but capital-intensive business, managing cash flow at Backblaze was and still is a key element of our success and requires continued focus. Let’s look at what we learned over the years.

Raising Your Initial Funding

When starting a tech business in Silicon Valley, the default assumption is that you will immediately try to raise venture funding. There are certainly many advantages to raising funding — not the least of which is that you don’t need to be cash-flow positive since you have cash in the bank and the expectation is that you will have a “burn rate,” i.e. you’ll be spending more than you make.

Note: While you’re not expected to be cash-flow positive, that doesn’t mean you don’t have to worry about cash. Cash-flow management will determine your burn rate. Whether you can get to cash-flow breakeven or need to raise another round of funding is a direct byproduct of your cash flow management.

Also, raising funding takes time (most successful fundraising cycles take 3-6 months start-to-finish), and time at a startup is in short supply. Constantly trying to raise funding can take away from product development and pursuing growth opportunities. If you’re not successful in raising funding, you then have to either shut down or find an alternate method of funding the business.

Sources of Funding

Depending on the stage of the company, type of company, and other factors, you may have access to different sources of funding. Let’s list a number of them:

Customers

Sales — the best kind of funding. It is non-dilutive, doesn’t have to be paid back, and is a direct metric of the success of your company.

Pre-Sales — some customers may be willing to pay you for a product in beta, a test, or pre-pay for a product they’ll receive when finished. Pre-Sales income also is great because it shares the characteristics of cash from sales, but you get the cash early. It also can be a good sign that the product you’re building fills a market need. We started charging for Backblaze computer backup while it was still in private beta, which allowed us to not only collect cash from customers, but also test the billing experience and users’ real desire for the service.

Services — if you’re a service company and customers are paying you for that, great. You can effectively scale for the number of hours available in a day. As demand grows, you can add more employees to increase the total number of billable hours.

Note: If you’re a product company and customers are paying you to consult, that can provide much needed cash, and could provide feedback toward the right product. However, it can also distract from your core business, send you down a path where you’re building a product for a single customer, and addict you to a path that prevents you from building a scalable business.

Investors

Yourself — you likely are putting your time into the business, and deferring salary in the process. You may also put your own cash into the business either as an investment or a loan.

Angels — angels are ideal as early investors since they are used to investing in businesses with little to no traction. AngelList is a good place to find them, though finding people you’re connected with through someone that knows you well is best.

Crowdfunding — a component of the JOBS Act permitted entrepreneurs to raise money from nearly anyone since May 2016. The SEC imposes limits on both investors and the companies. This article goes into some depth on the options and sites available.

VCs — VCs are ideal for companies that need to raise at least a few million dollars and intend to build a business that will be worth over $1 billion.

Debt

Friends & Family — F&F are often the first people to give you money because they are investing in you. It’s great to have some early supporters, but it also can be risky to take money from people who aren’t used to the risks. The key advice here is to only take money from people who won’t mind losing it. If someone is talking about using their children’s college funds or borrowing from their 401k, say ‘no thank you’ — even if they’re sure they want to loan you money.

Bank Loans — a variety of loan types exist, but most either require the company to have been operational for a couple years, be able to borrow against money the company has or is making, or be able to get a personal guarantee from the founders whereby their own credit is on the line. Fundera provides a good overview of loan options and can help secure some, but most will not be an option for a brand new startup.

Grants

Government — in some areas there is the potential for government grants to facilitate research. The SBIR program facilitates some such grants.

At Backblaze, we used a number of these options:

• Investors/Yourself
We loaned a cumulative total of a couple hundred thousand dollars to the company and invested our time by going without a salary for a year and a half.
• Customers/Pre-Sales
We started selling the Backblaze service while it was still in beta.
• Customers/Sales
We launched v1.0 and kept selling.
• Investors/Angels
After a year and a half, we raised $370k from 11 angels. All of them were either people whom we knew personally or were a strong recommendation from a mutual friend.
• Debt/Loans
After a couple years we were able to get equipment leases whereby the Storage Pods and hard drives were used as collateral to secure the lease on them.
• Investors/VCs
Ater five years we raised $5m from TMT Investments to add to the balance sheet and invest in growth.

The variety and quantity of sources we used is by no means uncommon.

GAAP vs. Cash

Most companies start tracking financials based on cash, and as they scale they switch to GAAP (Generally Accepted Accounting Principles). Cash is easier to track — we got paid $XXXX and spent $YYY — and as often mentioned, is required for the business to stay alive. GAAP has more subtlety and complexity, but provides a clearer picture of how the business is really doing. Backblaze was on a ‘cash’ system for the first few years, then switched to GAAP. For this post, I’m going to focus on things that help cash flow, not GAAP profitability.

Stages of Cash Flow Management

All-spend

In a pure service business (e.g. solo proprietor law firm), you may have no expenses other than your time, so this stage doesn’t exist. However, in a product business there is a period of time where you are building the product and have nothing to sell. You have zero cash coming in, but have cash going out. Your cash-flow is completely negative and you need funds to cover that.

Sales-generating

Starting to see cash come in from customers is thrilling. I initially had our system set up to email me with every $5 payment we received. You’re making sales, but not covering expenses.

Ramen-profitable

But it takes a lot of $5 payments to pay for servers and salaries, so for a while expenses are likely to outstrip sales. Getting to ramen-profitable is a critical stage where sales cover the business expenses and are “paying enough for the founders to eat ramen.” This extends the runway for a business, but is not completely sustainable, since presumably the founders can’t (or won’t) live forever on a subsistence salary.

Business-profitable

This is the ultimate stage whereby the business is truly profitable, including paying everyone market-rate salaries. A business at this stage is self-sustaining. (Of course, market shifts and plenty of other challenges can kill the business, but cash-flow issues alone will not.)

Note, I’m using the word ‘profitable’ here to mean this is still on a cash-basis.

Backblaze was in the all-spend stage for just over a year, during which time we built the service and hadn’t yet made the service available to customers. Backblaze was in the sales-generating stage for nearly another year before the company was barely ramen-profitable where sales were covering the company expenses and paying the founders minimum wage. (I say ‘barely’ since minimum wage in the SF Bay Area is arguably never subsistence.) It took almost three more years before the company was business-profitable, paying everyone including the founders market-rate.

Cash Flow Forecasting

When raising funding it’s helpful to think of milestones reached. You don’t necessarily need enough cash on day one to last for the next 100 years of the company. Some good milestones to consider are how much cash you need to prove there is a market need, prove you can build a product to meet that need, or get to ramen-profitable.

Two things to consider:

1) Unit Economics (COGS)

If your product is 100% software, this may not be relevant. Once software is built it costs effectively nothing to deliver the product to one customer or one million customers. However, in most businesses there is some incremental cost to provide the product. If you’re selling a hardware device, perhaps you sell it for $100 but it costs you $50 to make it. This is called “COGS” (Cost of Goods Sold).

Many products rely on cloud services where the costs scale with growth. That model works great, but it’s still important to understand what the costs are for the cloud service you use per unit of product you sell.

Support is often done by the founders early-on in a business, but that is another real cost to factor in and estimate on a per-user basis. Taking all of the per unit costs combined, you may charge $10/month/user for your service, but if it costs you $7/month/user in cloud services, you’re only netting $3/month/user.

2) Operating Expenses (OpEx)

These are expenses that don’t scale with the number of product units you sell. Typically this includes research & development, sales & marketing, and general & administrative expenses. Presumably there is a certain level of these functions required to build the product, market it, sell it, and run the organization. You can choose to invest or cut back on these, but you’ll still make the same amount per product unit.

Incremental Net Profit Per Unit

If you’ve calculated your COGS and your unit economics are “upside down,” where the amount you charge is less than that it costs you to provide your service, it’s worth thinking hard about how that’s going to change over time. If it will not change, there is no scale that will make the business work. Presuming you do make money on each unit of product you sell — what is sometimes referred to as “Contribution Margin” — consider how many of those product units you need to sell to cover your operating expenses as described above.

Calculating Your Profit

The math on getting to ramen-profitable is simple:

(Number of Product Units Sold x Contribution Margin) - Operating Expenses = Profit

If your operating expenses include subsistence salaries for the founders and profit > $0, you’re ramen-profitable.

Improving Cash Flow

Having access to sources of cash, whether from selling to customers or other methods, is excellent. But needing less cash gives you more choices and allows you to either dilute less, owe less, or invest more.

There are two ways to improve cash flow:

1) Collect More Cash

The best way to collect more cash is to provide more value to your customers and as a result have them pay you more. Additional features/products/services can allow this. However, you can also collect more cash by changing how you charge for your product. If you have a subscription, changing from charging monthly to yearly dramatically improves your cash flow. If you have a product that customers use up, selling a year’s supply instead of selling them one-by-one can help.

2) Spend Less Cash

Reducing COGS is a fantastic way to spend less cash in a scalable way. If you can do this without harming the product or customer experience, you win. There are a myriad of ways to also reduce operating expenses, including taking sub-market salaries, using your home instead of renting office space, staying focused on your core product, etc.

Ultimately, collecting more and spending less cash dramatically simplifies the process of getting to ramen-profitable and later to business-profitable.

Be Careful (Why GAAP Matters)

A word of caution: while running out of cash will put you out of business immediately, overextending yourself will likely put you out of business not much later. GAAP shows how a business is really doing; cash doesn’t. If you only focus on cash, it is possible to commit yourself to both delivering products and repaying loans in the future in an unsustainable fashion. If you’re taking out loans, watch the total balance and monthly payments you’re committing to. If you’re asking customers for pre-payment, make sure you believe you can deliver on what they’ve paid for.

Summary

There are numerous challenges to building a business, and ensuring you have enough cash is amongst the most important. Having the cash to keep going lets you keep working on all of the other challenges. The frameworks above were critical for maintaining Backblaze’s cash flow and cash balance. Hopefully you can take some of the lessons we learned and apply them to your business. Let us know what works for you in the comments below.

The post Early Challenges: Managing Cash Flow appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

LSFMM 2018 call for proposals

Post Syndicated from corbet original https://lwn.net/Articles/744400/rss

The 2018 Linux Storage, Filesystem, and Memory-Management Summit will be
held April 23-25 in Park City, Utah. The call for proposals has just gone
out with a tight deadline: they need to be received by January 31.
LSF/MM is an invitation-only technical
workshop to map out improvements to the Linux storage, filesystem and
memory management subsystems that will make their way into the
mainline kernel within the coming years.

US Govt Brands Torrent, Streaming & Cyberlocker Sites As Notorious Markets

Post Syndicated from Andy original https://torrentfreak.com/us-govt-brands-torrent-streaming-cyberlocker-sites-as-notorious-markets-180115/

In its annual “Out-of-Cycle Review of Notorious Markets” the office of the United States Trade Representative (USTR) has listed a long list of websites said to be involved in online piracy.

The list is compiled with high-level input from various trade groups, including the MPAA and RIAA who both submitted their recommendations (1,2) during early October last year.

With the word “allegedly” used more than two dozen times in the report, the US government notes that its report does not constitute cast-iron proof of illegal activity. However, it urges the countries from where the so-called “notorious markets” operate to take action where they can, while putting owners and facilitators on notice that their activities are under the spotlight.

“A goal of the List is to motivate appropriate action by owners, operators, and service providers in the private sector of these and similar markets, as well as governments, to reduce piracy and counterfeiting,” the report reads.

“USTR highlights the following marketplaces because they exemplify global counterfeiting and piracy concerns and because the scale of infringing activity in these marketplaces can cause significant harm to U.S. intellectual property (IP) owners, consumers, legitimate online platforms, and the economy.”

The report begins with a page titled “Issue Focus: Illicit Streaming Devices”. Unsurprisingly, particularly given their place in dozens of headlines last year, the segment focus on the set-top box phenomenon. The piece doesn’t list any apps or software tools as such but highlights the general position, claiming a cost to the US entertainment industry of $4-5 billion a year.

Torrent Sites

In common with previous years, the USTR goes on to list several of the world’s top torrent sites but due to changes in circumstances, others have been delisted. ExtraTorrent, which shut down May 2017, is one such example.

As the world’s most famous torrent site, The Pirate Bay gets a prominent mention, with the USTR noting that the site is of “symbolic importance as one of the longest-running and most vocal torrent sites. The USTR underlines the site’s resilience by noting its hydra-like form while revealing an apparent secret concerning its hosting arrangements.

“The Pirate Bay has allegedly had more than a dozen domains hosted in various countries around the world, applies a reverse proxy service, and uses a hosting provider in Vietnam to evade further enforcement action,” the USTR notes.

Other torrent sites singled out for criticism include RARBG, which was nominated for the listing by the movie industry. According to the USTR, the site is hosted in Bosnia and Herzegovina and has changed hosting services to prevent shutdowns in recent years.

1337x.to and the meta-search engine Torrentz2 are also given a prime mention, with the USTR noting that they are “two of the most popular torrent sites that allegedly infringe U.S. content industry’s copyrights.” Russia’s RuTracker is also targeted for criticism, with the government noting that it’s now one of the most popular torrent sites in the world.

Streaming & Cyberlockers

While torrent sites are still important, the USTR reserves considerable space in its report for streaming portals and cyberlocker-type services.

4Shared.com, a file-hosting site that has been targeted by dozens of millions of copyright notices, is reportedly no longer able to use major US payment providers. Nevertheless, the British Virgin Islands company still collects significant sums from premium accounts, advertising, and offshore payment processors, USTR notes.

Cyberlocker Rapidgator gets another prominent mention in 2017, with the USTR noting that the Russian-hosted platform generates millions of dollars every year through premium memberships while employing rewards and affiliate schemes.

Due to its increasing popularity as a hosting and streaming operation, Openload.co (Romania) is now a big target for the USTR. “The site is used frequently in combination with add-ons in illicit streaming devices. In November 2017, users visited Openload.co a staggering 270 million times,” the USTR writes.

Owned by a Swiss company and hosted in the Netherlands, the popular site Uploaded is also criticized by the US alongside France’s 1Fichier.com, which allegedly hosts pirate games while being largely unresponsive to takedown notices. Dopefile.pk, a Pakistan-based storage outfit, is also highlighted.

On the video streaming front, it’s perhaps no surprise that the USTR focuses on sites like FMovies (Sweden), GoStream (Vietnam), Movie4K.tv (Russia) and PrimeWire. An organization collectively known as the MovShare group which encompasses Nowvideo.sx, WholeCloud.net, NowDownload.cd, MeWatchSeries.to and WatchSeries.ac, among others, is also listed.

Unauthorized music / research papers

While most of the above are either focused on video or feature it as part of their repertoire, other sites are listed for their attention to music. Convert2MP3.net is named as one of the most popular stream-ripping sites in the world and is highlighted due to the prevalence of YouTube-downloader sites and the 2017 demise of YouTube-MP3.

“Convert2MP3.net does not appear to have permission from YouTube or other sites and does not have permission from right holders for a wide variety of music represented by major U.S. labels,” the USTR notes.

Given the amount of attention the site has received in 2017 as ‘The Pirate Bay of Research’, Libgen.io and Sci-Hub.io (not to mention the endless proxy and mirror sites that facilitate access) are given a detailed mention in this year’s report.

“Together these sites make it possible to download — all without permission and without remunerating authors, publishers or researchers — millions of copyrighted books by commercial publishers and university presses; scientific, technical and medical journal articles; and publications of technological standards,” the USTR writes.

Service providers

But it’s not only sites that are being put under pressure. Following a growing list of nominations in previous years, Swiss service provider Private Layer is again singled out as a rogue player in the market for hosting 1337x.to and Torrentz2.eu, among others.

“While the exact configuration of websites changes from year to year, this is the fourth consecutive year that the List has stressed the significant international trade impact of Private Layer’s hosting services and the allegedly infringing sites it hosts,” the USTR notes.

“Other listed and nominated sites may also be hosted by Private Layer but are using
reverse proxy services to obfuscate the true host from the public and from law enforcement.”

The USTR notes Switzerland’s efforts to close a legal loophole that restricts enforcement and looks forward to a positive outcome when the draft amendment is considered by parliament.

Perhaps a little surprisingly given its recent anti-piracy efforts and overtures to the US, Russia’s leading social network VK.com again gets a place on the new list. The USTR recognizes VK’s efforts but insists that more needs to be done.

Social networking and e-commerce

“In 2016, VK reached licensing agreements with major record companies, took steps to limit third-party applications dedicated to downloading infringing content from the site, and experimented with content recognition technologies,” the USTR writes.

“Despite these positive signals, VK reportedly continues to be a hub of infringing activity and the U.S. motion picture industry reports that they find thousands of infringing files on the site each month.”

Finally, in addition to traditional pirate sites, the US also lists online marketplaces that allegedly fail to meet appropriate standards. Re-added to the list in 2016 after a brief hiatus in 2015, China’s Alibaba is listed again in 2017. The development provoked an angry response from the company.

Describing his company as a “scapegoat”, Alibaba Group President Michael Evans said that his platform had achieved a 25% drop in takedown requests and has even been removing infringing listings before they make it online.

“In light of all this, it’s clear that no matter how much action we take and progress we make, the USTR is not actually interested in seeing tangible results,” Evans said in a statement.

The full list of sites in the Notorious Markets Report 2017 (pdf) can be found below.

– 1fichier.com – (cyberlocker)
– 4shared.com – (cyberlocker)
– convert2mp3.net – (stream-ripper)
– Dhgate.com (e-commerce)
– Dopefile.pl – (cyberlocker)
– Firestorm-servers.com (pirate gaming service)
– Fmovies.is, Fmovies.se, Fmovies.to – (streaming)
– Gostream.is, Gomovies.to, 123movieshd.to (streaming)
– Indiamart.com (e-commerce)
– Kinogo.club, kinogo.co (streaming host, platform)
– Libgen.io, sci-hub.io, libgen.pw, sci-hub.cc, sci-hub.bz, libgen.info, lib.rus.ec, bookfi.org, bookzz.org, booker.org, booksc.org, book4you.org, bookos-z1.org, booksee.org, b-ok.org (research downloads)
– Movshare Group – Nowvideo.sx, wholecloud.net, auroravid.to, bitvid.sx, nowdownload.ch, cloudtime.to, mewatchseries.to, watchseries.ac (streaming)
– Movie4k.tv (streaming)
– MP3VA.com (music)
– Openload.co (cyberlocker / streaming)
– 1337x.to (torrent site)
– Primewire.ag (streaming)
– Torrentz2, Torrentz2.me, Torrentz2.is (torrent site)
– Rarbg.to (torrent site)
– Rebel (domain company)
– Repelis.tv (movie and TV linking)
– RuTracker.org (torrent site)
– Rapidgator.net (cyberlocker)
– Taobao.com (e-commerce)
– The Pirate Bay (torrent site)
– TVPlus, TVBrowser, Kuaikan (streaming apps and addons, China)
– Uploaded.net (cyberlocker)
– VK.com (social networking)

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

ISP: We’re Cooperating With Police Following Pirate IPTV Raid

Post Syndicated from Andy original https://torrentfreak.com/isp-were-cooperating-with-police-following-pirate-iptv-raid-180113/

This week, police forces around Europe took action against what is believed to be one of the world’s largest pirate IPTV networks.

The investigation, launched a year ago and coordinated by Europol, came to head on Tuesday when police carried out raids in Cyprus, Bulgaria, Greece, and the Netherlands. A fresh announcement from the crime-fighting group reveals the scale of the operation.

It was led by the Cypriot Police – Intellectual Property Crime Unit, with the support of the Cybercrime Division of the Greek Police, the Dutch Fiscal Investigative and Intelligence Service (FIOD), the Cybercrime Unit of the Bulgarian Police, Europol’s Intellectual Property Crime Coordinated Coalition (IPC³), and supported by members of the Audiovisual Anti-Piracy Alliance (AAPA).

In Cyprus, Bulgaria and Greece, 17 house searches were carried out. Three individuals aged 43, 44, and 53 were arrested in Cyprus and one was arrested in Bulgaria.

All stand accused of being involved in an international operation to illegally broadcast around 1,200 channels of pirated content to an estimated 500,000 subscribers. Some of the channels offered were illegally sourced from Sky UK, Bein Sports, Sky Italia, and Sky DE. On Thursday, the three individuals in Cyprus were remanded in custody for seven days.

“The servers used to distribute the channels were shut down, and IP addresses hosted by a Dutch company were also deactivated thanks to the cooperation of the authorities of The Netherlands,” Europol reports.

“In Bulgaria, 84 servers and 70 satellite receivers were seized, with decoders, computers and accounting documents.”

TorrentFreak was previously able to establish that Megabyte-Internet Ltd, an ISP located in the small Bulgarian town Petrich, was targeted by police. The provider went down on Tuesday but returned towards the end of the week. Responding to our earlier inquiries, the company told us more about the situation.

“We are an ISP provider located in Petrich, Bulgaria. We are selling services to around 1,500 end-clients in the Petrich area and surrounding villages,” a spokesperson explained.

“Another part of our business is internet services like dedicated unmanaged servers, hosting, email servers, storage services, and VPNs etc.”

The spokesperson added that some of Megabyte’s equipment is located at Telepoint, Bulgaria’s biggest datacenter, with connectivity to Petrich. During the raid the police seized the company’s hardware to check for evidence of illegal activity.

“We were informed by the police that some of our clients in Petrich and Sofia were using our service for illegal streaming and actions,” the company said.

“Of course, we were not able to know this because our services are unmanaged and root access [to servers] is given to our clients. For this reason any client and anyone that uses our services are responsible for their own actions.”

TorrentFreak asked many more questions, including how many police attended, what type and volume of hardware was seized, and whether anyone was arrested or taken for questioning. But, apart from noting that the police were friendly, the company declined to give us any additional information, revealing that it was not permitted to do so at this stage.

What is clear, however, is that Megabyte-Internet is offering its full cooperation to the authorities. The company says that it cannot be held responsible for the actions of its clients so their details will be handed over as part of the investigation.

“So now we will give to the police any details about these clients because we hold their full details by law. [The police] will find [out about] all the illegal actions from them,” the company concludes, adding that it’s fully operational once more and working with clients.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

Wanted: Junior Support Technician

Post Syndicated from Yev original https://www.backblaze.com/blog/wanted-junior-support-technician/

Backblaze is growing and as we grow we want to make sure that our customers are very well taken care of! One of the departments that grows along with our customer base is the support department, which is located in our San Mateo, California headquarters. Want to jump start your career? Take a look below and if this sounds like you, apply to join our team!

Responsibilities:

  • Answer questions in the queue commonly found in the FAQ.
  • Ability to install and uninstall programs on Mac and PC.
  • Clear communication via email.
  • Learn and expand your knowledge base to become a Tech Support Agent.
  • Learn how to navigate the Zendesk support tool and create helpful macros and work flow views.
  • Create receipts for users that ask for them, via template.
  • Respond to any tickets that you get a reply to.
  • Ask questions for facilitate learning.
  • Obtain skills and knowledge to move into a Tier 2 position.

Requirements:

  • Excellent communication, time management, problem solving, and organizational skills.
  • Ability to learn quickly.
  • Position based in San Mateo, California.

Backblaze Employees Have:

  • Good attitude and willingness to do whatever it takes to get the job done.
  • Strong desire to work for a small, fast-paced company.
  • Desire to learn and adapt to rapidly changing technologies and work environment.
  • Comfortable with well-behaved pets in the office.

Backblaze is an Equal Opportunity Employer and we offer competitive salary and benefits, including our no policy vacation policy.

If This Sounds Like You:
Send an email to jobscontact@backblaze.com with:

  1. The position title in the subject line
  2. Your resume attached
  3. An overview of your relevant experience

The post Wanted: Junior Support Technician appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

[$] Opening up the GnuBee open NAS system

Post Syndicated from corbet original https://lwn.net/Articles/743609/rss

GnuBee is the brand name
for a line of open hardware boards designed to provide
Linux-based network-attached storage. Given the success of the
crowdfunding campaigns for the first two products, the GB-PC1 and
GB-PC2
(which support 2.5 and 3.5 inch drives respectively), there appears to be a
market for these devices. Given that Linux is quite good at attaching
storage to a network, it seems likely they will perform their core function
more than adequately. My initial focus when exploring my GB-PC1 is not the
performance but the openness: just how open is it really? The best analogy
I can come up with is that of a door with rusty hinges: it can be opened,
but doing so requires determination.

Announcing our new beta for the AWS Certified Security – Specialty exam

Post Syndicated from Janna Pellegrino original https://aws.amazon.com/blogs/architecture/announcing-our-new-beta-for-the-aws-certified-security-specialty-exam/

Take the AWS Certified Security – Specialty beta exam for the chance to be among the first to hold this new AWS Certification. This beta exam allows experienced cloud security professionals to demonstrate and validate their expertise. Register today – this beta exam will only be available from January 15 to March 2!

About the exam

This beta exam validates that the successful candidate can effectively demonstrate knowledge of how to secure the AWS platform. The exam covers incident response, logging and monitoring, infrastructure security, identity and access management, and data protection.

The exam validates:

  • Familiarity with regional- and country-specific security and compliance regulations and meta issues that these regulations embody.
  • An understanding of specialized data classifications and AWS data protection mechanisms.
  • An understanding of data encryption methods and AWS mechanisms to implement them.
  • An understanding of secure Internet protocols and AWS mechanisms to implement them.
  • A working knowledge of AWS security services and features of services to provide a secure production environment.
  • Competency gained from two or more years of production deployment experience using AWS security services and features.
  • Ability to make tradeoff decisions with regard to cost, security, and deployment complexity given a set of application requirements.
  • An understanding of security operations and risk.

Learn more and register >>

Who is eligible

The beta is open to anyone who currently holds an Associate or Cloud Practitioner certification. We recommend candidates have five years of IT security experience designing and implementing security solutions, and at least two years of hands-on experience securing AWS workloads.

How to prepare

We have training and other resources to help you prepare for the beta exam:

AWS Security Fundamentals Digital| 3 Hours
This course introduces you to fundamental cloud computing and AWS security concepts, including AWS access control and management, governance, logging, and encryption methods. It also covers security-related compliance protocols and risk management strategies, as well as procedures related to auditing your AWS security infrastructure.

Security Operations on AWS Classroom | 3 Days
This course demonstrates how to efficiently use AWS security services to stay secure and compliant in the AWS Cloud. The course focuses on the AWS-recommended security best practices that you can implement to enhance the security of your data and systems in the cloud. The course highlights the security features of AWS key services including compute, storage, networking, and database services.

Online resources for Cloud Security and Compliance

Review documentation, whitepapers, and articles & tutorials related to cloud security and compliance.

Learn more and register >>

Please contact us if you have questions about exam registration.

Good luck!

Connect Veeam to the B2 Cloud: Episode 1 — Using Synology

Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/backing-up-veeam-cloud-connect-synology-b2/

Veeam Cloud Connect to Backblaze B2

Veeam is well-known for its easy-to-use software for backing up virtual machines from VMware and Microsoft.

Users of Veeam and Backblaze B2 Cloud Storage have asked for a way to back up a Veeam repository to B2. Backblaze’s B2 is an ideal solution for backing up Veeam’s backup repository due to B2’s combination of low-cost and high availability compared to other cloud solutions such as Microsoft Azure.

This is the first in a series of posts on the topic of backing up Veeam to B2. Future posts will cover other methods.

In this post we provide a step-by-step tutorial on how to configure a Synology NAS as a Veeam backup repository, and in turn use Synology’s CloudSync software to back up that repository to the B2 Cloud.

Our guest contributor, Rhys Hammond, is well qualified to author this tutorial. Rhys is a Senior System Engineer for Data#3 in Australia specializing in Veeam and VMware solutions. He is a VMware vExpert and a member of the Veeam Vanguard program.

Rhy’s tutorial is outlined as follows:

Veeam and Backblaze B2 — Introduction

Introduction

Background on B2 and Veeam, and a discussion of various ways to back up a Veeam backup repository to the cloud.

Phase 1 — Create the Backblaze B2 Bucket

How to create the B2 Bucket that will be the destination for mirroring our Veeam backup repository.

Phase 2 — Install and Configure Synology CloudSync

Get CloudSync ready to perform the backup to B2.

Phase 3 — Configure Veeam Backup Repository

Create a new Veeam backup repository in preparation for upload to B2.

Phase 4 — Create the Veeam Backup Job

Configure the Veeam backup job, with two possible scenarios, primary target and secondary backup target.

Phase 5 — Testing and Tuning

Making sure it all works.

Summary

Some thoughts on the process, other options, and tips.

You can read the full tutorial on Rhy’s website by following the link below. To be sure to receive notice of future posts in this series on Veeam, use the Join button at the top of the page.

Beta Testers Needed: Veeam/Starwind/B2

If you back up Veeam using Starwind VTL, we have a BETA program for you. Help us with the Starwind VTL to Backblaze B2 integration Beta and test whether you can automatically back up Veeam to Backblaze B2 via Starwind VTL. Motivated beta testers can email starwind@backblaze.com for details and how to get started.

The post Connect Veeam to the B2 Cloud: Episode 1 — Using Synology appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Validate Your IT Security Expertise with the New AWS Certified Security – Specialty Beta Exam

Post Syndicated from Sara Snedeker original https://aws.amazon.com/blogs/security/validate-your-it-security-expertise-with-the-new-aws-certified-security-specialty-beta-exam/

AWS Training and Certification image

If you are an experienced cloud security professional, you can demonstrate and validate your expertise with the new AWS Certified Security – Specialty beta exam. This exam allows you to demonstrate your knowledge of incident response, logging and monitoring, infrastructure security, identity and access management, and data protection. Register today – this beta exam will be available only from January 15 to March 2, 2018.

By taking this exam, you can validate your:

  • Familiarity with region-specific and country-specific security and compliance regulations and meta issues that these regulations include.
  • Understanding of data encryption methods and secure internet protocols, and the AWS mechanisms to implement them.
  • Working knowledge of AWS security services to provide a secure production environment.
  • Ability to make trade-off decisions with regard to cost, security, and deployment complexity when given a set of application requirements.

See the full list of security knowledge you can validate by taking this beta exam.

Who is eligible?

The beta exam is open to anyone who currently holds an AWS Associate or Cloud Practitioner certification. We recommend candidates have five years of IT security experience designing and implementing security solutions, and at least two years of hands-on experience securing AWS workloads.

How to prepare

You can take the following courses and use AWS cloud security resources and compliance resources to prepare for this exam.

AWS Security Fundamentals (digital, 3 hours)
This digital course introduces you to fundamental cloud computing and AWS security concepts, including AWS access control and management, governance, logging, and encryption methods. It also covers security-related compliance protocols and risk management strategies, as well as procedures related to auditing your AWS security infrastructure.

Security Operations on AWS (classroom, 3 days)
This instructor-led course demonstrates how to efficiently use AWS security services to help stay secure and compliant in the AWS Cloud. The course focuses on the AWS-recommended security best practices that you can implement to enhance the security of your AWS resources. The course highlights the security features of AWS compute, storage, networking, and database services.

If you have questions about this new beta exam, contact us.

Good luck with the exam!

– Sara

Graphite 1.1: Teaching an Old Dog New Tricks

Post Syndicated from Blogs on Grafana Labs Blog original https://grafana.com/blog/2018/01/11/graphite-1.1-teaching-an-old-dog-new-tricks/

The Road to Graphite 1.1

I started working on Graphite just over a year ago, when @obfuscurity asked me to help out with some issues blocking the Graphite 1.0 release. Little did I know that a year later, that would have resulted in 262 commits (and counting), and that with the help of the other Graphite maintainers (especially @deniszh, @iksaif & @cbowman0) we would have added a huge amount of new functionality to Graphite.

There are a huge number of new additions and updates in this release, in this post I’ll give a tour of some of the highlights including tag support, syntax and function updates, custom function plugins, and python 3.x support.

Tagging!

The single biggest feature in this release is the addition of tag support, which brings the ability to describe metrics in a much richer way and to write more flexible and expressive queries.

Traditionally series in Graphite are identified using a hierarchical naming scheme based on dot-separated segments called nodes. This works very well and is simple to map into a hierarchical structure like the whisper filesystem tree, but it means that the user has to know what each segment represents, and makes it very difficult to modify or extend the naming scheme since everything is based on the positions of the segments within the hierarchy.

The tagging system gives users the ability to encode information about the series in a collection of tag=value pairs which are used together with the series name to uniquely identify each series, and the ability to query series by specifying tag-based matching expressions rather than constructing glob-style selectors based on the positions of specific segments within the hierarchy. This is broadly similar to the system used by Prometheus and makes it possible to use Graphite as a long-term storage backend for metrics gathered by Prometheus with full tag support.

When using tags, series names are specified using the new tagged carbon format: name;tag1=value1;tag2=value2. This format is backward compatible with most existing carbon tooling, and makes it easy to adapt existing tools to produce tagged metrics simply by changing the metric names. The OpenMetrics format is also supported for ingestion, and is normalized into the standard Graphite format internally.

At its core, the tagging system is implemented as a tag database (TagDB) alongside the metrics that allows them to be efficiently queried by individual tag values rather than having to traverse the metrics tree looking for series that match the specified query. Internally the tag index is stored in one of a number of pluggable tag databases, currently supported options are the internal graphite-web database, redis, or an external system that implements the Graphite tagging HTTP API. Carbon automatically keeps the index up to date with any tagged series seen.

The new seriesByTag function is used to query the TagDB and will return a list of all the series that match the expressions passed to it. seriesByTag supports both exact and regular expression matches, and can be used anywhere you would previously have specified a metric name or glob expression.

There are new dedicated functions for grouping and aliasing series by tag (groupByTags and aliasByTags), and you can also use tags interchangeably with node numbers in the standard Graphite functions like aliasByNode, groupByNodes, asPercent, mapSeries, etc.

Piping Syntax & Function Updates

One of the huge strengths of the Graphite render API is the ability to chain together multiple functions to process data, but until now (unless you were using a tool like Grafana) writing chained queries could be painful as each function had to be wrapped around the previous one. With this release it is now possible to “pipe” the output of one processing function into the next, and to combine piped and nested functions.

For example:

alias(movingAverage(scaleToSeconds(sumSeries(stats_global.production.counters.api.requests.*.count),60),30),'api.avg')

Can now be written as:

sumSeries(stats_global.production.counters.api.requests.*.count)|scaleToSeconds(60)|movingAverage(30)|alias('api.avg')

OR

stats_global.production.counters.api.requests.*.count|sumSeries()|scaleToSeconds(60)|movingAverage(30)|alias('api.avg')

Another source of frustration with the old function API was the inconsistent implementation of aggregations, with different functions being used in different parts of the API, and some functions simply not being available. In 1.1 all functions that perform aggregation (whether across series or across time intervals) now support a consistent set of aggregations; average, median, sum, min, max, diff, stddev, count, range, multiply and last. This is part of a new approach to implementing functions that emphasises using shared building blocks to ensure consistency across the API and solve the problem of a particular function not working with the aggregation needed for a given task.

To that end a number of new functions have been added that each provide the same functionality as an entire family of “old” functions; aggregate, aggregateWithWildcards, movingWindow, filterSeries, highest, lowest and sortBy.

Each of these functions accepts an aggregation method parameter, for example aggregate(some.metric.*, 'sum') implements the same functionality as sumSeries(some.metric.*).

It can also be used with different aggregation methods to replace averageSeries, stddevSeries, multiplySeries, diffSeries, rangeOfSeries, minSeries, maxSeries and countSeries. All those functions are now implemented as aliases for aggregate, and it supports the previously-missing median and last aggregations.

The same is true for the other functions, and the summarize, smartSummarize, groupByNode, groupByNodes and the new groupByTags functions now all support the standard set of aggregations. Gone are the days of wishing that sortByMedian or highestRange were available!

For more information on the functions available check the function documentation.

Custom Functions

No matter how many functions are available there are always going to be specific use-cases where a custom function can perform analysis that wouldn’t otherwise be possible, or provide a convenient alias for a complicated function chain or specific set of parameters.

In Graphite 1.1 we added support for easily adding one-off custom functions, as well as for creating and sharing plugins that can provide one or more functions.

Each function plugin is packaged as a simple python module, and will be automatically loaded by Graphite when placed into the functions/custom folder.

An example of a simple function plugin that translates the name of every series passed to it into UPPERCASE:

from graphite.functions.params import Param, ParamTypes

def toUpperCase(requestContext, seriesList):
  """Custom function that changes series names to UPPERCASE"""
  for series in seriesList:
    series.name = series.name.upper()
  return seriesList

toUpperCase.group = 'Custom'
toUpperCase.params = [
  Param('seriesList', ParamTypes.seriesList, required=True),
]

SeriesFunctions = {
  'upper': toUpperCase,
}

Once installed the function is not only available for use within Grpahite, but is also exposed via the new Function API which allows the function definition and documentation to be automatically loaded by tools like Grafana. This means that users will be able to select and use the new function in exactly the same way as the internal functions.

More information on writing and using custom functions is available in the documentation.

Clustering Updates

One of the biggest changes from the 0.9 to 1.0 releases was the overhaul of the clustering code, and with 1.1.1 that process has been taken even further to optimize performance when using Graphite in a clustered deployment. In the past it was common for a request to require the frontend node to make multiple requests to the backend nodes to identify matching series and to fetch data, and the code for handling remote vs local series was overly complicated. In 1.1.1 we took a new approach where all render data requests pass through the same path internally, and multiple backend nodes are handled individually rather than grouped together into a single finder. This has greatly simplified the codebase, making it much easier to understand and reason about, while allowing much more flexibility in design of the finders. After these changes, render requests can now be answered with a single internal request to each backend node, and all requests for both remote and local data are executed in parallel.

To maintain the ability of graphite to scale out horizontally, the tagging system works seamlessly within a clustered environment, with each node responsible for the series stored on that node. Calls to load tagged series via seriesByTag are fanned out to the backend nodes and results are merged on the query node just like they are for non-tagged series.

Python 3 & Django 1.11 Support

Graphite 1.1 finally brings support for Python 3.x, both graphite-web and carbon are now tested against Python 2.7, 3.4, 3.5, 3.6 and PyPy. Django releases 1.8 through 1.11 are also supported. The work involved in sorting out the compatibility issues between Python 2.x and 3.x was quite involved, but it is a huge step forward for the long term support of the project! With the new Django 2.x series supporting only Python 3.x we will need to evaluate our long-term support for Python 2.x, but the Django 1.11 series is supported through 2020 so there is time to consider the options there.

Watch This Space

Efforts are underway to add support for the new functionality across the ecosystem of tools that work with Graphite, adding collectd tagging support, prometheus remote read & write with tags (and native Prometheus remote read/write support in Graphite) and last but not least Graphite tag support in Grafana.

We’re excited about the possibilities that the new capabilities in 1.1.x open up, and can’t wait to see how the community puts them to work.

Download the 1.1.1 release and check out the release notes here.

AWS IoT, Greengrass, and Machine Learning for Connected Vehicles at CES

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-iot-greengrass-and-machine-learning-for-connected-vehicles-at-ces/

Last week I attended a talk given by Bryan Mistele, president of Seattle-based INRIX. Bryan’s talk provided a glimpse into the future of transportation, centering around four principle attributes, often abbreviated as ACES:

Autonomous – Cars and trucks are gaining the ability to scan and to make sense of their environments and to navigate without human input.

Connected – Vehicles of all types have the ability to take advantage of bidirectional connections (either full-time or intermittent) to other cars and to cloud-based resources. They can upload road and performance data, communicate with each other to run in packs, and take advantage of traffic and weather data.

Electric – Continued development of battery and motor technology, will make electrics vehicles more convenient, cost-effective, and environmentally friendly.

Shared – Ride-sharing services will change usage from an ownership model to an as-a-service model (sound familiar?).

Individually and in combination, these emerging attributes mean that the cars and trucks we will see and use in the decade to come will be markedly different than those of the past.

On the Road with AWS
AWS customers are already using our AWS IoT, edge computing, Amazon Machine Learning, and Alexa products to bring this future to life – vehicle manufacturers, their tier 1 suppliers, and AutoTech startups all use AWS for their ACES initiatives. AWS Greengrass is playing an important role here, attracting design wins and helping our customers to add processing power and machine learning inferencing at the edge.

AWS customer Aptiv (formerly Delphi) talked about their Automated Mobility on Demand (AMoD) smart vehicle architecture in a AWS re:Invent session. Aptiv’s AMoD platform will use Greengrass and microservices to drive the onboard user experience, along with edge processing, monitoring, and control. Here’s an overview:

Another customer, Denso of Japan (one of the world’s largest suppliers of auto components and software) is using Greengrass and AWS IoT to support their vision of Mobility as a Service (MaaS). Here’s a video:

AWS at CES
The AWS team will be out in force at CES in Las Vegas and would love to talk to you. They’ll be running demos that show how AWS can help to bring innovation and personalization to connected and autonomous vehicles.

Personalized In-Vehicle Experience – This demo shows how AWS AI and Machine Learning can be used to create a highly personalized and branded in-vehicle experience. It makes use of Amazon Lex, Polly, and Amazon Rekognition, but the design is flexible and can be used with other services as well. The demo encompasses driver registration, login and startup (including facial recognition), voice assistance for contextual guidance, personalized e-commerce, and vehicle control. Here’s the architecture for the voice assistance:

Connected Vehicle Solution – This demo shows how a connected vehicle can combine local and cloud intelligence, using edge computing and machine learning at the edge. It handles intermittent connections and uses AWS DeepLens to train a model that responds to distracted drivers. Here’s the overall architecture, as described in our Connected Vehicle Solution:

Digital Content Delivery – This demo will show how a customer uses a web-based 3D configurator to build and personalize their vehicle. It will also show high resolution (4K) 3D image and an optional immersive AR/VR experience, both designed for use within a dealership.

Autonomous Driving – This demo will showcase the AWS services that can be used to build autonomous vehicles. There’s a 1/16th scale model vehicle powered and driven by Greengrass and an overview of a new AWS Autonomous Toolkit. As part of the demo, attendees drive the car, training a model via Amazon SageMaker for subsequent on-board inferencing, powered by Greengrass ML Inferencing.

To speak to one of my colleagues or to set up a time to see the demos, check out the Visit AWS at CES 2018 page.

Some Resources
If you are interested in this topic and want to learn more, the AWS for Automotive page is a great starting point, with discussions on connected vehicles & mobility, autonomous vehicle development, and digital customer engagement.

When you are ready to start building a connected vehicle, the AWS Connected Vehicle Solution contains a reference architecture that combines local computing, sophisticated event rules, and cloud-based data processing and storage. You can use this solution to accelerate your own connected vehicle projects.

Jeff;

Wanted: Sales Engineer

Post Syndicated from Yev original https://www.backblaze.com/blog/wanted-sales-engineer/

At inception, Backblaze was a consumer company. Thousands upon thousands of individuals came to our website and gave us $5/mo to keep their data safe. But, we didn’t sell business solutions. It took us years before we had a sales team. In the last couple of years, we’ve released products that businesses of all sizes love: Backblaze B2 Cloud Storage and Backblaze for Business Computer Backup. Those businesses want to integrate Backblaze deeply into their infrastructure, so it’s time to hire our first Sales Engineer!

Company Description:
Founded in 2007, Backblaze started with a mission to make backup software elegant and provide complete peace of mind. Over the course of almost a decade, we have become a pioneer in robust, scalable low cost cloud backup. Recently, we launched B2 – robust and reliable object storage at just $0.005/gb/mo. Part of our differentiation is being able to offer the lowest price of any of the big players while still being profitable.

We’ve managed to nurture a team oriented culture with amazingly low turnover. We value our people and their families. Don’t forget to check out our “About Us” page to learn more about the people and some of our perks.

We have built a profitable, high growth business. While we love our investors, we have maintained control over the business. That means our corporate goals are simple – grow sustainably and profitably.

Some Backblaze Perks:

  • Competitive healthcare plans
  • Competitive compensation and 401k
  • All employees receive Option grants
  • Unlimited vacation days
  • Strong coffee
  • Fully stocked Micro kitchen
  • Catered breakfast and lunches
  • Awesome people who work on awesome projects
  • Childcare bonus
  • Normal work hours
  • Get to bring your pets into the office
  • San Mateo Office – located near Caltrain and Highways 101 & 280.

Backblaze B2 cloud storage is a building block for almost any computing service that requires storage. Customers need our help integrating B2 into iOS apps to Docker containers. Some customers integrate directly to the API using the programming language of their choice, others want to solve a specific problem using ready made software, already integrated with B2.

At the same time, our computer backup product is deepening it’s integration into enterprise IT systems. We are commonly asked for how to set Windows policies, integrate with Active Directory, and install the client via remote management tools.

We are looking for a sales engineer who can help our customers navigate the integration of Backblaze into their technical environments.

Are you 1/2” deep into many different technologies, and unafraid to dive deeper?

Can you confidently talk with customers about their technology, even if you have to look up all the acronyms right after the call?

Are you excited to setup complicated software in a lab and write knowledge base articles about your work?

Then Backblaze is the place for you!

Enough about Backblaze already, what’s in it for me?
In this role, you will be given the opportunity to learn about the technologies that drive innovation today; diverse technologies that customers are using day in and out. And more importantly, you’ll learn how to learn new technologies.

Just as an example, in the past 12 months, we’ve had the opportunity to learn and become experts in these diverse technologies:

  • How to setup VM servers for lab environments, both on-prem and using cloud services.
  • Create an automatically “resetting” demo environment for the sales team.
  • Setup Microsoft Domain Controllers with Active Directory and AD Federation Services.
  • Learn the basics of OAUTH and web single sign on (SSO).
  • Archive video workflows from camera to media asset management systems.
  • How upload/download files from Javascript by enabling CORS.
  • How to install and monitor online backup installations using RMM tools, like JAMF.
  • Tape (LTO) systems. (Yes – people still use tape for storage!)

How can I know if I’ll succeed in this role?

You have:

  • Confidence. Be able to ask customers questions about their environments and convey to them your technical acumen.
  • Curiosity. Always want to learn about customers’ situations, how they got there and what problems they are trying to solve.
  • Organization. You’ll work with customers, integration partners, and Backblaze team members on projects of various lengths. You can context switch and either have a great memory or keep copious notes. Your checklists have their own checklists.

You are versed in:

  • The fundamentals of Windows, Linux and Mac OS X operating systems. You shouldn’t be afraid to use a command line.
  • Building, installing, integrating and configuring applications on any operating system.
  • Debugging failures – reading logs, monitoring usage, effective google searching to fix problems excites you.
  • The basics of TCP/IP networking and the HTTP protocol.
  • Novice development skills in any programming/scripting language. Have basic understanding of data structures and program flow.
  • Your background contains:

  • Bachelor’s degree in computer science or the equivalent.
  • 2+ years of experience as a pre or post-sales engineer.
  • The right extra credit:
    There are literally hundreds of previous experiences you can have had that would make you perfect for this job. Some experiences that we know would be helpful for us are below, but make sure you tell us your stories!

  • Experience using or programming against Amazon S3.
  • Experience with large on-prem storage – NAS, SAN, Object. And backing up data on such storage with tools like Veeam, Veritas and others.
  • Experience with photo or video media. Media archiving is a key market for Backblaze B2.
  • Program arduinos to automatically feed your dog.
  • Experience programming against web or REST APIs. (Point us towards your projects, if they are open source and available to link to.)
  • Experience with sales tools like Salesforce.
  • 3D print door stops.
  • Experience with Windows Servers, Active Directory, Group policies and the like.
  • What’s it like working with the Sales team?
    The Backblaze sales team collaborates. We help each other out by sharing ideas, templates, and our customer’s experiences. When we talk about our accomplishments, there is no “I did this,” only “we”. We are truly a team.

    We are honest to each other and our customers and communicate openly. We aim to have fun by embracing crazy ideas and creative solutions. We try to think not outside the box, but with no boxes at all. Customers are the driving force behind the success of the company and we care deeply about their success.

    If this all sounds like you:

    1. Send an email to [email protected] with the position in the subject line.
    2. Tell us a bit about your Sales Engineering experience.
    3. Include your resume.

    The post Wanted: Sales Engineer appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

    Wanted: Datacenter Technician

    Post Syndicated from Yev original https://www.backblaze.com/blog/wanted-datacenter-technician/

    As we shoot way past 400 Petabytes of data under management we need some help scaling up our datacenters! We’re on the lookout for some datacenter technicians that can help us. This role is located near the Sacramento, California area. If you want to join a dynamic team that helps keep our almost 90,000+ hard drives spinning, this might be the job for you!

    Responsibilities

    • Work as Backblaze’s physical presence in Sacramento area datacenter(s).
    • Help maintain physical infrastructure including racking equipment, replacing hard drives and other system components.
    • Repair and troubleshoot defective equipment with minimal supervision.
    • Support datacenter’s 24×7 staff to install new equipment, handle after hours emergencies and other tasks.
    • Help manage onsite inventory of hard drives, cables, rails and other spare parts.
    • RMA defective components.
    • Setup, test and activate new equipment via the Linux command line.
    • Help train new Datacenter Technicians as needed.
    • Help with projects to install new systems and services as time allows.
    • Follow and improve Datacenter best practices and documentation.
    • Maintain a clean and well organized work environment.
    • On-call responsibilities require being within an hour of the SunGard’s Rancho Cordova/Roseville facility and occasional trips onsite 24×7 to resolve issues that can’t be handled remotely.
    • Work days may include Saturday and/or Sunday (e.g. working Tuesday – Saturday).

    Requirements

    • Excellent communication, time management, problem solving and organizational skills.
    • Ability to learn quickly.
    • Ability to lift/move 50-75 lbs and work down near the floor on a daily basis.
    • Position based near Sacramento, California and may require periodic visits to the corporate office in San Mateo.
    • May require travel to other Datacenters to provide coverage and/or to assist
      with new site set-up.

    Backblaze Employees Have:

    • Good attitude and willingness to do whatever it takes to get the job done.
    • Strong desire to work for a small, fast-paced company.
    • Desire to learn and adapt to rapidly changing technologies and work environment.
    • Comfortable with well-behaved pets in the office.
    • This position is located near Sacramento, California.

    Backblaze is an Equal Opportunity Employer and we offer competitive salary and benefits, including our no policy vacation policy.

    If This Sounds Like You:
    Send an email to [email protected] with:

    1. Datacenter Tech in the subject line
    2. Your resume attached
    3. An overview of your relevant experience

    The post Wanted: Datacenter Technician appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

    12 B2 Power Tips for Experts and Developers

    Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/advanced-cloud-storage-tips/

    B2 Tips for Pros
    If you’ve been using B2 Cloud Storage for a while, you probably think you know all that you can do with it. But do you?

    We’ve put together a list of blazing power tips for experts and developers that will take you to the next level. Take a look below.

    If you’re new to B2, we have a list of power tips for you, too.
    Visit 12 Power Tips for New B2 Users.
    Backblaze logo

    1    Manage File Versions

    Use Lifecycle Rules on a Bucket to set how many days to keep files that are no longer the current version. This is a great way to manage the amount of space your B2 account is using.

    Backblaze logo

    2    Easily Stay on Top of Your B2 Account Limits

    Set usage caps and get text/email alerts for your B2 account when you approach limits that you define.

    Backblaze logo

    3    Bring on Your Big Files

    You can upload files as large as 10TB to B2.

    Backblaze logo

    4    You Can Use FedEx to Get Your Data into B2

    If you have over 20TB of data, you can use Backblaze’s Fireball hard disk array to load large volumes of data directly into your B2 account. We ship a Fireball to you and you ship it back.

    Backblaze logo

    5    You Have Command-Line Control of All B2 Functions

    You have complete control over B2 using our command line tool that is available for Macintosh, Windows, and Linux.

    Backblaze logo

    6    You Can Use Your Own Domain Name To Front a Public B2 Bucket

    You can create a vanity URL for your B2 account.

    Backblaze logo

    7    See What’s Happening in Your Account with Graphical Reports

    You can view graphical reports summarizing your B2 usage — transactions, downloads, averages, data stored — in your B2 account dashboard.

    Backblaze logo

    8    Create a B2 SDK

    You can build your own B2 SDK for JVM-based or JVM-compatible languages using our B2 Java SDK on Github.

    Backblaze logo

    9    B2’s API is Easy to Use

    B2’s API is similar to, but simpler than Amazon’s S3 API, making it super easy for developers to integrate with B2 Cloud Storage.

    Backblaze logo

    10    View Code Examples To Get Your B2 Project Started

    The B2 API is well documented and has code examples for cURL, Java, Python, Swift, Ruby, C#, and PHP. For example, here’s how to create a B2 Bucket.

    Backblaze logo

    11    Developers can set the B2 part size as low as 5 MB

    When working with large files, the minimum file part size can be set as low as 5MB or as high as 5GB. This gives developers the ability to maximize the throughput of B2 data uploads and downloads. See Large Files and Downloading for more developer tips.

    Backblaze logo

    12    Your App or Device Can Work with B2, as well

    Your B2 integration can be listed on Backblaze’s website. Visit Submit an Integration to get started.

    Want to Learn More About B2?

    You can find more information on B2 on our website and in our help pages.

    The post 12 B2 Power Tips for Experts and Developers appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

    Wanted: Fixed Assets Accountant

    Post Syndicated from Yev original https://www.backblaze.com/blog/wanted-fixed-assets-accountant/

    As Backblaze continues to grow, we’re expanding our accounting team! We’re looking for a seasoned Fixed Asset Accountant to help us with fixed assets and equipment leases.

    Job Duties:

    • Maintain and review fixed assets.
    • Record fixed asset acquisitions and dispositions.
    • Review and update the detailed schedule of fixed assets and accumulated depreciation.
    • Calculate depreciation for all fixed assets.
    • Investigate the potential obsolescence of fixed assets.
    • Coordinate with Operations team data center asset dispositions.
    • Conduct periodic physical inventory counts of fixed assets. Work with Operations team on cycle counts.
    • Reconcile the balance in the fixed asset subsidiary ledger to the summary-level account in the general ledger.
    • Track company expenditures for fixed assets in comparison to the capital budget and management authorizations.
    • Prepare audit schedules relating to fixed assets, and assist the auditors in their inquiries.
    • Recommend to management any updates to accounting policies related to fixed assets.
    • Manage equipment leases.
    • Engage and negotiate acquisition of new equipment lease lines.
    • Overall control of original lease documentation and maintenance of master lease files.
    • Facilitate and track routing and execution of various lease related: agreements — documents/forms/lease documents.
    • Establish and maintain proper controls to track expirations, renewal options, and all other critical dates.
    • Perform other duties and special projects as assigned.

    Qualifications:

    • 5-6 years relevant accounting experience.
    • Knowledge of inventory and cycle counting preferred.
    • Quickbooks, Excel, Word experience desired.
    • Organized, with excellent attention to detail, meticulous, quick-learner.
    • Good interpersonal skills and a team player.
    • Flexibility and ability to adapt and wear different hats.

    Backblaze Employees Have:

    • Good attitude and willingness to do whatever it takes to get the job done.
    • Strong desire to work for a small, fast-paced company.
    • Desire to learn and adapt to rapidly changing technologies and work environment.
    • Comfortable with well-behaved pets in the office.

    This position is located in San Mateo, California. Regular attendance in the office is expected. Backblaze is an Equal Opportunity Employer and we offer competitive salary and benefits, including our no policy vacation policy.

    If This Sounds Like You:
    Send an email to [email protected] with:

    1. Fixed Asset Accountant in the subject line
    2. Your resume attached
    3. An overview of your relevant experience

    The post Wanted: Fixed Assets Accountant appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

    AWS Online Tech Talks – January 2018

    Post Syndicated from Ana Visneski original https://aws.amazon.com/blogs/aws/aws-online-tech-talks-january-2018/

    Happy New Year! Kick of 2018 right by expanding your AWS knowledge with a great batch of new Tech Talks. We’re covering some of the biggest launches from re:Invent including Amazon Neptune, Amazon Rekognition Video, AWS Fargate, AWS Cloud9, Amazon Kinesis Video Streams, AWS PrivateLink, AWS Single-Sign On and more!

    January 2018– Schedule

    Noted below are the upcoming scheduled live, online technical sessions being held during the month of January. Make sure to register ahead of time so you won’t miss out on these free talks conducted by AWS subject matter experts.

    Webinars featured this month are:

    Monday January 22

    Analytics & Big Data
    11:00 AM – 11:45 AM PT Analyze your Data Lake, Fast @ Any Scale  Lvl 300

    Database
    01:00 PM – 01:45 PM PT Deep Dive on Amazon Neptune Lvl 200

    Tuesday, January 23

    Artificial Intelligence
    9:00 AM – 09:45 AM PT  How to get the most out of Amazon Rekognition Video, a deep learning based video analysis service Lvl 300

    Containers

    11:00 AM – 11:45 AM Introducing AWS Fargate Lvl 200

    Serverless
    01:00 PM – 02:00 PM PT Overview of Serverless Application Deployment Patterns Lvl 400

    Wednesday, January 24

    DevOps
    09:00 AM – 09:45 AM PT Introducing AWS Cloud9  Lvl 200

    Analytics & Big Data
    11:00 AM – 11:45 AM PT Deep Dive: Amazon Kinesis Video Streams
    Lvl 300
    Database
    01:00 PM – 01:45 PM PT Introducing Amazon Aurora with PostgreSQL Compatibility Lvl 200

    Thursday, January 25

    Artificial Intelligence
    09:00 AM – 09:45 AM PT Introducing Amazon SageMaker Lvl 200

    Mobile
    11:00 AM – 11:45 AM PT Ionic and React Hybrid Web/Native Mobile Applications with Mobile Hub Lvl 200

    IoT
    01:00 PM – 01:45 PM PT Connected Product Development: Secure Cloud & Local Connectivity for Microcontroller-based Devices Lvl 200

    Monday, January 29

    Enterprise
    11:00 AM – 11:45 AM PT Enterprise Solutions Best Practices 100 Achieving Business Value with AWS Lvl 100

    Compute
    01:00 PM – 01:45 PM PT Introduction to Amazon Lightsail Lvl 200

    Tuesday, January 30

    Security, Identity & Compliance
    09:00 AM – 09:45 AM PT Introducing Managed Rules for AWS WAF Lvl 200

    Storage
    11:00 AM – 11:45 AM PT  Improving Backup & DR – AWS Storage Gateway Lvl 300

    Compute
    01:00 PM – 01:45 PM PT  Introducing the New Simplified Access Model for EC2 Spot Instances Lvl 200

    Wednesday, January 31

    Networking
    09:00 AM – 09:45 AM PT  Deep Dive on AWS PrivateLink Lvl 300

    Enterprise
    11:00 AM – 11:45 AM PT Preparing Your Team for a Cloud Transformation Lvl 200

    Compute
    01:00 PM – 01:45 PM PT  The Nitro Project: Next-Generation EC2 Infrastructure Lvl 300

    Thursday, February 1

    Security, Identity & Compliance
    09:00 AM – 09:45 AM PT  Deep Dive on AWS Single Sign-On Lvl 300

    Storage
    11:00 AM – 11:45 AM PT How to Build a Data Lake in Amazon S3 & Amazon Glacier Lvl 300

    Combine Transactional and Analytical Data Using Amazon Aurora and Amazon Redshift

    Post Syndicated from Re Alvarez-Parmar original https://aws.amazon.com/blogs/big-data/combine-transactional-and-analytical-data-using-amazon-aurora-and-amazon-redshift/

    A few months ago, we published a blog post about capturing data changes in an Amazon Aurora database and sending it to Amazon Athena and Amazon QuickSight for fast analysis and visualization. In this post, I want to demonstrate how easy it can be to take the data in Aurora and combine it with data in Amazon Redshift using Amazon Redshift Spectrum.

    With Amazon Redshift, you can build petabyte-scale data warehouses that unify data from a variety of internal and external sources. Because Amazon Redshift is optimized for complex queries (often involving multiple joins) across large tables, it can handle large volumes of retail, inventory, and financial data without breaking a sweat.

    In this post, we describe how to combine data in Aurora in Amazon Redshift. Here’s an overview of the solution:

    • Use AWS Lambda functions with Amazon Aurora to capture data changes in a table.
    • Save data in an Amazon S3
    • Query data using Amazon Redshift Spectrum.

    We use the following services:

    Serverless architecture for capturing and analyzing Aurora data changes

    Consider a scenario in which an e-commerce web application uses Amazon Aurora for a transactional database layer. The company has a sales table that captures every single sale, along with a few corresponding data items. This information is stored as immutable data in a table. Business users want to monitor the sales data and then analyze and visualize it.

    In this example, you take the changes in data in an Aurora database table and save it in Amazon S3. After the data is captured in Amazon S3, you combine it with data in your existing Amazon Redshift cluster for analysis.

    By the end of this post, you will understand how to capture data events in an Aurora table and push them out to other AWS services using AWS Lambda.

    The following diagram shows the flow of data as it occurs in this tutorial:

    The starting point in this architecture is a database insert operation in Amazon Aurora. When the insert statement is executed, a custom trigger calls a Lambda function and forwards the inserted data. Lambda writes the data that it received from Amazon Aurora to a Kinesis data delivery stream. Kinesis Data Firehose writes the data to an Amazon S3 bucket. Once the data is in an Amazon S3 bucket, it is queried in place using Amazon Redshift Spectrum.

    Creating an Aurora database

    First, create a database by following these steps in the Amazon RDS console:

    1. Sign in to the AWS Management Console, and open the Amazon RDS console.
    2. Choose Launch a DB instance, and choose Next.
    3. For Engine, choose Amazon Aurora.
    4. Choose a DB instance class. This example uses a small, since this is not a production database.
    5. In Multi-AZ deployment, choose No.
    6. Configure DB instance identifier, Master username, and Master password.
    7. Launch the DB instance.

    After you create the database, use MySQL Workbench to connect to the database using the CNAME from the console. For information about connecting to an Aurora database, see Connecting to an Amazon Aurora DB Cluster.

    The following screenshot shows the MySQL Workbench configuration:

    Next, create a table in the database by running the following SQL statement:

    Create Table
    CREATE TABLE Sales (
    InvoiceID int NOT NULL AUTO_INCREMENT,
    ItemID int NOT NULL,
    Category varchar(255),
    Price double(10,2), 
    Quantity int not NULL,
    OrderDate timestamp,
    DestinationState varchar(2),
    ShippingType varchar(255),
    Referral varchar(255),
    PRIMARY KEY (InvoiceID)
    )

    You can now populate the table with some sample data. To generate sample data in your table, copy and run the following script. Ensure that the highlighted (bold) variables are replaced with appropriate values.

    #!/usr/bin/python
    import MySQLdb
    import random
    import datetime
    
    db = MySQLdb.connect(host="AURORA_CNAME",
                         user="DBUSER",
                         passwd="DBPASSWORD",
                         db="DB")
    
    states = ("AL","AK","AZ","AR","CA","CO","CT","DE","FL","GA","HI","ID","IL","IN",
    "IA","KS","KY","LA","ME","MD","MA","MI","MN","MS","MO","MT","NE","NV","NH","NJ",
    "NM","NY","NC","ND","OH","OK","OR","PA","RI","SC","SD","TN","TX","UT","VT","VA",
    "WA","WV","WI","WY")
    
    shipping_types = ("Free", "3-Day", "2-Day")
    
    product_categories = ("Garden", "Kitchen", "Office", "Household")
    referrals = ("Other", "Friend/Colleague", "Repeat Customer", "Online Ad")
    
    for i in range(0,10):
        item_id = random.randint(1,100)
        state = states[random.randint(0,len(states)-1)]
        shipping_type = shipping_types[random.randint(0,len(shipping_types)-1)]
        product_category = product_categories[random.randint(0,len(product_categories)-1)]
        quantity = random.randint(1,4)
        referral = referrals[random.randint(0,len(referrals)-1)]
        price = random.randint(1,100)
        order_date = datetime.date(2016,random.randint(1,12),random.randint(1,30)).isoformat()
    
        data_order = (item_id, product_category, price, quantity, order_date, state,
        shipping_type, referral)
    
        add_order = ("INSERT INTO Sales "
                       "(ItemID, Category, Price, Quantity, OrderDate, DestinationState, \
                       ShippingType, Referral) "
                       "VALUES (%s, %s, %s, %s, %s, %s, %s, %s)")
    
        cursor = db.cursor()
        cursor.execute(add_order, data_order)
    
        db.commit()
    
    cursor.close()
    db.close() 

    The following screenshot shows how the table appears with the sample data:

    Sending data from Amazon Aurora to Amazon S3

    There are two methods available to send data from Amazon Aurora to Amazon S3:

    • Using a Lambda function
    • Using SELECT INTO OUTFILE S3

    To demonstrate the ease of setting up integration between multiple AWS services, we use a Lambda function to send data to Amazon S3 using Amazon Kinesis Data Firehose.

    Alternatively, you can use a SELECT INTO OUTFILE S3 statement to query data from an Amazon Aurora DB cluster and save it directly in text files that are stored in an Amazon S3 bucket. However, with this method, there is a delay between the time that the database transaction occurs and the time that the data is exported to Amazon S3 because the default file size threshold is 6 GB.

    Creating a Kinesis data delivery stream

    The next step is to create a Kinesis data delivery stream, since it’s a dependency of the Lambda function.

    To create a delivery stream:

    1. Open the Kinesis Data Firehose console
    2. Choose Create delivery stream.
    3. For Delivery stream name, type AuroraChangesToS3.
    4. For Source, choose Direct PUT.
    5. For Record transformation, choose Disabled.
    6. For Destination, choose Amazon S3.
    7. In the S3 bucket drop-down list, choose an existing bucket, or create a new one.
    8. Enter a prefix if needed, and choose Next.
    9. For Data compression, choose GZIP.
    10. In IAM role, choose either an existing role that has access to write to Amazon S3, or choose to generate one automatically. Choose Next.
    11. Review all the details on the screen, and choose Create delivery stream when you’re finished.

     

    Creating a Lambda function

    Now you can create a Lambda function that is called every time there is a change that needs to be tracked in the database table. This Lambda function passes the data to the Kinesis data delivery stream that you created earlier.

    To create the Lambda function:

    1. Open the AWS Lambda console.
    2. Ensure that you are in the AWS Region where your Amazon Aurora database is located.
    3. If you have no Lambda functions yet, choose Get started now. Otherwise, choose Create function.
    4. Choose Author from scratch.
    5. Give your function a name and select Python 3.6 for Runtime
    6. Choose and existing or create a new Role, the role would need to have access to call firehose:PutRecord
    7. Choose Next on the trigger selection screen.
    8. Paste the following code in the code window. Change the stream_name variable to the Kinesis data delivery stream that you created in the previous step.
    9. Choose File -> Save in the code editor and then choose Save.
    import boto3
    import json
    
    firehose = boto3.client('firehose')
    stream_name = ‘AuroraChangesToS3’
    
    
    def Kinesis_publish_message(event, context):
        
        firehose_data = (("%s,%s,%s,%s,%s,%s,%s,%s\n") %(event['ItemID'], 
        event['Category'], event['Price'], event['Quantity'],
        event['OrderDate'], event['DestinationState'], event['ShippingType'], 
        event['Referral']))
        
        firehose_data = {'Data': str(firehose_data)}
        print(firehose_data)
        
        firehose.put_record(DeliveryStreamName=stream_name,
        Record=firehose_data)

    Note the Amazon Resource Name (ARN) of this Lambda function.

    Giving Aurora permissions to invoke a Lambda function

    To give Amazon Aurora permissions to invoke a Lambda function, you must attach an IAM role with appropriate permissions to the cluster. For more information, see Invoking a Lambda Function from an Amazon Aurora DB Cluster.

    Once you are finished, the Amazon Aurora database has access to invoke a Lambda function.

    Creating a stored procedure and a trigger in Amazon Aurora

    Now, go back to MySQL Workbench, and run the following command to create a new stored procedure. When this stored procedure is called, it invokes the Lambda function you created. Change the ARN in the following code to your Lambda function’s ARN.

    DROP PROCEDURE IF EXISTS CDC_TO_FIREHOSE;
    DELIMITER ;;
    CREATE PROCEDURE CDC_TO_FIREHOSE (IN ItemID VARCHAR(255), 
    									IN Category varchar(255), 
    									IN Price double(10,2),
                                        IN Quantity int(11),
                                        IN OrderDate timestamp,
                                        IN DestinationState varchar(2),
                                        IN ShippingType varchar(255),
                                        IN Referral  varchar(255)) LANGUAGE SQL 
    BEGIN
      CALL mysql.lambda_async('arn:aws:lambda:us-east-1:XXXXXXXXXXXXX:function:CDCFromAuroraToKinesis', 
         CONCAT('{ "ItemID" : "', ItemID, 
                '", "Category" : "', Category,
                '", "Price" : "', Price,
                '", "Quantity" : "', Quantity, 
                '", "OrderDate" : "', OrderDate, 
                '", "DestinationState" : "', DestinationState, 
                '", "ShippingType" : "', ShippingType, 
                '", "Referral" : "', Referral, '"}')
         );
    END
    ;;
    DELIMITER ;

    Create a trigger TR_Sales_CDC on the Sales table. When a new record is inserted, this trigger calls the CDC_TO_FIREHOSE stored procedure.

    DROP TRIGGER IF EXISTS TR_Sales_CDC;
     
    DELIMITER ;;
    CREATE TRIGGER TR_Sales_CDC
      AFTER INSERT ON Sales
      FOR EACH ROW
    BEGIN
      SELECT  NEW.ItemID , NEW.Category, New.Price, New.Quantity, New.OrderDate
      , New.DestinationState, New.ShippingType, New.Referral
      INTO @ItemID , @Category, @Price, @Quantity, @OrderDate
      , @DestinationState, @ShippingType, @Referral;
      CALL  CDC_TO_FIREHOSE(@ItemID , @Category, @Price, @Quantity, @OrderDate
      , @DestinationState, @ShippingType, @Referral);
    END
    ;;
    DELIMITER ;

    If a new row is inserted in the Sales table, the Lambda function that is mentioned in the stored procedure is invoked.

    Verify that data is being sent from the Lambda function to Kinesis Data Firehose to Amazon S3 successfully. You might have to insert a few records, depending on the size of your data, before new records appear in Amazon S3. This is due to Kinesis Data Firehose buffering. To learn more about Kinesis Data Firehose buffering, see the “Amazon S3” section in Amazon Kinesis Data Firehose Data Delivery.

    Every time a new record is inserted in the sales table, a stored procedure is called, and it updates data in Amazon S3.

    Querying data in Amazon Redshift

    In this section, you use the data you produced from Amazon Aurora and consume it as-is in Amazon Redshift. In order to allow you to process your data as-is, where it is, while taking advantage of the power and flexibility of Amazon Redshift, you use Amazon Redshift Spectrum. You can use Redshift Spectrum to run complex queries on data stored in Amazon S3, with no need for loading or other data prep.

    Just create a data source and issue your queries to your Amazon Redshift cluster as usual. Behind the scenes, Redshift Spectrum scales to thousands of instances on a per-query basis, ensuring that you get fast, consistent performance even as your dataset grows to beyond an exabyte! Being able to query data that is stored in Amazon S3 means that you can scale your compute and your storage independently. You have the full power of the Amazon Redshift query model and all the reporting and business intelligence tools at your disposal. Your queries can reference any combination of data stored in Amazon Redshift tables and in Amazon S3.

    Redshift Spectrum supports open, common data types, including CSV/TSV, Apache Parquet, SequenceFile, and RCFile. Files can be compressed using gzip or Snappy, with other data types and compression methods in the works.

    First, create an Amazon Redshift cluster. Follow the steps in Launch a Sample Amazon Redshift Cluster.

    Next, create an IAM role that has access to Amazon S3 and Athena. By default, Amazon Redshift Spectrum uses the Amazon Athena data catalog. Your cluster needs authorization to access your external data catalog in AWS Glue or Athena and your data files in Amazon S3.

    In the demo setup, I attached AmazonS3FullAccess and AmazonAthenaFullAccess. In a production environment, the IAM roles should follow the standard security of granting least privilege. For more information, see IAM Policies for Amazon Redshift Spectrum.

    Attach the newly created role to the Amazon Redshift cluster. For more information, see Associate the IAM Role with Your Cluster.

    Next, connect to the Amazon Redshift cluster, and create an external schema and database:

    create external schema if not exists spectrum_schema
    from data catalog 
    database 'spectrum_db' 
    region 'us-east-1'
    IAM_ROLE 'arn:aws:iam::XXXXXXXXXXXX:role/RedshiftSpectrumRole'
    create external database if not exists;

    Don’t forget to replace the IAM role in the statement.

    Then create an external table within the database:

     CREATE EXTERNAL TABLE IF NOT EXISTS spectrum_schema.ecommerce_sales(
      ItemID int,
      Category varchar,
      Price DOUBLE PRECISION,
      Quantity int,
      OrderDate TIMESTAMP,
      DestinationState varchar,
      ShippingType varchar,
      Referral varchar)
    ROW FORMAT DELIMITED
          FIELDS TERMINATED BY ','
    LINES TERMINATED BY '\n'
    LOCATION 's3://{BUCKET_NAME}/CDC/'

    Query the table, and it should contain data. This is a fact table.

    select top 10 * from spectrum_schema.ecommerce_sales

     

    Next, create a dimension table. For this example, we create a date/time dimension table. Create the table:

    CREATE TABLE date_dimension (
      d_datekey           integer       not null sortkey,
      d_dayofmonth        integer       not null,
      d_monthnum          integer       not null,
      d_dayofweek                varchar(10)   not null,
      d_prettydate        date       not null,
      d_quarter           integer       not null,
      d_half              integer       not null,
      d_year              integer       not null,
      d_season            varchar(10)   not null,
      d_fiscalyear        integer       not null)
    diststyle all;

    Populate the table with data:

    copy date_dimension from 's3://reparmar-lab/2016dates' 
    iam_role 'arn:aws:iam::XXXXXXXXXXXX:role/redshiftspectrum'
    DELIMITER ','
    dateformat 'auto';

    The date dimension table should look like the following:

    Querying data in local and external tables using Amazon Redshift

    Now that you have the fact and dimension table populated with data, you can combine the two and run analysis. For example, if you want to query the total sales amount by weekday, you can run the following:

    select sum(quantity*price) as total_sales, date_dimension.d_season
    from spectrum_schema.ecommerce_sales 
    join date_dimension on spectrum_schema.ecommerce_sales.orderdate = date_dimension.d_prettydate 
    group by date_dimension.d_season

    You get the following results:

    Similarly, you can replace d_season with d_dayofweek to get sales figures by weekday:

    With Amazon Redshift Spectrum, you pay only for the queries you run against the data that you actually scan. We encourage you to use file partitioning, columnar data formats, and data compression to significantly minimize the amount of data scanned in Amazon S3. This is important for data warehousing because it dramatically improves query performance and reduces cost.

    Partitioning your data in Amazon S3 by date, time, or any other custom keys enables Amazon Redshift Spectrum to dynamically prune nonrelevant partitions to minimize the amount of data processed. If you store data in a columnar format, such as Parquet, Amazon Redshift Spectrum scans only the columns needed by your query, rather than processing entire rows. Similarly, if you compress your data using one of the supported compression algorithms in Amazon Redshift Spectrum, less data is scanned.

    Analyzing and visualizing Amazon Redshift data in Amazon QuickSight

    Modify the Amazon Redshift security group to allow an Amazon QuickSight connection. For more information, see Authorizing Connections from Amazon QuickSight to Amazon Redshift Clusters.

    After modifying the Amazon Redshift security group, go to Amazon QuickSight. Create a new analysis, and choose Amazon Redshift as the data source.

    Enter the database connection details, validate the connection, and create the data source.

    Choose the schema to be analyzed. In this case, choose spectrum_schema, and then choose the ecommerce_sales table.

    Next, we add a custom field for Total Sales = Price*Quantity. In the drop-down list for the ecommerce_sales table, choose Edit analysis data sets.

    On the next screen, choose Edit.

    In the data prep screen, choose New Field. Add a new calculated field Total Sales $, which is the product of the Price*Quantity fields. Then choose Create. Save and visualize it.

    Next, to visualize total sales figures by month, create a graph with Total Sales on the x-axis and Order Data formatted as month on the y-axis.

    After you’ve finished, you can use Amazon QuickSight to add different columns from your Amazon Redshift tables and perform different types of visualizations. You can build operational dashboards that continuously monitor your transactional and analytical data. You can publish these dashboards and share them with others.

    Final notes

    Amazon QuickSight can also read data in Amazon S3 directly. However, with the method demonstrated in this post, you have the option to manipulate, filter, and combine data from multiple sources or Amazon Redshift tables before visualizing it in Amazon QuickSight.

    In this example, we dealt with data being inserted, but triggers can be activated in response to an INSERT, UPDATE, or DELETE trigger.

    Keep the following in mind:

    • Be careful when invoking a Lambda function from triggers on tables that experience high write traffic. This would result in a large number of calls to your Lambda function. Although calls to the lambda_async procedure are asynchronous, triggers are synchronous.
    • A statement that results in a large number of trigger activations does not wait for the call to the AWS Lambda function to complete. But it does wait for the triggers to complete before returning control to the client.
    • Similarly, you must account for Amazon Kinesis Data Firehose limits. By default, Kinesis Data Firehose is limited to a maximum of 5,000 records/second. For more information, see Monitoring Amazon Kinesis Data Firehose.

    In certain cases, it may be optimal to use AWS Database Migration Service (AWS DMS) to capture data changes in Aurora and use Amazon S3 as a target. For example, AWS DMS might be a good option if you don’t need to transform data from Amazon Aurora. The method used in this post gives you the flexibility to transform data from Aurora using Lambda before sending it to Amazon S3. Additionally, the architecture has the benefits of being serverless, whereas AWS DMS requires an Amazon EC2 instance for replication.

    For design considerations while using Redshift Spectrum, see Using Amazon Redshift Spectrum to Query External Data.

    If you have questions or suggestions, please comment below.


    Additional Reading

    If you found this post useful, be sure to check out Capturing Data Changes in Amazon Aurora Using AWS Lambda and 10 Best Practices for Amazon Redshift Spectrum


    About the Authors

    Re Alvarez-Parmar is a solutions architect for Amazon Web Services. He helps enterprises achieve success through technical guidance and thought leadership. In his spare time, he enjoys spending time with his two kids and exploring outdoors.