CohnReznick Automates Claim Validation Workflow Using AWS AI Services

Post Syndicated from Rajeswari Malladi original https://aws.amazon.com/blogs/architecture/cohnreznick-automates-claim-validation-workflow-using-aws-ai-services/

This post was co-written by Winn Oo and Brendan Byam of CohnReznick and Rajeswari Malladi and Shanthan Kesharaju

CohnReznick is a leading advisory, assurance, and tax firm serving clients around the world. CohnReznick’s government and public sector practice provides claims audit and verification services for state agencies. This process begins with recipients submitting documentation as proof of their claim expenses. The supporting documentation often contains hundreds of filled-out or scanned (sometimes handwritten) PDFs, MS Word files, Excel spreadsheets, and/or pictures, along with a summary form outlining each of the claimed expenses.

Prior to automation with AWS artificial intelligence (AI) services, CohnReznick’s data extraction and validation process was performed manually. Audit professionals had to extract each data point from the submitted documentation, select a population sample for testing, and manually search the documentation for any pages or page sections that validated the information submitted. Validated data points and proof of evidence pages were then packaged into a single document and submitted for claim expense reimbursement.

In this blog post, we’ll show you how CohnReznick implemented Amazon Textract, Amazon Comprehend (with a custom machine learning classification model), and Amazon Augmented AI (Amazon A2I). With this solution, CohnReznick automated nearly 40% of the total claim verification process with focus on data extraction and package creation. This resulted in an estimated cost savings of $500k per year for each project and process.

Automating document processing workflow

Figure 1 shows the newly automated process. Submitted documentation is processed by Amazon Textract, which extracts text from the documents. This text is then submitted to Amazon Comprehend, which employs a custom classification model to classify the documents as claim summaries or evidence documents. All data points are collected from the Amazon Textract output of the claim summary documents. These data points are then validated against the evidence documents.

Finally, a population sample of the extracted data points is selected for testing. Rather than auditors manually searching for specific information in the documentation, the automated process conducts the data search, extracts the validated evidence pages from submitted documentation, and generates the audited package, which can then be submitted for reimbursement.

Architecture diagram

Figure 1. Architecture diagram

Components in the solution

At a high level, the end-to-end process starts with auditors using a proprietary web application to submit the documentation received for each case to the document processing workflow. The workflow includes three stages, as described in the following sections.

Text extraction

First, the process extracts the text from the submitted documents using the following steps:

  1. For each case, the CohnReznick proprietary web application uploads the documents to the Amazon Simple Storage Service (Amazon S3) upload bucket. Each file has a unique name, and the files have metadata that associates them with the parent case.
  2. The uploaded documents Amazon Simple Queue Service (Amazon SQS) queue is configured to receive notifications for all new objects added to the upload bucket. For every new document added to the upload bucket, Amazon S3 sends a notification to the uploaded documents queue.
  3. The text extraction AWS Lambda function runs every 5 minutes to poll the uploaded documents queue for new messages.
  4. For each message in the uploaded documents queue, the text extraction function submits an Amazon Textract job to process the document asynchronously. This continues until it reaches a predefined maximum allowed limit of concurrent jobs for that AWS account. Concurrency control is implemented by handling LimitExceededException on StartDocumentAnalysis API call.
  5. After Amazon Textract finishes processing a document, it sends a completion notification to a completed jobs Amazon Simple Notification Service (Amazon SNS) topic.
  6. A process job results Lambda function is subscribed to the completed jobs topic and receives a notification for every completed message sent to the completed jobs topic.
  7. The process job results function then fetches document extraction results from Amazon Textract.
  8. The process job results function stores the document extraction results in the Amazon Textract output bucket.

Documents classification

Next, the process classifies the documents. The submitted claim documents can consist of up to seven supporting document types. The documents need to be classified into the respective categories. They are primarily classified using automation. Any documents classified with a low confidence score are sent to a human review workflow.

Classification model creation 

The custom classification feature of Amazon Comprehend is used to build a custom model to classify documents into the seven different document types as required by the business process. The model is trained by providing sample data in CSV format. Amazon Comprehend uses multiple algorithms in the training process and picks the model that delivers the highest accuracy for the training data.

Classification model invocation and processing

The automated document classification uses the trained model and the classification consists of the following steps:

  1. The business logic in the process job results Lambda function determines text extraction completion for all documents for each case. It then calls the StartDocumentClassificationJob operation on the custom classifier model to start classifying unlabeled documents.
  2. The document classification results from the custom classifier are returned as a single output.tar.gz file in the comprehend results S3 bucket.
  3. At this point, the check confidence scores Lambda function is invoked, which processes the classification results.
  4. The check confidence scores function reviews the confidence scores of classified documents. The results for documents with high confidence scores are saved to the classification results table in Amazon DynamoDB.

Human review

The documents from the automated classification that have low confidence scores are classified using human review with the following steps:

  1. The check confidence scores Lambda function invokes human review with Amazon Augmented AI for documents with low confidence scores. Amazon A2I is a ready-to-use workflow service for human review of machine learning predictions.
  2. The check confidence scores Lambda function creates human review tasks for each document with a low confidence score. Humans assigned to the classification jobs log into the human review portal and either approve the classification done by the model or reclassify the text with the right labels.
  3. The results from human review are placed in the A2I results bucket.
  4. The update results Lambda function is invoked to process results from the human review.
  5. Finally, the update results function writes the human review document classification results to the classification results table in DynamoDB.

Additional processes

Documents workflow status capturing

The Lambda functions throughout the workflow update the status of their processing and document/case details in the workflow status table in DynamoDB. The auditor that submitted the case documents will know the status of the workflow of their submitted case using the data in workflow status table.

Search and package creation

When the processing is complete for a case, auditors perform the final review and submit the generated packet for downstream processing.

  1. The web application uses AWS SDK for Java to integrate with the Textract output S3 bucket that has the document extraction results and classification results table in DynamoDB with classification results. This data is used for the search and package creation process.

Purge data process

After the package creation is complete, the auditor can purge all data in the workflow.

  1. Using the AWS SDK, the data is purged from the S3 buckets and DynamoDB tables.

Conclusion

As seen in this blog post, Amazon Textract, Amazon Comprehend, and Amazon A2I for human review work together with Amazon S3, DynamoDB, and Lambda services. These services have helped CohnReznick automate nearly 40% of their total claim verification process with focus on data extraction and package creation.

You can achieve similar efficiencies and increase scalability by automating your business processes. Get started today by reading additional user stories and using the resources on automated document processing.

Security updates for Friday

Post Syndicated from jake original https://lwn.net/Articles/864158/rss

Security updates have been issued by Arch Linux (chromium, curl, impacket, jdk11-openjdk, jre-openjdk, jre-openjdk-headless, jre11-openjdk-headless, kernel, lib32-curl, lib32-libcurl-compat, lib32-libcurl-gnutls, libcurl-compat, libcurl-gnutls, libpano13, linux-hardened, linux-lts, linux-zen, nvidia-utils, opera, systemd, and virtualbox), CentOS (java-11-openjdk and kernel), Debian (lemonldap-ng), Fedora (curl and podman), Gentoo (icedtea-web and velocity), openSUSE (bluez, go1.15, go1.16, kernel, thunderbird, transfig, and wireshark), Oracle (java-1.8.0-openjdk, java-11-openjdk, kernel, and kernel-container), SUSE (bluez, curl, kernel, qemu, thunderbird, transfig, and wireshark), and Ubuntu (curl).

Commercial Location Data Used to Out Priest

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2021/07/commercial-location-data-used-to-out-priest.html

A Catholic priest was outed through commercially available surveillance data. Vice has a good analysis:

The news starkly demonstrates not only the inherent power of location data, but how the chance to wield that power has trickled down from corporations and intelligence agencies to essentially any sort of disgruntled, unscrupulous, or dangerous individual. A growing market of data brokers that collect and sell data from countless apps has made it so that anyone with a bit of cash and effort can figure out which phone in a so-called anonymized dataset belongs to a target, and abuse that information.

There is a whole industry devoted to re-identifying anonymized data. This was something that Snowden showed that the NSA could do. Now it’s available to everyone.

AWS’s Egregious Egress

Post Syndicated from Matthew Prince original https://blog.cloudflare.com/aws-egregious-egress/

AWS’s Egregious Egress

AWS’s Egregious Egress

When web hosting services first emerged in the mid-1990s, you paid for everything on a separate meter: bandwidth, storage, CPU, and memory. Over time, customers grew to hate the nickel-and-dime nature of these fees. The market evolved to a fixed-fee model. Then came Amazon Web Services.

AWS was a huge step forward in terms of flexibility and scalability, but a massive step backward in terms of pricing. Nowhere is that more apparent than with their data transfer (bandwidth) pricing. If you look at the (ironically named) AWS Simple Monthly Calculator you can calculate the price they charge for bandwidth for their typical customer. The price varies by region, which shouldn’t surprise you because the cost of transit is dramatically different in different parts of the world.

Charging for Stocks, Paying for Flows

AWS charges customers based on the amount of data delivered — 1 terabyte (TB) per month, for example. To visualize that, imagine data is water. AWS fills a bucket full of water and then charges you based on how much water is in the bucket. This is known as charging based on “stocks.”

On the other hand, AWS pays for bandwidth based on the capacity of their network. The base unit of wholesale bandwidth is priced as one Megabit per second per month (1 Mbps). Typically, a provider like AWS, will pay for bandwidth on a monthly fee based on the number of Mbps that their network uses at its peak capacity. So, extending the analogy, AWS doesn’t pay for the amount of water that ends up in their customers’ buckets, but rather the capacity based on the diameter of the “hose” that is used to fill them. This is known as paying for “flows.”

Translating Flows to Stocks

You can translate between flow and stock pricing by knowing that a 1 Mbps connection (think of it as the "hose") can transfer 0.3285 TB (328GB) if utilized to its fullest capacity over the course of a month (think of it as running the "hose" at full capacity to fill the "bucket" for a month).1 AWS obviously has more than 1 Mbps of capacity — they can certainly transfer more than 0.3285 TB per month — but you can use this as the base unit of their bandwidth costs, and compare it against what they charge a customer to deliver 1 Terabyte (1TB), in order to figure out the AWS bandwidth markup.

One more subtlety to be as accurate as possible. Wholesale bandwidth is also billed at the 95th percentile. That effectively cuts off the peak hour or so of use every day. That means a 1 Mbps connection running at 100% can actually likely transfer closer to 0.3458 TB (346GB) per month.

Two more factors are important: utilization and regional costs. AWS can’t run all their connections at 100% utilization 24×7 for a month. Instead, they’ll have some average utilization per transit connection in any month. It’s reasonable to estimate that they likely run at between 20% and 40% average utilization. That would be a typical average utilization range for the industry. The higher their utilization, the more efficient they are, the lower their costs, and the higher their effective customer markup will be.

To be conservative, we’ve assumed that AWS’s average utilization is the bottom of that range (20%), but you can download the raw data and adjust the assumptions however you think makes sense.

We have a good sense of the wholesale prices of bandwidth in different regions around the world based on what Cloudflare sees in the market when we buy bandwidth ourselves. We’d imagine AWS gets at least as good of pricing as we do. We’ve included a rough estimate of these prices in the calculation, rounding up on the wholesale price wherever there was a question (which makes AWS look better).

Massive Markups

Based on these assumptions, here’s our best estimate of AWS’s effective markup for egress bandwidth on a per-region basis.

AWS’s Egregious Egress
AWS’s Egregious Egress

Don’t rest easy, South Korea with your merely 357% markup. The general rule of thumb appears to be that the older a market is, the more Amazon wrings from its customers in egregious egress markups — and the Seoul availability zone is only a bit over four years old. Winter, unfortunately, inevitably seems to come to AWS customers.

AWS Stands Alone In Not Passing On Savings to Customers

Remember, this is for the transit bandwidth that AWS is paying for. For the bandwidth that they exchange with a network like Cloudflare, where they are directly connected (settlement-free peered) over a private network interface (PNI), there are no meaningful incremental costs and their effective margins are nearly infinite. Add in the effect of rebates Amazon collects from colocation providers who charge cross connect fees to customers, and the effective markup is likely even higher.

Some other cloud providers take into account that their costs are lower when passing over peering connections. Both Microsoft Azure and Google Cloud will substantially discount egress charges for their mutual Cloudflare customers. Members of the Bandwidth Alliance — Alibaba, Automattic, Backblaze, Cherry Servers, Dataspace, DNS Networks, DreamHost, HEFICED, Kingsoft Cloud, Liquid Web, Scalway, Tencent, Vapor, Vultr, Wasabi, and Zenlayer — waive bandwidth charges for mutual Cloudflare customers.

AWS’s Egregious Egress

At this point, the majority of hosting providers in the industry either substantially discount or entirely waive egress fees when sending traffic from their network to a peer like Cloudflare. AWS is the notable exception in the industry. It’s worth noting that we invited AWS to be a part of the Bandwidth Alliance, and they politely declined.

It seems like a no-brainer that if we’re not paying for the bandwidth costs, and the hosting provider isn’t paying for the bandwidth costs, customers shouldn’t be charged for the bandwidth costs at the same rate as if the traffic was being sent over the public Internet. Unfortunately, Amazon’s supposed obsession over doing the right thing for customers doesn’t extend to egress charges.

Artificially Held High

Amazon’s mission statement is: “We strive to offer our customers the lowest possible prices, the best available selection, and the utmost convenience.” And yet, when it comes to egress, their prices are far from the lowest possible.

During the last ten years, industry wholesale transit prices have fallen an average of 23% annually. Compounded over that time, wholesale bandwidth is 93% less expensive than 10 years ago. However, AWS’s egress fees over that same period have fallen by only 25%.

And, since 2018, the egress fees AWS charges in North America and Europe have not dropped a penny even as wholesale prices in those markets over the same time period have fallen by more than half.

AWS’s Hotel California Pricing

Another oddity of AWS’s pricing is that they charge for data transferred out of their network but not for data transferred into their network. If the only time you’ve paid for bandwidth is with your residential Internet connection, then this may make some sense. Because of some technical limitations of the cable network, download bandwidth is typically higher than upload bandwidth on cable modem connections. But that’s not how wholesale bandwidth is bought or sold.

AWS’s Egregious Egress

Wholesale bandwidth isn’t like your home cable connection. Instead, it’s symmetrical. That means that if you purchase a 1 Mbps (1 Megabit per second) connection, then you have the capacity to send 1 Megabit out and receive another 1 Megabit in every second. If you receive 1 Mbps in and simultaneously 1 Mbps out, you pay the same price as if you receive 1 Mbps in and 0 Mbps out or 0 Mbps in and 1 Mbps out. In other words, ingress (data sent to AWS) doesn’t cost them any more or less than egress (data sent from AWS). And yet, they charge customers more to take data out than put it in. It’s a head scratcher.

We’ve tried to be charitable in trying to understand why AWS would charge this way. Disappointingly, there just doesn’t seem to be an innocent explanation. As we dug in, even things like writes versus reads and the wear they put on storage media, as well as the challenges of capacity planning for storage capacity, suggest that AWS should charge less for egress than ingress.

But they don’t.

The only rationale we can reasonably come up with for AWS’s egress pricing: locking customers into their cloud, and making it prohibitively expensive to get customer data back out. So much for being customer-first.

But… But… But…

AWS may object that this doesn’t take into account the cost of things like metro dark fiber between data centers, amortized optical and other networking equipment, and cross connects. In our experience, those costs amount to a rounding error of less than one cent per Mbps when operating at AWS-like scale. And these prices have been falling at a similar rate to the decline in the price of bandwidth over the past 10 years. Yet AWS’s egress prices have barely budged.

All the data above is derived from what’s published on AWS’s simple pricing calculator. There’s no doubt that some large customers are able to negotiate lower prices. But these are the prices charged to small businesses and startups by default. And, when we’ve reviewed pricing even with large AWS customers, the egress fees remain egregious.

It’s Not Too Late!

We have a lot of mutual customers who use Cloudflare and AWS. They’re a great service, and we want to support our mutual customers and provide services in a way that meets their needs and is always as secure, fast, reliable, and efficient as possible. We remain hopeful that AWS will do the right thing, lower their egress fees, join the Bandwidth Alliance — following the lead of the majority of the rest of the hosting industry — and pass along savings from peering with Cloudflare and other networks to all their customers.

AWS’s Egregious Egress

…….
1Here’s the calculation to convert a 1 Mbps flow into TB stocks: 1 Mbps @ 100% for 1 month = (1 million bits per second) * (60 seconds / minute) * (60 minutes / hour) * (730 hours on average/month) divided by (eight bits / byte) divided by 10^12 (to convert bytes to Terabytes) = 0.3285 TB/month.

Empowering customers with the Bandwidth Alliance

Post Syndicated from Arjunan Rajeswaran original https://blog.cloudflare.com/empowering-customers-with-the-bandwidth-alliance/

Empowering customers with the Bandwidth Alliance

High Egress Fees

Empowering customers with the Bandwidth Alliance

Debates over the benefits and drawbacks of walled gardens versus open ecosystems have carried on since the beginnings of the tech industry. As applied to the Internet, we don’t think there’s much to debate. There’s a reason why it’s easier today than ever before to start a company online: open standards. They’ve encouraged a flourishing of technical innovation, made the Internet faster and safer, and easier and less expensive for anyone to have an Internet presence.

Of course, not everyone likes competition. Breaking open standards — with proprietary ones — is a common way to stop competition. In the cloud industry, a more subtle way to gain power over customers and lock them in has emerged. Something that isn’t obvious at the start: high egress fees.

You probably won’t notice them when you embark on your cloud journey. And if you need to bring data into your environment, there’s no data charge. But say you want to get that data out? Or go multi-cloud, and work with another cloud provider who is best-in-class? That’s when the charges start rolling in.

To make matters worse, as the number and diversity of applications in your IT stack increases, the lock-in power of egress fees increases as well. As more data needs to travel between more applications across clouds and there is more data to move to a newer, better cloud service, the egress tax increases, further locking you in. You lose the ability to choose best-of-breed services or to negotiate prices with your provider.

Why We Launched The Bandwidth Alliance

This is not a better Internet. So wherever we can, we’re on the lookout for ways to prevent this from happening — in this case, with our Bandwidth Alliance partners. We launched the Bandwidth Alliance in late 2018 with over fifteen cloud providers who also believe in an open Internet where data can flow freely. In short, partners in the Bandwidth Alliance have agreed to reduce egress fees for data transfer — either in their entirety or at a steep discount.

Empowering customers with the Bandwidth Alliance

How did we do this — the power of Cloudflare’s network

Say you’re hosted in a facility in Ashburn, Virginia and a user visits your service from Sydney, Australia. There is a cost to moving the data between the two places. In this example, a cloud provider would use their own global backbone to carry the traffic across the United States and then across the Pacific, eventually handing it off to the users’ ISP. Someone has to maintain the infrastructure that hauls that traffic more than 9,000 miles from Ashburn to Sydney.

Cloudflare has more than 206 data centers globally in almost every major city. Our network automatically receives traffic at whatever data center is nearest to the user and then carries it to the data center closest to where the origin is hosted.

As part of the Bandwidth Alliance, this traffic is then delivered to the partner data center via private network interconnects (PNI) or peered connections. These PNIs typically occur within the same facility through a fiber optic cable between routers, or via a dedicated connection between two facilities at a very low cost. Unlike when there’s a transit provider in between, there’s no middleman, so neither Cloudflare nor our partners bear incremental costs for transferring the data over this PNI.

Cloudflare is one of the most interconnected networks in the world, peering with over 9,500 networks globally, including major ISPs, cloud providers, and enterprises. Cloudflare is connected with partners in many global regions via Private Interconnections, Internet exchanges with private peering, and via public peering.

Empowering customers with the Bandwidth Alliance

Customer benefit

Since its inception, the Bandwidth Alliance program has provided many customers significant benefits: both in egress cost savings and more importantly, in choice across their needs of compute, storage, and other services. Providing this choice and preventing vendor lock-in has allowed our customers to choose the right product for their use case while benefiting from significant savings.

We looked at a sample set of our customers benefiting from the Bandwidth Alliance and estimated their egress savings based on the amount of data (GB) flowing from the origin to us. We estimated the potential savings using the \$0.08/GB retail price vs. the discounted \$0.04/GB for large amounts of data transferred. Of course, customers could save more by using one of our partners with whom the cost is $0/GB. We compared the savings to the amount of money these customers spend on us. These savings are in the range of 7.5% to 27% or, in other words, for every \$1 spent on Cloudflare customers are saving up to \$0.27 — that is a no-brainer offer to take advantage of.

The Bandwidth Alliance also offers customers the option to choose a cloud that meets their price and feature requirements. For a media delivery use case, choosing the best storage provider and Cloudflare has allowed one customer to save up to 85% in storage costs. Another customer who went with a best-of-breed solution across storage and the Cloudflare global network reduced their overall cloud bill by 50%. Customers appreciate these benefits of choice:

“We were looking at moving our data from one storage provider to another, and it was a total no-brainer to use a service that was part of the Bandwidth Alliance. It really makes sense for anyone looking at any cloud service, especially one that’s pushing a lot of traffic.” — James Ross, Co-founder/CTO, Nodecraft

Earlier this month we made it even easier for our joint customers with Microsoft Azure to realize the benefits of discounted egress to Cloudflare with Microsoft Azure’s Data Transfer Routing Preference. With a few clicks on the Azure dashboard, Cloudflare customer egress bills will automatically be discounted. It’s been exciting to hear positive customer feedback:

“Before taking advantage of the Routing Preference by Azure via Cloudflare, Egress fees were one of the key reasons that restricted us from having more multi-cloud solutions since it can be high and unpredictable at times as the traffic scales. Enabling Routing Preference on the Azure dashboard was quick and easy. It was a one-and-done effort, and we get discounted Egress rates on every Azure bill.”  — Darin MacRae, Chief Architect / Cloud Computing, MyRadar.com

If you’re looking to find the right set of cloud storage, networking and security solutions to meet your needs, consider the Bandwidth Alliance as an alternative to being locked-in to a single platform. We hope it helps.

СРС по дело с паралегална цел

Post Syndicated from nellyo original https://nellyo.wordpress.com/2021/07/23/95-penal-code/

От седмици се обсъжда въпросът има ли незаконно подслушване на участници в протестите през 2020 г.

Атанас Атанасов и Бойко Рашков, служебен вътрешен министър, казват, че има.

Прокуратурата казва, че няма.

Сайтът BIRD публикува прокурорско искане за прилагане на СРС и обосновка за продължаване на прилагането.

Днес в парламента по време на изслушването на министъра на правосъдието и министъра на вътрешните работи беше потвърдено следното:

Прилагане на СРС по отношение на участници в протестите има.

Прокуратурата смята, че прилагането е законно, защото има разрешение от председателя на спецсъда.

Но важното за разказа  е по какво дело са искани СРС – та:  по дело по Особената част на НК, чл.95 – преврат – във вр. с чл.321 НК организирана престъпна група, ОПГ:

с организирането на протестите лицата целят да бъде съборена, подровена или отслабена властта в републиката и извършването на опит за преврат за насилствено завземане на властта в центъра или по места, което съставлява нарушение на чл. 321, ал. 3, вр. ал. 1 от НК, вр. с чл. 95 от НК.

Както днес народни представители изтъкнаха, чл.95 НК не е прилаган от 50-те години на миналия век.

Каква свобода на изразяване, каква свобода на политическото слово. Свободна Европа цитира адв. Екимджиев, според когото

за всеки вещ в наказателното право е ясна паралегалната цел на това безумно досъдебно производство – създаване на формално основание за подслушване, външно наблюдение, арести, претърсвания.

Образува се безумно наказателно производство за опит за преврат, което формално легитимира действията по свръхупотреба на СРС. На формално ниво Сийка Милева е права [че няма данни за незаконно подслушване], защото според прокуратурата ние сме подслушвани, за да бъде предотвратен опит за преврат. Всички ние обаче сме наясно дали това беше опит за преврат или легитимен протест срещу самозабравила се клептокрация и налуден главен прокурор.

Тук не става дума за свобода на медиите, а за нещо базисно – свобода на изразяване. Индексите за 2020 за свобода на изразяване вече са изготвени, късно идва тази информация, за да бъде взета предвид.

Но по отношение на Гешев не е късно да бъде взета предвид.

Медийни предпочитания на българската аудитория в предизборните кампании за парламентарните избори на 4 април и 11 юли 2021

Post Syndicated from nellyo original https://nellyo.wordpress.com/2021/07/23/2021/

На 21 юли 2021 г.  Съветът за електронни медии представи социологическо изследване на тема “Медийни предпочитания на българската аудитория в предизборните кампании за парламентарните избори на 04 април и 11 юли 2021 г.”.

  • Презентация
  • Доклад  – включително от него се виждат зададените въпроси за изясняване откъде се информират хората по време на двете кампании през 2021 г.
Данните бяха представени от Лидия Йорданова, Екзакта. Данните от двете национални проучвания на  Екзакта, проведени в последните седмици на двете кампании, дават възможност да се коментират както по-трайните , така и по-динамичните нагласи, се казва в началото на доклада.
 
Моето мнение, което по технически причини не успях да представя на самото обсъждане, макар да чух представянето почти до края:
 
1. Подобни изследвания се очаква да станат обичайна рутинна дейност и регулаторът да ни информира за потреблението на новини не само по време на предизборна кампания. За сравнение – ако отворим сайта на британския регулатор Офком, ще видим, че той по закон взема evidence based / основани на данни решения.  С поредица от доклади Офком информира за   потреблението на новини в Обединеното кралство и  за отношението на хората към различни видове съдържание на различни платформи.  И нещо повече – Офком съобразява регулирането с констатираните потребителски модели. Дано и българският регулатор се ориентира към evidence based  решения.
 
2. Имам и някои въпроси:
 
  • Каква е била концепцията за качествена журналистика при изготвяне на картата. Проучвано е дали респондентите се интересуват от следните четири характеристики на информацията – обективност, изчерпателност, достоверност и актуалност. Установено е, че три четвърти от респондентите  се интересуват от тези характеристики, а за останалите (които не се интересуват) социолозите установяват, че са с ниско образование и от малцинствата. Качествената журналистика  заслужава специална защита – ето защо по мое мнение подготовката на картата в тази част заслужава специално внимание. Концепцията за качествена журналистика следва да има много по-широко приложение, дори за процеса на финансиране на медиите. 


  • Прави ли се разлика между медийни предпочитания и доверие. Темата на изследването  е “Медийните предпочитания”, а въпросите в изследването са – имате ли доверие, на кого имате по-голямо доверие и пр.  Аз лично смятам, че  няма съвпадение между “източник” (или “медийни предпочитания”)  и “медия, в която имам доверие”. В самото изследване се установява например, че   част от гражданите се информират от няколко медии. Аз  самата се информирам от повече медии, между тях  и медии, в които нямам никакво доверие – но бих отговорила, ако ме попитат, че ги чета.


3. Възможно ли е да се разшири и задълбочи интерпретацията на резултатите.

Числата са важни, но истинската задача на тези изследвания е да позволят анализ отвъд числата. Истинската задача е да се следи дали медиите (тези, които са в обхвата на дейност на СЕМ) осигуряват информиран избор. Особено по време на предизборна кампания. Не просто откъде се информират хората – важно е  да се направи извод дали медийният сектор гарантира информирания избор на гражданите.
 
Като пример вземам въпроса за БНТ като източник на информация (или, както в доклада – според мен спорно – се говори за доверие към БНТ). В доклада се сравнява делът на хората, които се информират от БНТ през април и юли,  с дела на хората, които се информират от частните телевизии през април и юли, и се прави  извод:
  1. През юли намалява с 3 пункта делът на вярващите повече на обществената телевизия БНТ, отколкото на частните телевизии. В същото време с 6 пункта, в сравнение с март, расте делът на имащите повече доверие на частните телевизии, отколкото на БНТ.

Освен резервите ми, че става дума не за доверие, а за потребление, което е нещо различно, смятам, че интерпретацията би могла да даде много повече – а не просто да прочете числата. Според мен съотношение 41:24 в полза на частните телевизии означава, че – ако се запази тенденцията – скоро частните тв ще бъдат два пъти по-предпочитани от БНТ за ориентация в предизборно време – а БНТ по закон трябва да прави това, финансирана с публичен ресурс, за да прави именно това (“демократичните потребности на гражданите” според правото на ЕС), и в допълнение е получила тъкмо сега 20 милиона извън предвиденото в Закона за държавния бюджет. Според мен има смисъл да се продължи съдържателният анализ отвъд числата и цитираната т.7 от доклада по-горе.

4. И последното, което бих споделила в обсъждането, е да вземем предвид изводите на OSCEспоред доклада на Организацията за сигурност и сътрудничество в Европа:

  • В разрез с въведените през декември 2020 г. законови изисквания, СЕМ не  проведе систематичен мониторинг на онлайн аудио-визуално съдържание;
  • Въпреки активния медиен мониторинг на СЕМ, ЦИК не предприе ефективни правни мерки в случаите на установени нарушения в медиите.

Ще завърша с пожелание СЕМ да взема все повече evidence based / основани на данни решения. 

Ten years of the Raspberry Pi blog

Post Syndicated from Liz Upton original https://www.raspberrypi.org/blog/ten-years-of-the-raspberry-pi-blog/

(Buckle yourselves in: this is a long one.)

I had an email last month from UKScone, a Raspberry Pi user I met ten years ago at a Maker Faire in New York.

“Just had a thought. It’ll be 10 years soon since you setup the blog/forums 🙂 Going to do a blog piece about it?

Damn, I feel old.”

Scone was one of a surprisingly large group of people who’d travelled surprisingly long distances to look at a prototype of this Raspberry Pi thing we’d been writing about. That group of people had coalesced around this blog and the Raspberry Pi forums, which both got set up exactly ten years ago tomorrow.

Back in 2011, we thought that perhaps we might sell a few thousand computers.

As of today, we’ve sold more than 40 million of the things.

We’ve seen some spectacular stuff from our community. Remember the Raspberry Pi drawing machine that ran on hamster power?

We’ve kept every single blog post we’ve ever written up on this site, starting way back in July 2011. Ten years is a long time in internet terms, so you’ll find some dead links in some earlier posts; and this website has undergone a number of total redesigns, so early stuff doesn’t tend to have the pretty thumbnail associated with it to show you what it’s all about. (Our page design didn’t use them back then.) But all the same, for the internet archeologists among you, or those interested in the beginnings of Raspberry Pi, those posts from before we even had hardware are worth flicking through.

The incredible dad who recreated the Apollo mission in his son’s bedroom still makes me feel like an inadequate parent.

When we started doing this, I was a freelance writer and copy-editor, writing for several fragrance industry clients alongside the food and travel businesses I drummed work up for through a blog that worked as a kind of portfolio, alongside a food-trivia Twitter account. Blogs were awfully modern back then – I was one of the top three food bloggers by visitor numbers in the country – and Twitter was not yet a cesspool. Because it was modern. (In short, I was not anything approaching a tech writer, although I was a giant nerd already.) Then, one day in 2011, Eben Upton and David Braben showed Rory Cellan-Jones at the BBC a prototype, his YouTube video about it went viral – and Raspberry Pi found itself suddenly in need of somebody to run social media and press. I thought I’d do it for free for a few months, then hand over to someone else and go back to a life of being paid to eat nice things and go on holidays.

Water Droplet Photography created by Dave Hunt using a $25 Raspberry Pi to make a camera rig that would have cost thousands commercially.

I never went back. Ten years on, Eben and I (who met in the 90s and married a few years before the Raspberry Pi project kicked off in 2009) are still here. Raspberry Pi is now two organisations: Raspberry Pi Trading, where I work, which makes the computers, the magazines, the peripherals and all that good stuff; and the Foundation, which is headed up by Philip Colligan, and which runs all our charitable programs. The Foundation trains teachers, gives hardware to deprived kids, advises on the curriculum, offers training programs for free to everybody, allows children to send their code to space, and much more. I’m immensely proud of what Philip’s built over there: it’s more than we could have imagined when we were raising money by selling keyboard stickers from our kitchen table in 2011. (Before you ask, no, we don’t make them any more.) I still remember the envelope-stuffing paper cuts. Let us know in the comments if you’d like us to start making them again. We’re in a position to pay someone who isn’t me to cut them all out this time.

BeetBox – music employing capacitive touch, root veg and a Pi. I was a professional musician before I went into publishing and PR, and music projects have always hit a very special spot for me.

We’re a big team of photographers, videographers, editors, writers and social media people now, producing all the words, videos and pictures that come out of the organisation: Ashley looks after this blog these days, while I look after the team. One thing I’ve always missed about the early days, when I was doing everything (bad photography, social media, press, PR and all the public-facing writing we produced), has been the ability to talk more publicly about hardware development, hiccups in the very early development, and about how the business behind Raspberry Pi was built. Once Raspberry Pi was actually on the market and we started work on follow-up devices, we had to stop talking about that development work in order to avoid getting hit by the Osborne effect – the social phenomenon where people stop or delay buying a product when they know a newer version is in the works. And blogging was so easy right at the start, when every project was new – at a point when there were only 2000 Raspberry Pis in the world, everything somebody did with one felt special! But there’s still a ton of stuff for us to talk about – so many people are doing so many wonderful things with Raspberry Pi that choosing a subject for the day’s blog is one of the hardest parts of Ashley’s job.

Mike Cook is one of my childhood heroes – I used to save my pocket money for Micro User magazine just to read his hardware column. This project comes from very shortly after the first Pis started arriving in people’s houses. I couldn’t believe it when I realised he was using our hardware to do the things I’d loved reading about as a kid.

We have a big anniversary coming up next year, when it’ll be ten years since we sold the first Raspberry Pi. But we’re having a little, premature celebration here at Pi Towers today, as we congratulate ourselves on having kept this stream of news going for ten whole years.

Saved my all-time favourite for last. This paludarium simulating an Amazonian rainforest, complete with weather effects, is one of the most beautiful projects we’ve ever covered.

,

The post Ten years of the Raspberry Pi blog appeared first on Raspberry Pi.

Using Cloud Fitness Functions to Drive Evolutionary Architecture

Post Syndicated from Hauke Juhls original https://aws.amazon.com/blogs/architecture/using-cloud-fitness-functions-to-drive-evolutionary-architecture/

“It is not the strongest of the species that survives, nor the most intelligent. It is the one that is most adaptable to change.” – often attributed to Charles Darwin

One common strategy for businesses that operate in dynamic market conditions (and thus need to continuously correct their course) is to aim for smaller, independent development teams. Microservices and two-pizza teams at Amazon are prominent examples of this strategy. But having smaller units is not the only success factor: to reduce organizational bottlenecks and make high-quality decisions quickly, these two-pizza teams need to be autonomous in most of their decision making.

Architects can no longer rely on static upfront design to meet the change rate required to be successful in such an environment.

This blog shows enterprise architects a mechanism to align decentralized architectural decision making with overall architecture goals.

Gathering data from your fitness functions

“Evolutionary architecture” was coined by Neal Ford and his colleagues from AWS Partner ThoughtWorks in their work on Building Evolutionary Architectures. It is defined as “supporting guided, incremental change as a first principle across multiple dimensions.”

Fitness functions help you obtain the necessary data to allow for the planned evolution of your architecture. They set measurable values to assess how close your solution is to achieving your set goals.

Fitness functions can and should be adapted as the architecture evolves to guide a desired change process. This provides architects with a tool to guide their teams while maintaining team autonomy.

Example of a regression fitness function in action

You’ve identified shorter time-to-market as a key non-functional requirement. You want to lower the risk of regressions and rollbacks after deployments. So, you and your team write automated test cases. To ensure that they have a good set of test cases in place, they measure test coverage. This test coverage measures the percentage of code that is tested automatically. This steers the team toward writing tests to mitigate the risk of regressions so they have fewer rollbacks and shorter time to market.

Fitness functions like this work best when they’re as automated as possible. But how do you acquire the necessary data points to use this mechanism outside of software architecture? We’ll show you how in the following sections.

AWS Cloud services with built-in fitness functions

AWS Cloud services are highly standardized, fully automated via API operations, and are built with observability in mind. This allows you to generate measurements for fitness functions automatically for areas such as availability, responsiveness, and security.

To start building your evolutionary architecture with fitness functions, use something that can be easily measured. AWS has services that can be used as inputs to fitness functions, including:

  • Amazon CloudWatch aggregates logs and metrics to check for availability, responsiveness, and reliability fitness functions.
  • AWS Security Hub provides a comprehensive view of your security alerts and security posture across your AWS accounts. Security Architects could, for example, define the fitness function of critical and high findings to be zero. Teams then would be guided into reducing the number of these findings, resulting in better security.
  • AWS Cost Explorer ensures your costs stay in line with value generated.
  • AWS Well-Architected Tool evaluates teams’ architectures in a consistent and repeatable way. The number of items acts as your fitness function, which can be queried using the API. To improve your architecture based on the results, review the Establishing Feedback Loops Based on the AWS Well-Architected Framework Review blog post.
  • Amazon SageMaker Model Monitor continuously monitors the quality of SageMaker machine learning models in production. Detecting deviations early allows you to take corrective actions like retraining models, auditing upstream systems, or fixing quality issues.

Using the observability that the cloud provides

Fitness functions can be derived by evaluating the AWS account activity such as configuration changes. AWS CloudTrail is useful for this. It records account activity and service events from most AWS services, which can then be analyzed with Amazon Athena.

Fitness functions provide feedback to engineers via metrics

Figure 1. Fitness functions provide feedback to engineers via metrics

Example of a cloud fitness function in action

In this example, we implement a fitness function that monitors the operability of your system.

You have had certain outages due to manual tasks in operations, and you have anecdotal evidence that engineers are spending time on manual work during application rollouts. To improve operations, you want to reduce manual interactions via the shell in favor of automation. First, you prevent direct secure shell (SSH) access by blocking SSH traffic via the managed AWS Config rule restricted-ssh. Second, you make use of AWS Systems Manager Session Manager, which provides a secure and auditable way to access Amazon Elastic Compute Cloud (Amazon EC2) instances.

By counting the logged API events in CloudTrail you can measure the number of shell sessions. This is shown in this sample Athena query to count the number of shell sessions:

SELECT count(*),
       DATE(from_iso8601_timestamp(eventTime)),
       userIdentity.type,
       eventSource,
       eventName
FROM "cloudtrail_logs_partition_projection"
WHERE readonly = 'false'
  AND eventsource = 'ssm.amazonaws.com'
  AND eventname in ('StartSession',
                    'ResumeSession',
                    'TerminateSession')
GROUP BY DATE(from_iso8601_timestamp(eventTime)),
         userIdentity.type,
         eventSource,
         eventName
ORDER BY DATE(from_iso8601_timestamp(eventTime)) DESC

The number of shell sessions now act as fitness function to improve operational excellence through operations as code. Coincidently, the fitness function you defined also rewards teams moving to serverless compute services such as AWS Fargate or AWS Lambda.

Fitness through exercising

Similar to people, your architecture’s fitness can be improved by exercising. It does not take much equipment, but you need to take the first step. To get started, we encourage you to think of the desired outcomes for your architecture that you can measure (and thus guide) through fitness functions. The following lessons learned will help you focus your goals:

  • Requirements and business goals may differ per domain. Thus, your fitness functions might differ. Work closely with your teams when defining fitness functions.
  • Start by taking something that can be easily measured and communicated as a goal.
  • Focus on a positive trendline rather than absolute values.
  • Make sure you and your teams are using the same metrics and the same way to measure them. We have seen examples where central governance departments had access to data the individual teams did not, leading to frustration on all sides.
  • Ensure that your architecture goals fit well into the current context and time horizon.
  • Continuously re-visit the fitness functions to ensure that they evolve with the changing business goals.

Conclusion

Fitness functions help architects focus on building. Once established, teams can use the data points from fitness functions to make decisions and work towards a common and measurable goal. The architects in turn can use the data points they get from fitness functions to confirm their hypothesis of the current state of the architecture. Get started building your fitness functions today by:

  • Gathering the most important system quality attributes.
  • Beginning with approximately three meaningful fitness functions relying on the API operations available.
  • Building a dashboard that shows progress over time, share it with your teams, and rely on this data in your daily work.

The three most important AWS WAF rate-based rules

Post Syndicated from Artem Lovan original https://aws.amazon.com/blogs/security/three-most-important-aws-waf-rate-based-rules/

In this post, we explain what the three most important AWS WAF rate-based rules are for proactively protecting your web applications against common HTTP flood events, and how to implement these rules. We share what the Shield Response Team (SRT) has learned from helping customers respond to HTTP floods and show how all AWS WAF customers can benefit from these learnings.

When you have business-critical applications that are internet-facing, you need to protect them from risks such as distributed denial of service (DDoS) attacks. AWS Shield Advanced is a managed DDoS protection service that safeguards applications that are running behind Amazon Web Services (AWS) internet-facing resources. The backend origin of your application can exist anywhere, including on premises, and Shield Advanced can protect it. Shield Advanced provides DDoS protection for Layers 3–7. It also includes 24/7 access to the SRT to help you quickly respond to sophisticated unauthorized activity scenarios that might be unique to your application. To learn more about what resource types are supported to associate AWS WAF, see AWS WAF.

Increasingly, the SRT has been assisting customers in protecting against Layer 7 HTTP flood occurrences that negatively impact application availability or performance by overloading the application with an unusually high number of HTTP requests. In many cases, these malicious events can be automatically mitigated by using AWS WAF. In addition, AWS WAF has an easy-to-configure native rate-based rule capability, which detects source IP addresses that make large numbers of HTTP requests within a 5-minute time span, and automatically blocks requests from the offending source IP until the rate of requests falls below a set threshold. In this post, we show how you can pull insights from the AWS WAF logs to determine what your rate-based rule threshold should be.

The top three most important AWS WAF rate-based rules are:

  • A blanket rate-based rule to protect your application from large HTTP floods.
  • A rate-based rule to protect specific URIs at more restrictive rates than the blanket rate-based rule.
  • A rate-based rule to protect your application against known malicious source IPs.

Solution overview

AWS WAF is a web application firewall that helps protect your web applications against common web exploits that might affect availability, compromise security, or consume excessive resources. AWS WAF gives you control over which web traffic reaches your applications. If you already know the request rates for your application, you have all the necessary information to start creating your AWS WAF rate-based rules. To learn more about how to create rules, see Creating a rule and adding conditions. However, if you don’t have this data and want to learn how to get started, this solution helps you determine appropriate rates for your applications, and how to create AWS WAF rate-based rules.

Figure 1 shows how incoming request information is captured so that the operations team can use it to determine rate-based rules.

Figure 1: The workflow to collect and query logs and apply rate-based rules

Figure 1: The workflow to collect and query logs and apply rate-based rules

Let’s go through the flow to better understand what’s happening at each step:

  1. An application user makes requests to the application.
  2. AWS WAF captures information about the incoming requests and sends this to Amazon Kinesis Data Firehose.
  3. Kinesis Data Firehose delivers the logs to an Amazon Simple Storage Service (Amazon S3) bucket, where they will be stored.
  4. The operations team uses Amazon Athena to analyze the logs with SQL queries.
  5. Athena queries the logs in the S3 bucket and shows the query results.
  6. The operations team uses the query results to determine the appropriate AWS WAF rate-based rule.

The three rate-based rules in detail

Each of the rules helps to protect web applications from unauthorized activity. Each of the rules focuses on a specific aspect of protection. The rules complement each other, and so when they’re combined, they can offer greater help in protecting your web application. We’ll look at each of the rules to understand what they do.

Blanket rate-based rule

A blanket rate-based rule is designed to prevent any single source IP address from negatively impacting the availability of a website. For example, if the threshold for the rate-based rule is set to 2,000, the rule will block all IPs that are making more than 2,000 requests in a rolling 5-minute period. This is the most basic rate-based rule, and one of the most valuable for AWS WAF customers to implement. The SRT often helps customers who are actively under a DDoS attack to quickly implement this rule. In past experiences with HTTP flood cases, if this rule were proactively in place, the customer would have been protected and wouldn’t have needed to reach out to the SRT for assistance. The blanket rate-based rule would have automatically blocked the attempt without any human intervention.

URI-specific rate-based rule

Some application URI endpoints typically receive a high request volume, but for others it would be unusual and suspicious to see a high request count. For example, multiple requests in a 5-minute period to an application’s login page is suspicious and indicates a potential brute force or credential-stuffing attack against the application. A URI-specific rule can prevent a single source IP address from connecting to the login page as few as 100 times per 5-minute period, while still allowing a much higher request volume to the rest of the application. Some applications naturally have computationally expensive URIs that, when called, require considerably more resources to process the request. An example of this could be a database query or search function. If a bad actor targets these computationally expensive URIs, this can quickly lead to application performance or availability issues. If you assign a URI-specific rate-based rule to these portions of your site, you can configure a much lower threshold than the blanket rate-based rule. It’s beyond the scope of this blog post, but some customers use Application Load Balancer access logs and the target_processing_time information to determine precisely which portions of the site are the slowest to respond and might represent a computationally expensive call. These customers then put additional rate-based rule protections on calls that are made to these URIs.

IP reputation rate-based rule

Many of the DDoS events the SRT assists customers with include HTTP floods that originate from known malicious source IPs. The AWS WAF Security Automations solution provides AWS WAF customers with a subscription to four open-source threat intelligence lists. Rate-based rules with low thresholds can be applied to requests coming from these suspect sources. Some customers feel comfortable completely blocking web requests from these IPs, but at the very least, requests from these IPs should be rate-limited to protect the application from these well-known malicious sources.

It’s also common to see HTTP floods originate from IP addresses within certain countries. You can use AWS WAF geographical matching rules to assign lower rate-based rule thresholds to requests that originate from certain countries, or countries that don’t contain your web application’s primary user base. For example, suppose your application primarily serves users in the United States. In that case, it could be beneficial to create a rate-based rule with a low threshold for requests that come from any country other than the United States. HTTP floods are also commonly seen originating from IP addresses classified as cloud hosting provider IPs. You can use AWS WAF’s “HostingProviderIPList” Managed Rule to label these requests and then assign a lower rate-based rule threshold to them as well.

Prerequisites

Before you implement the solution, verify that:

  • AWS WAF is deployed in your AWS account and is associated with an Amazon CloudFront distribution or an Application Load Balancer.
  • Your AWS WAF default action is set to Block. When you create and configure a web ACL, you set the web ACL default action, which determines how AWS WAF handles web requests that don’t match any rules in the web ACL. To learn more about default action for a web ACL, see Deciding on the default action for a web ACL.
  • AWS WAF logging is configured and logs are being stored in an S3 bucket.

    Note: You can follow these instructions to configure delivery of AWS WAF logs to your S3 bucket, and you can also use AWS Firewall Manager to configure centralized AWS WAF logging in a multi-account environment.

Set up Athena to analyze AWS WAF logs

Amazon Athena is an interactive query service that you can use to analyze data in Amazon S3 by using standard SQL. For this solution, you’ll use Athena to connect to the S3 bucket where AWS WAF logs are stored and query the AWS WAF logs. The first step is to open the Athena console and create a database.

Note: The Athena database and table creation is a once-off configuration process. You can then come back and run the queries and see the query results based on your latest AWS WAF log data.

To create an Athena database, you’ll use a data definition language (DDL) statement. Paste the following query in the Athena query editor, replacing values as described here:

  • Replace <your-bucket-name> with the S3 bucket name that holds your AWS WAF logs.
  • For <bucket-prefix-if-exist>, if AWS WAF logs are stored in an S3 bucket prefix, replace with your prefix name. Otherwise, remove this part from the query, including the slash “/” at the end.
CREATE DATABASE IF NOT EXISTS wafrulesdb
  COMMENT 'AWS WAF logs'
  LOCATION 's3://<your-bucket-name>/<bucket-prefix-if-exist>/';

Choose Run query to run the query and create the database. Successful completion will be indicated by the query result, as shown below.

Results
Query successful. 

Next, you’ll create a table inside the database. Paste the following query in the Athena query editor, replacing values as described here:

  • Replace <your-bucket-name> with the S3 bucket name that holds your AWS WAF logs.
  • For <bucket-prefix-if-exist>, if AWS WAF logs are stored in an S3 bucket prefix, replace with your prefix name. Otherwise, you can remove this part from the query, including the slash “/” at the end.
  • For has_encrypted_data, if your AWS WAF log data is encrypted at rest, change the value to true, otherwise false is the correct value.
CREATE EXTERNAL TABLE IF NOT EXISTS wafrulesdb.waftable (
  `terminatingRuleId` string,
  `httpSourceName` string,
  `action` string,
  `httpSourceId` string,
  `terminatingRuleType` string,
  `webaclId` string,
  `timestamp` float,
  `formatVersion` int,
  `ruleGroupList` array<string>,
  `httpRequest` struct<`headers`:array<struct<name:string,value:string>>,clientIp:string,args:string,requestId:string,httpVersion:string,httpMethod:string,country:string,uri:string>,
  `rateBasedRuleList` string,
  `nonTerminatingMatchingRules` string,
  `terminatingRuleMatchDetails` string 
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
  'serialization.format' = '1'
) LOCATION 's3://<your-bucket-name>/<bucket-prefix-if-exist>/'
TBLPROPERTIES ('has_encrypted_data'='false');

Run the query in the Athena console. After the query completes, Athena registers the waftable table, which makes the data in it available for queries.

Run SQL queries to identify rate-based rule thresholds

Now that you have a table in Athena, know where the data is located, and have the correct schema, you can run SQL queries for each of the rate-based rules and see the query results.

Blanket rate-based rule for all application endpoints

You’ll start with a SQL query that identifies the blanket rule. The critical factor in determining the blanket rule is to run the query against AWS WAF logs data that represents a healthy high request volume. The following query defines a time window of 6 hours in the evening, expressed as 2020-12-01 16:00:00 and 2020-12-01 22:00:00. Time windows can span a few hours or several days; however, this time window must be a good representation of your traffic volume, which you will use as the basis to identify the threshold. For example, if your application is busier during certain periods, you should evaluate the log data for that time. In the example shown here, we limit the query results to the top 100 IPs in our SQL queries. You can adjust the limit to your needs by updating the LIMIT value.

SELECT
  httprequest.clientip,
  COUNT(*) AS "count"
FROM wafrulesdb.waftable
WHERE from_unixtime(timestamp/1000) BETWEEN TIMESTAMP '2020-12-01 16:00:00' AND TIMESTAMP '2020-12-01 22:00:00'
GROUP BY httprequest.clientip, FLOOR("timestamp"/(1000*60*5))
ORDER BY count DESC
LIMIT 100; 

Update the time window to your needs and run the query in the Athena console. The results will show the top requesting IPs in any 5-minute period between two dates, as illustrated in Figure 2.

Figure 2: The top requesting IP in any 5-minute period between dates

Figure 2: The top requesting IP in any 5-minute period between dates

You can visualize the results data to see a holistic view of the request count per IP. The chart in Figure 3 illustrates the SQL query results.

Figure 3: Chart: Top requesting IP in any 5-minute period between dates

Figure 3: Chart: Top requesting IP in any 5-minute period between dates

The results are sorted by showing the IPs with the highest request volume for every 5-minute period. This means that the same IP could appear multiple times, if most of the requests were made within that 5-minute interval. In our example, looking at the result, an excellent first blanket rule would limit the request volume to about 7,000 requests within a 5-minute time period. You can either create the AWS WAF rule by using the following JSON and the JSON rule editor, or by using the AWS WAF visual rule editor and following these instructions. If you’re using the following JSON, make sure to replace the Limit value with the value that you identified by running the SQL query earlier.

{
  "Name": "BlanketRule",
  "Priority": 2,
  "Action": {
    "Block": {}
  },
  "VisibilityConfig": {
    "SampledRequestsEnabled": true,
    "CloudWatchMetricsEnabled": true,
    "MetricName": "BlanketRule"
  },
  "Statement": {
    "RateBasedStatement": {
      "Limit": 7000,
      "AggregateKeyType": "IP"
    }
  }
}

Sometimes a client connects to an application through an HTTP proxy or a content delivery network (CDN), which obscures the client origin IP. It’s important to identify the client IP instead of the one from the proxy or CDN, because blocking source IPs can cause a wider unwanted impact. You can use many tools to help you identify whether the source IP might be a CDN. In this case, you would need to query and filter on the X-Forwarded-For, True-Client-IP, or other custom headers. CDN providers typically publish which headers they add to the requests, but X-Forwarded-For and True-Client-IP are common. The following query shows how you can reference these headers, illustrating with the X-Forwarded-For header, to write rate-based rules. You can replace X-Forwarded-For with the header you expect to hold the client IP.

SELECT
  header.value,
  COUNT(*) AS "count"
FROM wafrulesdb.waftable, UNNEST(httprequest.headers) as t(header)
WHERE
    from_unixtime(timestamp/1000) BETWEEN TIMESTAMP '2020-12-01 16:00:00' AND TIMESTAMP '2020-12-01 22:00:00'
  AND
    header.name = 'X-Forwarded-For'
GROUP BY header.value, FLOOR("timestamp"/(1000*60*5))
ORDER BY count DESC
LIMIT 100;

URI-based rule for specific application endpoints

Suppose that you want to further limit requests to the login page on your website. To do this, you could add the following string match condition to a rate-based rule:

  • The part of the request to filter on is URI
  • The Match Type is Starts with
  • A Value to match is /login (this needs to be whatever identifies the login page in the URI portion of the web request)

Next you have to identify what is a typical request volume to the /login URI for the application. The following SQL query does exactly that.

SELECT
  httprequest.clientip,
  httprequest.uri,
  COUNT(*) AS "count"
FROM wafrulesdb.waftable
WHERE 
  from_unixtime(timestamp/1000) BETWEEN TIMESTAMP '2020-12-01 16:00:00' AND TIMESTAMP '2020-12-01 22:00:00'
AND
  httprequest.uri = '/login'
GROUP BY httprequest.clientip, httprequest.uri, FLOOR("timestamp"/(1000*60*5))
ORDER BY count DESC
LIMIT 100;

Replace the time window 2020-12-01 16:00:00 and 2020-12-01 22:00:00 and the httprequest.uri value, if applicable, and run the query in the Athena console. The results show the highest requesting IP and /login URI for every 5-minute period between dates, as illustrated in Figure 4.

Figure 4: The highest requesting IP and /login URI for every 5-minute period between dates

Figure 4: The highest requesting IP and /login URI for every 5-minute period between dates

Figure 5 illustrates a chart based on the query results for the highest requesting IP and /login URI for every 5-minute period between dates.

Figure 5: Chart: The highest requesting IP and /login URI for every 5-minute period between dates

Figure 5: Chart: The highest requesting IP and /login URI for every 5-minute period between dates

Based on the SQL query results, you would specify a rate limit of 150 requests per 5 minutes. Adding this rate-based rule to a web ACL will limit requests to your login page per IP address without affecting the rest of your site. Once again, you can either create the AWS WAF rule by using the following JSON and the JSON rule editor, or by using the AWS WAF visual rule editor and following these instructions. If you’re using the following JSON, make sure to replace the Limit value with the value that you identified by running the SQL query earlier.

{
  "Name": "UriBasedRule",
  "Priority": 1,
  "Action": {
    "Block": {}
  },
  "VisibilityConfig": {
    "SampledRequestsEnabled": true,
    "CloudWatchMetricsEnabled": true,
    "MetricName": "UriBasedRule"
  },
  "Statement": {
    "RateBasedStatement": {
      "Limit": 150,
      "AggregateKeyType": "IP",
      "ScopeDownStatement": {
        "ByteMatchStatement": {
          "FieldToMatch": {
            "UriPath": {}
          },
          "PositionalConstraint": "STARTS_WITH",
          "SearchString": "/login",
          "TextTransformations": [
            {
              "Type": "NONE",
              "Priority": 0
            }
          ]
        }
      }
    }
  }
}

AWS WAF rules with a lower value for Priority are evaluated before rules with a higher value. For the AWS WAF rules to work as expected (first evaluating the more specific rule—the URI-based rule, and only after that, the more general blanket rule) you have to set the AWS WAF rule priority. You can do that by updating the JSON and setting the Priority value to 1 for the blanket rule and 0 for the URI-based rule, or by using the AWS WAF visual rule editor. The expected AWS WAF rule priority should be as illustrated in Figure 6.

Figure 6: AWS WAF rules with priority for UriBasedRule

Figure 6: AWS WAF rules with priority for UriBasedRule

If you want to know the request volume across all application URIs, the following SQL will accomplish that.

SELECT
  httprequest.clientip,
  httprequest.uri,
  COUNT(*) AS "count"
FROM wafrulesdb.waftable
WHERE from_unixtime(timestamp/1000) BETWEEN TIMESTAMP '2020-12-01 16:00:00' AND TIMESTAMP '2020-12-01 22:00:00'
GROUP BY httprequest.clientip, httprequest.uri, FLOOR("timestamp"/(1000*60*5))
ORDER BY count DESC
LIMIT 100;

Figure 7 shows a chart of what the SQL query results might look like.

Figure 7: The highest requesting IP and URI for every 5-minute period between dates

Figure 7: The highest requesting IP and URI for every 5-minute period between dates

IP reputation rule groups to block bots or other threats

You can use IP reputation rules to block requests based on their source. AWS WAF offers a wide selection of managed rule groups, and Amazon IP reputation list is the one that will help to reduce your exposure to bot traffic or exploitation attempts.

To add the Amazon IP reputation list rule to your web ACL

  1. Open the AWS WAF console and navigate to the managed rule groups view.

    Figure 8: The managed rule group view in AWS WAF

    Figure 8: The managed rule group view in AWS WAF

  2. Expand AWS managed rule groups, and for Amazon IP reputation list, choose Add to web ACL.

    Figure 9: Add the Amazon IP reputation list to the web ACL

    Figure 9: Add the Amazon IP reputation list to the web ACL

  3. Scroll to the bottom of the page and choose Add rule.
  4. At this point, you should see the Set rule priority view. Move up the Amazon managed rule so that it has the highest priority. If a request originates from a bot, you want to deny the request as early as possible, and you achieve exactly that by assigning the highest priority to the Amazon IP reputation list rule. Your final AWS WAF rules order should be as shown in Figure 10.

    Figure 10: Final AWS WAF rules ordered by priority

    Figure 10: Final AWS WAF rules ordered by priority

Considerations for rate-based rules

It’s important to note that the more specific AWS WAF rules should have a higher priority, because you want these rules to limit the request volume first. In our example, the rules strategy is first based on a specific URI, and then on a blanket rule that limits requests across the whole application.

The rate-based rules that we discussed here provide a solid foundation to help you protect your internet-facing applications from common basic HTTP request floods. However, the solution in this blog post shouldn’t be seen as a one-time setup but rather as an iterative activity.

You should determine a healthy time frame to rerun Amazon Athena queries to identify a new rate-based rule that aligns with the application’s growth and increasing request volume. Reviewing the rate-based rules on an iterative basis and incorporating it into your existing processes, such as software development life cycle, is a great way to schedule in the review process. Each AWS WAF rule can publish Amazon CloudWatch metrics, which can be used to trigger alerts before thresholds are crossed. You can use alerts to create tickets to operations teams based on thresholds you set. This alerts your operations teams to review the situation to see if it’s a DDoS attack being thwarted versus legitimate traffic being dropped.

After you define your request, add a buffer to allow for growth. Rate-based rules should have a reasonable buffer to account for near-future application growth. For instance, when an Athena query result shows a request volume of 500 requests, a rate-based rule with a limit of 1,000 requests gives a buffer for an additional 500 requests to account for application growth.

Summary

In this post, we introduced you to the top three most important AWS WAF rate-based rules to protect your web applications from common HTTP flood events. We also covered how to implement these rate-based rules and determined an appropriate request threshold for your application by using AWS WAF logs and Amazon Athena queries. To learn more about best practices that help you protect your websites and web applications against various attack vectors by using AWS WAF, see our whitepaper, Guidelines for Implementing AWS WAF.

You can learn more about AWS WAF in other AWS WAF–related Security Blog posts.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the AWS WAF forum or contact AWS Support.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Artem Lovan

Artem is a Senior Solutions Architect based in New York. He helps customers architect and optimize applications on AWS. He has been involved in IT at many levels, including infrastructure, networking, security, DevOps, and software development.

Author

Jesse Lepich

Jesse is a Senior Security Solutions Architect at AWS based in Lake St. Louis, Missouri, focused on helping customers implement native AWS security services. Outside of cloud security, his interests include relaxing with family, barefoot waterskiing, snowboarding/snow skiing, surfing, boating/sailing, and mountain climbing.

Synchronize and control your Amazon Redshift clusters maintenance windows

Post Syndicated from Ahmed Gamaleldin original https://aws.amazon.com/blogs/big-data/synchronize-and-control-your-amazon-redshift-clusters-maintenance-windows/

Amazon Redshift is a data warehouse that can expand to exabyte-scale. Today, tens of thousands of AWS customers (including NTT DOCOMO, Finra, and Johnson & Johnson) use Amazon Redshift to run mission-critical business intelligence dashboards, analyze real-time streaming data, and run predictive analytics jobs.

Amazon Redshift powers analytical workloads for Fortune 500 companies, startups, and everything in between. With the constant increase in generated data, Amazon Redshift customers continue to achieve successes in delivering better service to their end-users, improving their products, and running an efficient and effective business. Availability, therefore, is key in continuing to drive customer success, and AWS will use commercially reasonable efforts to make Amazon Redshift available with a Monthly Uptime Percentage for each multi-node cluster, during any monthly billing cycle, of at least 99.9% (our “Service Commitment”).

In this post, we present a solution to help you provide a predictable and repeatable experience to your Amazon Redshift end-users by taking control of recurring Amazon Redshift maintenance windows.

Amazon Redshift maintenance windows

Amazon Redshift periodically performs maintenance to apply fixes, enhancements, and new features to your cluster. This type of maintenance occurs during a 30-minute maintenance window set by default per Region from an 8-hour block on a random day of the week. You should change the scheduled maintenance window according to your business needs by modifying the cluster, either programmatically or by using the Amazon Redshift console. The window must be at least 30 minutes and not longer than 24 hours. For more information, see Managing clusters using the console.

If a maintenance event is scheduled for a given week, it starts during the assigned 30-minute maintenance window. While Amazon Redshift performs maintenance, it terminates any queries or other operations that are in progress. If there are no maintenance tasks to perform during the scheduled maintenance window, your cluster continues to operate normally until the next scheduled maintenance window. Amazon Redshift uses Amazon Simple Notification Service (Amazon SNS) to send notifications of Amazon Redshift events. You enable notifications by creating an Amazon Redshift event subscription. You can create an Amazon Redshift event notification subscription so you can be notified when an event occurs for a given cluster.

When an Amazon Redshift cluster is scheduled for maintenance, you receive an Amazon Redshift “Pending” event, as described in the following table.

Amazon Redshift category Event ID Event severity Description
Pending REDSHIFT-EVENT-2025 INFO Your database for cluster <cluster name> will be updated between <start time> and <end time>. Your cluster will not be accessible. Plan accordingly.
Pending REDSHIFT-EVENT-2026 INFO Your cluster <cluster name> will be updated between <start time> and <end time>. Your cluster will not be accessible. Plan accordingly.

Amazon Redshift also gives you the option to reschedule your cluster’s maintenance window by deferring your upcoming maintenance by up to 45 days. This option is particularly helpful if you want to maximize your cluster uptime by deferring a future maintenance window. For example, if your cluster’s maintenance window is set to Wednesday 8:30–9:00 UTC and you need to have nonstop access to your cluster for the next 2 weeks, you can defer maintenance to a date 2 weeks from now. We don’t perform any maintenance on your cluster when you have specified a deferment.

Deferring a maintenance window doesn’t apply to mandatory Amazon Redshift updates, such as vital security patches. Scheduled maintenance is different from Amazon Redshift mandatory maintenance. If Amazon Redshift needs to update hardware or make other mandatory updates during your period of deferment, we notify you and make the required changes. Your cluster isn’t available during these updates and such maintenance can’t be deferred. If a hardware replacement is required, you receive an event notification through the AWS Management Console and your SNS subscription as a “Pending” item, as shown in the following table.

Amazon Redshift category Event ID Event severity Description
Pending REDSHIFT-EVENT-3601 INFO A node on your cluster <cluster name> will be replaced between <start time> and <end time>. You can’t defer this maintenance. Plan accordingly.
Pending REDSHIFT-EVENT-3602 INFO A node on your cluster <cluster name> is scheduled to be replaced between <start time> and <end time>. Your cluster will not be accessible. Plan accordingly.

Challenge

As the number of the Amazon Redshift clusters you manage for an analytics application grows, the end-user experience becomes highly dependent on recurring maintenance. This means that you want to make sure maintenance windows across all clusters happen at the same time and fall on the same day each month. You want to avoid a situation in which, for example, your five Amazon Redshift clusters are each updated on a different day or week. Ultimately, you want to give your Amazon Redshift end-users an uninterrupted number of days where the clusters aren’t subject to any scheduled maintenance. This gives you the opportunity to announce a scheduled maintenance to your users well ahead of the maintenance date.

Amazon Redshift clusters are scheduled for maintenance depending on several factors, including when a cluster was created and its Region. For example, you may have two clusters on the same Amazon Redshift version that are scheduled for different maintenance windows. Regions implement the Amazon Redshift latest versions and patches at different times. A cluster running in us-east-1 might be scheduled for the same maintenance 1 week before another cluster in eu-west-1. This makes it harder for you to provide your end-users with a predictable maintenance schedule.

To solve this issue, you typically need to frequently check and synchronize the maintenance windows across all your clusters to a specific time, for example sat:06:00-sat:06:30. Additionally, you want to avoid having your clusters scheduled for maintenance at different intervals. For that, you need to defer maintenance across all your clusters to fall on the same exact day. For example, you can defer all maintenance across all clusters to happen 1 month from now regardless of when the cluster was last updated. This way you know that your clusters aren’t scheduled for maintenance for the next 30 days. This gives you enough time to announce the maintenance schedule to your Amazon Redshift users.

Solution overview

Having one day per month when Amazon Redshift scheduled maintenance occurs across all your clusters provides your users with a seamless experience. It gives you control and predictability over when clusters aren’t available. You can announce this maintenance window ahead of time to avoid sudden interruptions.

The following solution deploys an AWS Lambda function (RedshiftMaintenanceSynchronizer) to your AWS account. The function runs on a schedule configurable through an AWS CloudFormation template parameter frequency to run every 6, 12, or 24 hours. This function synchronizes maintenance schedules across all your Amazon Redshift clusters to happen at the same time on the same day. It also gives you the option to defer all future maintenance windows across all clusters by a number of days (deferment days) for up to 45 days. This enables you to provide your users with an uninterrupted number of days when your clusters aren’t subject to maintenance.

We use the following input parameters:

  • Deferment days – The number of days to defer all future scheduled maintenance windows. Amazon Redshift can defer maintenance windows by up to 45 days. The solution adds this number to the date of the last successfully completed maintenance across any of your clusters. The resulting date is the new maintenance date for all your clusters.
  • Frequency – The frequency to run this solution. You can configure it to run every 6, 12, or 24 hours.
  • Day – The preferred day of the week to schedule maintenance windows.
  • Hour – The preferred hour of the day to schedule maintenance windows (24-hour format).
  • Minute – The preferred minute of the hour to schedule maintenance windows.

The Lambda function performs the following steps:

  1. Lists all your Amazon Redshift clusters in the Region.
  2. Updates the maintenance window for all clusters to the same value of the day/hour/minute input parameters from the CloudFormation template.
  3. Checks for the last successfully completed maintenance across all clusters.
  4. Calculates the deferment maintenance date by adding the deferment input parameter to the date of the last successfully completed maintenance. For example, if the deferment parameter is 30 days and the last successful maintenance window completed on July 1, 2020, then the next deferment maintenance date is July 31, 2020.
  5. Defers the next maintenance window across all clusters to the deferment date calculated in the previous step.

Launch the solution

To get started, deploy the cloudFormation template to your AWS account.

  1. For Stack name, enter a name for your stack for easy reference.
  2. For Day, choose the day of week for the maintenance window to occur.
  3. For Hour, choose the hour for the window to start.
  4. For Minute, choose the minute within the hour for the window to start.

Your window should be a time with the least cluster activity and during off-peak work hours.

  1. For Deferment Days, choose the number of days to defer all future scheduled maintenance windows.
  2. For Solution Run Frequency, choose the frequency of 6, 12, or 24 hours.

cloudFormation Template

After the template has successfully deployed, the following resources are available:

  • The RedshiftMaintenanceSync Lambda function. This is a Python 3.8 Lambda function that syncs and defers the maintenance windows across all your Amazon Redshift clusters.

lamda_console

  • The RedshiftMaintenanceSyncEventRule Amazon CloudWatch event rule. This rule triggers on a schedule based on the Frequency input parameter. It triggers the RedshiftMaintenanceSync Lambda function to run the solution logic.

When the Lambda function starts and detects an already available deferment on any of your clusters, it doesn’t attempt to modify the existing deferment, and exits instead.

The solution logs any deferment it performs on any cluster in the associated CloudWatch log group.

Synchronize and defer maintenance windows

To demonstrate this solution, I have two Amazon Redshift clusters with different preferred maintenance windows and scheduled intervals.

Cluster A (see the following screenshot) has a preferred maintenance window set to Friday at 4:30–5:00 PM. It’s scheduled for maintenance in 2 days.

Cluster B has a preferred maintenance window set to Tuesday at 9:45–10:15 AM. It’s scheduled for maintenance in 6 days.

This means that my Amazon Redshift users have a 30-minute interruption twice in the next 2 and 6 days. Also, these interruptions happen at completely different times.

Let’s launch the solution and see how the cluster maintenance windows and scheduling intervals change.

The Lambda function does the following every time it runs:

  1. Lists all the clusters.
  2. Checks if any of the clusters have a deferment enabled and exits if it finds any.
  3. Syncs all preferred maintenance windows across all clusters to fall on the same time and day of week.
  4. Checks for the latest successfully completed maintenance date.
  5. Adds the deferment days to the last maintenance date. This becomes the new deferment date.
  6. Applies the new deferment date to all clusters.

Cluster A’s maintenance window was synced to the input parameters I passed to the CloudFormation template. Additionally, the next maintenance window was deferred until June 21, 2021, 8:06 AM (UTC +02:00).

Cluster B’s maintenance window is also synced to the same value from Cluster A, and the next maintenance window was deferred to the same exact day as Cluster A: June 21, 2021, 8:06 AM (UTC +02:00).

Finally, let’s check the CloudWatch log group to understand what the solution did.

Now both clusters have the same maintenance window and next scheduled maintenance, which is a month from now. In a production scenario, this gives you enough lead time to announce the maintenance window to your Amazon Redshift end-users and provide them with the exact day and time when clusters aren’t available.

Summary

This solution can help you provide a predictable and repeatable experience to your Amazon Redshift end-users by taking control of recurring Amazon Redshift maintenance windows. It enables you to provide your users with an uninterrupted number of days where clusters are always available—barring any scheduled mandatory upgrades. Click here to get started with Amazon Redshift today.


About the Author

Ahmed_Gamaleldin

Ahmed Gamaleldin is a Senior Technical Account Manager (TAM) at Amazon Web Services. Ahmed helps customers run optimized workloads on AWS and make the best out of their cloud journey.

Accelerate your data warehouse migration to Amazon Redshift – Part 2

Post Syndicated from Michael Soo original https://aws.amazon.com/blogs/big-data/part-2-accelerate-your-data-warehouse-migration-to-amazon-redshift/

This is the second post in a multi-part series. We’re excited to shared dozens of new features to automate your schema conversion; preserve your investment in existing scripts, reports, and applications; accelerate query performance; and potentially reduce your overall cost to migrate to Amazon Redshift. Check out the first post Accelerate your data warehouse migration to Amazon Redshift – Part 1 to learn more about automated macro conversion, case-insensitive string comparison, case-sensitive identifiers, and other exciting new features.

Amazon Redshift is the leading cloud data warehouse. No other data warehouse makes it as easy to gain new insights from your data. With Amazon Redshift, you can query exabytes of data across your data warehouse, operational data stores, and data lake using standard SQL. You can also integrate other services like Amazon EMR, Amazon Athena, and Amazon SageMaker to use all the analytic capabilities in the AWS Cloud.

Many customers have asked for help migrating their self-managed data warehouse engines to Amazon Redshift. In these cases, you may have terabytes (or petabytes) of historical data, a heavy reliance on proprietary features, and thousands of extract, transform, and load (ETL) processes and reports built over years (or decades) of use.

Until now, migrating a data warehouse to AWS was complex and involved a significant amount of manual effort.

Today, we’re happy to share additional enhancements to the AWS Schema Conversion Tool (AWS SCT) to automate your migrations to Amazon Redshift. These enhancements reduce the recoding needed for your data tables, and more importantly, the manual work needed for views, stored procedures, scripts, and other application code that use those tables.

In this post, we introduce automation for INTERVAL and PERIOD data types, automatic type casting, binary data support, and some other enhancements that have been requested by customers. We show you how to use AWS SCT to convert objects from a Teradata data warehouse and provide links to relevant documentation so you can continue exploring these new capabilities.

INTERVAL data types

An INTERVAL is an unanchored duration of time, like “1 year” or “2 hours,” that doesn’t have a specific start or end time. In Teradata, INTERVAL data is implemented as 13 distinct data types depending on the granularity of time being represented. The following table summarizes these types.

Year intervals Month intervals Day intervals Hour intervals Minute intervals Second intervals

INTERVAL YEAR

INTERVAL YEAR TO MONTH

INTERVAL MONTH

 

INTERVAL DAY

INTERVAL DAY TO HOUR

INTERVAL DAY TO MINUTE

INTERVAL DAY TO SECOND

INTERVAL HOUR

INTERVAL HOUR TO MINUTE

INTERVAL HOUR TO SECOND

INTERVAL MINUTE

INTERVAL MINUTE TO SECOND

INTERVAL SECOND

 

Amazon Redshift doesn’t support INTERVAL data types natively. Previously, if you used INTERVAL types in your data warehouse, you had to develop custom code as part of the database conversion process.

Now, AWS SCT automatically converts INTERVAL data types for you. AWS SCT converts an INTERVAL column into a CHARACTER VARYING column in Amazon Redshift. Then AWS SCT converts your application code that uses the column to emulate the INTERVAL semantics.

For example, consider the following Teradata table, which has a MONTH interval column. This table store different types of leaves of absences and the allowable duration for each.

CREATE TABLE testschema.loa_durations (
  loa_type_id INTEGER
, loa_name VARCHAR(100) CHARACTER SET LATIN
, loa_duration INTERVAL MONTH(2))
PRIMARY INDEX (loa_type_id);

AWS SCT converts the table to Amazon Redshift as follows. Because Amazon Redshift doesn’t have a native INTERVAL data type, AWS SCT replaces it with a VARCHAR data type.

CREATE TABLE testschema.loa_durations(
  loa_type_id INTEGER
, loa_name VARCHAR(100)
, loa_duration VARCHAR(64)
)
DISTSTYLE KEY
DISTKEY
(
loa_type_id
)
SORTKEY
(
loa_type_id
);

Now, let’s suppose your application code uses the loa_duration column, like the following Teradata view. Here, the INTERVAL MONTH field is added to the current date to compute when a leave of absence ends if it starts today.

REPLACE VIEW testschema.loa_projected_end_date AS
SELECT
  loa_type_id loa_type_id
, loa_name loa_name
, loa_duration
, current_date AS today
, current_date + loa_duration AS end_date
FROM
testschema.loa_durations
;

Because the data is stored as CHARACTER VARYING, AWS SCT injects the proper type CAST into the Amazon Redshift code to interpret the string values as a MONTH interval. It then converts the arithmetic using Amazon Redshift date functions.

CREATE OR REPLACE VIEW testschema.loa_projected_end_date (loa_type_id, loa_name, loa_duration, today, end_date) AS
SELECT
  loa_type_id AS loa_type_id
, loa_name AS loa_name
, loa_duration
, CURRENT_DATE AS today
, dateadd(MONTH, CAST (loa_duration AS INTEGER),CURRENT_DATE)::DATE AS end_date
FROM testschema.loa_durations
;

Also, as a bonus, AWS SCT automatically converts any literal INTERVAL values that you might be using in your code.

For example, consider the following Teradata table. The table contains a DATE column, which records the last date when an employee was promoted.

CREATE TABLE TESTSCHEMA.employees (
  id INTEGER
, name VARCHAR(20) CHARACTER SET LATIN NOT CASESPECIFIC
, manager_id INTEGER
, last_promo_date DATE FORMAT 'YY/MM/DD'
)
UNIQUE PRIMARY INDEX ( id );

Now, suppose the database contains a view that computes the next date an employee is eligible for a promotion. We implement a business rule that employees who have never been promoted are eligible for promotion in 1.5 years. All other employees become eligible 2.5 years after their last promotion. See the following code:

REPLACE VIEW testschema.eligible_for_promo AS
SELECT 
  id
, name
, last_promo_date
, CASE WHEN last_promo_date is NULL THEN current_date + INTERVAL '18' MONTH
       ELSE last_promo_date + INTERVAL '2-06' YEAR TO MONTH
  END eligible_date
FROM employees
;

AWS SCT converts the INTERVAL values used in the CASE statement and translates the date expressions accordingly:

CREATE OR REPLACE VIEW testschema.eligible_for_promo (id, name, last_promo_date, eligible_date) AS
SELECT
  id
, name
, last_promo_date
, CASE
    WHEN last_promo_date IS NULL THEN dateadd(MONTH, 18, CURRENT_DATE)::DATE
    ELSE dateadd(MONTH, 30, last_promo_date)::DATE
  END AS eligible_date
FROM testschema.employees
;

We’re excited about INTERVAL automation in AWS SCT and encourage you to give it a try. For more information about getting started with AWS SCT, see Installing, verifying, and updating AWS SCT.

PERIOD data type

A PERIOD data value represents a duration of time with a specified start and end. For example, the Teradata literal “(2021-01-01 to 2021-01-31)” is a period with a duration of 31 days that starts and ends on the first and last day of January 2021, respectively. PERIOD data types can have three different granularities: DATE, TIME, or TIMESTAMP. The following table provides some examples.

Period Type Example
PERIOD(DATE) “(2021-01-01 to 2021-01-31) “
PERIOD(TIME) “(12:00:00 to 13:00:00)”
PERIOD(TIMESTAMP) “(2021-01-31 00:00:00 to 2021-01-31 23:59:59)”

As with INTERVAL, the PERIOD data type isn’t natively supported by Amazon Redshift. Previously, if you used these data types in your tables, you had to write custom code as part of the database conversion process.

Now, AWS SCT automatically converts PERIOD data types for you. AWS SCT converts a PERIOD column into two DATE (or TIME or TIMESTAMP) columns as appropriate on Amazon Redshift. Then AWS SCT converts your application code that uses the column to emulate the source engine semantics.

For example, consider the following Teradata table:

CREATE SET TABLE testschema.period_table (
  id INTEGER
, period_col PERIOD(timestamp)) 
UNIQUE PRIMARY INDEX (id);

AWS SCT converts the PERIOD(TIMESTAMP) column into two TIMESTAMP columns in Amazon Redshift:

CREATE TABLE IF NOT EXISTS testschema.period_table(
  id INTEGER
, period_col_begin TIMESTAMP
, period_col_end TIMESTAMP
)
DISTSTYLE KEY
DISTKEY
(id)
SORTKEY
(id);

Now, let’s look at a simple example of how you can use AWS SCT to convert your application code. A common operation in Teradata is to extract the starting (or ending) timestamps in a PERIOD value using the BEGIN and END built-in functions:

REPLACE VIEW testschema.period_view_begin_end AS 
SELECT 
  BEGIN(period_col) AS period_start
, END(period_col) AS period_end 
FROM testschema.period_table
;

AWS SCT converts the view to reference the transformed table columns:

CREATE OR REPLACE VIEW testschema.period_view_begin_end (period_start, period_end) AS
SELECT
  period_col_begin AS period_start
, period_col_end AS period_end
FROM testschema.period_table;

We’ll continue to build automation for PERIOD data conversion, so stay tuned for more improvements. In the meantime, you can try out the PERIOD data type conversion features in AWS SCT now. For more information, see Installing, verifying, and updating AWS SCT.

Type casting

Some data warehouse engines, like Teradata, provide an extensive set of rules to cast data values in expressions. These rules permit implicit casts, where the target data type is inferred from the expression, and explicit casts, which typically use a function to perform the type conversion.

Previously, you had to manually convert implicit cast operations in your SQL code. Now, we’re happy to share that AWS SCT automatically converts implicit casts as needed. This feature is available now for the following set of high-impact Teradata data types.

Category Source data type Target data types
Numeric CHAR BIGINT
NUMBER
TIMESTAMP
VARCHAR NUMBER
NUMERIC
DEC
CHAR
GEOMETRY
INTEGER DATE
DEC
BIGINT DATE
NUMBER CHARACTER
VARCHAR
DEC
DECIMAL DATE
TIMESTAMP
SMALLINT
DOUBLE PRECISION
FLOAT DEC
Time DATE BIGINT
INTEGER
DECIMAL
FLOAT
NUMBER
CHARACTER
TIMESTAMP
INTERVAL NUMBER
BIGINT
INTEGER
Other GEOMETRY DECIMAL

Let’s look at how to cast numbers to DATE. Many Teradata applications treat numbers and DATE as equivalent values. Internally, Teradata stores DATE values as INTEGER. The rules to convert between an INTEGER and a DATE are well-known and developers have commonly exploited this information to perform date calculations using INTEGER arithmetic.

For example, consider the following Teradata table:

CREATE TABLE testschema.employees (
  id INTEGER
, name VARCHAR(20) CHARACTER SET LATIN
, manager_id INTEGER
, last_promo_date DATE FORMAT 'YY/MM/DD')
UNIQUE PRIMARY INDEX ( id );

We insert a single row of data into the table:

select * from employees;

 *** Query completed. One row found. 4 columns returned. 
 *** Total elapsed time was 1 second.

         id  name                   manager_id  last_promo_date
-----------  --------------------  -----------  ---------------
        112  Britney                       201                ?

We use a macro to update the last_promo_date field for id = 112. The macro accepts a BIGINT parameter to populate the DATE field.

replace macro testschema.set_last_promo_date(emp_id integer, emp_promo_date bigint) AS (
update testschema.employees
set last_promo_date = :emp_promo_date
where id = :emp_id;
);

Now, we run the macro and check the value of the last_promo_date attribute:

exec testschema.set_last_promo_date(112, 1410330);

 *** Update completed. One row changed. 
 *** Total elapsed time was 1 second.


select * from employees;

 *** Query completed. One row found. 4 columns returned. 
 *** Total elapsed time was 1 second.

         id  name                   manager_id  last_promo_date
-----------  --------------------  -----------  ---------------
        112  Britney                       201         41/03/30

You can see the last_promo_date attribute is set to the date March 30, 2041.

Now, let’s use AWS SCT to convert the table and macro to Amazon Redshift. As we saw in Part 1 of this series, AWS SCT converts the Teradata macro into an Amazon Redshift stored procedure:

CREATE TABLE IF NOT EXISTS testschema.employees(
  id INTEGER
, name CHARACTER VARYING(20) 
, manager_id INTEGER
, last_promo_date DATE
)
DISTSTYLE KEY
DISTKEY
(id)
SORTKEY
(id);

CREATE OR REPLACE PROCEDURE testschema.set_last_promo_date(par_emp_id INTEGER, par_emp_promo_date BIGINT)
AS $BODY$
BEGIN
    UPDATE testschema.employees
    SET last_promo_date = TO_DATE((par_emp_promo_date + 19000000), 'YYYYMMDD')
        WHERE id = par_emp_id;
END;
$BODY$
LANGUAGE plpgsql;

Note that 20410330 = 1410330 + 19000000; so adding 19,000,000 to the input returns the correct date value 2041-03-30.

Now, when we run the stored procedure, it updates the last_promo_date as expected:

myredshift=# select * from testschema.employees;
 id  |  name   | manager_id | last_promo_date
 112 | Britney |        201 |
(1 row)

myredshift=# call testschema.set_last_promo_date(112, 1410330);
CALL

myredshift=# select * from testschema.employees;
 id  |  name   | manager_id | last_promo_date
 112 | Britney |        201 | 2041-03-30
(1 row)

Automatic data type casting is available in AWS SCT now. You can download the latest version and try it out.

BLOB data

Amazon Redshift doesn’t have native support for BLOB columns, which you use to store large binary objects like text or images.

Previously, if you were migrating a table with a BLOB column, you had to manually move the BLOB values to file storage, like Amazon Simple Storage Service (Amazon S3), then add a reference to the S3 file in the table. Using Amazon S3 as the storage target for binary objects is a best practice because these objects are large and typically have low analytic value.

We’re happy to share that AWS SCT now automates this process for you. AWS SCT replaces the BLOB column with a CHARACTER VARYING column on the target table. Then, when you use the AWS SCT data extractors to migrate your data, the extractors upload the BLOB value to Amazon S3 and insert a reference to the BLOB into the target table.

For example, let’s create a table in Teradata and populate it with some data:

CREATE SET TABLE TESTSCHEMA.blob_table (
  id INTEGER
, blob_col BLOB(10485760))
PRIMARY INDEX ( id );

select * from blob_table;

 *** Query completed. 2 rows found. 2 columns returned. 
 *** Total elapsed time was 1 second.

         id blob_col
----------- ---------------------------------------------------------------
          1 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
          2 BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB

Now, we convert the table with AWS SCT and build it on Amazon Redshift:

myredshift=# \d testschema.blob_table;
                    Table "testschema.blob_table"
Column  | Type                     | Collation | Nullable | Default 
id      | integer                  |.          |          | 
blob_col | character varying(1300) |           |          |

Then we use the AWS SCT data extractors to migrate the table data from Teradata to Amazon Redshift.

When we look at the table in Amazon Redshift, you can see the paths to the S3 files that contain the BLOB values:

myredshift=# select * from testschema.blob_table;
(2 rows)

 id |                                                               blob_col                                                               
  2 | s3://<bucket name>/data/c12f53330dd3427a845a77f143d4a1a1/dbdee8e0485c481dad601fd6170fbfb4_lobs/2/308b6f0a902941e793212058570cdda5.dat
  1 | s3://<bucket name>/data/c12f53330dd3427a845a77f143d4a1a1/dbdee8e0485c481dad601fd6170fbfb4_lobs/2/a7686067af5549479b52d81e83c3871e.dat

And on Amazon S3, you can see the actual data files. There are two, one for each BLOB value:

$ aws s3 ls s3://<bucket name>/data/c12f53330dd3427a845a77f143d4a1a1/dbdee8e0485c481dad601fd6170fbfb4_lobs/2/
2021-05-13 23:59:47         23 522fee54fda5472fbae790f43e36cba1.dat
2021-05-13 23:59:47         24 5de6c53831f741629476e2c2cbc6b226.dat

BLOB support is available now in AWS SCT and the AWS SCT data extractors. Download the latest version of the application and try it out today.

Multi-byte CHARACTER conversion

Teradata supports multibyte characters in CHARACTER data columns, which are fixed length fields. Amazon Redshift supports multibyte characters in CHARACTER VARYING fields but not in fixed-length CHARACTER columns.

Previously, if you had fixed-length CHARACTER columns, you had to determine if they contained multibyte character data, and increase the target column size as appropriate.

AWS SCT now bridges this gap for you. If your Teradata tables contain CHARACTER columns with multibyte characters, AWS SCT automatically converts these columns to Amazon Redshift CHARACTER VARYING fields and sets the column sizes accordingly. Consider the following example, which contains four columns, a LATIN column that contains only single-byte characters, and UNICODE, GRAPHIC, and KANJISJIS columns that can contain multi-byte characters:

create table testschema.char_table (
  latin_col char(70) character set latin
, unicode_col char(70) character set unicode
, graphic_col char(70) character set graphic
, kanjisjis_col char(70) character set kanjisjis
);

AWS SCT translates the LATIN column to a fixed length CHARACTER column. The multi-byte columns are upsized and converted to CHARACTER VARYING:

CREATE TABLE IF NOT EXISTS testschema.char_table (
  latin_col CHARACTER(70)
, unicode_col CHARACTER VARYING(210)
, graphic_col CHARACTER VARYING(210)
, kanjisjis_col CHARACTER VARYING(210)
)
DISTSTYLE KEY
DISTKEY
(col1)
SORTKEY
(col1);

Automatic conversion for multibyte CHARACTER columns is available in AWS SCT now.

GEOMETRY data type size

Amazon Redshift has long supported geospatial data with a GEOMETRY data type and associated spatial functions.

Previously, Amazon Redshift restricted the maximum size of a GEOMETRY column to 64 KB, which constrained some customers with large objects. Now, we’re happy to share that the maximum size of GEOMETRY objects has been increased to just under 1 MB (specifically, 1,048,447 bytes).

For example, consider the following Teradata table:

create table geometry_table (
 id INTEGER
, geometry_col1 ST_GEOMETRY 
, geometry_col2 ST_GEOMETRY(1000)
, geometry_col3 ST_GEOMETRY(1048447) 
, geometry_col4 ST_GEOMETRY(10484470)
, geometry_col5 ST_GEOMETRY INLINE LENGTH 1000
)
;

You can use AWS SCT to convert it to Amazon Redshift. The converted table definition is as follows. A size specification isn’t needed on the converted columns because Amazon Redshift implicitly sets the column size.

CREATE TABLE IF NOT EXISTS testschema.geometry_table(
id INTEGER,
geometry_col1 GEOMETRY,
geometry_col2 GEOMETRY,
geometry_col3 GEOMETRY,
geometry_col4 GEOMETRY,
geometry_col5 GEOMETRY
)
DISTSTYLE KEY
DISTKEY
(
id
)
SORTKEY
(
id
);
ALTER TABLE testschema.geometry_table ALTER DISTSTYLE AUTO;
ALTER TABLE testschema.geometry_table ALTER SORTKEY AUTO;

Large GEOMETRY columns are available in Amazon Redshift now. For more information, see Querying spatial data in Amazon Redshift.

Conclusion

We’re happy to share these new features with you. If you’re contemplating a migration to Amazon Redshift, these capabilities can help automate your schema conversion and preserve your investment in existing reports, applications, and ETL, as well as accelerate your query performance.

This post described a few of the dozens of new features we have recently introduced to automate your data warehouse migrations to Amazon Redshift. We will share more in upcoming posts. You’ll hear about additional SQL automation, a purpose-built scripting language for Amazon Redshift with BTEQ compatibility, and automated support for proprietary SQL features.

Check back soon for more information. Until then, you can learn more about Amazon Redshift and the AWS Schema Conversion Tool on the AWS website. Happy migrating!


About the Author

Michael Soo is a database engineer with the AWS DMS and AWS SCT team at Amazon Web Services.

Establish private connectivity between Amazon QuickSight and Snowflake using AWS PrivateLink

Post Syndicated from Maxwell Moon original https://aws.amazon.com/blogs/big-data/establish-private-connectivity-between-amazon-quicksight-and-snowflake-using-aws-privatelink/

Amazon QuickSight is a scalable, serverless, embeddable, machine learning-powered business intelligence (BI) service built for the cloud. QuickSight lets you easily create and publish interactive BI dashboards that include Machine Learning-powered insights. QuickSight dashboards can be accessed from any device, and seamlessly embedded into your applications, portals, and websites.

QuickSight offers several sources for data, including but not limited to Amazon Athena, Amazon Redshift, Amazon Simple Storage Service (Amazon S3), and Snowflake. This post presents solutions to enable you to set up Snowflake as a data source for QuickSight regardless of your network configuration requirements.

We cover the following configurations of Snowflake as a data source for QuickSight:

  • QuickSight connection to Snowflake via AWS PrivateLink
  • QuickSight connection to Snowflake via AWS PrivateLink and virtual private cloud (VPC) peering (same Region)
  • QuickSight connection to Snowflake via AWS PrivateLink and VPC peering (cross-Region)
  • QuickSight connection to Snowflake (public network)

Prerequisites

To complete this solution, you need the following:

QuickSight connection to Snowflake via AWS PrivateLink

First, we show you how to connect to Snowflake with QuickSight over AWS PrivateLink. The following diagram illustrates the solution architecture.

Set up the Snowflake AWS PrivateLink integration

To start, we walk through enabling AWS PrivateLink for your Snowflake account. This includes locating resources in your AWS account, access to the Snowflake UI, and creating a support case with Snowflake.

  1. Identify the VPC you want to use to set up the AWS PrivateLink integration. To do so, retrieve a list of VPCs from the command line, then retrieve the VpcId element from the resulting JSON object for the desired VPC. See the following code:
aws ec2 describe-vpcs --output json
  1. Retrieve your AWS account ID. This post assumes that the account you’re targeting is your default account on your AWS CLI configuration.
aws sts get-caller-identity --output json
  1. If you’re setting up multiple accounts, repeat these steps for all accounts and VPCs (this post assumes you’re setting up a single account and VPC and will use this as the context moving forward).
  2. Contact Snowflake Support with your AWS account ID, VPC ID, and the corresponding account URL you use to access Snowflake (for example, <account id>.snowflakecomputing.com).

Enabling AWS PrivateLink for your Snowflake account can take up to two business days.

  1. After AWS PrivateLink is enabled, retrieve the AWS PrivateLink configuration for your Region by running the following command in a Snowflake worksheet, then retrieve the values for privatelink-account-url and privatelink_ocsp-url from the resulting JSON object. Examples of each value are as follows:
select SYSTEM$GET_PRIVATELINK_CONFIG();

privatelink-vpce-id: com.amazonaws.vpce.<region_id>.vpce-svc-xxxxxxxxxxxxxxxxx
privatelink-account-url: xxxxxxxx.<region>.privatelink.snowflakecomputing.com
privatelink_ocsp-url: ocsp.xxxxxxxx.<region>.privatelink.snowflakecomputing.com
  1. Store these values in a text editor for later use.

Next, we configure the VPC endpoint on the Amazon Virtual Private Cloud (Amazon VPC) console create all the required security groups.

  1. On the Amazon VPC console, choose Endpoints in the navigation menu.
  2. Choose Create endpoint.
  3. Select Find AWS Service by Name.
  4. For Service Name, enter the value for privatelink-vpce-id that we retrieved earlier.
  5. Choose Verify.

A green alert with “Service Name Found” appears and the VPC and subnet options automatically expand.

Depending on your targeted Region, your resulting screen may show another Region name.

  1. Choose the same VPC ID that you sent to Snowflake.
  2. Select the subnets where you want to create endpoints.

AWS recommends using more than one subnet for high availability.

  1. For Security group, choose Create a new security group.

This opens the Security groups page on the Amazon VPC console in a new tab.

  1. Choose Create security group.

  1. Give your new security group a name (for example, quicksight-doc-snowflake-privatelink-connection) and description.
  2. Choose the VPC ID you used in previous steps.

Next, you add two rules that allow traffic from within your VPC to this VPC endpoint.

  1. Retrieve the CIDR block for your targeted VPC:
aws ec2 describe-vpcs --vpc-ids vpc-xxxxxxxxxxxxxxxxx | jq -r '.Vpcs[].CidrBlock'
  1. Choose Add rule in the Inbound rules
  2. Choose HTTPS for the type, leave the source as Custom, and enter the value retrieved from the preceding describe-vpcs call (for example, 10.0.0.0/16).
  3. Choose Add rule in the Inbound rules
  4. Choose HTTP for the type, leave the source as Custom, and enter the value retrieved from the preceding describe-vpcs
  5. Choose Create security group.

  1. Retrieve the security group ID from the newly created security group.
  2. On the VPC endpoint configuration page, remove the default security group.
  3. Search for and select the new security group ID.

  1. Choose Create endpoint.

You’re redirected to a page that has a link to your VPC endpoint configuration, specified by the VPC ID. The next page has a link to view the configuration in full.

  1. Retrieve the topmost record in the DNS names list.

This can be differentiated from other DNS names because it only includes the Region name (such as us-west-2), and no Availability Zone letter notation (such as us-west-2a).

  1. Store this record in a text editor for later use.

Configure DNS for Snowflake endpoints in your VPC

To configure your Snowflake endpoints, complete the following steps:

  1. On the Route 53 console, choose Hosted Zones in the navigation pane.
  2. Choose Create hosted zone.
  3. For Domain name, enter the value you stored for privatelink-account-url from the previous steps.

In this field, we remove the Snowflake account ID from the DNS name and only use the value starting with the Region identifier (for example, <region>.privatelink.snowflakecomputing.com). We create a resource record set later for the subdomain.

  1. For Type, select Private hosted zone.

Your Region code may not be us-west-2; reference the DNS name returned to you by Snowflake.

  1. In the VPCs to associate with the hosted zone section, choose the Region in which your VPC is located and the VPC ID used in previous steps.

  1. Choose Create hosted zone.

Next. we create two records: one for privatelink-account-url and one for privatelink_ocsp-url.

  1. On the Hosted zones page, choose Create record set.
  2. For Record name, enter your Snowflake account ID (the first eight characters in privatelink-account-url).
  3. For Record type, choose CNAME.
  4. For Value, enter the DNS name for the Regional VPC endpoint we retrieved in the previous section.
  5. Choose Create records.

  1. Repeat these steps for the OCSP record we notated as privatelink-ocsp-url earlier, starting with ocsp through the eight-character Snowflake ID for the record name (for example, ocsp.xxxxxxxx).

Configure a Route 53 resolver inbound endpoint for your VPC

QuickSight doesn’t use the standard AWS resolver (the VPC’s .2 resolver). To resolve private DNS from QuickSight, you need to set up Route 53 resolver endpoints.

First, we create a security group for the Route 53 resolver inbound endpoint.

  1. On the Security groups page of the Amazon VPC console, choose Create security group.
  2. Enter a name for your security group (for example, quicksight-doc-route53-resolver-sg) and a description.
  3. Choose the VPC ID used in previous steps.
  4. Create rules that allow for DNS (Port 53) over UDP and TCP from within the VPC CIDR block.
  5. Choose Create security group.
  6. Note the security group ID, because we now add a rule to allow traffic to the VPC endpoint security group.

Now we create the Route 53 resolver inbound endpoint for our VPC.

  1. On the Route 53 console, choose Inbound endpoint in the navigation pane.
  2. Choose Create inbound endpoint.
  3. For Endpoint name, enter a name (for example, quicksight-inbound-resolver).
  4. For VPC in the Region, choose the VPC ID used in previous steps.
  5. For Security group for the endpoint, choose the security group ID you saved earlier.

  1. In the IP address section, choose two Availability Zones and subnets, and leave Use an IP address that is selected automatically selected.

  1. Choose Submit.
  2. Choose the inbound endpoint after it’s created and take note of the two IP addresses for the resolvers.

Connect a VPC to QuickSight

To connect a VPC to QuickSight, complete the following steps:

  1. On the Security groups page of the Amazon VPC console, choose Create security group.
  2. Enter a name (for example, quicksight-snowflake-privatelink-sg) and a description.
  3. Choose the VPC ID used in previous steps.

Security groups for QuickSight are different from other security groups in that they are stateless, rather than stateful. This means you must explicitly allow return traffic from the targeted security group. The inbound rule in your security group must allow traffic on all ports. It needs to do this because the destination port number of any inbound return packets is set to a randomly allocated port number. For more information, see Inbound Rules.

  1. Choose Create security group.
  2. Take note of the security group ID, because we now add a rule to allow traffic to the VPC endpoint security group.
  3. On the Security groups page, search for the security group ID that is used for the VPC endpoint.
  4. Choose Edit inbound rules.
  5. Add rules for both HTTPS and HTTP traffic, using the security group ID for the security group you created as the source.
  6. Choose Save rules.

Next, we move to the QuickSight console to configure the VPC connection.

  1. Navigate to the QuickSight console.
  2. Choose the user name and choose Manage QuickSight.

  1. In the navigation pane, choose Manage VPC connections.
  2. Choose Add a VPC connection.
  3. For VPC connection name, enter a name (for example, snowflake-privatelink).
  4. For VPC ID, choose the VPC used in previous steps.
  5. For Subnet ID, choose one of the subnets that has a VPC endpoint, as specified when you created the endpoint earlier.
  6. For Security group ID, enter the ID of the security group you created.
  7. For DNS resolver endpoints, enter the two IPs for the inbound resolver endpoint you created earlier.

  1. Choose Create.

Set up a Snowflake data source through the VPC

To set up a Snowflake data source, complete the following steps.

  1. On the QuickSight console, choose Datasets in the navigation page.
  2. Choose New dataset.
  3. Choose the Snowflake option.
  4. For Data source name, enter a name (for example, snowflake).
  5. For Connection type¸ choose the VPC connection you created earlier (snowflake-privatelink).
  6. For Database server, enter privatelink-account-url.
  7. For Database name, enter the name of your database.
  8. For Warehouse, enter the name of a running Snowflake warehouse.
  9. For Username, enter your Snowflake username.
  10. For Password, enter your Snowflake password.
  11. Choose Validate.
  12. Upon successful validation, choose Create data source.

Create your first QuickSight dashboard

In this section, we cover creating a dataset in QuickSight, then using this data in a visualization. We’re using a dummy dataset that has information about fictional employees.

  1. For Schema, choose your schema.
  2. For Tables, select your tables.
  3. Choose Select.

In the Finish dataset creation section, you can determine if QuickSight imports your dataset into SPICE to improve query performance or directly queries your data each time a dashboard is loaded. For more information about SPICE, see Importing Data into SPICE.

  1. For this post, we select Import to SPICE for quicker analytics.
  2. Choose Visualize.

Now that we have the schema, table, and SPICE configuration for the dataset, we can create our first visualization.

  1. Choose a field from the available fields list. For this post, we choose City.
  2. Choose a visualization in the Visual types

This only scratches the surface of the visualization capabilities of QuickSight. For more information, see Working with Amazon QuickSight Visuals.

Next, we cover a network configuration that allows for QuickSight to be connected to one VPC with AWS PrivateLink in another VPC, and use VPC peering to allow QuickSight to use the AWS PrivateLink connection.

QuickSight connection to Snowflake via AWS PrivateLink and VPC peering within the same Region

In this section, we show you how to connect to Snowflake with QuickSight with two VPCs peered and AWS PrivateLink. The following diagram illustrates the solution architecture.

Set up VPC peering

First, we create the VPC peering connection from the requesting VPC.

  1. On the Peering connections page of the Amazon VPC console, choose Create peering connection.
  2. For Select a local VPC to peer with, choose the VPC in which you configured your Snowflake AWS PrivateLink connection.

  1. In the Select another VPC to peer with section, leave the default options for Account and Region (My account and This Region, respectively).
  2. For VPC (Accepter), choose the VPC where your QuickSight is connected to.

  1. Choose Create peering connection.

Next, we accept the VPC connection from the accepting VPC.

  1. On the Peering connections page, select the connection you created.
  2. On the Actions menu, choose Accept.
  3. Review the information about the request. If everything looks correct, choose Yes, Accept.

Next, we configure DNS to resolve between the two VPCs.

  1. On the Peering connections page, choose your new peering connection.
  2. On the DNS tab, check if the two options show as Disabled.

If they’re enabled, you can skip to the steps on creating route tables.

  1. On the Actions menu, choose Edit DNS Settings.

This requires your VPC to have DNS host name and resolution enabled.

  1. Select both check boxes to allow DNS to resolve from both the acceptor and requestor VPCs.
  2. Choose Save.

Next, create the route table entry to allow for routes to propagate between the two VPCs.

  1. On the Route tables page, choose the route tables in your requesting VPC.
  2. On the Route tab, choose Edit routes.
  3. Add a route for the CIDR block that your peered VPC uses (for this post, 172.31.0.0/16).
  4. Choose Save routes.

  1. Repeat for the route tables in your accepter VPC.

Configure DNS in the accepter VPC

In this section, we associate the accepter VPC that with the same private hosted zone as the requester VPC (<region>.privatelink.snowflakecomputing.com).

  1. On the Route 53 console, choose Hosted zones in the navigation pane.
  2. Select the hosted zone <region>.privatelink.snowflakecomputing.com and choose Edit.
  3. In the VPCs to associate with the hosted zone section, choose Add VPC.
  4. Choose the Region and VPC ID associated with the accepter VPC.

  1. Choose Save changes.

Configure Route 53 resolver inbound endpoints in the accepter VPC

To configure your Route 53 resolver inbound endpoints, complete the following steps:

  1. On the Security groups page of the Amazon VPC console, choose Create security group.
  2. Enter a name (for example, quicksight-doc-route53-resolver-sg) and a description.
  3. Choose the VPC ID used in previous steps.
  4. Create rules that allow for DNS (port 53) over UDP and TCP from within the VPC CIDR block (for this post, 172.31.0.0/16).
  5. Choose Create security group.

  1. Take note of the security group ID, because we now add a rule to allow traffic to the VPC endpoint security group.

Next, we set up the Route 53 inbound endpoint for this VPC.

  1. On the Route 53 console, choose Inbound endpoint in the navigation pane.
  2. Choose Create inbound endpoint.
  3. Enter a name for the endpoint (for example, quicksight-inbound-resolver).
  4. For VPC in the Region, choose the VPC ID for the accepter VPC.
  5. For Security group, choose the security group ID you saved earlier.
  6. In the IP Address section, select two Availability Zones and subnets, and leave Use an IP address that is selected automatically
  7. Choose Submit.
  8. Choose the inbound endpoint after it’s created.
  9. After the inbound endpoint has provisioned, note the two IP addresses for the resolvers.

Connect the accepter VPC to QuickSight

To start, we need to create a security group for QuickSight to allow traffic to the Route 53 resolver inbound endpoints, the VPC endpoint for AWS PrivateLink, and traffic within the local network.

  1. On the Security groups page of the Amazon VPC console, choose Create security group.
  2. Enter a name (for example, quicksight-snowflake-privatelink-vpc-peering-sg) and a description.
  3. Choose the VPC ID for the accepter VPC.
  4. Create the following ingress rules:
    1. One rule for the local network for all TCP ports (e.g., 172.31.0.0/16).
    2. One rule allowing DNS traffic from the security group for the Route 53 resolver inbound endpoint for all TCP ports.
    3. One rule allowing DNS traffic from the security group for the Route 53 resolver inbound endpoint for all UDP ports.
    4. One rule allowing traffic to the security group for the VPC endpoint (located in the peered VPC).

As discussed earlier, security groups for QuickSight are different from other security groups. You must explicitly allow return traffic from the targeted security group, and the inbound rule in your security group must allow traffic on all ports. For more information, see Inbound Rules.

Next, we modify the security group for the Route 53 resolver inbound endpoint to allow traffic from the security group we created.

  1. On the Security groups page, search for the security group ID used for the Route 53 resolver inbound endpoint.
  2. Choose Edit inbound rules.
  3. Add rules for both DNS over UDP and DNS over TCP, using the security group ID for the security group we created for QuickSight as the source.

  1. Choose Save rules.

Next, modify the security group that was created for the VPC endpoint for the AWS PrivateLink connection.

  1. On the Security groups page, search for the security group ID used for the VPC endpoint for the AWS PrivateLink connection.
  2. Choose Edit inbound rules.
  3. Add rules for both HTTPS and HTTP, using the security group ID for the security group created for QuickSight as the source.
  4. Choose Save rules.

Next, we set up the VPC connection in QuickSight.

  1. On the QuickSight console, choose the user name and choose Manage QuickSight.
  2. In the navigation pane, choose Manage VPC connections.
  3. Choose Add a VPC connection.
  4. For VPC connection name¸ enter a name (for example, snowflake-privatelink-vpc-peering).
  5. For Subnet, choose a subnet ID that has a route table with a peering connection to the requester VPC where the AWS PrivateLink connection resides.
  6. For Security group ID, enter the ID of the security group created earlier.
  7. For DNS resolver endpoints, enter the two IPs for the inbound resolver endpoint you created.
  8. Choose Create.

Set up a Snowflake data source in QuickSight through the VPC

To set up a Snowflake data source in QuickSight, complete the following steps:

  1. On the QuickSight console, choose Datasets in the navigation pane.
  2. Choose New dataset.
  3. Choose the Snowflake option.
  4. Enter a data source name (for example, snowflake-dataset).
  5. Choose the VPC connection you created (snowflake-privatelink).
  6. For Database server, enter the privatelink-account-url.
  7. For Database name, enter the name of your database.
  8. For Warehouse, enter the name of a running Snowflake warehouse.
  9. For Username, enter your Snowflake username.
  10. For Password, enter your Snowflake password.
  11. Choose Validate.
  12. Upon successful validation, choose Create data source.

For steps to create a dashboard, see the earlier section, Create your first QuickSight dashboard.

In the next section, we cover a similar network configuration, with the difference being that we use cross-Region VPC peering.

QuickSight connection to Snowflake via AWS PrivateLink and VPC peering across Regions

In this section, we show you how to connect to Snowflake with QuickSight over AWS PrivateLink with two VPCs peered across Regions.

We refer to Regions generically throughout this post, denoting the Region that has the Snowflake AWS PrivateLink connection as Region A and the Region in which QuickSight is set up as Region B.

The following diagram illustrates our solution architecture.

Set up VPC peering between two Regions

First, we create the VPC peering connection from the requesting VPC.

  1. Navigate to the Peering connections page on the Amazon VPC console in Region B (the Region in which you plan to use QuickSight to deploy dashboards).
  2. Choose Create peering connection.
  3. In the Select a local VPC to peer with section, for VPC (Requester), choose the VPC in which you have connected or intend to connect QuickSight.

  1. For Select another VPC to peer with, select My account and Another Region.
  2. Choose the Region in which your Snowflake AWS PrivateLink connection exists.
  3. For VPC ID (Accepter), enter the VPC ID for the VPC in which your Snowflake AWS PrivateLink exists.

  1. Choose Create peering connection.
  2. Copy the VPC peering connection ID so we can easily locate it in the next steps (it looks like pcx-xxxxxxxxxxxx).

Next, we accept the VPC peering connection from the Region in which you created your AWS PrivateLink connection.

  1. Navigate to the Amazon VPC console in Region A (where your Snowflake AWS PrivateLink connection exists).
  2. Search for and select the peering connection you created.
  3. On the Actions menu, choose Accept Request.

  1. Review the information about the request. If everything looks correct, choose Yes, Accept.

Next, we configure DNS to resolve between the two VPCs.

  1. On the Peering connections page of the Amazon VPC console, choose your newly created VPC peering connection.
  2. On the DNS tab, check if the two options show Disabled.

If they’re enabled, skip to the steps on creating route tables.

  1. On the Actions menu, choose Edit DNS settings.

This requires your VPC to have DNS host name and resolution enabled.

  1. Select both check boxes to allow DNS to resolve from both the accepter and requestor VPCs.
  2. Choose Save.

Next, we create the route table entry to allow for routes to propagate between the two VPCs for Region B.

  1. Navigate to the Amazon VPC console in Region B (the Region in which you plan to use QuickSight to deploy dashboards).
  2. In the navigation pane, choose Route tables.
  3. Select the route tables in your requesting VPC.
  4. On the Route tab, choose Edit routes.
  5. Add a route for the CIDR block that your peered VPC uses (for this post, 10.0.0.0/16 is the CIDR block for the VPC in which the Snowflake AWS PrivateLink connection resides).
  6. Choose Save routes.

Next, create the route table entry to allow for routes to propagate between the two VPCs for Region A.

  1. Navigate to the Amazon VPC console in Region A (where your Snowflake AWS PrivateLink connection exists).
  2. Repeat the previous steps, using the CIDR block for the peered VPC (in this post, 172.16.0.0/16).

Configure DNS in the VPC in Region B

First, we need to associate the VPC in Region B (where you deploy QuickSight) with the same private hosted zone as the VPC in Region A where your Snowflake AWS PrivateLink connection exists (<region>.privatelink.snowflakecomputing.com).

  1. On the Route 53 console, choose Hosted zones in the navigation pane.
  2. Select the private hosted zone <region>.privatelink.snowflakecomputing.com and choose Edit.
  3. In the VPCs to associate with the hosted zone section, choose Add VPC.
  4. Choose the Region and VPC ID associated with the accepter VPC.

  1. Choose Save changes.

Configure the Route 53 resolver inbound endpoint for your VPC in Region B

To configure the resolver inbound endpoint in Region B, complete the following steps:

  1. On the Security groups page on the Amazon VPC console, choose Create security group.
  2. Enter a name (for example, quicksight-doc-route53-resolver-sg) and a description.
  3. Choose the VPC ID used in previous steps.
  4. Create rules that allow for DNS (port 53) over UDP and TCP from within the VPC CIDR block (for this post, 172.16.0.0/16).

  1. Choose Create security group.
  2. Take note the security group ID, because we now add a rule to allow traffic to the VPC endpoint security group.

Next, we set up the Route 53 inbound endpoint for this VPC.

  1. On the Route 53 console, choose Inbound endpoint in the navigation pane.
  2. Choose Create inbound endpoint.
  3. Enter a name for the endpoint (for example, quicksight-inbound-resolver).
  4. For VPC in the Region, choose the VPC ID used in previous steps.
  5. For Security group, choose the security group ID from the previous step.
  6. In the IP Address section, select two Availability Zones and subnets, and leave Use an IP address that is selected automatically
  7. Choose Submit.
  8. Choose the inbound endpoint after it’s created.
  9. After the inbound endpoint has provisioned, note the two IP addresses for the resolvers.

Connect the VPC to QuickSight in Region B

To start, we need to create a security group for QuickSight to allow traffic to the Route 53 resolver inbound endpoints, the VPC endpoint for AWS PrivateLink, and traffic within the local network.

  1. On the Security groups page of the Amazon VPC console in Region B, choose Create security group.
  2. Enter a name (for example, quicksight-snowflake-sg) and a description.
  3. Choose the VPC ID for the VPC where you previously created the VPC peering connection.
  4. Create the following ingress rules:
    1. One for the local network all TCP ports (for example, 172.16.0.0/16).
    2. One rule allowing DNS traffic from the security group for the Route 53 resolver inbound endpoint for all TCP ports.
    3. One rule allowing DNS traffic from the security group for the Route 53 resolver inbound endpoint for all UDP ports.
    4. One allowing traffic for all TCP ports to the CIDR block for the VPC located in Region A, where your Snowflake AWS PrivateLink connection exists (for this post, 10.0.0.0/16).

As discussed earlier, security groups for QuickSight are different from other security groups. You must explicitly allow return traffic from the targeted security group, and the inbound rule in your security group must allow traffic on all ports. For more information, see Inbound Rules.

Next, we modify the security group for the Route 53 resolver inbound endpoint in Region B to allow traffic from the security group we created.

  1. On the Security groups page, search for the security group ID used for the Route 53 resolver inbound endpoint.
  2. Choose Edit inbound rules.
  3. Add rules for both DNS over UDP and DNS over TCP, using the CIDR block for the VPC in Region B (for this post, 172.16.0.0/16).

  1. Choose Save rules.

Next, we need to modify the security group we’re using for the AWS PrivateLink connection.

  1. Navigate to the Security groups page on the Amazon VPC console in Region A.
  2. Search for the security group ID that is used for the VPC endpoint for the AWS PrivateLink connection.
  3. Choose Edit inbound rules.
  4. Add rules for both HTTPS and HTTP, using the CIDR Block for the VPC in Region B as the source (for this post, 172.16.0.0/16).

  1. Choose Save rules.

Finally, we set up the QuickSight VPC connection.

  1. Navigate to the QuickSight console in Region B.
  2. Choose the user name and choose Manage QuickSight.
  3. In the navigation pane, choose Manage VPC connection.
  4. Choose Add a VPC connection.
  5. For VPC connection name, enter a connection name (for example, snowflake-privatelink-cross-region).
  6. For VPC ID, choose the VPC ID of the VPC in Region B.
  7. For Subnet, choose a subnet ID from the VPC in Region B that has a route table with a peering connection to the VPC where the AWS PrivateLink connection resides.
  8. For Security group ID, enter the ID of the security group you created.
  9. For DNS resolver endpoints, enter the two IPs for the inbound resolver endpoint created earlier.

  1. Choose Create.

Set up a Snowflake data source in QuickSight through the VPC

To set up a Snowflake data source in QuickSight, complete the following steps:

  1. On the QuickSight console, choose Datasets in the navigation pane.
  2. Choose New dataset.
  3. Choose the Snowflake option.
  4. Enter a name for your data source (for example, snowflake-dataset).
  5. Choose the VPC connection you created (snowflake-privatelink).
  6. For Database server, enter the privatelink-account-url.
  7. For Database name, enter the name of your database.
  8. For Warehouse, enter the name of a running Snowflake warehouse.
  9. For Username, enter your Snowflake username.
  10. For Password, enter your Snowflake password.
  11. Choose Validate.
  12. Upon successful validation, choose Create data source.

For steps to create a dashboard, see the earlier section, Create your first QuickSight dashboard.

For our last configuration, we cover how to set up a QuickSight connection to Snowflake without AWS PrivateLink.

QuickSight connection to Snowflake without AWS PrivateLink

In this section, we show you how to connect to Snowflake with QuickSight without using AWS PrivateLink.

  1. On the QuickSight console, choose Datasets in the navigation pane.
  2. Choose New dataset.
  3. Choose the Snowflake option.
  4. Enter a data source name (for example, snowflake-dataset).
  5. Leave the connection type as Public network.
  6. For Database name, enter the name of your database.
  7. For your database server, enter the URL you use to log in to your Snowflake (xxxxxxxx.snowflakecomputing.com).
  8. For Warehouse, enter the name of a running Snowflake warehouse.
  9. For Username, enter your Snowflake username.
  10. For Password, enter your Snowflake password.
  11. Choose Validate.
  12. Choose Create data source.

For steps to create a dashboard, see the earlier section, Create your first QuickSight dashboard.

Clean up

If your work with QuickSight, Snowflake, and PrivateLink is complete, remove your Route53 resolver inbound endpoint, Route 53 private host zone, and the VPC endpoint for Snowflake in order to avoid incurring additional fees.

Conclusion

In this post, we covered four scenarios for connecting QuickSight to Snowflake as a data source using AWS PrivateLink for connectivity in three different scenarios: the same VPC, with VPC peering in the same Region, and with VPC peering across Regions. We also covered how to connect QuickSight to Snowflake without AWS PrivateLink.

After you set up the data source, you can gain further insights from your data by setting up ML Insights in QuickSight, set up graphical representations of your data using QuickSight visuals, or join data from multiple datasets, as well as all other QuickSight features.


About the Author

Maxwell Moon is a Senior Solutions Architect at AWS working with Independent Software Vendors (ISVs) to design and scale their applications on AWS. Outside of work, Maxwell is a dad to two cats, is an avid supporter of the Wolverhampton Wanderers Football Club, and is patiently waiting for a new wave of ska music.

 

 

Bosco Albuquerque is a Sr Partner Solution Architect at AWS and has over 20 years of experience in working with database and analytics products, from enterprise database vendors, and cloud providers and has helped large technology companies in designing data analytics solutions as well as led engineering teams is designing and implementing data analytics platforms and data products.

Building well-architected serverless applications: Regulating inbound request rates – part 1

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/building-well-architected-serverless-applications-regulating-inbound-request-rates-part-1/

This series of blog posts uses the AWS Well-Architected Tool with the Serverless Lens to help customers build and operate applications using best practices. In each post, I address the serverless-specific questions identified by the Serverless Lens along with the recommended best practices. See the introduction post for a table of contents and explanation of the example application.

Reliability question REL1: How do you regulate inbound request rates?

Defining, analyzing, and enforcing inbound request rates helps achieve better throughput. Regulation helps you adapt different scaling mechanisms based on customer demand. By regulating inbound request rates, you can achieve better throughput, and adapt client request submissions to a request rate that your workload can support.

Required practice: Control inbound request rates using throttling

Throttle inbound request rates using steady-rate and burst rate requests

Throttling requests limits the number of requests a client can make during a certain period of time. Throttling allows you to control your API traffic. This helps your backend services maintain their performance and availability levels by limiting the number of requests to actual system throughput.

To prevent your API from being overwhelmed by too many requests, Amazon API Gateway throttles requests to your API. These limits are applied across all clients using the token bucket algorithm. API Gateway sets a limit on a steady-state rate and a burst of request submissions. The algorithm is based on an analogy of filling and emptying a bucket of tokens representing the number of available requests that can be processed.

Each API request removes a token from the bucket. The throttle rate then determines how many requests are allowed per second. The throttle burst determines how many concurrent requests are allowed. I explain the token bucket algorithm in more detail in “Building well-architected serverless applications: Controlling serverless API access – part 2

Token bucket algorithm

Token bucket algorithm

API Gateway limits the steady-state rate and burst requests per second. These are shared across all APIs per Region in an account. For further information on account-level throttling per Region, see the documentation. You can request account-level rate limit increases using the AWS Support Center. For more information, see Amazon API Gateway quotas and important notes.

You can configure your own throttling levels, within the account and Region limits to improve overall performance across all APIs in your account. This restricts the overall request submissions so that they don’t exceed the account-level throttling limits.

You can also configure per-client throttling limits. Usage plans restrict client request submissions to within specified request rates and quotas. These are applied to clients using API keys that are associated with your usage policy as a client identifier. You can add throttling levels per API route, stage, or method that are applied in a specific order.

For more information on API Gateway throttling, see the AWS re:Invent presentation “I didn’t know Amazon API Gateway could do that”.

API Gateway throttling

API Gateway throttling

You can also throttle requests by introducing a buffering layer using Amazon Kinesis Data Stream or Amazon SQS. Kinesis can limit the number of requests at the shard level while SQS can limit at the consumer level. For more information on using SQS as a buffer with Amazon Simple Notification Service (SNS), read “How To: Use SNS and SQS to Distribute and Throttle Events”.

Identify steady-rate and burst rate requests that your workload can sustain at any point in time before performance degraded

Load testing your serverless application allows you to monitor the performance of an application before it is deployed to production. Serverless applications can be simpler to load test, thanks to the automatic scaling built into many of the services. During a load test, you can identify quotas that may act as a limiting factor for the traffic you expect and take action.

Perform load testing for a sustained period of time. Gradually increase the traffic to your API to determine your steady-state rate of requests. Also use a burst strategy with no ramp up to determine the burst rates that your workload can serve without errors or performance degradation. There are a number of AWS Marketplace and AWS Partner Network (APN) solutions available for performance testing, Gatling Frontline, BlazeMeter, and Apica.

In the serverless airline example used in this series, you can run a performance test suite using Gatling, an open source tool.

To deploy the test suite, follow the instructions in the GitHub repository perf-tests directory. Uncomment the deploy.perftest line in the repository Makefile.

Perf-test makefile

Perf-test makefile

Once the file is pushed to GitHub, AWS Amplify Console rebuilds the application, and deploys an AWS CloudFormation stack. You can run the load tests locally, or use an AWS Step Functions state machine to run the setup and Gatling load test simulation.

Performance test using Step Functions

Performance test using Step Functions

The Gatling simulation script uses constantUsersPerSec and rampUsersPerSec to add users for a number of test scenarios. You can use the test to simulate load on the application. Once the tests run, it generates a downloadable report.

Gatling performance results

Gatling performance results

Artillery Community Edition is another open-source tool for testing serverless APIs. You configure the number of requests per second and overall test duration, and it uses a headless Chromium browser to run its test flows. For Artillery, the maximum number of concurrent tests is constrained by your local computing resources and network. To achieve higher throughput, you can use Serverless Artillery, which runs the Artillery package on Lambda functions. As a result, this tool can scale up to a significantly higher number of tests.

For more information on how to use Artillery, see “Load testing a web application’s serverless backend”. This runs tests against APIs in a demo application. For example, one of the tests fetches 50,000 questions per hour. This calls an API Gateway endpoint and tests whether the AWS Lambda function, which queries an Amazon DynamoDB table, can handle the load.

Artillery performance test

Artillery performance test

This is a synchronous API so the performance directly impacts the user’s experience of the application. This test shows that the median response time is 165 ms with a p95 time of 201 ms.

Performance test API results

Performance test API results

Another consideration for API load testing is whether the authentication and authorization service can handle the load. For more information on load testing Amazon Cognito and API Gateway using Step Functions, see “Using serverless to load test Amazon API Gateway with authorization”.

API load testing with authentication and authorization

API load testing with authentication and authorization

Conclusion

Regulating inbound requests helps you adapt different scaling mechanisms based on customer demand. You can achieve better throughput for your workloads and make them more reliable by controlling requests to a rate that your workload can support.

In this post, I cover controlling inbound request rates using throttling. I show how to use throttling to control steady-rate and burst rate requests. I show some solutions for performance testing to identify the request rates that your workload can sustain before performance degradation.

This well-architected question will be continued where I look at using, analyzing, and enforcing API quotas. I cover mechanisms to protect non-scalable resources.

For more serverless learning resources, visit Serverless Land.

[$] The core of the -stable debate

Post Syndicated from corbet original https://lwn.net/Articles/863505/rss

Disagreements over which patches should find their way into stable updates
are not new — or uncommon. So when the topic came up again recently, there
was little reason to expect anything but more of the same. And, for the
most part, that is what ensued but, in this exchange, we were also able to
see the core issue that drives these discussions. There are, in the
end, two fundamentally different views of what the stable tree should be.

How to Back Up Your Twitch Stream

Post Syndicated from Caitlin Bryson original https://www.backblaze.com/blog/how-to-back-up-your-twitch-stream/

Every month, millions of viewers tune in to their favorite channels live streaming League of Legends, Call of Duty, Dota, and more on Twitch. With over two million streamers creating live content each month, video games and streaming go hand in hand.

Whether you’re streaming for yourself, your friends, an audience, or you’re trying to build a brand, you’re creating a lot of great content when you stream. The problem is that most services will only protect your content for a few weeks before deleting it.

Whether you want to edit or rewatch your content for fun, to build a reel for a sponsor, or to distribute content to your adoring fans, backups of the raw and edited content are essential to make sure your hard work doesn’t disappear forever. Outside of videos, you should also consider backing up other Twitch content like stream graphics including overlays, alerts, emotes, and chat badges; your stream setup; and media files that you use on stream.

Read our guide below to learn:

  • Two methods for downloading your Twitch stream.
  • How to create a backup of your Twitch stream setup.

How to Download Your Twitch Stream

Once you finish a stream, Twitch automatically saves that broadcast as a video on demand. For most accounts, videos are saved for 14 days, but if you are a Twitch Partner or have Twitch linked to your Amazon Prime account, you have access to your videos for up to 60 days. You can also create clips up to a minute long of your streams within Twitch or upload longer videos as highlights, which are stored indefinitely.

Download Method #1

With this method, there’s almost no work required besides hitting the record button in your streaming software. Keep in mind that recording while streaming can put a strain on your output performance, so while it’s the simplest download method, it might not work best depending on your setup.

Continue reading to learn how to simultaneously stream and record a copy of your videos, or skip to method #2 to learn how to download without affecting performance during streaming.

  1. If you, like many streamers, use software like OBS or Streamlabs OBS, you have the option of simultaneously streaming your output and recording a copy of the video locally.
  2. Before you start recording, check to make sure that the folder for your local recordings is included in your computer backup system.
  3. Then, go ahead with streaming. When you’re done, the video will save to your local folder.

Download Method #2

This second method for downloading and saving your videos requires a bit more work, but the benefit is that you can choose which videos you’d like to keep without affecting your streaming performance.

  1. Once you’ve finished streaming, navigate to your Creator Dashboard.
  2. On the left side of the screen, click “Content,” then “Video Producer.” Your clips and highlights live here and can be downloaded from this panel.
  3. Find the video you’d like to download, then click the three vertical dots and choose “Download.” The menu will change to “Preparing” and may take several minutes.
  4. Once the download is ready, a save screen will appear where you can choose where you’d like to save your video on your computer.

How to Download Your Stream Setup

If you’re using streaming software like OBS, most services allow you to export your Scene Profile and back it up, which will allow you to re-import without rebuilding all of your Scenes if you ever need to restore your Profile or switch computers. In OBS, go to the Profile menu, choose “Export” to download your data, and save it in a folder on your computer.

If you also use a caption program for your streams like Webcaptioner, you can follow similar steps to export and back up your caption settings as well.

How to Back Up Your Twitch Streams and Setups

Having a backup of your original videos as well as the edited clips and highlights is fundamental because data loss can happen at any time, and losing all your work is a huge setback. In case any data loss wreaks havoc on your setup or updates change your settings, you’ll always have a backup of all of your content that you can restore to your system. We recommend keeping a local copy on your computer and an off-site backup—you can learn more about this kind of backup strategy here.

Downloading your live streams will mean saving a collection of large files that will put a strain on your system to store. By creating a cloud storage archive of data you don’t need to access regularly, you can free up space on your local system. It’s quick and easy to organize your content using buckets where you simply drag and drop the files or folders you’d like to upload and save to the cloud. Take a look at how to set up and test a cloud storage archive here.

The difference between computer backup and cloud storage is that data is stored in the cloud for both options, but in backup, the data in the cloud is a copy of the data on your computer. For cloud storage, it’s just saved data without mirroring or versioning.

If you prefer to back up your files, computer backup services automatically scan your computer for new files, so all you have to do is make sure your local recordings folder is included in your backup.

Nowadays with our data scattered across multiple platforms, it’s all the more important to make sure you have a copy saved in case your media becomes inaccessible for any reason. Take a look at our other posts about downloading and backing up your data:

Let us know in the comments about other helpful backup guides you’d like to see!

The post How to Back Up Your Twitch Stream appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

The collective thoughts of the interwebz

By continuing to use the site, you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.

Close