Security updates for Friday

Post Syndicated from original https://lwn.net/Articles/869521/rss

Security updates have been issued by CentOS (firefox and thunderbird), Fedora (haproxy, wordpress, and xen), openSUSE (apache2-mod_auth_openidc, fail2ban, ghostscript, haserl, libcroco, nextcloud, and wireshark), Oracle (kernel and kernel-container), Slackware (httpd), SUSE (crmsh, gtk-vnc, libcroco, Mesa, postgresql12, postgresql13, and transfig), and Ubuntu (libgcrypt20, linux-gcp, linux-gcp-4.15, linux-hwe-5.4, linux-oem-5.13, python3.4, python3.5, and qtbase-opensource-src).

SANS 2021 Threat Hunting Survey: How Organizations’ Security Postures Have Evolved in the New Normal

Post Syndicated from Margaret Wei original https://blog.rapid7.com/2021/09/17/sans-2021-threat-hunting-survey-how-organizations-security-postures-have-evolved-in-the-new-normal/

SANS 2021 Threat Hunting Survey: How Organizations' Security Postures Have Evolved in the New Normal

It’s that time of year once again: The SANS Institute — the most trusted resource for cybersecurity research — has conducted its sixth annual Threat Hunting Survey, sponsored by Rapid7. The goal of this survey is to better understand the current threat hunting landscape and the benefits provided to an organization’s security posture as a result of threat hunting.

This year’s survey, “A SANS 2021 Survey: Threat Hunting in Uncertain Times,” has a unique focus, one that’s taken into consideration the impact of COVID-19 and how it’s affected organizations’ threat hunting. The findings indicate that the global pandemic has had a relatively mixed impact on the organizations surveyed, with many respondents unsure of what type of impact it’s had — and will have — on their threat hunting efforts.

Here’s a preview of the survey’s findings and its takeaways for organizations navigating today’s cybersecurity landscape.

Fewer organizations are performing threat hunting in 2021

According to the survey results, 12.6% fewer organizations are performing threat hunting in 2021 when compared to those surveyed in 2020. This is concerning, as threat hunting is an ever-evolving field, and organizations that don’t dedicate resources to it won’t be able to keep pace with the changes in tactics and techniques needed to find threat actors.

But what caused this dip? It seems to be a combination of organizations reducing their external spend with third parties and their overall internal staff in response to COVID-19. That said, this reduction cannot be fully accounted for by the pandemic.

Despite this decrease, there is good news: 93.1% of respondents indicated they have dedicated threat hunting staff, and the majority of respondents plan to increase spending on staffing and tools for threat hunting in the near future. Over the year to come, we’ll likely see an extended detection and response (XDR) approach leveraging tools like InsightIDR playing a key role in these efforts.

The threat hunting toolbox is evolving

The tools organizations are using to conduct threat hunting are evolving — but have they advanced enough to keep up with the modern cybersecurity landscape?

The output of threat hunting depends on three factors: visibility, skills, and threat intelligence. To achieve this output, threat hunters need the right tools. After asking respondents about their organizations’ tool chests, SANS found that over 75% of respondents are using a tool set that includes EDRs, SIEMs, and IDS/IPS.

It should come as no surprise that these tools are at the top — these are essential to establishing visibility. What is interesting, however, is the second-place spot taken by customizable tools, followed by threat intelligence platforms. This indicates there’s room for improvement for solutions vendors regarding threat hunting — and users are looking for deep insights. Tools like Rapid7’s cloud SIEM solution that cut through the noise and surface the threats that really matter are key in today’s complex IT environments.

Overall security posture has improved — but there’s room to grow

The improvements seen in organizations’ overall security posture as a result of threat hunting continue to show steady numbers. According to the study, organizations have seen anywhere from a 10-25% improvement in their security posture from threat hunting over the last year. In addition, 72.3% of respondents claimed threat hunting had a positive improvement on their organization over time.

These are brilliant results to see, and they reinforce the positive impact threat hunting can have, even in the face of today’s extraordinary challenges.

That said, while there are clear benefits to threat hunting, there are some barriers to success for organizations, namely:

  • Over half (51.3%) of all respondents indicated the primary barrier for them as threat hunters is a lack of skilled staff and training.
  • This was closely followed (43%) by an even split of challenges between the limitations of tools or technologies and a lack of defined processes.

Organizations can start addressing these challenges in a variety of ways, including adopting best-in-class detection and response tooling and owning documentation, education, and maintenance at scale. These are manageable barriers that will come down with time, and despite a global pandemic, the overall outlook is good, as the general trend to more threat hunting appears to sustain with this year’s survey.

Hopefully, these numbers continue to increase next year, and more organizations will reap the benefits of threat hunting.

To take a deeper dive into the survey’s findings, download the full report: A SANS 2021 Survey: Threat Hunting in Uncertain Times.

Learn more about how Rapid7’s Incident Detection and Response solutions can help you protect your organization and boost your ability to swiftly thwart attackers.

Benchmarking Edge Network Performance: Akamai, Cloudflare, AWS CloudFront, Fastly, and Google

Post Syndicated from John Graham-Cumming original https://blog.cloudflare.com/benchmarking-edge-network-performance/

Benchmarking Edge Network Performance: Akamai, Cloudflare, AWS CloudFront, Fastly, and Google

Benchmarking Edge Network Performance: Akamai, Cloudflare, AWS CloudFront, Fastly, and Google

During Speed Week we’ve talked a lot about services that make the web faster. Argo 2.0 for better routing around bad Internet weather, Orpheus to ensure that origins are reachable from anywhere, image optimization to send just the right bits to the client, Tiered Cache to maximize cache hit rates and get the most out of Cloudflare’s new 25% bigger network, our expanded fiber backbone and more.

Those things are all great.

But it’s vital that we also measure the performance of our network and benchmark ourselves against industry players large and small to make sure we are providing the best, fastest service.

We recently ran a measurement experiment where we used Real User Measurement (RUM) data from the standard browser API to test the performance of Cloudflare and others in real-world conditions across the globe. We wanted to use third-party tests for this, but they didn’t have the granularity we wanted. We want to drill down to every single ISP in the world to make sure we optimize everywhere. We knew that in some places the answers we got wouldn’t be good, and we’d need to do work to improve our performance. But without detailed analysis across the entire globe we couldn’t know we were really the fastest or where we needed to improve.

In this blog post I’ll describe how that measurement worked and what the results are. But the short version is: Cloudflare is #1 in almost all cases whether you look at all the networks on the globe, or the top 1,000 largest, or the top 1,000 most active, and whether you look at mean timings or 95th percentile, and you measure how quickly a connection can be established, how long it takes for the first byte to arrive in a user’s web browser, or how long the complete download takes. And we’re not resting here, we’re committed to working network by network globally to ensure that we are always #1.

Why we measured

Commercial Internet measurement services (such as Cedexis, Catchpoint, Pingdom, ThousandEyes) already exist and perform the sorts of RUM measurements that Cloudflare used for this test. And we wanted to ensure that our test was as fair as possible and allowed each network to put its best foot forward.

We subscribe to the third party monitoring services already. And, when we looked at their data we weren’t satisfied.

First, we worried that the way they sampled wasn’t globally representative and was often skewed by measuring from the server, rather than the eyeball, side of the network. Or, even if operating from the eyeball side, could be skewed as artificial or tainted by bots and other automated traffic.

Second, it wasn’t granular enough. It showed our performance by country or region, but didn’t dive into individual networks and therefore obscured the details and outliers behind averages. While we looked good in third party tests, we didn’t trust them to be as thorough and accurate as we wanted. The goal isn’t to pick a test where we looked good. The goal was to be accurate and see where we weren’t good enough yet, so we could focus on those areas and improve. That’s why we had to build this ourselves.

We benchmark against others because it’s useful to know what’s possible. If someone else is faster than we are somewhere then it proves it’s possible. We won’t be satisfied until we’re at least as good as everyone else everywhere. Now we have the granular data to ensure that’ll happen. We plan our next update during Birthday Week when our target is to take 10% of networks where we’re not the fastest and become the fastest.

How we measured

To measure performance we did two things. We created a small internal team to do the measurements separate from the team that manages and optimizes our network. The goal was to show the team where we need to improve.

And to ensure that the other CDNs were tested using their most representative assets we used the very same endpoints that commercial measurement services use on the assumption that our competitors will have already ensured that those endpoints are optimized to show their networks’ best performance.

The measurements in this blog post are based on four days just before Speed Week began (2021-09-10 12:25:02 UTC to 2021-09-13 16:21:10 UTC). We took measurements of downloading exactly the same 100KB PNG file. We categorized them by the network the measurement was taken from. It’s identified by its ASN and a name. We’re not done with these measurements and will continue measuring and optimizing.

A 100KB file is a common test measurement used in the industry and allows us to measure network characteristics like the connection time, but also the total download time.

Before we get into results let’s talk a little about how the Internet fits together. The Internet is a network of networks that cooperate to form the global network that we all use. These networks are identified by a strangely named “autonomous system number” or ASN. The idea is that large networks (like ISPs, or cloud providers, or universities, or mobile phone companies) operate autonomously and then join the global Internet through a protocol called BGP (which we’ve written about in the past).

In a way the Internet is these ASNs and because the Internet is made of them we want to measure our performance for each ASN. Why? Because one part of improving performance is improving our connectivity to each ASN and knowing how we’re doing on a per-network basis helps enormously.

There are roughly 70,000 ASNs in the global Internet and during the measurement period we saw traffic from about 21,000 (exact number: 20,728) of them. This makes sense since not all networks are “external” (as in the source of traffic to websites); many ASNs are intermediaries moving traffic around the Internet.

For the rest of this blog we simply say “network” instead of “ASN”.

What we measured

Getting real user measurement data used to be hard but has been made easy for HTTP endpoints thanks to the Resource Timing API, supported by most modern browsers. This API allows a page to measure network timing data of fetched resources using high-resolution timestamps, accurate to 5 µs (microseconds).

The point of this API is to get timing information that shows how a real end-user experiences the Internet (and not a synthetic test that might measure a single component of all the things that happen when a user browses the web).

Benchmarking Edge Network Performance: Akamai, Cloudflare, AWS CloudFront, Fastly, and Google

The Resource Timing API is supported by pretty much every browser enabling measurement on everything from old versions of Internet Explorer, to mobile browsers on iOS and Android to the latest version of Chrome. Using this API gives us a view of real world use on real world browsers.

We don’t just instruct the browser to download an image too. To make sure that we’re fair and replicate the real-life end-user experience, we make sure that no local caching was involved in the request, check if the object has been compressed by the server or not, take the HTTP headers size into account, and record if the connection has been pre-warmed or not, to name a few technical details.

Here’s a high-level example on how this works:

await fetch("https://example.com/100KB.png?r=7562574954", {
              mode: "cors",
              cache: "no-cache",
              credentials: "omit",
              method: "GET",
})

performance.getEntriesByType("resource");

{
   connectEnd: 1400.3999999761581
   connectStart: 1400.3999999761581
   decodedBodySize: 102400
   domainLookupEnd: 1400.3999999761581
   domainLookupStart: 1400.3999999761581
   duration: 51.60000002384186
   encodedBodySize: 102400
   entryType: "resource"
   fetchStart: 1400.3999999761581
   initiatorType: "fetch"
   name: "https://example.com/100KB.png"
   nextHopProtocol: "h2"
   redirectEnd: 0
   redirectStart: 0
   requestStart: 1406
   responseEnd: 1452
   responseStart: 1428.5
   secureConnectionStart: 1400.3999999761581
   startTime: 1400.3999999761581
   transferSize: 102700
   workerStart: 0
}

To measure the performance of each CDN we downloaded an image from each, when a browser visited one of our special pages. Once every image is downloaded we record the measurements using a Cloudflare Workers based API.

The three measurements: TCP connection time, TTFB and TTLB

We focused on three measurements to illustrate how fast our network is: TCP connection time, TTFB and TTLB. Here’s why those three values matter.

TCP connection time is used to show how well-connected we are to the global Internet as it counts only the time taken for a machine to establish a connection to the remote host (before any higher level protocols are used). The TCP connection time is calculated as connectEnd – connectStart (see the diagram above).

TTFB (or time to first byte) is the time taken once an HTTP request has been sent for the first byte of data to be returned by the server. This is a common measurement used to show how responsive a server is. We calculate TTFB as responseStart – connectStart – (requestStart – connectEnd).

TTLB (or time to last byte) is the time taken to send the entire response to the web browser. It’s a good measure of how long a complete download takes and helps measure how good the server (or CDN) is at sending data. We calculate TTLB as responseEnd – connectStart – (requestStart – connectEnd).

We then produced two sets of data: mean and p95. The mean is a very well understood number for laypeople and gives the average user experience, but it doesn’t capture the breadth of different speeds people experience very well. Because it averages everything together it can miss skewed distributions of data (where lots of people get good performance and lots bad performance, for example).

To address the mean’s problems we also used p95, the 95th percentile. This number tells us what performance 95% of measurements fall below. That can be a hard number to understand, but you can think of it as the “reasonable worst case” performance for a user. Only 5% of measurements were worse than this number.

An example chart

As this blog contains a lot of data, let’s start with a detailed look at a chart of results. We compared ourselves against industry giants Google and Amazon CloudFront, industry pioneer Akamai and up and comer Fastly.

For each network (represented by an ASN) and for each CDN we calculated which CDN had the best performance. Here, for example, is a count of the number of networks on which each CDN had the best performance for TTFB. This particular chart shows p95 and includes data from the top 1,000 networks (by number of IPv4 addresses advertised).

In these charts, longer bars are better; the bars indicate the number of networks for which that CDN had the lowest TTFB at p95.

Benchmarking Edge Network Performance: Akamai, Cloudflare, AWS CloudFront, Fastly, and Google

This shows that Cloudflare had the lowest time to first byte (the TTFB, or the time it took the first byte of content to reach their browser) at the 95th percentile for the largest number of networks in the top 1,000 (based on the number IPv4 addresses they advertise). Google was next, then Fastly followed by Amazon CloudFront and Akamai.

All three measures, TCP connection time, time to first byte and time to last byte, matter to the user experience. For this example, I focused on time to first byte (TTFB) because it’s such a common measure of responsiveness of the web. It’s literally the time it takes a web server to start responding to a request from a browser for a web page.

To understand the data we gathered let’s look at two large US-based ISPs: Cox and Comcast. Cox serves about 6.5 million customers and Comcast has about 30 million customers. We performed roughly 22,000 measurements on Cox’s network and 100,000 on Comcast’s. Below we’ll make use of measurement counts to rank networks by their size, here we see that our measurements and customer counts of Cox and Comcast track nicely.

Cox Communications has ASN 22773 and our data shows that the p95 TTFB for the five CDNs was as follows: Cloudflare 332.6ms, Fastly 357.6ms, Google 380.3ms, Amazon CloudFront 404.4ms and Akamai 441.5ms. In this case Cloudflare was the fastest and about 7% faster than the next CDN (Fastly) which was faster than Google and so on.

Looking at Comcast (with ASN 7922) we see p95 TTFB was 323.7ms (Fastly), 324.2ms (Cloudflare), 353.7ms (Google), 384.6ms (Akamai) and 418.8ms (Amazon CloudFront). Here Fastly (323.7ms) edged out Cloudflare (324.2ms) by 0.2%.

Figures like these go into determining which CDN is the fastest for each network for this analysis and the charts presented. At a granular level they matter to Cloudflare as we can then target networks and connectivity for optimization.

The results

Shown below are the results for three different measurement types (TCP connection time, TTFB and TTLB) with two different aggregations (mean and p95) and for three different groupings of networks.

The first grouping is the simplest: it’s all the networks we have data for. The second grouping is the one used above, the top 1,000 networks by number of IP addresses. But we also show a third grouping, top 1,000 networks by number of observations. That last group is important.

Top 1,000 networks by number of IP addresses is interesting (it includes the major ISPs) but it also includes some networks that have huge numbers of IP addresses available that aren’t necessarily used. These come about because of historical allocations of IP addresses organisations like the US Department of Defense.

Cloudflare’s testing reveals which networks are most used, and so we also report results for the top 1,000 networks by number of observations to get an idea of how we’re performing on networks with heavy usage.

Hence, we have 18 charts showing all combinations of (TCP connection time, TTFB, TTLB), (mean, p95) and (all networks, top 1,000 networks by IP count, top 1,000 networks by observations).

You’ll see that in two of the charts Cloudflare is not #1 of 18 (we’re #2). We’ll be working to make sure we’re #1 for every measurement over the next few weeks. Both of those are average times. We’re most interested in the p95 measurements because they show the “reasonable worst case” scenario for someone using the Internet. But as we go about optimizing performance we want to be #1 on every chart so that we’re top no matter how performance is measured.

TCP Connection Time

Let’s start with TCP connection time to get a sense of how well-connected the five companies we’ve measured. Recall that longer bars are better here, they indicate that the particular CDN was the highest performance for that many networks: more networks is better.

Benchmarking Edge Network Performance: Akamai, Cloudflare, AWS CloudFront, Fastly, and Google Benchmarking Edge Network Performance: Akamai, Cloudflare, AWS CloudFront, Fastly, and Google
Benchmarking Edge Network Performance: Akamai, Cloudflare, AWS CloudFront, Fastly, and Google Benchmarking Edge Network Performance: Akamai, Cloudflare, AWS CloudFront, Fastly, and Google
Benchmarking Edge Network Performance: Akamai, Cloudflare, AWS CloudFront, Fastly, and Google Benchmarking Edge Network Performance: Akamai, Cloudflare, AWS CloudFront, Fastly, and Google

Time To First Byte (TTFB)

Next up is TTFB for the five companies. Once again, longer bars is better means more networks where that CDN had the lowest TTFB.

Benchmarking Edge Network Performance: Akamai, Cloudflare, AWS CloudFront, Fastly, and Google Benchmarking Edge Network Performance: Akamai, Cloudflare, AWS CloudFront, Fastly, and Google
Benchmarking Edge Network Performance: Akamai, Cloudflare, AWS CloudFront, Fastly, and Google Benchmarking Edge Network Performance: Akamai, Cloudflare, AWS CloudFront, Fastly, and Google
Benchmarking Edge Network Performance: Akamai, Cloudflare, AWS CloudFront, Fastly, and Google Benchmarking Edge Network Performance: Akamai, Cloudflare, AWS CloudFront, Fastly, and Google

Time To Last Byte (TTLB)

And finally the TTLB measurements. Once again, longer bars is better means more networks where that CDN had the lowest TTLB.

Benchmarking Edge Network Performance: Akamai, Cloudflare, AWS CloudFront, Fastly, and Google Benchmarking Edge Network Performance: Akamai, Cloudflare, AWS CloudFront, Fastly, and Google
Benchmarking Edge Network Performance: Akamai, Cloudflare, AWS CloudFront, Fastly, and Google Benchmarking Edge Network Performance: Akamai, Cloudflare, AWS CloudFront, Fastly, and Google
Benchmarking Edge Network Performance: Akamai, Cloudflare, AWS CloudFront, Fastly, and Google Benchmarking Edge Network Performance: Akamai, Cloudflare, AWS CloudFront, Fastly, and Google

Optimization Targets

Looking not just at where we are #1 but where we are #1 or #2 helps us see how quickly we can optimize our network to be #1 in more places. Examining the top 1,000 networks by observations we see that we’re currently #1 or #2 for TTFB in 69.9% of networks, for TTLB in 65.0% of networks and for TCP connection time in 70.5%.

To see how much optimization we need to do to go from #2 to #1 we looked at the three measures and see that median TTFB of the #1 network is 92.3%, median TTLB is 94.0% and TCP connection time is 91.5%.

The latter figure is very interesting because it shows that we can make big gains by optimizing network level routing.

Where’s the world map?

It’s very common to present data about Internet performance on maps. World maps look good but they obscure information. The reason is very simple: world maps show geography (and, depending on the projection, a very skewed version of the world’s actual countries and land masses).

Here’s an example world map with countries colored by who had the lowest TTLB at p95. Cloudflare is in orange, Amazon CloudFront in yellow, Google in purple, Akamai in red and Fastly in blue.

Benchmarking Edge Network Performance: Akamai, Cloudflare, AWS CloudFront, Fastly, and Google

What that doesn’t show is population. And Cloudflare is interested in how well we perform for people. Consider Russia and Indonesia. Indonesia has double the population of Russia and about 1/10th of the land.

By focusing on networks we can optimize for the people who use the Internet. For example, Biznet is a major ISP in Indonesia with ASN 17451. Looking at TTFB at p95 (the same statistics we discussed earlier for US ISPs Cox and Comcast) we see that for Biznet users Cloudflare has the fastest p95 TTFB at 677.7ms, Fastly 744.0ms, Google 872.8ms, Amazon CloudFront 1,239.9 and Akamai 1,248.9ms.

What’s next

The data we’ve gathered gives us a granular view of every network that connects to Cloudflare. This allows us to optimize routes and over the coming days and weeks we’ll be using the data to further increase Cloudflare’s performance.

In just over a week it’s Cloudflare’s Birthday Week, and we are aiming to improve our performance in 10% of the networks where we are not #1 today. Stay tuned.

Announcing Project Turpentine: an easy way to get off Varnish

Post Syndicated from Joao Sousa Botto original https://blog.cloudflare.com/announcing-turpentine/

Announcing Project Turpentine: an easy way to get off Varnish

Announcing Project Turpentine: an easy way to get off Varnish

When Varnish and the Varnish Configuration Language (VCL) were first introduced 15 years ago, they were an incredibly powerful combination to configure caching on servers (and your networks). It seemed a logical choice for a language to configure CDNs — caching in the cloud.

A lot has changed on the Internet since then.

In particular, caching is just one of many things that “CDNs” are expected to do: load balancing, DDoS protection, rate limiting, transformations, synthetic responses, routing and more. But above all what “CDNs” need to be is programmable, not just configurable.

Configuration went from a niche activity to a much more common — and often involved — requirement. We’ve heard from a lot of teams that want to remove critical dependencies on the one person they have who knows how to make configuration changes — because they’re the only one on the team who knows how to write in VCL.

But it’s not just about who can write VCL — it’s what VCL is increasingly being asked to do. A lot of our customers have told us that they have much greater expectations for what they expect the network to do: they don’t just want to configure anymore… they want to be able to program it! VCL is being pushed and stretched into things it was never envisaged to do.

These are often the frustrations we hear from customers about the use of VCL. And yet, at the same time, migrations are always hard. Taking thousands of lines of code that have been built up over the years for a mission critical service, and rewriting it from scratch? Nobody wants to do that.

Today, we are excited to announce a solution to all these problems: Turpentine. Turpentine is a true VCL-to-TypeScript converter: it is the easiest and most effective way for you to migrate your legacy Varnish to a modern, Turing-complete programming language and onto the edge. But don’t think that because you’re moving up in terms of language abstraction that you’re giving up performance — it’s the opposite. Turpentine enables porting your VCL-based configuration to Cloudflare Workers — which is known for its speed. Beyond being able to eliminate the use of VCL, Turpentine enables you to take full advantage of Cloudflare: including proxies, firewall, load balancers, tiered caching/shielding and everything else Cloudflare offers.

And you’ll be able to configure it using one of the most widely used languages in the world.
Whether you’re using standard configurations, or have a heavily customized VCL file, Turpentine will generate human-readable and well commented TypeScript code and deploy it to our serverless platform that is… fast.

Announcing Project Turpentine: an easy way to get off Varnish

How It Works

To turn a VCL into TypeScript code that is well optimized, human-readable, and commented, Turpentine takes a three-phase approach:

  1. The VCL is parsed, its meaning is understood, and TypeScript code that preserves its intentions is generated. This is more than transpiling. All the functionality configured in the VCL is rewritten to the best solution available. For example, rate limiting might be custom code on your VCL — but have native functionality on Cloudflare.
  2. The code is cleaned and optimized. During this phase, Turpentine looks for redundancies and inefficiencies, and removes them. Your code will be running on Cloudflare Workers, the fastest serverless platform in the world, and we definitely don’t want to leave behind inefficiencies that would prevent you from enjoying its benefits to the full extent.
  3. The final step is pretty printing. The whole point of this exercise is to make the code easy to understand by as many people as possible on your team — so that virtually any engineer on your team, not just infrastructure experts, can program your network moving forward.

Turpentine in Action

We’ve been perfecting Turpentine in a private beta with some very large customers who have some very complicated VCL files.

Here’s a demo of a VCL file being converted to a Worker:

Legacy to Modern

Happily, this all occurs without the pains of the usual legacy migration. There’s no project plan. No engineers and IT teams trying to replicate each bit of configuration or investigating whether a specific feature even existed.

If you’re interested in finding out more, please register your interest in the sign-up form here — we’d love to explore with you. For now, we’re staying in private beta, so we can walk every new customer through the process. A Cloudflare engineer will help your team get up to speed and comfortable with the new infrastructure.

Announcing Project Turpentine: an easy way to get off Varnish

Sir Clive Sinclair, 1940-2021

Post Syndicated from original https://www.raspberrypi.org/blog/sir-clive-sinclair-1940-2021/

It’s an incredibly sad day for the British computing industry.

We’re always going to be very grateful to Sir Clive for being one of the founding fathers of the UK home computing boom that helped so many of us at Raspberry Pi get hooked on programming as kids.

He was someone from whom the business behind Raspberry Pi has drawn great inspiration. He’ll be very sadly missed.

sir clive sinclair

The post Sir Clive Sinclair, 1940-2021 appeared first on Raspberry Pi.

New Standard Contractual Clauses now part of the AWS GDPR Data Processing Addendum for customers

Post Syndicated from Stéphane Ducable original https://aws.amazon.com/blogs/security/new-standard-contractual-clauses-now-part-of-the-aws-gdpr-data-processing-addendum-for-customers/

Today, we’re happy to announce an update to our online AWS GDPR Data Processing Addendum (AWS GDPR DPA) and our online Service Terms to include the new Standard Contractual Clauses (SCCs) that the European Commission (EC) adopted in June 2021. The EC-approved SCCs give our customers the ability to comply with the General Data Protection Regulation (GDPR) when they transfer personal data subject to GDPR to countries outside the European Economic Area (EEA) that haven’t received an EC adequacy decision (third countries). The new SCCs are now better adapted to how our customers operate their applications or run their workloads in the cloud, because they cover different transfer scenarios, and also provide enhanced safeguards for data transfers.

Achieving compliance with GDPR is critical for hundreds of thousands of AWS customers and AWS. The new SCCs allow all AWS customers that are controllers or processors under GDPR to continue to transfer personal data from their AWS accounts in compliance with GDPR. As part of the online Service Terms, the new SCCs will apply automatically whenever an AWS customer uses AWS services to transfer customer data to third countries.

The updated AWS GDPR DPA incorporating the new SCCs supplements our announcement in February 2021 of strengthened commitments to protect customer data, such as challenging law enforcement requests that conflict with EU law. We have also published the blog post How AWS is helping EU customers navigate the new normal for data protection, and the whitepaper Navigating Compliance with EU Data Transfer Requirements to help AWS customers conduct their data transfer assessments and comply with GDPR, the Schrems II ruling, and the recommendations issued by the European Data Protection Board. AWS is constantly working to ensure that our customers can enjoy the benefits of AWS everywhere they operate, and we welcome the new SCCs because they enable our customers to continue using AWS services in compliance with GDPR. If you have questions or need more information, visit our EU Data Protection page and our GDPR Center.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Stéphane Ducable

Stéphane is Vice President of Public Policy – EMEA at AWS. He is focused on increasing awareness of the benefits of adopting cloud computing technology across the EMEA region.

Introducing Amazon MSK Connect – Stream Data to and from Your Apache Kafka Clusters Using Managed Connectors

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/introducing-amazon-msk-connect-stream-data-to-and-from-your-apache-kafka-clusters-using-managed-connectors/

Apache Kafka is an open-source platform for building real-time streaming data pipelines and applications. At re:Invent 2018, we announced Amazon Managed Streaming for Apache Kafka, a fully managed service that makes it easy to build and run applications that use Apache Kafka to process streaming data.

When you use Apache Kafka, you capture real-time data from sources such as IoT devices, database change events, and website clickstreams, and deliver it to destinations such as databases and persistent storage.

Kafka Connect is an open-source component of Apache Kafka that provides a framework for connecting with external systems such as databases, key-value stores, search indexes, and file systems. However, manually running Kafka Connect clusters requires you to plan and provision the required infrastructure, deal with cluster operations, and scale it in response to load changes.

Today, we’re announcing a new capability that makes it easier to manage Kafka Connect clusters. MSK Connect allows you to configure and deploy a connector using Kafka Connect with a just few clicks. MSK Connect provisions the required resources and sets up the cluster. It continuously monitors the health and delivery state of connectors, patches and manages the underlying hardware, and auto-scales connectors to match changes in throughput. As a result, you can focus your resources on building applications rather than managing infrastructure.

MSK Connect is fully compatible with Kafka Connect, which means you can migrate your existing connectors without code changes. You don’t need an MSK cluster to use MSK Connect. It supports Amazon MSK, Apache Kafka, and Apache Kafka compatible clusters as sources and sinks. These clusters can be self-managed or managed by AWS partners and 3rd parties as long as MSK Connect can privately connect to the clusters.

Using MSK Connect with Amazon Aurora and Debezium
To test MSK Connect, I want to use it to stream data change events from one of my databases. To do so, I use Debezium, an open-source distributed platform for change data capture built on top of Apache Kafka.

I use a MySQL-compatible Amazon Aurora database as the source and the Debezium MySQL connector with the setup described in this architectural diagram:

Architectural diagram.

To use my Aurora database with Debezium, I need to turn on binary logging in the DB cluster parameter group. I follow the steps in the How do I turn on binary logging for my Amazon Aurora MySQL cluster article.

Next, I have to create a custom plugin for MSK Connect. A custom plugin is a set of JAR files that contain the implementation of one or more connectors, transforms, or converters. Amazon MSK will install the plugin on the workers of the connect cluster where the connector is running.

From the Debezium website, I download the MySQL connector plugin for the latest stable release. Because MSK Connect accepts custom plugins in ZIP or JAR format, I convert the downloaded archive to ZIP format and keep the JARs files in the main directory:

$ tar xzf debezium-connector-mysql-1.6.1.Final-plugin.tar.gz
$ cd debezium-connector-mysql
$ zip -9 ../debezium-connector-mysql-1.6.1.zip *
$ cd ..

Then, I use the AWS Command Line Interface (CLI) to upload the custom plugin to an Amazon Simple Storage Service (Amazon S3) bucket in the same AWS Region I am using for MSK Connect:

$ aws s3 cp debezium-connector-mysql-1.6.1.zip s3://my-bucket/path/

On the Amazon MSK console there is a new MSK Connect section. I look at the connectors and choose Create connector. Then, I create a custom plugin and browse my S3 buckets to select the custom plugin ZIP file I uploaded before.

Console screenshot.

I enter a name and a description for the plugin and then choose Next.

Console screenshot.

Now that the configuration of the custom plugin is complete, I start the creation of the connector. I enter a name and a description for the connector.

Console screenshot.

I have the option to use a self-managed Apache Kafka cluster or one that is managed by MSK. I select one of my MSK cluster that is configured to use IAM authentication. The MSK cluster I select is in the same virtual private cloud (VPC) as my Aurora database. To connect, the MSK cluster and Aurora database use the default security group for the VPC. For simplicity, I use a cluster configuration with auto.create.topics.enable set to true.

Console screenshot.

In Connector configuration, I use the following settings:

connector.class=io.debezium.connector.mysql.MySqlConnector
tasks.max=1
database.hostname=<aurora-database-writer-instance-endpoint>
database.port=3306
database.user=my-database-user
database.password=my-secret-password
database.server.id=123456
database.server.name=ecommerce-server
database.include.list=ecommerce
database.history.kafka.topic=dbhistory.ecommerce
database.history.kafka.bootstrap.servers=<bootstrap servers>
database.history.consumer.security.protocol=SASL_SSL
database.history.consumer.sasl.mechanism=AWS_MSK_IAM
database.history.consumer.sasl.jaas.config=software.amazon.msk.auth.iam.IAMLoginModule required;
database.history.consumer.sasl.client.callback.handler.class=software.amazon.msk.auth.iam.IAMClientCallbackHandler
database.history.producer.security.protocol=SASL_SSL
database.history.producer.sasl.mechanism=AWS_MSK_IAM
database.history.producer.sasl.jaas.config=software.amazon.msk.auth.iam.IAMLoginModule required;
database.history.producer.sasl.client.callback.handler.class=software.amazon.msk.auth.iam.IAMClientCallbackHandler
include.schema.changes=true

Some of these settings are generic and should be specified for any connector. For example:

  • connector.class is the Java class of the connector.
  • tasks.max is the maximum number of tasks that should be created for this connector.

Other settings are specific to the Debezium MySQL connector:

  • The database.hostname contains the writer instance endpoint of my Aurora database.
  • The database.server.name is a logical name of the database server. It is used for the names of the Kafka topics created by Debezium.
  • The database.include.list contains the list of databases hosted by the specified server.
  • The database.history.kafka.topic is a Kafka topic used internally by Debezium to track database schema changes.
  • The database.history.kafka.bootstrap.servers contains the bootstrap servers of the MSK cluster.
  • The final eight lines (database.history.consumer.* and database.history.producer.*) enable IAM authentication to access the database history topic.

In Connector capacity, I can choose between autoscaled or provisioned capacity. For this setup, I choose Autoscaled and leave all other settings at their defaults.

Console screenshot.

With autoscaled capacity, I can configure these parameters:

  • MSK Connect Unit (MCU) count per worker – Each MCU provides 1 vCPU of compute and 4 GB of memory.
  • The minimum and maximum number of workers.
  • Autoscaling utilization thresholds – The upper and lower target utilization thresholds on MCU consumption in percentage to trigger auto scaling.

Console screenshot.

There is a summary of the minimum and maximum MCUs, memory, and network bandwidth for the connector.

Console screenshot.

For Worker configuration, you can use the default one provided by Amazon MSK or provide your own configuration. In my setup, I use the default one.

In Access permissions, I create a IAM role. In the trusted entities, I add kafkaconnect.amazonaws.com to allow MSK Connect to assume the role.

The role is used by MSK Connect to interact with the MSK cluster and other AWS services. For my setup, I add:

The Debezium connector needs access to the cluster configuration to find the replication factor to use to create the history topic. For this reason, I add to the permissions policy the kafka-cluster:DescribeClusterDynamicConfiguration action (equivalent Apache Kafka’s DESCRIBE_CONFIGS cluster ACL).

Depending on your configuration, you might need to add more permissions to the role (for example, in case the connector needs access to other AWS resources such as an S3 bucket). If that is the case, you should add permissions before creating the connector.

In Security, the settings for authentication and encryption in transit are taken from the MSK cluster.

Console screenshot.

In Logs, I choose to deliver logs to CloudWatch Logs to have more information on the execution of the connector. By using CloudWatch Logs, I can easily manage retention and interactively search and analyze my log data with CloudWatch Logs Insights. I enter the log group ARN (it’s the same log group I used before in the IAM role) and then choose Next.

Console screenshot.

I review the settings and then choose Create connector. After a few minutes, the connector is running.

Testing MSK Connect with Amazon Aurora and Debezium
Now let’s test the architecture I just set up. I start an Amazon Elastic Compute Cloud (Amazon EC2) instance to update the database and start a couple of Kafka consumers to see Debezium in action. To be able to connect to both the MSK cluster and the Aurora database, I use the same VPC and assign the default security group. I also add another security group that gives me SSH access to the instance.

I download a binary distribution of Apache Kafka and extract the archive in the home directory:

$ tar xvf kafka_2.13-2.7.1.tgz

To use IAM to authenticate with the MSK cluster, I follow the instructions in the Amazon MSK Developer Guide to configure clients for IAM access control. I download the latest stable release of the Amazon MSK Library for IAM:

$ wget https://github.com/aws/aws-msk-iam-auth/releases/download/1.1.0/aws-msk-iam-auth-1.1.0-all.jar

In the ~/kafka_2.13-2.7.1/config/ directory I create a client-config.properties file to configure a Kafka client to use IAM authentication:

# Sets up TLS for encryption and SASL for authN.
security.protocol = SASL_SSL

# Identifies the SASL mechanism to use.
sasl.mechanism = AWS_MSK_IAM

# Binds SASL client implementation.
sasl.jaas.config = software.amazon.msk.auth.iam.IAMLoginModule required;

# Encapsulates constructing a SigV4 signature based on extracted credentials.
# The SASL client bound by "sasl.jaas.config" invokes this class.
sasl.client.callback.handler.class = software.amazon.msk.auth.iam.IAMClientCallbackHandler

I add a few lines to my Bash profile to:

  • Add Kafka binaries to the PATH.
  • Add the MSK Library for IAM to the CLASSPATH.
  • Create the BOOTSTRAP_SERVERS environment variable to store the bootstrap servers of my MSK cluster.
$ cat >> ~./bash_profile
export PATH=~/kafka_2.13-2.7.1/bin:$PATH
export CLASSPATH=/home/ec2-user/aws-msk-iam-auth-1.1.0-all.jar
export BOOTSTRAP_SERVERS=<bootstrap servers>

Then, I open three terminal connections to the instance.

In the first terminal connection, I start a Kafka consumer for a topic with the same name as the database server (ecommerce-server). This topic is used by Debezium to stream schema changes (for example, when a new table is created).

$ cd ~/kafka_2.13-2.7.1/
$ kafka-console-consumer.sh --bootstrap-server $BOOTSTRAP_SERVERS \
                            --consumer.config config/client-config.properties \
                            --topic ecommerce-server --from-beginning

In the second terminal connection, I start another Kafka consumer for a topic with a name built by concatenating the database server (ecommerce-server), the database (ecommerce), and the table (orders). This topic is used by Debezium to stream data changes for the table (for example, when a new record is inserted).

$ cd ~/kafka_2.13-2.7.1/
$ kafka-console-consumer.sh --bootstrap-server $BOOTSTRAP_SERVERS \
                            --consumer.config config/client-config.properties \
                            --topic ecommerce-server.ecommerce.orders --from-beginning

In the third terminal connection, I install a MySQL client using the MariaDB package and connect to the Aurora database:

$ sudo yum install mariadb
$ mysql -h <aurora-database-writer-instance-endpoint> -u <database-user> -p

From this connection, I create the ecommerce database and a table for my orders:

CREATE DATABASE ecommerce;

USE ecommerce

CREATE TABLE orders (
       order_id VARCHAR(255),
       customer_id VARCHAR(255),
       item_description VARCHAR(255),
       price DECIMAL(6,2),
       order_date DATETIME DEFAULT CURRENT_TIMESTAMP
);

These database changes are captured by the Debezium connector managed by MSK Connect and are streamed to the MSK cluster. In the first terminal, consuming the topic with schema changes, I see the information on the creation of database and table:

Struct{source=Struct{version=1.6.1.Final,connector=mysql,name=ecommerce-server,ts_ms=1629202831473,db=ecommerce,server_id=1980402433,file=mysql-bin-changelog.000003,pos=9828,row=0},databaseName=ecommerce,ddl=CREATE DATABASE ecommerce,tableChanges=[]}
Struct{source=Struct{version=1.6.1.Final,connector=mysql,name=ecommerce-server,ts_ms=1629202878811,db=ecommerce,table=orders,server_id=1980402433,file=mysql-bin-changelog.000003,pos=10002,row=0},databaseName=ecommerce,ddl=CREATE TABLE orders ( order_id VARCHAR(255), customer_id VARCHAR(255), item_description VARCHAR(255), price DECIMAL(6,2), order_date DATETIME DEFAULT CURRENT_TIMESTAMP ),tableChanges=[Struct{type=CREATE,id="ecommerce"."orders",table=Struct{defaultCharsetName=latin1,primaryKeyColumnNames=[],columns=[Struct{name=order_id,jdbcType=12,typeName=VARCHAR,typeExpression=VARCHAR,charsetName=latin1,length=255,position=1,optional=true,autoIncremented=false,generated=false}, Struct{name=customer_id,jdbcType=12,typeName=VARCHAR,typeExpression=VARCHAR,charsetName=latin1,length=255,position=2,optional=true,autoIncremented=false,generated=false}, Struct{name=item_description,jdbcType=12,typeName=VARCHAR,typeExpression=VARCHAR,charsetName=latin1,length=255,position=3,optional=true,autoIncremented=false,generated=false}, Struct{name=price,jdbcType=3,typeName=DECIMAL,typeExpression=DECIMAL,length=6,scale=2,position=4,optional=true,autoIncremented=false,generated=false}, Struct{name=order_date,jdbcType=93,typeName=DATETIME,typeExpression=DATETIME,position=5,optional=true,autoIncremented=false,generated=false}]}}]}

Then, I go back to the database connection in the third terminal to insert a few records in the orders table:

INSERT INTO orders VALUES ("123456", "123", "A super noisy mechanical keyboard", "50.00", "2021-08-16 10:11:12");
INSERT INTO orders VALUES ("123457", "123", "An extremely wide monitor", "500.00", "2021-08-16 11:12:13");
INSERT INTO orders VALUES ("123458", "123", "A too sensible microphone", "150.00", "2021-08-16 12:13:14");

In the second terminal, I see the information on the records inserted into the orders table:

Struct{after=Struct{order_id=123456,customer_id=123,item_description=A super noisy mechanical keyboard,price=50.00,order_date=1629108672000},source=Struct{version=1.6.1.Final,connector=mysql,name=ecommerce-server,ts_ms=1629202993000,db=ecommerce,table=orders,server_id=1980402433,file=mysql-bin-changelog.000003,pos=10464,row=0},op=c,ts_ms=1629202993614}
Struct{after=Struct{order_id=123457,customer_id=123,item_description=An extremely wide monitor,price=500.00,order_date=1629112333000},source=Struct{version=1.6.1.Final,connector=mysql,name=ecommerce-server,ts_ms=1629202993000,db=ecommerce,table=orders,server_id=1980402433,file=mysql-bin-changelog.000003,pos=10793,row=0},op=c,ts_ms=1629202993621}
Struct{after=Struct{order_id=123458,customer_id=123,item_description=A too sensible microphone,price=150.00,order_date=1629115994000},source=Struct{version=1.6.1.Final,connector=mysql,name=ecommerce-server,ts_ms=1629202993000,db=ecommerce,table=orders,server_id=1980402433,file=mysql-bin-changelog.000003,pos=11114,row=0},op=c,ts_ms=1629202993630}

My change data capture architecture is up and running and the connector is fully managed by MSK Connect.

Availability and Pricing
MSK Connect is available in the following AWS Regions: Asia Pacific (Mumbai), Asia Pacific (Seoul), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Canada (Central), EU (Frankfurt), EU (Ireland), EU (London), EU (Paris), EU (Stockholm), South America (Sao Paulo), US East (N. Virginia), US East (Ohio), US West (N. California), US West (Oregon). For more information, see the AWS Regional Services List.

With MSK Connect you pay for what you use. The resources used by your connectors can be scaled automatically based on your workload. For more information, see the Amazon MSK pricing page.

Simplify the management of your Apache Kafka connectors today with MSK Connect.

Danilo

Disaster Recovery (DR) for a Third-party Interactive Voice Response on AWS

Post Syndicated from Priyanka Kulkarni original https://aws.amazon.com/blogs/architecture/disaster-recovery-dr-for-a-third-party-interactive-voice-response-on-aws/

Voice calling systems are prevalent and necessary to many businesses today. They are usually designed to provide a 24×7 helpline support across multiple domains and use cases. Reliability and availability of such systems are important for a good customer experience. The thoughtful design of a cost-optimized solution will allow your business to sustain the system into the future.

We address a scenario in which you are mandated to host the workload on a corporate data center (DC), and configure the backup site on Amazon Web Services (AWS). Since the primary objective of a backup site is disaster recovery (DR) management, this site is often referred to as a DR site.

Disaster Recovery on AWS

DR strategy defines the recovery objectives for downtime and data loss. The workload has a recovery time objective (RTO) and a recovery point objective (RPO). RTO is the maximum acceptable delay between the interruption of service and the restoration of service. RPO is the maximum acceptable amount of time since the last data recovery point. AWS defines four DR strategies in increasing order of complexity, and decreasing order of RTO and RPO. These are backup and restore, active/passive (pilot light or warm standby), or active/active.

Figure 1. Disaster recovery (DR) options

Figure 1. Disaster recovery (DR) options

In our use case, the DR site on AWS must serve the user traffic with RPO and RTO in seconds. Warm standby is the optimal choice in this case. It is a scaled-down version of a fully functional environment, and is always running in the cloud.

Amazon Connect is an omnichannel cloud contact center that helps you provide great customer service at a lower cost. But in some situations, Amazon Connect may not be available. In other cases, the customer may want to use their home developed or third-party contact center application. Our solution is designed to help in both these scenarios.

This architecture enables customers facing challenges of cost overhead with redundant Session Initiation Protocol (SIP) trunks for the DC and DR sites. It allows you to optimize your spend and yet retain a reliable workflow.

SIP trunk communication on AWS

Let’s see how the SIP trunk termination on the AWS network handles the failover scenario of a third-party IVR application installed on Amazon EC2 at the DR site.

There will be two connections made from the AWS Direct Connect location (DX). The first will be for a point-to-point connectivity between the corporate DC and the AWS DR site. The second connection will be originating from the multiplexer (MUX) of the telecom provider who is providing you the SIP trunk.

The telecom provider will lay the SIP trunk from its MUX to the customer router at the DX location. At this point, the mode of communication becomes IP-based. The telecom provider will send the call to the IP address attached to the Network Load Balancer (NLB) in Amazon Virtual Private Cloud (VPC).

Figure 2. Communication circuitry at telecom side

Figure 2. Communication circuitry at telecom side

AWS Network Load Balancers can now distribute traffic to AWS resources using their IP addresses and instance IDs as targets. You can also distribute the traffic with on-premises resources over AWS Direct Connect. Load balancing across AWS and on-premises resources using the same load balancer streamlines migrate-to-cloud, burst-to-cloud, or failover-to-cloud.

In the backup site, the NLB will point to the Session Border Controller (SBC). This is a special-purpose device that protects and regulates IP communications flows. You can bring your own SBC, or you can use an SBC offered in the AWS Marketplace.

Best practices for high availability of IVR solution on AWS

  • Configure the multiple Availability Zone (Multi-AZ) SBC setup
  • Make sure that the telecom provider for the SIP trunk is different from the internet service provider (ISP). This is for last mile connectivity for the DC from Direct Connect
  • Consider redundancy for Direct Connect by using a Site-to-Site VPN tunnel
Figure 3. Solution architecture of DR on AWS for a third-party IVR solution

Figure 3. Solution architecture of DR on AWS for a third-party IVR solution

Communication flow for an IVR solution deployed on a corporate DC and its DR on AWS

  1. The callers are received on the telecom providers SIP line, which terminates on the AWS Direct Connect location.
  2. At the DX location, you will configure a route in the AWS router to send the traffic to the IP address of the NLB. The NLB should be configured to perform health checks on the virtual machine in your on-premises DC. Based on these health checks, the NLB will do the routing and the failover.
  3. In a live scenario with successful health checks at the DC, the NLB will forward the call to the IP of the on-premises virtual machine. This is where the IVR application will be installed.
  4. The communication between the NLB in Amazon VPC and the virtual machine in DC, will happen over Direct Connect.
  5. In a DR scenario, the NLB will failover the communication to SBCs in Amazon VPC.

Conclusion

This solution is useful when a third-party IVR system is deployed in a corporate data center, and the passive DR site is hosted on AWS. Cost optimization on telecom components is an important aspect of this design. AWS Direct Connect provides dedicated connectivity to the AWS environment, from 50 Mbps up to 10 Gbps. This gives you managed and controlled latency. It also provides provisioned bandwidth, so your workload can connect to AWS resources in a reliable, scalable, and cost-effective way.

The solution in this blog explains the end-to-end flow of communication, from the user to the IVR agents. It also provides insights into managing failover and failback between DR and the DR site.

Further Reading:

The collective thoughts of the interwebz

By continuing to use the site, you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.

Close