[$] CXL 2: Pooling, sharing, and I/O-memory resources

Post Syndicated from original https://lwn.net/Articles/894626/

During the final day of the 2022 Linux Storage,
Filesystem, Memory-management and BPF Summit
(LSFMM), attention in the
memory-management track turned once again to the challenges posed by the
upcoming Compute Express Link (CXL) technology. Two sessions looked at
different problems posed by CXL memory, which can come and go over the
operation of the system. CXL offers a lot of flexibility, but changes will
be needed for the kernel to be able to take advantage of it.

Huang: Rust: A Critical Retrospective

Post Syndicated from original https://lwn.net/Articles/895773/

Andrew ‘bunnie’ Huang has posted an extensive review of
the Rust language
derived from the experience of writing “over
100k lines
” of code.

Rust is a difficult language for authoring code because it makes
these “cheats” hard – as long as you have the discipline of not
using “unsafe” constructions to make cheats easy. However, really
hard does not mean impossible – there were definitely some cheats
that got swept under the rug during the construction of Xous.

This is where Rust really exceeded expectations for me. The
language’s structure and tooling was very good at hunting down
these cheats and refactoring the code base, thus curing the cancer
without killing the patient, so to speak. This is the point at
which Rust’s very strict typing and borrow checker converts from a
productivity liability into a productivity asset.

CVE-2022-22972: Critical Authentication Bypass in VMware Workspace ONE Access, Identity Manager, and vRealize Automation

Post Syndicated from Jake Baines original https://blog.rapid7.com/2022/05/19/cve-2022-22972-critical-authentication-bypass-in-vmware-workspace-one-access-identity-manager-and-vrealize-automation/

CVE-2022-22972: Critical Authentication Bypass in VMware Workspace ONE Access, Identity Manager, and vRealize Automation

On May 18, 2022, VMware published VMSA-2022-0014 on CVE-2022-22972 and CVE-2022-22973. The more severe of the two vulnerabilities is CVE-2022-22972, a critical authentication bypass affecting VMware’s Workspace ONE Access, Identity Manager, and vRealize Automation solutions. The vulnerability allows attackers with network access to the UI to obtain administrative access without the need to authenticate. CVE-2022-22972 may be chained with CVE-2022-22973 to bypass authentication and obtain root access. A full list of affected products is available in VMware’s advisory.

At time of writing, there is no public proof of concept for CVE-2022-22972, and there have been no reports of exploitation in the wild. We expect this to change quickly, however, since Rapid7 researchers have seen similar VMware vulnerabilities come under attack quickly in recent weeks. In April 2022, we published details on CVE-2022-22954, a server-side template injection flaw that was widely exploited by threat actors targeting internet-facing VMware Workspace ONE and Identity Manager applications.

In conjunction with VMware’s advisory on May 18, the US Cybersecurity and Infrastructure Agency (CISA) published Emergency Directive 22-03 in response to VMSA-2022-0014. The directive requires all “Federal Civilian Executive Branch” agencies to either apply the patch or remove affected VMware installations from agency networks by May 24, 2022. CISA also released an additional alert emphasizing that threat actors are known to be chaining recent VMware vulnerabilities — CVE-2022-22954 and CVE-2022-22960 — to gain full control of vulnerable systems. CISA’s alert notes that the new vulnerabilities in VMSA-2022-0014 are likely to be exploited in the wild quickly:

Due to the [likely] rapid exploitation of these vulnerabilities, CISA strongly encourages all organizations with affected VMware products that are accessible from the internet — that did not immediately apply updates — to assume compromise.

Mitigation guidance

VMware customers should patch their Workspace ONE Access, Identity Manager, and vRealize Automation installations immediately, without waiting for a regular patch cycle to occur. VMware has instructions here on patching and applying workarounds.

Additionally, if your installation is internet-facing, consider taking steps to remove direct access from the internet. It may also be prudent to follow CISA’s guidance on post-exploitation detection methods found in Alert (AA22-138B).

Rapid7 customers

InsightVM and Nexpose customers can assess their VMware Workspace ONE Access and Identity Manager systems’ exposure to CVE-2022-22972 and CVE-2022-22973 with authenticated vulnerability checks for Unix-like systems available in the May 19, 2022 content release. (Note that VMware Workspace ONE Access is only able to be deployed on Linux from 20.x onward.) Additional vulnerability coverage will be evaluated as the need arises.

NEVER MISS A BLOG

Get the latest stories, expertise, and news about security today.

Eurovision 2022, the Internet effect version

Post Syndicated from João Tomé original https://blog.cloudflare.com/eurovision-2022-internet-trends/

Eurovision 2022, the Internet effect version

Eurovision 2022, the Internet effect version

There’s only one song contest that is more than six decades old and not only presents many new songs (ABBA, Celine Dion, Julio Iglesias and Domenico Modugno shined there), but also has a global stage that involves 40 countries — performers represent those countries and the public votes. The 66th edition of the Eurovision Song Contest, in Turin, Italy, had two semi-finals (May 10 and 12) and a final (May 14), all of them with highlights, including Ukraine’s victory. The Internet was impacted in more than one way, from whole countries to the fan and official broadcasters sites, but also video platforms.

On our Eurovision dedicated page, it was possible to see the level of Internet traffic in the 40 participant countries, and we tweeted some highlights during the final.


First, some technicalities. The baseline for the values we use in the following charts is the average of the preceding week, except for the more granular minute by minute view that uses the traffic average of May 9 and 10 as baseline. To estimate the traffic to the several types of websites from the 40 participating countries, we use DNS name resolution data. In this blog post, we’re using CEST, Central European Summer Time.

It’s not often that an entertainment event has an impact on a country’s Internet. So, was there an impact on Eurovision nights?

Let’s start with aggregate Internet traffic to the 40 participant countries (Australia included). In the first May 10 semi-final, there seems to be a slight decrease in traffic during the contest — it makes sense if we think that most people were probably watching the broadcast on national TV (and not on YouTube, that was also transmitting live the event). Traffic was lower than in the previous period between 21:00 and 23:00 (the event was between 21:00 to 23:14), but it was back to normal at 23:00.

Eurovision 2022, the Internet effect version

For the second semi-final that trend is less clear. But the May 14 final (that lasted from 21:00 CEST to 01:10) told a different story. Traffic was 6% lower than on the previous Saturday after 21:00, mostly around 22:00, and after 23:15 it was actually higher (between 4% and 6%) than before and continued that way until 02:00.

What happened at that 23:15 time in Eurovision? The last of the 25 songs at the contest was Estonia’s “Hope”, by Stefan, and it ended at 23:14 (also in this blog post we will also see how 23:16 was the highest spike in terms of DNS traffic to fan websites during the final). This is the Internet traffic in the participating countries on May 14 chart:

Eurovision 2022, the Internet effect version

There were several countries that showed similar impact in terms of traffic change during at least the final. France, UK, Germany, Iceland, Greece and Switzerland are examples.

Eurovision & the UK

The UK was one of the countries where there seems to be more impact during the time of the grand final — last year, according to the ratings, eight million were watching the BBC transmission with the commentator Graham Norton. Traffic started to drop to lower levels than usual at 20:30 (a few minutes before the final) and was 20% lower at 22:00, starting to go closer to normal levels after 23:00, when the set of 25 finalists’ songs came to an end.

Here’s the UK’s Internet traffic trend during the Eurovision May 14 final:

Eurovision 2022, the Internet effect version

Fan sites: what a difference a winner makes

The most obvious thing to check in terms of impact are the fan websites. Eurovision has many, some general (there’s the OGAE, General Organisation of Eurovision Fans), others more local. And DNS traffic to them was clearly impacted.

The first semi-final, on May 10, had 33x more traffic than in the average of the previous week, with a clear 22:00 CEST spike. But the second semi-final, May 12, topped that, with 42x more traffic at the same time. The final, with the 25 finalists, clearly surpassed that and at 22:00 traffic was already 70x. But because the final was much longer (in the semi-finals it was around 23:00 that the finalists were announced), the peak was reached at 23:00, with 86x more traffic than usual.

Eurovision 2022, the Internet effect version

“We have a winner. The winner of the Eurovision Song Contest 2022 is… Ukraine!”.
Alessandro Cattelan, Laura Pausini and Mika at 01:01 CEST, May 15, 2022.

Saturday’s final was more than four hours long (the semi-finals took little over two hours), and it finished a few minutes after 01:00 CEST. DNS traffic to fan websites dropped from 86x to 45x at midnight, but it went up again to 49x more traffic when it was already 01:00 CEST in most of Europe and Ukraine was announced the winner of Eurovision 2022. This next chart shows Saturday’s May 14 final traffic change to fan sites:

Eurovision 2022, the Internet effect version

We can also clearly see that on Sunday morning, at 09:00, there was a 20x peak to fan sites, and also at 11:00 (17%).

Now, let’s go deeper by looking at a minute by minute view (the previous charts show hourly data) of DNS traffic to fan sites. In the two semi-finals it’s easy to see that the moment the finalists were announced, and the event was ending, around 23:12, was when traffic was higher. Here’s what the May 10 (yellow) and May 12 (green) two semi-finals fan sites growth looked like:

Eurovision 2022, the Internet effect version

We can also spot some highlights in fan sites during the semi-final besides the finalists’ announcement, which we saw were definitely the most popular moments of the two nights. First, on May 10 there was more traffic before the event (21:00) than on May 12, so people seem to have greater expectations of the first Eurovision 2022 event of the week. In terms of spikes (before the winners’ announcements), we created a list of moments in time with more interest to the fan websites and connected them to the events that were taking place at that time in Eurovision (ordered by impact):

First semi-final, May 10
#1. 22:47 Sum up of all the songs.
#2. 22:25 Norway’s song (Subwoolfer, “Give That Wolf a Banana”).
#3. 21:42 Bulgaria’s song (Intelligent Music Project, “Intention”).
#4. 21:51 Moldova’s song (Zdob și Zdub and Advahov Brothers, “Trenulețul”).
#5. 22:20 Greece’s song (Amanda Georgiadi Tenfjord, “Die Together”).

Second semi-final, May 12
#1. 21:22 Between Serbia (Konstrakta, “In corpore sano”) and Azerbaijan (Nadir Rustamli, “Fade to Black”).
#2. 22:48 Voting period starts.
#3. 22:30 Czech Republic’s song (We Are Domi, “Lights Off”).
#4. 22:38 Laura Pausini & Mika performing (“Fragile” Sting cover song).
#5. 22:21 Belgium’s song (Jérémie Makiese, “Miss You”).

How about the May 14 final? This chart (followed by a ranking list) shows DNS traffic spikes in fan sites on Saturday’s final:

Eurovision 2022, the Internet effect version

Final, May 14
#1. 23:11 Between Serbia (Konstrakta, “In corpore sano”) and Estonia (Stefan, “Hope”).
#2. 23:33 Sum up of all the songs.
#3. 23:57 Voting ended.
#4. 23:19 Sum up of all the songs.
#5. 23:01 Ending of the United Kingdom’s song (Sam Ryder, “Space Man”).


(UK’s performer and representative Sam Ryder with Graham Norton, the BBC commentator of Eurovision since 2009 — the BBC broadcasts the event since 1956.)

The broadcasters show

How about official national broadcaster websites? Around 23:00 CEST traffic to the aggregate of 40 broadcasters was generally higher on the semi-finals and final nights (represented in grey on the next chart). That’s more clear on the final at 23:00, when DNS traffic was 18% higher than in the previous Saturday (and 50% compared to the previous day). During the semi-finals the difference is more subtle, but at 23:00 traffic in both May 10 and 12 traffic was ~6% higher than in previous days.

Eurovision 2022, the Internet effect version

When we focus on the minute by minute view also on the broadcaster sites but on the three Eurovision evenings, the highest growth in traffic is also during the final (like we saw in the fan sites), mainly after 23:00, which seems normal, considering that the final was much longer in time than the semi-finals that ended around that time.

Eurovision 2022, the Internet effect version

During the final (represented in pink in the previous chart), there were some clear spikes. We’ve added them to a ranking that also shows what was happening in the event at that time.

Broadcaster site spikes. Final, May 14
#1. 21:52 Best moments clip of the two semi-finals
#2. 21:00 Contest starts
#3. 00:24 Sam Ryder, the UK representative (with the song “Space Man”) being interviewed after reaching the #1 in the voting process.
#4. 01:09 Ukraine’s (Kalush Orchestra, “Stefania”) performance as the winner
#5. 01:02 Ukraine was announced as the Eurovision 2022 winner.

Video platforms: the post-final growth

Eurovision uses video platforms like YouTube and TikTok to share all the songs, clips of the events and performers and there was also a live transmission on YouTube of the three nights. Given that, we looked at DNS traffic to the video platforms in an aggregate for the 40 participating countries. So, was there an impact to this well known and high performing social and video platforms? The short answer is: yes.

The final was also the most evident example, especially after 23:15, when all the 25 finalists songs already performed and the event had two more hours of non-participant performances, video clips that summarize the songs and the voting process — the famous moment in Europe to find out who will get from each of the 40 participant countries the maximum of 12 points.

In this comparison between the semi-finals and final day, we can see how on May 10, the day of the first semi-final, video platform traffic had more growth before the contest started, which is not that surprising given that it was the first Eurovision 2022 event and there was perhaps curiosity to check who were the other contestants (by then Eurovision had videos of them all on YouTube).

Eurovision 2022, the Internet effect version

But the May 14 final shows more DNS traffic growth than the other Eurovision days after 23:16 (as we saw before, that was the time when all the finalists’ songs had already been performed). The difference in traffic compared to the semi-finals was higher at 1:11 CEST. That was the moment that the final came to an end on Saturday night, and at that time it reached 31% more traffic to video platforms than on May 10, and 38% than on May 12.

Australia’s impact (with an eight hours difference)

Australia was one of the 40 participants, and it had a major time difference (there’s an eight-hour difference to CEST). Continuing to look at video platforms, DNS traffic in Australia was 22% higher at 23:00 CEST (07:00 local time) than it was in the previous Saturday and continued high around 17% of increase a few hours after. Before the 23:00 peak, traffic was 20% higher at 22:00 and 17% at 21:00, when the event was beginning.

Eurovision 2022, the Internet effect version

The winners & social media

Social media in general in the 40 participating countries wasn’t as impacted, but there was a 01:00 CEST spike during the final at around the time the decision to choose the winner was between Ukraine and the UK — at 01:01 Ukraine was announced the winner of Eurovision 2022.

Eurovision 2022, the Internet effect version

We can also see an impact on social media in Ukraine, when Kalush Orchestra’s “Stefania” song was announced the winner at Saturday’s, May 14, final (it was already after midnight, May 15). The usual traffic slowing down night trend that is seen in other days was clearly interrupted after 01:02 CEST (02:02 local time in Ukraine).

Eurovision 2022, the Internet effect version

Conclusion: the Eurovision effect

When an event like Eurovision happens, there are different patterns on the Internet in the participating countries, usually all in Europe (although this year Australia was also there). Fan and broadcaster websites have specific impact because of the event, but in such a multimedia event, there are also some changes in video platforms’ DNS traffic.

And that trend goes as far as the Internet traffic of the participating countries at a more general level, something that seems to indicate that people, at least for some parts of Eurovision and in some countries, were more focused on their national TV broadcast.

The Internet is definitely a human-centric place, as we saw before in different moments like the 2022 Oscars, the Super Bowl, French elections, Ramadan or even the war on Ukraine and the impact on the open Internet in Russia.

Security updates for Thursday

Post Syndicated from original https://lwn.net/Articles/895771/

Security updates have been issued by Fedora (microcode_ctl, rubygem-nokogiri, and vim), Mageia (htmldoc, python-django, and python-oslo-utils), Red Hat (container-tools:2.0, kernel, kernel-rt, kpatch-patch, and pcs), SUSE (ardana-barbican, grafana, openstack-barbican, openstack-cinder, openstack-heat-gbp, openstack-horizon-plugin-gbp-ui, openstack-ironic, openstack-keystone, openstack-neutron-gbp, python-lxml, release-notes-suse-openstack-cloud, autotrace, curl, firefox, libslirp, php7, poppler, slurm_20_11, and ucode-intel), and Ubuntu (bind9, gnome-control-center, and libxrandr).

3-те най важни стъпки при покупката на недвижим имот

Post Syndicated from VassilKendov original http://kendov.com/3-%D1%82%D0%B5-%D0%BD%D0%B0%D0%B9-%D0%B2%D0%B0%D0%B6%D0%BD%D0%B8-%D1%81%D1%82%D1%8A%D0%BF%D0%BA%D0%B8-%D0%BF%D1%80%D0%B8-%D0%BF%D0%BE%D0%BA%D1%83%D0%BF%D0%BA%D0%B0%D1%82%D0%B0-%D0%BD%D0%B0-%D0%BD/


3-те най важни стъпки при покупката на недвижим имот

1. Осигуряване на финансиране
2. Избор на имот
3. Предварителен договор

За срещи и консултации по банкови неволи, моля използвайте посочената форма.

[contact-form-7]

Изпълните ли тези 3 стъпки както трябва, шансът за измама или проблеми по покупката на недвиим имот става минимален

Най честите проблеми при всяка от стъпките са следните:

1.При осигуряване на финансиране – тази стъпка често се изпълнява втора (след дефакто избран вече имот), но не трябва да бъде така. Това Ви кара да бързате при осигуряването на финансиране, което увеличава разходате по кредита.
Кредитните консултанти са за предпочитане пред банковите служители. Кредитните консултанти са на Ваша страна, докато служителите защитават интереса на конкретна банка. Хубаво е да получите оферти от няколко банки и да се запознаете с клаузите на договора, преди да го подпишете. Все пак инвестирате спестяванията си и ще плащате 20 години. Добре е да сте информирани в детайли по клаузите на договора.

2. Изборът на имот – тук повечето проблеми идват от брокерите на недвижими имоти и информацията, която те укриват или изменят. Напослетък се налага схемата в която се твърди, че при продажба на ново строителство, не се взима комисион от купувача, но реално Ви продават на по-висока цена, в която е включен комисиона на брокера

3. Предварителен договор – битува схващането, че “адвоката на агенцията” ще го изготви безплатно.
Има наредба за минималните адвокатски хонорари. Мислите ли, че адвокат ще свърши нещо  “безплатно”, което по тарифа струва между 1200 и 1600 лева?
Самия брокер получава между 2 и 3% комисион, да тов акоето прави, а адвоката проверява имота, проверява фирмата и изготвя договора безплатно?!?

Бъдете реалисти. Ако ще се случва измама, то тя ще бъде на ниво предварителен договор.
СЪВЕТВАМ ВИ АДВОКАТА ДА Е ПЛАТЕН ОТ ВАС.
Безплатно е само сиренето в капана за мишки

Следващото видео ще бъде на тема – Спасяват ли се спестяванията от инфлацията, чрез покупка на недвиим имот? В кои случаи това е оправдано и рентабилно?

Ако информацията Ви е била полезна, моля споделете статията във ФБ и се абонирайте за канала в Youtube – https://www.youtube.com/channel/UChh1cOXj_FpK8D8C0tV9GKg

Васил Кендов
финансов консултант

The post 3-те най важни стъпки при покупката на недвижим имот appeared first on Kendov.com.

Websites that Collect Your Data as You Type

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2022/05/websites-that-collect-your-data-as-you-type.html

A surprising number of websites include JavaScript keyloggers that collect everything you type as you type it, not just when you submit a form.

Researchers from KU Leuven, Radboud University, and University of Lausanne crawled and analyzed the top 100,000 websites, looking at scenarios in which a user is visiting a site while in the European Union and visiting a site from the United States. They found that 1,844 websites gathered an EU user’s email address without their consent, and a staggering 2,950 logged a US user’s email in some form. Many of the sites seemingly do not intend to conduct the data-logging but incorporate third-party marketing and analytics services that cause the behavior.

After specifically crawling sites for password leaks in May 2021, the researchers also found 52 websites in which third parties, including the Russian tech giant Yandex, were incidentally collecting password data before submission. The group disclosed their findings to these sites, and all 52 instances have since been resolved.

“If there’s a Submit button on a form, the reasonable expectation is that it does something — that it will submit your data when you click it,” says Güneş Acar, a professor and researcher in Radboud University’s digital security group and one of the leaders of the study. “We were super surprised by these results. We thought maybe we were going to find a few hundred websites where your email is collected before you submit, but this exceeded our expectations by far.”

Research paper.

A teaspoon of computing in every subject: Broadening participation in computer science

Post Syndicated from Sue Sentance original https://www.raspberrypi.org/blog/guzdial-teaspoon-computing-tsp-language-broadening-participation-school/

From May to November 2022, our seminars focus on the theme of cross-disciplinary computing. Through this seminar series, we want to explore the intersections and interactions of computing with all aspects of learning and life, and think about how they can help us teach young people. We were delighted to welcome Prof. Mark Guzdial (University of Michigan) as our first speaker.

Mark Guzdial.
Professor Mark Guzdial, University of Michigan

Mark has worked in computer science (CS) education for decades and won many awards for his research, including the prestigious ACM SIGCSE Outstanding Contribution to Computing Education award in 2019. He has written literally hundreds of papers about CS education, and he authors an extremely popular computing education research blog that keeps us all up to date with what is going on in the field.

Young learners at computers in a classroom.

In his talk, Mark focused on his recent work around developing task-specific programming (TSP) languages, with which teachers can add a teaspoon (also abbreviated TSP) of programming to a wide variety of subject areas in schools. Mark’s overarching thesis is that if we want everyone to have some exposure to CS, then we need to integrate it into a range of subjects across the school curriculum. And he explained that this idea of “adding a teaspoon” embraces some core principles; for TSP languages to be successful, they need to:

  • Meet the teachers’ needs
  • Be relevant to the context or lesson in which it appears
  • Be technically easy to get to grips with

Mark neatly summarised this as ‘being both usable and useful’. 

Historical views on why we should all learn computer science

We can learn a lot from going back in time and reflecting on the history of computing. Mark started his talk by sharing the views of some of the eminent computer scientists of the early days of the subject. C. P. Snow maintained, way back in 1961, that all students should study CS, because it was too important to be left to a small handful of people.

A quote by computer scientist C. S. Snow from 1961: A handful of people, having no relation to the will of society, having no communication with the rest of society, will be taking decisions in secret which are going to affect our lives in the deepest, sense.

Alan Perlis, also in 1961, argued that everyone at university should study one course in CS rather than a topic such as calculus. His reason was that CS is about process, and thus gives students tools that they can use to change the world around them. I’d never heard of this work from the 1960s before, and it suggests incredible foresight. Perhaps we don’t need to even have the debate of whether computer science is for everyone — it seems it always was!

What’s the problem with the current situation?

In many of our seminars over the last two years, we have heard about the need to broaden participation in computing in school. Although in England, computing is mandatory for ages 5 to 16 (in theory, in practice it’s offered to all children from age 5 to 14), other countries don’t have any computing for younger children. And once computing becomes optional, numbers drop, wherever you are.

""
Not enough students are experiencing computer science in school.

Mark shared with us that in US high schools, only 4.7% of students are enrolled in a CS course. However, students are studying other subjects, which brought him to the conclusion that CS should be introduced where the students already are. For example, Mark described that, at the Advanced Placement (AP) level in the US, many more students choose to take history than CS (399,000 vs 114,000) and the History AP cohort has more even gender balance, and a higher proportion of Black and Hispanic students. 

The teaspoon approach to broadening participation

A solution to low uptake of CS being proposed by Mark and his colleagues is to add a little computing to other subjects, and in his talk he gave us some examples from history and mathematics, both subjects taken by a high proportion of US students. His focus is on high school, meaning learners aged 14 and upwards (upper secondary in Europe, or key stage 4 and 5 in England). To introduce a teaspoon of CS to other subjects, Mark’s research group builds tools using a participatory design approach; his group collaborates with teachers in schools to identify the needs of the teachers and students and design and iterate TSP languages in conjunction with them.

Three teenage boys do coding at a shared computer during a computer science lesson.

Mark demonstrated a number of TSP language prototypes his group has been building for use in particular contexts. The prototypes seem like simple apps, but can be classified as languages because they specify a process for a computational agent to execute. These small languages are designed to be used at a specific point in the lesson and should be learnable in ten minutes. For example, students can use a small ‘app’ specific to their topic, look at a script that generates a visualisation, and change some variables to find out how they impact the output. Students may also be able to access some program code, edit it, and see the impact of their edits. In this way, they discover through practical examples the way computer programs work, and how they can use CS principles to help build an understanding of the subject area they are currently studying. If the language is never used again, the learning cost was low enough that it was worth the value of adding computation to the one lesson.

We have recorded the seminar and will be sharing the video very soon, so bookmark this page.

Try TSP languages yourself

You can try out the TSP language prototypes Mark shared yourself, which will give you a good idea of how much a teaspoon is!

DV4L: For history students, the team and participating teachers have created a prototype called DV4L, which visualises historical data. The default example script shows population growth in Africa. Students can change some of the variables in the script to explore data related to other countries and other historical periods.

Pixel Equations: Mathematics and engineering students can use the Pixel Equations tool to learn about the way that pictures are made up of individual pixels. This can be introduced into lessons using a variety of contexts. One example lesson activity looks at images in the contexts of maps. This prototype is available in English and Spanish. 

Counting Sheets: Another example given by Mark was Counting Sheets, an interactive tool to support the exploration of counting problems, such as how many possible patterns can come from flipping three coins. 

Have a go yourself. What subjects could you imagine adding a teaspoon of computing to?

Join our next free research seminar

We’d love you to join us for the next seminar in our series on cross-disciplinary computing. On 7 June, we will hear from Pratim Sengupta, of the University of Calgary, Canada. He has conducted studies in science classrooms and non-formal learning environments, focusing on providing open and engaging experiences for anyone to explore code. Pratim will share his thoughts on the ways that more of us can become involved with code when we open up its richness and depth to a wider audience. He will also introduce us to his ideas about countering technocentrism, a key focus of his new book.

And finally… save another date!

We will shortly be sharing details about the official in-person launch event of the Raspberry Pi Computing Education Research Centre at the University of Cambridge on 20 July 2022. And guess who is going to be coming to Cambridge, UK, from Michigan to officially cut the ribbon for us? That’s right, Mark Guzdial. More information coming soon on how you can sign up to join us for free at this launch event.

The post A teaspoon of computing in every subject: Broadening participation in computer science appeared first on Raspberry Pi.

Промяна, ама за бизнеса

Post Syndicated from Емилия Милчева original https://toest.bg/promyana-ama-za-biznesa/

„Леви цели с десни инструменти“ – в тази енигма бяха закодирани икономическата политика на Кирил Петков, Асен Василев и брандът им „Промяната“. Тя така и не беше ясно дефинирана. Дефинициите ограничават, нали? Осем месеца по-късно нито (леви) цели има, нито (десни) инструменти.

Управляващата четворна коалиция се носи като ноев ковчег в очакване световният потоп да спре. Да спре покачването на енергийните цени, руската война в Украйна да приключи, а после и инфлацията да се кротне (санкциите обаче със сигурност ще продължат). А докато изчакват, от правителството панически разхвърлят пари, за да „нахранят“ протестите и така да си откупят още време.

Властта се кълне, че фокусът на политиките ѝ е „обикновеният човек“, но действителността е различна от внушенията и медийната лъст. Най-облагодетелстван от подпомагането досега е бизнесът. Базираният в Брюксел икономически институт „Брьогел“ публикува на 11 май проучване на националните политики с цел да бъдат защитени потребителите от високите цени.

Само две от изследваните 27 европейски страни не са подпомогнали уязвимите групи – България и Унгария.

Това, за което правителството търпи заслужени критики, е липсата на диференциран подход при защитата на бизнес и граждани.

Мярката с наложения мораториум върху цените на тока, парното и водата за битови потребители не беше ефективна, защото сега предстоят сериозни поскъпвания. Освен това бизнесът калкулира скъпата електроенергия в цените на произведените стоки и услуги. Битовите абонати в България все още не купуват ток от свободния пазар, но колкото и Комисията по енергийно и водно регулиране да сдържа исканите от дружествата повишения на цените, все пак трябва да осигури що-годе нормалното им функциониране – доставки на електроенергия и ВиК услуги.

През тези пет месеца управление законодателното мнозинство на коалицията така и не подготви дефиниция на „енергийна бедност“, изисквана от Световната банка (СБ) и Европейската комисия. Сега тези, които получават помощи за отопление през зимния сезон, са около 300 000, но според доклад на СБ енергийно бедни са 61% от българските домакинства. Вместо да направи пазарни цените на тока за битовите абонати и да изготви методология за подпомагане на тези, които имат затруднения с плащането, през цените на тока за бита

властта субсидира и богатите, и бедните – и тези, които отопляват луксозна къща, и тези в панелка и струпана набързо колиба.

Ако националният План за възстановяване и устойчивост бъде най-сетне одобрен и от Съвета на ЕС, в него са предвидени определени реформи, в т.ч. и дефиниране на „енергийната бедност“.

Липсва и законодателно уреден механизъм за подпомагане на домакинства с ниски доходи за ВиК услуги, независимо от изискването за т.нар. социално поносима цена. През 2020 г. беше подготвен законопроект, който предвиждаше както нова формула за изчисление на кубик вода съобразно дохода в съответната област и по-справедливо заплащане към ВиК операторите, но също и дефиниране на уязвими групи, които да бъдат подпомагани. Приемането на такъв закон бе изискван още с присъединяването на България към ЕС – като условие за финансиране на сектора с европейски средства, но така и не видя бял свят.

ВиК секторът е все така нереформиран – създаването на ВиК холдинг не се брои за реформа. А миналата година служебният кабинет отпусна 450 млн. лв. на дружествата, за да се справят с високите цени на електроенергията, защото имаше и все още има опасност да спрат доставките на вода в някои селища. За инвестиции изобщо не може да се говори, а загубите по мрежата си остават на нива от средно около 60% за страната.

На фона на игнориране на проблемите на социално слабите групи в България, подпомагането за бизнеса върви –

до степен, че спиране на временна мярка 60/40, наложена заради пандемията от COVID-19, предизвикваше абстинентен синдром в едрото предприемачество, свикнало на държавни бонуси и евросубсидии. Достатъчно беше големите работодателски организации да обявят, че се присъединяват към протестите на превозвачите – и правителството удължи срока за енергийно подпомагане с още два месеца. И бизнесът се отказа да протестира, но от кумова срама заяви, че остава в готовност.

Само за енергийни помощи заради скъпата електроенергия, отпускани от декември насам и продължаващи и през май и юни (включително за топлофикациите, които купуват природния газ на регулиран пазар), сметката надхвърля 3 млрд. лв. Парите се осигуряват от държавния бюджет, дивидента от държавните енергийни дружества, както и от приходите от продажба на квоти въглеродни емисии.

Тази сума е много по-голяма от приблизително двата милиарда лева, на колкото се изчисляват обявените антикризисни мерки, включващи 0% ДДС за хляба, повишаване на пенсиите със средно 20%, по-високи данъчни облекчения за семействата с деца и от 1 юли отстъпки в размер на 25 ст. на литър за бензин 95 и дизел (без добавки), метан и пропан-бутан на всяко зареждане.

Мярката с горивата също не е диференцирана съобразно социалния статус – 

така най-облагодетелствани ще са собствениците на мощни автомобили, както и домакинствата с по няколко коли.

Такава е и позицията на експерти като Мартин Владимиров от Центъра за изследване на демокрацията. Пред БНТ той заяви, че е много по-разумно да се помогне на най-уязвимите групи, които не могат да покриват задълженията си в енергиен план и са ограничили потреблението си – това важи за около 1/3 от населението:

Да се дава отстъпка за горивата за цялото население е неразумно, защото дава грешни стимули на средната класа и на по-богатите да консумират много повече горива, отколкото са им необходими.

Според него към домакинствата и най-уязвимите групи трябва да бъдат насочени директни кешови трансфери, за да могат те сами да решат за какво да похарчат парите си. „За някои хора купуването на основни хранителни стоки в магазина ще е много по-важно от зареждането на горива. От тази гледна точка мярката е по-скоро популистка“, коментира Владимиров.

От 1 юли всички пенсии ще се повишат с 10%, към тях ще се прибави и бонусът от 60 лв., както и различни по размер компенсации за над един милион пенсионери.

Останалите мерки са в услуга на граждани, но също и на бизнеса.

Към диференцираната ставка от 9% ДДС, въведена още при пандемията от COVID-19 за ресторантьори и собственици на заведения, сега се прибавя и 9% ДДС за парното и топлата вода. Толкова ще е ставката на ДДС за природния газ за крайни потребители, включително и битовите – за срок от една година. От акциз се освобождават електроенергията, природният газ и метанът. Наказателната лихва за просрочени задължения за фирмите се намалява от 10 на 8%, а за физическите лица – от 10 на 4%.

Нулевата ставка на ДДС за хляба е добър бонус за производителите на хляб и хлебни изделия. Това е много по-голяма отстъпка от 9% ДДС, за които призова президентът Румен Радев при посещението си през февруари в завод „Добруджански хляб“, чийто собственик Енчо Малев е най-големият производител в България. От bTV изчислиха, че цената на масово купувания хляб „Добруджа“ в опаковка от 830 г от сегашните 2,30–2,50 лв. ще се върне на нива отпреди последното поскъпване – 1,90–2,00 лв. Но при условие че до влизането на мярката в сила производителите не вдигнат още цените. Хлебопроизводители вече коментираха, че едва ли хлябът ще поевтинее, тъй като енергийните суровини и брашното поскъпват, и дори по-късно може да има още повишения.

„Този пакет от мерки гарантира, че стандартът на живот на всички българи ще бъде запазен“, заяви премиерът Кирил Петков на пресконференция в Министерски съвет.

Освен че не е вярно, това е и практически невъзможно.

Ако иска да бъде по-убедителен, вместо да пръска „хеликоптерни пари“, да намери демократичен социалист като американския политик Бърни Сандърс. Защото никой от БСП не е такъв, макар че на XIV конгрес през 1990-та партията (тогава БКП) обяви курс към „демократичен социализъм“ в контекста на перестройката в СССР. Само че ръководството рязко зави към едрия капитал, а носталгията по соца остави на избирателите си.

Струва си да има един Бърни Сандърс в правителството на толкова бедна и корумпирана държава като България. В кампанията си през 2019 г., в която се бореше да бъде номиниран за кандидат-президент от Демократическата партия, Сандърс каза неща, разбираеми за България и Европа – изобщо такива, каквито преизбраният за втори мандат Румен Радев никога не е говорил.

По целия свят борбата срещу олигархията върви успоредно със засилването на авторитарните режими – като на Путин в Русия, Си Дзинпин в Китай, Мохамед бин Салман в Саудитска Арабия, Родриго Дутерте във Филипините, Болсонаро в Бразилия и Виктор Орбан в Унгария. Тези лидери съчетават корпоративната икономика с ксенофобията и авторитаризма. Те пренасочват народния гняв срещу неравенството и влошаващите се икономически условия към насилствена ярост срещу малцинствата – независимо дали са имигранти, расови малцинства, религиозни малцинства, или ЛГБТ общност. И за да потиснат несъгласието, те се борят с демокрацията и правата на човека.

Откакто дойдоха на власт, никой от четворната коалиция не отваря дума за олигарси – като че ли са се изпарили със свалянето на ГЕРБ от власт. Макар да не са. Президентът също не се спира на тази тема, той предпочита да насочва своя гибелен гняв към правителството. Дали ще изрази задоволство от новия пакет мерки?

Как разбират Бърни Сандърс и сподвижниците му демократичния социализъм – избраните на демократични избори политици да използват публичния сектор за насърчаване на равенството и повече възможности, за по-добро и по-качествено образование и здравеопазване. Наистина върви за България, където са регистрирани едни от най-големите неравенства в Европа.

Някои от изказванията на Сандърс през 2019 г. звучат като писани за България през 2022 г.

Например онези, в които той говори за предизвикателствата пред света ни днес, сравнявайки ги с „дълбоко вкоренените и на пръв поглед непреодолими икономически и социални различия“, довели до възхода на десните националистически сили през 30-те години на миналия век.

В Европа гневът и отчаянието в крайна сметка бяха овладени от авторитарни демагози, които смесиха корпоративизма, национализма, расизма и ксенофобията в политическо движение, което натрупа тоталитарна власт, унищожи демокрацията и в крайна сметка уби милиони хора – включително членове на моето собствено семейство.

Последните изследвания на социолозите от „Тренд“, „Галъп“ и „Маркет Линкс“ показват как в стълбицата на общественото одобрение се изкачват точно такива демагози. Но никоя от политическите сили не предупреждава за заплахата.

Така че един демократичен социалист ще му дойде добре на правителството на най-бедната държава в ЕС. Но само ако е автентичен. Остава да се намерят десните инструменти.

Заглавна снимка: Председателката на Европейската комисия Урсула фон дер Лайен на пресконференция с министър-председателя Кирил Петков на 7 април 2022 г. © Пресцентър на Министерския съвет

Източник

(Не)моралните аспекти на войната

Post Syndicated from Александър Нуцов original https://toest.bg/nemoralnite-aspekti-na-voynata/

На 6 април 1994 г. при терористичен акт по време на полет загива президентът на Руанда Жювенал Хабяримана. Обвинени като поръчители са представители на етническата група тутси, а военни формирования от другия голям етнос в страната – хуту – окупират властта. Министър-председателката Агате Увилингиймана е екзекутирана, а контролираното от властта радио RTLM открито подканва населението към ликвидиране на тутси. Това дава началото на едно от най-кървавите и интензивни кланета в историята, при което по различни оценки са избити между 500 000 и 800 000 души в рамките на само няколко месеца.

Впоследствие геноцидът в Руанда заема ключово място като тема в международните отношения – не само заради изключителните си мащаби и жестокост, а и заради анемичната реакция на международни организации като ООН и водещи сили като САЩ, Великобритания, Франция, Русия и Китай. Това от своя страна повдига множество морални дилеми относно мястото на войната в политиката и цената на (без)действието на международната общност при избухване на агресия.

Годината на ужаса в Руанда съвпада с издаването на поредния Доклад за човешкото развитие на ООН, който следва тогавашните тенденции за преразглеждане на понятия като военна намеса и сигурност. Докладът стъпва върху концепцията за човешка сигурност (human security), която измества фокуса от националната държава върху индивида като обект на сигурността. И въпреки недостатъците на понятието в чисто академичен план, човешката сигурност се превръща във водещ фактор в сферата на международната политика със силно нормативната си база и практическа ориентация.

В контекста на множеството конфликти след падането на желязната завеса, войната се превръща във все по-сложен феномен в резултат на противоборството между вече установени принципи и норми в международните отношения. Принципите за суверенитет и ненамеса във вътрешните работи например са подкопани от концепцията за човешка сигурност. Безусловният суверенитет на отделната държава се превръща във функция на това дали тя гарантира сигурността и правата на собствените си граждани, или обратно – допринася за тяхното страдание и несигурност. При определени условия дори се допуска външна намеса под формата на хуманитарна интервенция от трета страна или коалиция от държави, чиято цел е да възстанови сигурността и да съхрани правата на гражданите, превърнали се в жертва на собствените си управници.

В отговор на неспособността на международната общност да предотврати геноцида в Руанда и в Сребреница, бивша Югославия, през 2001 г. е временно създадена Международната комисия по интервенции и държавен суверенитет (ICISS). В хода на работата си тя изковава понятието „отговорност за защита“ (Responsibility to Protect, или R2P), която легитимира външна намеса чрез военни, хуманитарни или друг вид средства за защита на мирното население в четири отделни случая – геноцид, военни престъпления, етническо прочистване и престъпления срещу човечеството.

В последните две десетилетия R2P постепенно се установява като норма в международните отношения след единодушното одобрение, което получава на Световната среща на върха на ООН през 2005 г. Пряко или косвено, Съветът за сигурност на ООН включва R2P в свои резолюции относно конфликтите в Сирия, Либия, Ирак, Судан и други точки на напрежение предимно в Близкия изток и Африка. Понятието все повече се използва и в политическата риторика на държавите за легитимиране на военни действия, неоторизирани от Съвета за сигурност.

От гледна точка на отговорността и справедливостта, бездействието на международната общност в Руанда и Сребреница може да бъде характеризирано като фатална грешка. Самият тогавашен президент на САЩ Бил Клинтън описва пасивността в Руанда като най-големия си външнополитически провал. Защо? Една превантивна и добре координирана военна намеса при конкретните обстоятелства би довела до бърза деескалация, минимални жертви и спасяване на живота на стотици хиляди.

Именно тук се крие ключовото значение на баланса между чисто човешката чувствителност към използването на военна сила и трезвата преценка кога тя може да бъде оправдана. И макар международното право формално да легализира войната в два единствени случая – при упълномощаване от Съвета за сигурност на ООН и при самоотбрана, – концепции като R2P и човешка сигурност повдигат ключови въпроси за законността и морала на военното действие и бездействие, независимо от характера на агресията, типа война и специфичните обстоятелства, застрашаващи мира.

Възможно ли е обаче хора без практически и академичен опит да се ориентират в морето от информация и сами да изградят по-задълбочена представа за динамиката на даден конфликт? Тук науката предлага различни механизми за анализ. Един от най-универсалните инструменти, свързани с R2P и доразвити в Доклада на ICISS от 2001 г., се корени в класическата теория за справедливата война, която предлага две отделни категории – jus ad bellum и jus in bello. Първата група критерии се отнася до условията, осигуряващи морално право за използване сила, а втората постулира правилата за ръководене на вече разразили се военни действия.

Jus ad bellum (цитатите са взети и преведени от доклада на ICISS)

• Справедлива кауза (just cause): „За да бъде одобрена [една военна намеса], трябва да е налице сериозна и непоправима вреда, нанесена на човешки същества, или да има непосредствена заплаха от такава вреда.“

• Почтено намерение (right intention): „Главната цел на интервенцията, независимо какви други мотиви може да имат държавите, които участват в нея, трябва да бъде спиране или предотвратяване на човешкото страдание.“

• Последно средство (last resort): „Военната намеса може да бъде оправдана само тогава, когато всяка невоенна алтернатива за предотвратяване или мирно разрешаване на кризата е проучена и изчерпана и са налице разумни основания да се смята, че по-меки мерки не биха постигнали успех.“

• Пропорционални средства (proportional means): „Мащабът, продължителността и интензивността на планираната военна намеса трябва да са минимално необходимите за обезпечаване на поставената хуманитарна цел.“

• Разумни перспективи (reasonable prospects): „Трябва да са налице разумни шансове за успех в спирането или предотвратяването на страданието, подтикнало интервенцията, и да е малко вероятно последиците от действието да бъдат по-лоши от последиците от бездействие.“

• Разрешаващ орган (right authority): „Няма по-добър и по-подходящ орган от Съвета за сигурност на ООН, който да разрешава военна намеса с цел защита на човека.“

Jus in bello

• Разграничаване (discrimination) – разграничаване на военните лица от цивилните граждани с идеята да се приоритизира опазването на мирното население.

• Пропорционалност (proportionality) – гарантиране на минимално унищожение под формата на физически, материални и психологически щети.

• Военна необходимост (military necessity) – фокусиране върху обекти от военностратегическо значение.

Следвайки изложените по-горе критерии, твърдо можем да заявим, че руската инвазия в Украйна грубо погазва установените в международните отношения и право норми за легитимност и легалност – от проблема със справедливостта на каузата и пропорционалността на мащабите на инвазията спрямо предвоенната ситуация, през размера и характера на унищоженията по време на военните действия, до тоталната липса на международна подкрепа. Показателни са масовите гробове, разрушенията на цели градове като Мариупол, огромната бежанска вълна и нарастващите сведения за грабежи, изнасилвания и други военни престъпления, докладвани от независими органи и международни организации като Human Rights Watch.

В противовес на това Украйна се възползва от правото си на самозащита в съответствие с Хартата на ООН и международното право. А що се отнася до международната военна и хуманитарна подкрепа за Украйна, е редно да споменем и един от водещите принципи на ООН – колективната сигурност. Накратко той гласи, че неоправданата проява на агресия в международната система се третира като агресия срещу цялата международна общност, която носи споделена отговорност да възпре агресора и да възстанови мира. И макар ООН да е с вързани с ръце за оторизирането на по-решителни мерки заради руското вето в Съвета за сигурност, помощта на Запада за Украйна е не само морално, но и правно оправдана.

Защо обаче изложените тук понятия са необходими в настоящия момент? Войната в Украйна протича успоредно с не по-малко ожесточена информационна и медийна пропаганда, дезинформация и разпространение на фалшиви новини – инструменти, които замъгляват съзнанието, въздействат на обществените нагласи и влияят на политическите решения в страната. С други думи, информационната война размива представите ни за случващото се в Украйна и отслабва способността ни да различаваме добро и зло. За да противодействаме на това, е нужно да сме наясно с понятията. И да четем повече – не само медийни анализи и доклади на международни организации, но и литература, която ни предоставя по-широк поглед върху корените на войната в Украйна.

Заглавен колаж: © „Тоест“

Източник

Nurturing Continued Growth of Our Oak CT Log

Post Syndicated from Let's Encrypt original https://letsencrypt.org/2022/05/19/database-to-app-tls.html

Let’s Encrypt has been running a Certificate Transparency (CT) log since 2019 as part of our commitment to keeping the Web PKI ecosystem healthy. CT logs have become important infrastructure for an encrypted Web 1, but have a well-deserved reputation for being difficult to operate at high levels of trust: Only 5 organizations run logs that are currently considered to be “qualified.” 2

Our Oak log is the only qualified CT log that runs on an entirely open source stack 3. In the interest of lowering the barrier for other organizations to join the CT ecosystem, we want to cover a few recent changes to Oak that might be helpful to anyone else planning to launch a log based on Google’s Trillian backed by MariaDB:

  • The disk I/O workload of Trillian atop MariaDB is easily mediated by front-end rate limits, and

  • It’s worth the complexity to split each new annual CT log into its own Trillian/MariaDB stack.

This post will update some of the information from the previous post How Let’s Encrypt Runs CT Logs.

Growing Oak While Staying Open Source

Oak runs on a free and open source stack: Google’s Trillian data store, backed by MariaDB, running at Amazon Web Services (AWS) via Amazon’s Relational Database Service (RDS). To our knowledge, Oak is the only trusted CT log without closed-source components 3.

Open Source Stack

Other operators of Trillian have opted to use different databases which segment data differently, but the provided MySQL-compatible datastore has successfully kept up with Let’s Encrypt’s CT log volume (currently above 400 GB per month). The story for scaling Oak atop MariaDB is quite typical for any relational database, though the performance requirements are stringent.

Keeping Oak Qualified

The policies that Certificate Transparency Log operators follow require there to be no significant downtime, in addition to the more absolute and difficult requirement that the logs themselves make no mistakes: Given the append-only nature of Certificate Transparency, seemingly minor data corruption prompts permanent disqualification of the log 4. To minimize the impacts of corruption, as well as for scalability reasons, it’s become normal for CT logs to distribute the certificates they contain in different, smaller individual CT logs, called shards.

Splitting Many Years Of Data Among Many Trees

The Let’s Encrypt Oak CT log is actually made up of many individual CT log shards each named after a period of time: Oak 2020 contains certificates which expired in 2020; Oak 2022 contains certificates which expire in 2022. For ease of reference, we refer to these as “temporal log shards,” though in truth each is an individual CT log sharing the Oak family name.

It is straightforward to configure a single Trillian installation to support multiple CT log shards. Each log shard is allocated storage within the backing database, and the Trillian Log Server can then service requests for all configured logs.

The Trillian database schema is quite compact and easy to understand:

  • Each configured log gets a Tree ID, with metadata in several tables.

  • All log entries – certificates in our case – get a row in LeafData.

  • Entries that haven’t been sequenced yet get a row in the table Unsequenced, which is normally kept empty by the Trillian Log Signer service.

  • Once sequenced, entries are removed from the Unsequenced table and added as a row in SequencedLeafData.

Database Layout

In a nutshell: No matter how many different certificate transparency trees and subtrees you set up for a given copy of Trillian, all of them will store the lion’s share of their data, particularly the DER-encoded certificates themselves, interwoven into the one LeafData table. Since Trillian Log Server can only be configured with a single MySQL connection URI, limiting it to a single database, that single table can get quite big.

For Oak, the database currently grows at a rate of about 400 GB per month; that rate is ever-increasing as the use of TLS grows and more Certificate Authorities submit their certificates to our logs.

Amazon RDS Size Limitations

In March 2021 we discovered that Amazon RDS has a 16TB limit per tablespace when RDS is configured to use one file-per-table, as we were doing for all of our CT log shards. Luckily, we reached this limit first in our testing environment, the Testflume log.

Part of Testflume’s purpose was to grow ahead of the production logs in total size, as well as test growth with more aggressive configuration options than the production Oak log had, and in these ways it was highly successful.

Revisiting Database Design

In our blog post, How Let’s Encrypt Runs CT Logs, we wrote that each year we planned “to freeze the previous year’s shard and move it to a less expensive serving infrastructure, reclaiming its storage for our live shards.” However, that is not practical while continuing to serve traffic from the same database instance. Deleting terabytes of rows from an InnoDB table that is in-use is not feasible. Trillian’s MySQL-compatible storage backend agrees: as implemented, Trillian’s built-in Tree Deletion mechanism marks a tree as “soft deleted," and leaves the removal of data from the LeafData table (and others) as an exercise for the administrator.

Since Trillian’s MySQL-compatible backend does not support splitting the LeafData among multiple tables by itself, and since deleting stale data from those tables yields slow performance across the whole database server, to continue to scale the Oak CT log we have to instead prune out the prior seasons’ data another way.

Single RDS Instance with Distinct Schema per Log Shard

We considered adding new database schemas to our existing MariaDB-backed Amazon RDS instance. In this design, we would run a Trillian CT Front-End (CTFE) instance per temporal log shard, each pointing to individual Trillian Log Server and Log Signer instances, which themselves point to a specific temporally-identified database schema name and tablespace. This is cost-effective, and it gives us ample room to avoid the 16 TB limit.

Distinct Schema per Log Shard in a Single Database

However, if heavy maintenance is required on any part of the underlying database, it would affect every log shard contained within. In particular, we know from using MariaDB with InnoDB inside the Let’s Encrypt CA infrastructure that truncating and deleting a multi-terabyte table causes performance issues for the whole database while the operation runs. Inside the CA infrastructure we mitigate that performance issue by deleting table data only on database replicas; this is more complicated in a more hands-off managed hosting environment like RDS.

Since we wish to clear out old data regularly as a matter of data hygiene, and the performance requirements for a CT log are strict, this option wasn’t feasible.

Distinct RDS Instance per Log Shard

While it increases the number of managed system components, it is much cleaner to give each temporal log shard its own database instance. Like the Distinct Schema per Log Shard model, we now run Trillian CTFE, Log Server, and Log Signer instances for each temporal log shard. However, each log shard gets its own RDS instance for the active life of the log 5. At log shutdown, the RDS instance is simply deprovisioned.

Using Distinct Databases Per Log

With the original specifications for the Oak log, this would require allocating a significant amount of data I/O resources. However, years of experience running the Testflume log showed that Trillian in AWS did not require the highest possible disk performance.

Tuning IOPS

We launched Oak using the highest performance AWS Elastic Block Storage available at the time: Provisioned IOPS SSDs (type io1). Because of the strict performance requirements on CT logs, we worried that without the best possible performance for disk I/O that latency issues might crop up that could lead to disqualification. As we called out in our blog post How Let’s Encrypt Runs CT Logs, we hoped that we could use a simpler storage type in the future.

To test that, we used General Purpose SSD storage type (type gp2) for our testing CT log, Testflume, and obtained nominal results over the lifespan of the log. In practice higher performance was unnecessary because Trillian makes good use of database indices. Downloading the whole log tree from the first leaf entry is the most significant demand of disk I/O, and that manner of operation is easily managed via rate limits at the load balancer layer.

Our 2022 and 2023 Oak shards now use type gp2 storage and are performing well.

Synergistically, the earlier change to run a distinct RDS instance for each temporal log shard has also further reduced Trillian’s I/O load: A larger percentage of the trimmed-down data fits in MariaDB’s in-memory buffer pool.

More Future Improvements

It’s clear that CT logs will continue to accelerate their rate of growth. Eventually, if we remain on this architecture, even a single year’s CT log will exceed the 16 TB table size limit. In advance of that, we’ll have to take further actions. Some of those might be:

  • Change our temporal log sharding strategy to shorter-than-year intervals, perhaps every 3 or 6 months.

  • Reduce the absolute storage requirements for Trillian’s MySQL-compatible storage backend by de-duplicating intermediate certificates.

  • Contribute a patch to add table sharding to Trillian’s MySQL-compatible storage backend.

  • Change storage backends entirely, perhaps to a sharding-aware middleware, or another more horizontally-scalable open-source system.

We’ve also uprooted our current Testflume CT log and brought online a replacement which we’ve named Sapling. As before, this test-only log will evaluate more aggressive configurations that might bear fruit in the future.

As Always, Scaling Data Is The Hard Part

Though the performance requirements for CT logs are strict, the bulk of the scalability difficulty has to do with the large amount of data and the high and ever-increasing rate of growth; this is the way of relational databases. Horizontal scaling continues to be the solution, and is straightforward to apply to the open source Trillian and MariaDB stack.

Supporting Let’s Encrypt

As a nonprofit project, 100% of our funding comes from contributions from our community of users and supporters. We depend on their support in order to provide our services for the public benefit. If your
company or organization would like to sponsor Let’s Encrypt please email us at [email protected]. If you can support us with a donation, we ask that you make an individual contribution.


  1. Chrome and Safari check that certificates include evidence that certificates were submitted to CT logs. If a certificate is lacking that evidence, it won’t be trusted. https://certificate.transparency.dev/useragents/ ↩︎

  2. As of publication, these organizations have logs Google Chrome considers qualified for Certificate Authorities to embed their signed timestamps: Cloudflare, DigiCert, Google, Let’s Encrypt, and Sectigo. https://ct.cloudflare.com/logs ↩︎

  3. DigiCert’s Yeti CT log deployment at AWS uses a custom Apache Cassandra backend; Oak is the only production log using the Trillian project’s MySQL-compatible backend. SSLMate maintains a list of known log software at https://sslmate.com/labs/ct_ecosystem/ecosystem.html ↩︎

  4. In the recent past, a cosmic ray event led to the disqualification of a CT log. Andrew Ayer has a good discussion of this in his post “How Certificate Transparency Logs Fail and Why It’s OK” https://www.agwa.name/blog/post/how_ct_logs_fail, which references the discovery on the ct-policy list https://groups.google.com/a/chromium.org/g/ct-policy/c/PCkKU357M2Q/m/xbxgEXWbAQAJ↩︎

  5. Logs remain online for a period after they stop accepting new entries to give a grace period for mirrors and archive activity. ↩︎

Nurturing Continued Growth of Our Oak CT Log

Post Syndicated from Let's Encrypt original https://letsencrypt.org/2022/05/19/nurturing-ct-log-growth.html

Let’s Encrypt has been running a Certificate Transparency (CT) log since 2019 as part of our commitment to keeping the Web PKI ecosystem healthy. CT logs have become important infrastructure for an encrypted Web 1, but have a well-deserved reputation for being difficult to operate at high levels of trust: Only 6 organizations run logs that are currently considered to be “qualified.” 2

Our Oak log is the only qualified CT log that runs on an entirely open source stack 3. In the interest of lowering the barrier for other organizations to join the CT ecosystem, we want to cover a few recent changes to Oak that might be helpful to anyone else planning to launch a log based on Google’s Trillian backed by MariaDB:

  • The disk I/O workload of Trillian atop MariaDB is easily mediated by front-end rate limits, and

  • It’s worth the complexity to split each new annual CT log into its own Trillian/MariaDB stack.

This post will update some of the information from the previous post How Let’s Encrypt Runs CT Logs.

Growing Oak While Staying Open Source

Oak runs on a free and open source stack: Google’s Trillian data store, backed by MariaDB, running at Amazon Web Services (AWS) via Amazon’s Relational Database Service (RDS). To our knowledge, Oak is the only trusted CT log without closed-source components 3.

Open Source Stack

Other operators of Trillian have opted to use different databases which segment data differently, but the provided MySQL-compatible datastore has successfully kept up with Let’s Encrypt’s CT log volume (currently above 400 GB per month). The story for scaling Oak atop MariaDB is quite typical for any relational database, though the performance requirements are stringent.

Keeping Oak Qualified

The policies that Certificate Transparency Log operators follow require there to be no significant downtime, in addition to the more absolute and difficult requirement that the logs themselves make no mistakes: Given the append-only nature of Certificate Transparency, seemingly minor data corruption prompts permanent disqualification of the log 4. To minimize the impacts of corruption, as well as for scalability reasons, it’s become normal for CT logs to distribute the certificates they contain in different, smaller individual CT logs, called shards.

Splitting Many Years Of Data Among Many Trees

The Let’s Encrypt Oak CT log is actually made up of many individual CT log shards each named after a period of time: Oak 2020 contains certificates which expired in 2020; Oak 2022 contains certificates which expire in 2022. For ease of reference, we refer to these as “temporal log shards,” though in truth each is an individual CT log sharing the Oak family name.

It is straightforward to configure a single Trillian installation to support multiple CT log shards. Each log shard is allocated storage within the backing database, and the Trillian Log Server can then service requests for all configured logs.

The Trillian database schema is quite compact and easy to understand:

  • Each configured log gets a Tree ID, with metadata in several tables.

  • All log entries – certificates in our case – get a row in LeafData.

  • Entries that haven’t been sequenced yet get a row in the table Unsequenced, which is normally kept empty by the Trillian Log Signer service.

  • Once sequenced, entries are removed from the Unsequenced table and added as a row in SequencedLeafData.

Database Layout

In a nutshell: No matter how many different certificate transparency trees and subtrees you set up for a given copy of Trillian, all of them will store the lion’s share of their data, particularly the DER-encoded certificates themselves, interwoven into the one LeafData table. Since Trillian Log Server can only be configured with a single MySQL connection URI, limiting it to a single database, that single table can get quite big.

For Oak, the database currently grows at a rate of about 400 GB per month; that rate is ever-increasing as the use of TLS grows and more Certificate Authorities submit their certificates to our logs.

Amazon RDS Size Limitations

In March 2021 we discovered that Amazon RDS has a 16TB limit per tablespace when RDS is configured to use one file-per-table, as we were doing for all of our CT log shards. Luckily, we reached this limit first in our testing environment, the Testflume log.

Part of Testflume’s purpose was to grow ahead of the production logs in total size, as well as test growth with more aggressive configuration options than the production Oak log had, and in these ways it was highly successful.

Revisiting Database Design

In our blog post, How Let’s Encrypt Runs CT Logs, we wrote that each year we planned “to freeze the previous year’s shard and move it to a less expensive serving infrastructure, reclaiming its storage for our live shards.” However, that is not practical while continuing to serve traffic from the same database instance. Deleting terabytes of rows from an InnoDB table that is in-use is not feasible. Trillian’s MySQL-compatible storage backend agrees: as implemented, Trillian’s built-in Tree Deletion mechanism marks a tree as “soft deleted," and leaves the removal of data from the LeafData table (and others) as an exercise for the administrator.

Since Trillian’s MySQL-compatible backend does not support splitting the LeafData among multiple tables by itself, and since deleting stale data from those tables yields slow performance across the whole database server, to continue to scale the Oak CT log we have to instead prune out the prior seasons’ data another way.

Single RDS Instance with Distinct Schema per Log Shard

We considered adding new database schemas to our existing MariaDB-backed Amazon RDS instance. In this design, we would run a Trillian CT Front-End (CTFE) instance per temporal log shard, each pointing to individual Trillian Log Server and Log Signer instances, which themselves point to a specific temporally-identified database schema name and tablespace. This is cost-effective, and it gives us ample room to avoid the 16 TB limit.

Distinct Schema per Log Shard in a Single Database

However, if heavy maintenance is required on any part of the underlying database, it would affect every log shard contained within. In particular, we know from using MariaDB with InnoDB inside the Let’s Encrypt CA infrastructure that truncating and deleting a multi-terabyte table causes performance issues for the whole database while the operation runs. Inside the CA infrastructure we mitigate that performance issue by deleting table data only on database replicas; this is more complicated in a more hands-off managed hosting environment like RDS.

Since we wish to clear out old data regularly as a matter of data hygiene, and the performance requirements for a CT log are strict, this option wasn’t feasible.

Distinct RDS Instance per Log Shard

While it increases the number of managed system components, it is much cleaner to give each temporal log shard its own database instance. Like the Distinct Schema per Log Shard model, we now run Trillian CTFE, Log Server, and Log Signer instances for each temporal log shard. However, each log shard gets its own RDS instance for the active life of the log 5. At log shutdown, the RDS instance is simply deprovisioned.

Using Distinct Databases Per Log

With the original specifications for the Oak log, this would require allocating a significant amount of data I/O resources. However, years of experience running the Testflume log showed that Trillian in AWS did not require the highest possible disk performance.

Tuning IOPS

We launched Oak using the highest performance AWS Elastic Block Storage available at the time: Provisioned IOPS SSDs (type io1). Because of the strict performance requirements on CT logs, we worried that without the best possible performance for disk I/O that latency issues might crop up that could lead to disqualification. As we called out in our blog post How Let’s Encrypt Runs CT Logs, we hoped that we could use a simpler storage type in the future.

To test that, we used General Purpose SSD storage type (type gp2) for our testing CT log, Testflume, and obtained nominal results over the lifespan of the log. In practice higher performance was unnecessary because Trillian makes good use of database indices. Downloading the whole log tree from the first leaf entry is the most significant demand of disk I/O, and that manner of operation is easily managed via rate limits at the load balancer layer.

Our 2022 and 2023 Oak shards now use type gp2 storage and are performing well.

Synergistically, the earlier change to run a distinct RDS instance for each temporal log shard has also further reduced Trillian’s I/O load: A larger percentage of the trimmed-down data fits in MariaDB’s in-memory buffer pool.

More Future Improvements

It’s clear that CT logs will continue to accelerate their rate of growth. Eventually, if we remain on this architecture, even a single year’s CT log will exceed the 16 TB table size limit. In advance of that, we’ll have to take further actions. Some of those might be:

  • Change our temporal log sharding strategy to shorter-than-year intervals, perhaps every 3 or 6 months.

  • Reduce the absolute storage requirements for Trillian’s MySQL-compatible storage backend by de-duplicating intermediate certificates.

  • Contribute a patch to add table sharding to Trillian’s MySQL-compatible storage backend.

  • Change storage backends entirely, perhaps to a sharding-aware middleware, or another more horizontally-scalable open-source system.

We’ve also uprooted our current Testflume CT log and brought online a replacement which we’ve named Sapling. As before, this test-only log will evaluate more aggressive configurations that might bear fruit in the future.

As Always, Scaling Data Is The Hard Part

Though the performance requirements for CT logs are strict, the bulk of the scalability difficulty has to do with the large amount of data and the high and ever-increasing rate of growth; this is the way of relational databases. Horizontal scaling continues to be the solution, and is straightforward to apply to the open source Trillian and MariaDB stack.

Supporting Let’s Encrypt

As a nonprofit project, 100% of our funding comes from contributions from our community of users and supporters. We depend on their support in order to provide our services for the public benefit. If your
company or organization would like to sponsor Let’s Encrypt please email us at [email protected]. If you can support us with a donation, we ask that you make an individual contribution.


  1. Chrome and Safari check that certificates include evidence that certificates were submitted to CT logs. If a certificate is lacking that evidence, it won’t be trusted. https://certificate.transparency.dev/useragents/ ↩︎

  2. As of publication, these organizations have logs Google Chrome considers qualified for Certificate Authorities to embed their signed timestamps: Cloudflare, DigiCert, Google, Let’s Encrypt, Sectigo, and TrustAsia. https://ct.cloudflare.com/logs and https://twitter.com/__agwa/status/1527407151660122114 ↩︎

  3. DigiCert’s Yeti CT log deployment at AWS uses a custom Apache Cassandra backend; Oak is the only production log using the Trillian project’s MySQL-compatible backend. SSLMate maintains a list of known log software at https://sslmate.com/labs/ct_ecosystem/ecosystem.html ↩︎

  4. In the recent past, a cosmic ray event led to the disqualification of a CT log. Andrew Ayer has a good discussion of this in his post “How Certificate Transparency Logs Fail and Why It’s OK” https://www.agwa.name/blog/post/how_ct_logs_fail, which references the discovery on the ct-policy list https://groups.google.com/a/chromium.org/g/ct-policy/c/PCkKU357M2Q/m/xbxgEXWbAQAJ↩︎

  5. Logs remain online for a period after they stop accepting new entries to give a grace period for mirrors and archive activity. ↩︎

Analyze Amazon Ion datasets using Amazon Athena

Post Syndicated from Pathik Shah original https://aws.amazon.com/blogs/big-data/analyze-amazon-ion-datasets-using-amazon-athena/

Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon Simple Storage Service (Amazon S3) using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run.

Amazon Ion is a richly typed, self-describing, hierarchical data serialization format offering interchangeable binary and text representations. The text format extends JSON (meaning all JSON files are valid Ion files), and is easy to read and author, supporting rapid prototyping. The binary representation is efficient to store, transmit, and skip-scan parse. The rich type system provides unambiguous semantics for long-term preservation of data that can survive multiple generations of software evolution.

Athena now supports querying and writing data in Ion format. The Ion format is currently used by internal Amazon teams, by external services such as Amazon Quantum Ledger Database (Amazon QLDB) and Amazon DynamoDB (which can be exported into Ion), and in the open-source SQL query language PartiQL.

In this post, we discuss use cases and the unique features Ion offers, followed by examples of querying Ion with Athena. For demonstration purposes, we use the transformed version of the City Lots San Francisco dataset.

Features of Ion

In this section, we discuss some of the unique features that Ion offers:

  • Type system
  • Dual format
  • Efficiency gains
  • Skip scanning

Type system

Ion extends JSON, adding support for more precise data types to improve interpretability, simplify processing, and avoid rounding errors. These high precision numeric types are essential for financial services, where fractions of a cent on every transaction add up. Data types that are added are arbitrary-size integers, binary floating-point numbers, infinite-precision decimals, timestamps, CLOBS, and BLOBS.

Dual format

Users can be presented with a familiar text-based representation while benefiting from the performance efficiencies of a binary format. The interoperability between the two formats enables you to rapidly discover, digest, and interpret data in a familiar JSON-like representation, while underlying applications benefit from a reduction in storage, memory, network bandwidth, and latency from the binary format. This means you can write plain text queries that run against both text-based and binary-based Ion. You can rewrite parts of your data in text-based Ion when you need human readable data during development and switch to binary in production.

When debugging a process, the ability for systems engineers to locate data and understand it as quickly as possible is vital. Ion provides mechanisms to move between binary and a text-based representation, optimizing for both the human and the machine. Athena supports querying and writing data in both of these Ion formats. The following is an example Ion text document taken from the transformed version of the citylots dataset:

{ "type": "Feature"
, "properties": { "MAPBLKLOT": "0004002"
                 ,"BLKLOT": "0004002"
                 ,"BLOCK_NUM": "0004"
                 , "LOT_NUM": "002"
                 , "FROM_ST": "0"
                 , "TO_ST": "0"
                 , "STREET": "UNKNOWN"
                 , "ST_TYPE": null
                 , "ODD_EVEN": "E" }
, "geometry": { "type": "Polygon"
               , "coordinates": [ [ [ -122.415701204606876, 37.808327252671461, 0.0 ],
                                    [ -122.415760743593196, 37.808630700240904, 0.0 ],
                                    [ -122.413787891332404, 37.808566801319841, 0.0 ],
                                    [ -122.415701204606876, 37.808327252671461, 0.0 ] ] ] } }

Efficiency gains

Binary-encoded Ion reduces file size by moving repeated values, such as field names, into a symbol table. Symbol tables reduce CPU and read latency by limiting the validation of character encoding to the single instance of the value in the symbol table.

For example, a company that operates at Amazon’s scale can produce large volumes of application logs. When compressing Ion and JSON logs, we noticed approximately 35% less CPU time to compress the log, which produced an average of roughly 26% smaller files. Log files are critical when needed but costly to retain, so the reduction in file sizes combined with the read performance gains from symbol tables helps when handling these logs. The following is an example of file size reduction with the citylots JSON dataset when converted to Ion binary with GZIP and ZSTD compression:

77MB    citylots.ion
 17MB    citylots.ion.gz
 15MB    citylots.ion.zst
181MB    citylots.json
 22MB    citylots.json.gz
 18MB    citylots.json.zst

Skip-scanning

In a textual format, every byte must be read and interpreted, but because Ion’s binary format is a TLV (type-length-value) encoding, an application may skip over elements that aren’t needed. This reduces query and application processing costs correlated with the proportion of unexamined fields.

For example, forensic analysis of application log data involves reading large volumes of data where only a fraction of the data is needed for diagnosis. In these scenarios, skip-scanning allows the binary Ion reader to move past irrelevant fields without the cost of reading the element stored within a field. This results in users experiencing lower resource usage and quicker response times.

Query Ion datasets using Athena

Athena now supports querying and creating Ion-formatted datasets via an Ion-specific SerDe, which in conjunction with IonInputFormat and IonOutputFormat allows you to read and write valid Ion data. Deserialization allows you to run SELECT queries on the Ion data so that it can be queried to gain insights. Serialization through CTAS or INSERT INTO queries allows you to copy datasets from existing tables’ values or generate new data in the Ion format.

The interchangeable nature of Ion text and Ion binary means that Athena can read datasets that contain both types of files. Because Ion is a superset of JSON, a table using the Ion SerDe can also include JSON files. Unlike the JSON SerDe, where every new line character indicates a new row, the Ion SerDe uses a combination of closing brackets and new line characters to determine new rows. This means that if each JSON record in your source documents isn’t on a single line, these files can now be read in Athena via the Ion SerDe.

Create external tables

Athena supports querying Ion-based datasets by defining AWS Glue tables with the user-defined metadata. Let’s start with an example of creating an external table for a dataset stored in Ion text. The following is a sample row from the citylots dataset:

{
    type:"Feature",
    properties:{
        mapblklot:"0579021",
        blklot:"0579024",
        block_num:"0579",
        lot_num:"024",
        from_st:"2160",
        to_st:"2160",
        street:"PACIFIC",
        st_type:"AVE",
        odd_even:"E"
    },
    geometry:{
        type:"Polygon",coordinates:[[[-122.4308798855922, ...]]]
    }
}

To create an external table that has its data stored in Ion, you have two syntactic options.

First, you can specify STORED AS ION. This is a more concise method, and is best used for simple cases, when no additional properties are required. See the following code:

CREATE EXTERNAL TABLE city_lots_ion1 (
  type STRING, 
  properties struct<
    mapblklot:string,
    blklot:string,
    block_num:string,
    lot_num:string,
    from_st:string,
    to_st:string,
    street:string,
    st_type:string,
    odd_even:string>, 
  geometry struct<
    type:string,
    coordinates:array<array<array<decimal(18,15)>>>,
    multi_coordinates:array<array<array<array<decimal(18,15)>>>>>
)
STORED AS ION
LOCATION 's3://aws-bigdata-blog/artifacts/athena-ion-blog/city_lots_ion_binary/'

Alternatively, you can explicitly specify the Ion classpaths in ROW FORMAT SERDE, INPUTFORMAT, and OUTPUTFORMAT. Unlike the first method, you can specify a SERDEPROPERTIES clause here. In our example DDL, we added a SerDe property that allows values that are outside of the Hive data type ranges to overflow rather than fail the query:

CREATE EXTERNAL TABLE city_lots_ion2(
  type STRING, 
  properties struct<
    mapblklot:string,
    blklot:string,
    block_num:string,
    lot_num:string,
    from_st:string,
    to_st:string,
    street:string,
    st_type:string,
    odd_even:string>, 
  geometry struct<
    type:string,
    coordinates:array<array<array<decimal(18,15)>>>,
    multi_coordinates:array<array<array<array<decimal(18,15)>>>>>
)
ROW FORMAT SERDE 
  'com.amazon.ionhiveserde.IonHiveSerDe'
WITH SERDEPROPERTIES (
 'ion.fail_on_overflow'='false'
 )
STORED AS INPUTFORMAT 
  'com.amazon.ionhiveserde.formats.IonInputFormat' 
OUTPUTFORMAT 
  'com.amazon.ionhiveserde.formats.IonOutputFormat'
LOCATION
  's3://aws-bigdata-blog/artifacts/athena-ion-blog/city_lots_ion_binary/'

Athena converts STORED AS ION into the explicit classpaths, so both tables look similar in the metastore. If we look in AWS Glue, we see both tables we just created have the same input format, output format, and SerDe serialization library.

Now that our table is created, we can run standard SELECT queries on the city_lots_ion table. Let’s run a query that specifies the block_num from our example row of Ion data to verify that we can read from the table:

-- QUERY
SELECT * FROM city_lots_ion1 WHERE properties.block_num='0579';

The following screenshot shows our results.

Use path extraction to read from specific fields

Athena supports further customization of how data is interpreted via SerDe properties. To specify these, you can add a WITH SERDEPROPERTIES clause, which is a subfield of the ROW FORMAT SERDE field.

In some situations, we may only care about some parts of the information. Let’s suppose we don’t want any of the geometry info from the citylots dataset, and only need a few of the fields in properties. One solution is to specify a search path using the path extractor SerDe property:

-- Path Extractor property
ion.<column>.path_extractor = <search path>

Path extractors are search paths that Athena uses to map the table columns to locations in the individual document. Full information on what can be done with path extractors is available on GitHub, but for our example, we focus on creating simple paths that use the names of each field as an index. In this case, the search path takes the form of a space-delimited set of indexes (and wraps it with parentheses) that indicate the location of each desired piece of information. We map the search paths to table columns by using the path extractor property.

By default, Athena builds path extractors dynamically based on column names unless overridden. This means that when we run our SELECT query on our city_lots_ion1 table, Athena builds the following search paths:

Default Extractors generated by Athena for city_lots_ion1.
-- Extracts the 'type' field to the 'type' column
    'ion.type.path_extractor' = '(type)'

-- Extracts the 'properties' field to the 'properties' column
    'ion.properties.path_extractor' = '(properties)'

-- Extracts the 'geometry' field to the 'geometry' column
    'ion.geometry.path_extractor' = '(geometry)'

Assuming we only care about the block and lot information from the properties struct, and the geometry type from the geometry struct, we can build search paths that map the desired fields from the row of data to table columns. First let’s build the search paths:

(properties mapblklot) - Search path for the mapblklot field in the properties struct
(properties blklot) - Search path for the blklot field in the properties struct
(properties block_num) - Search path for the block_num field in the properties struct
(properties lot_num) - Search path for the lot_num field in the properties struct
(geometry type) - Search path for the type field in the geometry struct

Now let’s map these search paths to table columns using the path extractor SerDe property. Because the search paths specify where to look for data, we are able to flatten and rename our datasets to better serve our purpose. For this example, let’s rename the mapblklot field to map_block_lot, blklot to block_lot, and the geometry type to shape:

 'ion.map_block_lot.path_extractor' = '(properties mapblklot)'
 'ion.block_lot.path_extractor' = '(properties blklot)'
 'ion.block_num.path_extractor' = '(properties block_num)'
 'ion.lot_num.path_extractor' = '(properties lot_num)'
 'ion.shape.path_extractor' = '(geometry type)'

Let’s put all of this together and create the city_blocks table:

CREATE EXTERNAL TABLE city_blocks (
    map_block_lot STRING,
    block_lot STRING,
    block_num STRING,
    lot_num STRING,
    shape STRING
)
ROW FORMAT SERDE
 'com.amazon.ionhiveserde.IonHiveSerDe'
WITH SERDEPROPERTIES (
 'ion.map_block_lot.path_extractor' = '(properties mapblklot)',
 'ion.block_lot.path_extractor' = '(properties blklot)', 
 'ion.block_num.path_extractor' = '(properties block_num)',
 'ion.lot_num.path_extractor' = '(properties lot_num)',
 'ion.shape.path_extractor' = '(geometry type)'
 )
STORED AS ION
LOCATION 's3://aws-bigdata-blog/artifacts/athena-ion-blog/city_lots_ion_binary/'

Now we can run a select query on the city_blocks table, and see the results:

-- Select Query
SELECT * FROM city_blocks WHERE block_num='0579';

Utilizing search paths in this way enables skip-scan parsing when reading from Ion binary files, which allows Athena to skip over the unneeded fields and reduces the overall time it takes to run the query.

Use CTAS and UNLOAD for data transformation

Athena supports CREATE TABLE AS SELECT (CTAS), which creates a new table in Athena from the results of a SELECT statement from another query. Athena also supports UNLOAD, which writes query results to Amazon S3 from a SELECT statement to the specified data format.

Both CTAS and UNLOAD have a property to specify a format and a compression type. This allows you to easily convert Ion datasets to other data formats, such as Parquet or ORC, and vice versa, without needing to set up a complex extract, transform, and load (ETL) job. This is beneficial for situations when you want to transform your data, or know you will run repeated queries on a subset of your data and want to use some of the benefits inherent to columnar formats. Combining it with path extractors is especially helpful, because we’re only storing the data that we need in the new format.

Let’s use CTAS to convert our city_blocks table from Ion to Parquet, and compress it via GZIP. Because we have path extractors set up for the city_blocks table, we only need to convert a small portion of the original dataset:

CREATE TABLE city_blocks_parquet_gzip
WITH (format = 'PARQUET', write_compression='GZIP')
AS SELECT * FROM city_blocks;

We can now run queries against the city_block_parquet_gzip table, and should see the same result. To test this out, let’s run the same SELECT query we ran before on the Parquet table:

SELECT * FROM city_blocks_parquet_gzip WHERE block_num='0579';

When converting tables from another format to Ion, Athena supports the following compression codecs: ZSTD, BZIP2, GZIP, SNAPPY, and NONE. In addition to adding Ion as a new format for CTAS, we added the ion_encoding property, which allows you to choose whether the output files are created in Ion text or Ion binary. This allows for serialization of data from other formats back into Ion.

Let’s convert the original city_lots JSON file back to Ion, but this time we specify that we want to use ZSTD compression and a binary encoding.

The JSON file can be found at following location: s3://aws-bigdata-blog/artifacts/athena-ion-blog/city_lots_json/

Because Ion is a superset of JSON, we can use the Ion SerDe to read this file:

CREATE EXTERNAL TABLE city_blocks_json_ion_serde (
    map_block_lot STRING,
    block_lot STRING,
    block_num STRING,
    lot_num STRING,
    shape STRING
)
ROW FORMAT SERDE
'com.amazon.ionhiveserde.IonHiveSerDe'
WITH SERDEPROPERTIES (
'ion.map_block_lot.path_extractor' = '(properties mapblklot)',
'ion.block_lot.path_extractor' = '(properties blklot)',
'ion.block_num.path_extractor' = '(properties block_num)',
'ion.lot_num.path_extractor' = '(properties lot_num)',
'ion.shape.path_extractor' = '(geometry type)'
)
STORED AS ION
LOCATION 's3://aws-bigdata-blog/artifacts/athena-ion-blog/city_lots_json/'

Now let’s copy this table into our desired Ion binary form:

CREATE TABLE city_blocks_ion_zstd
WITH (format = 'ION', write_compression='ZSTD', ion_encoding='BINARY')
AS SELECT * FROM city_blocks_parquet_gzip

Finally, let’s run our verification SELECT statement to verify everything was created properly:

SELECT * FROM city_blocks_ion_zstd WHERE block_num='0579'; 

Use UNLOAD to store Ion data in Amazon S3

Sometimes we just want to reformat the data and don’t need to store the additional metadata to query the table. In this case, we can use UNLOAD, which stores the results of the query in the specified format in an S3 bucket.

Let’s test it out, using UNLOAD to convert the drivers_names table from Ion to ORC, compress it via ZLIB, and store it to an S3 bucket:

UNLOAD (SELECT * FROM city_blocks_ion_zstd WHERE block_num='0579') 
TO 's3://<your-s3-bucket>/athena-ion-blog/unload/orc_zlib/'
WITH (format = 'ORC', compression='ZLIB')

When you check in Amazon S3, you can find a new file in the ORC format.

Conclusion

This post talked about the new feature in Athena that allows you to query and create Ion datasets using standard SQL. We discussed use cases and unique features of the Ion format like type system, dual formats (Ion text and Ion binary), efficiency gains, and skip-scanning. You can get started with querying an Ion dataset stored in Amazon S3 by simply creating a table in Athena, and also converting existing datasets to Ion format and vice versa using CTAS and UNLOAD statements.

To learn more about querying Ion using Athena, refer to Amazon Ion Hive SerDe.

References


About the Authors

Pathik Shah is a Sr. Big Data Architect on Amazon Athena. He joined AWS in 2015 and has been focusing in the big data analytics space since then, helping customers build scalable and robust solutions using AWS analytics services.

Jacob Stein works on the Amazon Athena team as a Software Development Engineer. He led the project to add support for Ion in Athena. He loves working on technical problems unique to internet scale data, and is passionate about developing scalable solutions for distributed systems.

Giovanni Matteo Fumarola is the Engineering Manager of the Athena Data Lake and Storage team. He is an Apache Hadoop Committer and PMC member. He has been focusing in the big data analytics space since 2013.

Pete Ford is a Sr. Technical Program Manager at Amazon.

LWN is hiring

Post Syndicated from original https://lwn.net/Articles/895695/

LWN does its best to provide comprehensive coverage of the free-software
development community, but there is far more going on than our small staff
can handle. When expressed that way, this problem suggests an obvious
solution: make the staff bigger. Thus, LWN is looking to hire a
writer/editor.

AWS Backup Now Supports Amazon FSx for NetApp ONTAP

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-backup-now-supports-amazon-fsx-for-netapp-ontap/

If you are a long-time reader of this blog, you know that I categorize some posts as “chocolate and peanut butter” in homage to an ancient (1970 or so) series of TV commercials for Reese’s Peanut Butter Cups. Today, I am happy to bring you the latest such post, combining AWS Backup and Amazon FSx for NetApp ONTAP. Before I dive into the specifics, let’s review each service:

AWS Backup helps you to automate and centrally manage your backups (read my post, AWS Backup – Automate and Centrally Manage Your Backups, for a detailed look). After you create policy-driven plans, you can monitor the status of on-going backups, verify compliance, and find/restore backups, all from a central console. We launched in 2019 with support for Amazon EBS volumes, Amazon EFS file systems, Amazon RDS databases, Amazon DynamoDB tables, and AWS Storage Gateway volumes. After that, we added support for EC2 instances, Amazon Aurora clusters, Amazon FSx for Lustre and Amazon FSx for Window File Server file systems, Amazon Neptune databases, VMware workloads, Amazon DocumentDB clusters, and Amazon S3.

Amazon FSx for NetApp ONTAP gives you the features, performance, and APIs of NetApp ONTAP file systems with the agility, scalability, security, and resiliency of AWS (again, read my post, New – Amazon FSx for NetApp ONTAP to learn more). ONTAP is an enterprise data management product that is designed to provide high-performance storage suitable for use with Oracle, SAP, VMware, Microsoft SQL Server, and so forth. Each file system supports multi-protocol access and can scale up to 176 PiB, along with inline data compression, deduplication, compaction, thin provisioning, replication, and point-in-time cloning. We launched with a multi-AZ deployment type, and introduced a single-AZ deployment type earlier this year.

Chocolate and Peanut Butter
AWS Backup now supports Amazon FSx for NetApp ONTAP file systems. All of the existing AWS Backup features apply, and you can add this support to an existing backup plan or you can create a new one.

Suppose I have a couple of ONTAP file systems:

I go to the AWS Backup Console and click Create Backup plan to get started:

I decide to Start with a template, and choose Daily-Monthly-1yr-Retention, then click Create plan:

Next, I examine the Resource assignments section of my plan and click Assign resources:

I create a resource assignment (Jeff-ONTAP-Resources), and select the FSx resource type. I can leave the assignment as-is in order to include all of my Amazon FSx volumes in the assignment, or I can uncheck All file systems, and then choose volumes on the file systems that I showed you earlier:

I review all of my choices, and click Assign resources to proceed. My backups will be performed in accord with the backup plan.

I can also create an on-demand backup. To do this, I visit the Protected resources page and click Create on-demand backup:

I choose a volume, set a one week retention period for my on-demand backup, and click Create on-demand backup:

The backup job starts within seconds, and is visible on the Backup jobs page:

After the job completes I can examine the vault and see my backup. Then I can select it and choose Restore from the Actions menu:

To restore the backup, I choose one of the file systems from it, enter a new volume name, and click Restore backup.

Also of Interest
We recently launched two new features for AWS Backup that you may find helpful. Both features can now be used in conjunction with Amazon FSx for ONTAP:

AWS Backup Audit Manager – You can use this feature to monitor and evaluate the compliance status of your backups. This can help you to meet business and regulatory requirements, and lets you generate reports that you can use to demonstrate compliance to auditors and regulators. To learn more, read Monitor, Evaluate, and Demonstrate Backup Compliance with AWS Backup Audit Manager.

AWS Backup Vault Lock – This feature lets you prevent your backups from being accidentally or maliciously deleted, and also enhances protection against ransomware. You can use this feature to make selected backup values WORM (write-once-read-many) compliant. Once you have done this, the backups in the vault cannot be modified manually. You can also set minimum and maximum retention periods for each vault. To learn more, read Enhance the security posture of your backups with AWS Backup Vault Lock.

Available Now
This new feature is available now and you can start using it today in all regions where AWS Backup and Amazon FSx for NetApp ONTAP are available.

Jeff;

Use Amazon Redshift RA3 with managed storage in your modern data architecture

Post Syndicated from Bhanu Pittampally original https://aws.amazon.com/blogs/big-data/use-amazon-redshift-ra3-with-managed-storage-in-your-modern-data-architecture/

Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud. You can start with just a few hundred gigabytes of data and scale to a petabyte or more. This enables you to use your data to acquire new insights for your business and customers.

Over the years, Amazon Redshift has evolved a lot to meet our customer demands. Its journey started as a standalone data warehousing appliance that provided a low-cost, high-performance, cloud-based data warehouse. Support for Amazon Redshift Spectrum compute nodes was later added to extend your data warehouse to data lakes, and the concurrency scaling feature was added to support burst activity and scale your data warehouse to support thousands of queries concurrently. In its latest offering, Amazon Redshift runs on third-generation architecture where storage and compute layers are decoupled and scaled independent of each other. This latest generation powers the several modern data architecture patterns our customers are actively embracing to build flexible and scalable analytics platforms.

When spinning up a new instance of Amazon Redshift, you get to choose either Amazon Redshift Serverless, for when you need a data warehouse that can scale seamlessly and automatically as your demand evolves unpredictably, or you can choose an Amazon Redshift provisioned cluster for steady-state workloads and greater control over your Amazon Redshift cluster’s configuration.

An Amazon Redshift provisioned cluster is a collection of computing resources called nodes, which are organized into a group called a cluster. Each cluster runs the Amazon Redshift engine and contains one or more databases. Creating an Amazon Redshift cluster is the first step in your process of building an Amazon Redshift data warehouse. While launching a provisioned cluster, one option that you specify is the node type. The node type determines the CPU, RAM, storage capacity, and storage drive type for each node.

In this post, we cover the current generation node RA3 architecture, different RA3 node types, important capabilities that are available only on RA3 node types, and how you can upgrade your current Amazon Redshift node types to RA3.

Amazon Redshift RA3 nodes

RA3 nodes with managed storage enable you to optimize your data warehouse by scaling and paying for compute and managed storage independently. RA3 node types are the latest node type for Amazon Redshift. With RA3, you choose the number of nodes based on your performance requirements and only pay for the managed storage that you use. RA3 architecture gives you the ability to size your cluster based on the amount of data you process daily or the amount of data that you want to store in your warehouse; there is no need to account for both storage and processing needs together.

Other node types that we previously offered include the following:

  • Dense compute – DC2 nodes enable you to have compute-intensive data warehouses with local SSD storage included. You choose the number of nodes you need based on data size and performance requirements.
  • Dense storage (deprecated) – DS2 nodes enable you to create large data warehouses using hard disk drives (HDDs). If you’re using the DS2 node type, we strongly recommend that you upgrade to RA3 to get twice as much storage and improved performance for the same on-demand cost.

When you use the RA3 node size and choose your number of nodes, you can provision the compute independent of storage. RA3 nodes are built on the AWS Nitro System and feature high bandwidth networking and large high-performance SSDs as local caches. RA3 nodes use your workload patterns and advanced data management techniques to deliver the performance of local SSD while scaling storage automatically to Amazon Simple Storage Service (Amazon S3).

RA3 node types come in three different sizes to accommodate your analytical workloads. You can quickly start experimenting with the RA3 node type by creating a single-node ra3.xlplus cluster and explore various features that are available. If you’re running a medium-sized data warehouse, you can size your cluster with ra3.4xlarge nodes. For large data warehouses, you can start with ra3.16xlarge. The following table gives more information about RA3 node types and their specifications as of this writing.

Node Type vCPU

RAM

(GiB)

Default Slices Per Node Managed Storage Quota Per Node Node Range with Create Cluster Total Managed Storage Capacity
ra3.xlplus 4 32 2 32 TB 1-16 1024 TB
ra3.4xlarge 12 96 4 128 TB 2-32 8192 TB
ra3.16xlarge 48 384 16 128 TB 2-128 16384 TB

Amazon Redshift with managed storage

Amazon Redshift with a managed storage architecture (RMS) still boasts the same resiliency and industry-leading hardware. With managed storage, Amazon Redshift uses intelligent data prefetching and data evictions based on the temperature of your data. This method helps you decide where to store your most-queried data. Most frequently used blocks (hot data) are cached locally on SSD, and infrequently used blocks (cold data) are stored on an RMS layer backed by Amazon S3. The following diagram depicts the leader node, compute node, and Amazon Redshift managed storage.

In the following sections, we discuss the capabilities that Amazon Redshift RA3 with managed storage can provide.

Independently scale compute and storage

As the scale of an organization grows, data continues to grow—reaching petabytes. The amount of data you ingest into your Amazon Redshift data warehouse also grows. You may be looking for ways to cost-effectively analyze all your data and at the same time have control over choosing the right compute or storage resource at the right time. For customers who are looking to be cost conscientious and cost-effective, the RA3 platform provides the option to scale and pay for your compute and storage resources separately.

With RA3 instances with managed storage, you can choose the number of nodes based on your performance requirements, and only pay for the managed storage that you use. This gives you the flexibility to size your RA3 cluster based on the amount of data you process daily without increasing your storage costs. It allows you to pay per hour for the compute and separately scale your data warehouse storage capacity without adding any additional compute resources and paying only for what you use.

Another benefit of RMS is that Amazon Redshift manages which data should be stored locally for fastest access, and data that is slightly colder is still kept within fast-access reach.

Advanced hardware

RA3 instances use high-bandwidth networking built on the AWS Nitro System to further reduce the time taken for data to be offloaded to and retrieved from Amazon S3. Managed storage uses high-performance SSDs for your hot data and Amazon S3 for your cold data, providing ease of use, cost-effective storage, and fast query performance.

Additional security options

Amazon Redshift managed VPC endpoints enable you to set up a private connection to securely access your Amazon Redshift cluster within your virtual private cloud (VPC) from client applications in another VPC within the same AWS account, another AWS account, or a subnet without using public IPs and without requiring the traffic to traverse across the internet.

The following scenarios describe common reasons to allow access to a cluster using an Amazon Redshift managed VPC endpoint:

  • AWS account A wants to allow a VPC in AWS account B to have access to a cluster
  • AWS account A wants to allow a VPC that is also in AWS account A to have access to a cluster
  • AWS account A wants to allow a different subnet in the cluster’s VPC within AWS account A to have access to a cluster

For information about access options to another VPC, refer to Enable private access to Amazon Redshift from your client applications in another VPC.

Further optimize your workload

In this section, we discuss two ways to further optimize your workload.

AQUA

AQUA (Advanced Query Accelerator) is a new distributed and hardware-accelerated cache that enables Amazon Redshift to run up to 10 times faster than other enterprise cloud data warehouses by automatically boosting certain types of queries. AQUA is available with the ra3.16xlarge, ra3.4xlarge, or ra3.xlplus nodes at no additional charge and with no code changes.

AQUA is an analytics query accelerator for Amazon Redshift that uses custom-designed hardware to speed up queries that scan large datasets. AQUA automatically optimizes query performance on subsets of the data that require extensive scans, filters, and aggregation. With this approach, you can use AQUA to run queries that scan, filter, and aggregate large datasets.

For more information about using AQUA, refer to How to evaluate the benefits of AQUA for your Amazon Redshift workloads.

Concurrency scaling for write operations

With RA3 nodes, you can take advantage of concurrency scaling for write operations, such as extract, transform, and load (ETL) statements. Concurrency scaling for write operations is especially useful when you want to maintain consistent response times when your cluster receives a large number of requests. It improves throughput for write operations contending for resources on the main cluster.

Concurrency scaling supports COPY, INSERT, DELETE, and UPDATE statements. In some cases, you might follow DDL statements, such as CREATE, with write statements in the same commit block. In these cases, the write statements are not sent to the concurrency scaling cluster.

When you accrue credit for concurrency scaling, this credit accrual applies to both read and write operations.

Increased agility to scale compute resources

Elastic resize allows you to scale your Amazon Redshift cluster up and down in minutes to get the performance you need, when you need it. However, there are limits on the nodes that you can add to a cluster. With some RA3 node types, you can increase the number of nodes up to four times the existing count. All RA3 node types support a decrease in the number of nodes to a quarter of the existing count. The following table lists growth and reduction limits for each RA3 node type.

Node Type Growth Limit Reduction Limit
ra3.xlplus 2 times (from 4 to 8 nodes, for example) To a quarter of the number
ra3.4xlarge 4 times (from 4 to 16 nodes, for example) To a quarter of the number (from 16 to 4 nodes, for example)
ra3.16xlarge 4 times (from 4 to 16 nodes, for example) To a quarter of the number (from 16 to 4 nodes, for example)

RA3 node types also have a shorter duration of snapshot restoration time because of the separation of storage and compute.

Improved resiliency

Amazon Redshift employs extensive fault detection and auto remediation techniques in order to maximize the availability of a cluster. With the RA3 architecture, you can enable cluster relocation, which provides additional resiliency by having the ability to relocate a cluster across Availability Zones without losing any data (RPO is zero) or having to change your client applications. The cluster’s endpoint remains the same after the relocation occurs so applications can continue operating without modifications. As the existing cluster fails, a new cluster is created on demand in another Availability Zone so cost of a standby replica cluster is avoided.

Accelerate data democratization

In this section, we share two techniques to accelerate data democratization.

Data sharing

Data sharing provides instant, granular, and high-performance access without copying data and data movement. You can query live data constantly across all consumers on different RA3 clusters in the same AWS account, in a different AWS account, or in a different AWS Region. Data is shared securely and provides governed collaboration. You can provide access in different granularity, including schema, database, tables, views, and user-defined functions.

This opens up various new use cases where you may have one ETL cluster that is producing data and have multiple consumers such as ad-hoc querying, dashboarding, and data science clusters to view the same data. This also enables bi-directional collaboration where groups such as marketing and finance can share data with one another. Queries accessing shared data use the compute resources of the consumer Amazon Redshift cluster and don’t impact the performance of the producer cluster.

For more information about data sharing, refer to Sharing Amazon Redshift data securely across Amazon Redshift clusters for workload isolation.

AWS Data Exchange for Amazon Redshift

AWS Data Exchange for Amazon Redshift enables you to find and subscribe to third-party data in AWS Data Exchange that you can query in an Amazon Redshift data warehouse in minutes. You can also license your data in Amazon Redshift through AWS Data Exchange. Access is automatically granted when a customer subscribes to your data and is automatically revoked when their subscription ends. Invoices are automatically generated, and payments are automatically collected and disbursed through AWS. This feature empowers you to quickly query, analyze, and build applications with third-party data.

For details on how to publish a data product and subscribe to a data product using AWS Data Exchange for Amazon Redshift, refer to New – AWS Data Exchange for Amazon Redshift.

Cross-database queries for Amazon Redshift

Amazon Redshift supports the ability to query across databases in a Redshift cluster. With cross-database queries, you can seamlessly query data from any database in the cluster, regardless of which database you are connected to. Cross-database queries can eliminate data copies and simplify your data organization to support multiple business groups on the same cluster.

One of many use cases where Cross-database query helps you is when data is organized across multiple databases in a Redshift cluster to support multi-tenant configurations. For example, different business groups and teams that own and manage data sets in their specific database in the same data warehouse need to collaborate with other groups. You might want to perform common ETL staging and processing while your raw data is spread across multiple databases. Organizing data in multiple Redshift databases is also a common scenario when migrating from traditional data warehouse systems. With cross-database queries, you can now access data from any of the databases on the Redshift cluster without having to connect to that specific database. You can also join data sets from multiple databases in a single query

You can read more about cross-database queries here.

Upgrade to RA3

You can upgrade to RA3 instances within minutes no matter the size of your current Amazon Redshift clusters. Simply take a snapshot of your cluster and restore it to a new RA3 cluster. For more information, refer to Upgrading to RA3 node types.

You can also simplify your migration efforts with Amazon Redshift Simple Replay. For more information, refer to Simplify Amazon Redshift RA3 migration evaluation with Simple Replay utility.

Summary

In this post, we talked about the RA3 node types, the benefits of Amazon Redshift managed storage, and the additional capabilities that you get by using Amazon Redshift RA3 with managed storage. Migrating to RA3 node types isn’t a complicated effort, you can get started today.


About the Authors

Bhanu Pittampally is an Analytics Specialist Solutions Architect based out of Dallas. He specializes in building analytic solutions. His background is in data warehouses—architecture, development, and administration. He has been in the data and analytics field for over 13 years.

Jason Pedreza is an Analytics Specialist Solutions Architect at AWS with data warehousing experience handling petabytes of data. Prior to AWS, he built data warehouse solutions at Amazon.com. He specializes in Amazon Redshift and helps customers build scalable analytic solutions.

AWS Security Profile: Ely Kahn, Principal Product Manager for AWS Security Hub

Post Syndicated from Maddie Bacon original https://aws.amazon.com/blogs/security/aws-security-profile-ely-kahn-principal-product-manager-for-aws-security-hub/

In the AWS Security Profile series, I interview some of the humans who work in Amazon Web Services Security and help keep our customers safe and secure. This interview is with Ely Kahn, principal product manager for AWS Security Hub. Security Hub is a cloud security posture management service that performs security best practice checks, aggregates alerts, and facilitates automated remediation.

How long have you been at AWS and what do you do in your current role?

I’ve been with AWS just over 4 years. I came to AWS through the acquisition of a company I co-founded called Sqrrl, which then became Amazon Detective. Shortly after the acquisition, I moved from the Sqrrl/Detective team and helped launch AWS Security Hub. In my current role, I’m the head of product for Security Hub, which means I lead our product roadmap and our product strategy, and I translate customer requirements into technical specifications.

How did you get started in the world of security?

My career started inside the U.S. federal government, first inside the Department of Homeland Security and, specifically, inside the Transportation Security Administration (TSA). At the time, the TSA had uncovered a vulnerability concerning boarding passes and the terrorist no-fly list. I was tasked with figuring out how to close that vulnerability, and I came up with a new way to embed a digital signature inside the barcode to help ensure the authenticity of the boarding pass. After that, people thought I was a cybersecurity expert, and I began working on a lot of cybersecurity strategy and policy at the Department of Homeland Security and then at the White House.

How do you explain your job to your non-tech friends?

I actually explain it the same way to technical and non-technical friends. I head up a service called Security Hub, which is designed to help you do a couple of different things. It helps you understand your security posture on AWS—what sort of risks you face and the most urgent security issues that you need to address across your AWS accounts. It also gives you the tools to improve your security posture and help you fix as many of those security issues as possible. We do that through three primary functions. First, we aggregate all of your security alerts into a standardized data format that’s available in one place. Second, we do our own automated security checks. We look at all the resources you’ve enabled on AWS and help check that those resources are configured in accordance with best practices that we define, and in alignment with various regulatory frameworks. Third, we help you auto-remediate and auto-respond to as many of those issues as possible.

What are you currently working on that you’re excited about?

Our number one priority with Security Hub is to expand coverage of the automated security checks that we provide. We have almost 200 automated security checks today covering several dozen AWS services. Over the next few years, we plan to expand this to more AWS services, which will add a large number of additional security checks. This is important because customers don’t want to have to write these security checks themselves. They want the one-click capability to turn on the checks—or controls, as we call them in Security Hub—and they should be automatically on in all of your accounts. They should only run if you’re using resources that are actually in-scope for those checks, and they should produce a security score to help you quickly understand the security posture of different accounts and of your organization as a whole.

What would you say is the coolest feature of Security Hub?

The coolest feature is probably the one that gets the least attention. It’s what we call our AWS Security Finding Format (ASFF). The ASFF is really just a data standard—it consists of over 1,000 JSON fields and objects, and it’s how you normalize all of your different security alerts. We’ve integrated 75 different services and partner products. The real advantage of Security Hub is that we automatically take all of those different alerts from all of those different integration partners and normalize them into this standardized data format, so that when you’re searching the findings you have a common set of fields to search against if you’re trying to do correlations. For example, you can imagine a situation where Amazon GuardDuty detects unusual activity in an Amazon Simple Storage Service (Amazon S3) bucket, one of our Security Hub checks detects that the bucket is open, and Amazon Macie determines that the bucket contains sensitive information. It’s much easier to do correlations for situations like this when the alerts from those different tools are in the same format. Similarly, building auto-response, auto-remediation workflows is much easier when all of your alerts are in the same format. One of our biggest customers at AWS called the ASFF the gold standard for how to normalize security alerts, which is something we’re super proud of.

As you mentioned, Security Hub integrates with a lot of other AWS services, like GuardDuty and Macie. How do you work with other service teams?

We work across AWS in a couple of different ways. We build out these integrations with other AWS services to either send or receive findings from those services. So, we receive findings from services like GuardDuty and Macie, and we send our findings to other services like AWS Trusted Advisor to give them the same view of security that we see in Security Hub. In general, we try to make it as simple and as low impact as possible because every service team is extremely busy. Wherever possible, we do the integration work and don’t put the onus of effort on the other service team.

The other way we work with other service teams is to formally define the best practices for that service. We have a security engineering team on Security Hub, and we partner with AWS Professional Services and their security consultants. Together, we have been working through the list of the most popular AWS services using a standard taxonomy of control categories to define security controls and best practices for that service. We then work with product managers and engineers on those service teams to review the controls we’re proposing, get their feedback, and then finally code them up as AWS Config rules before deploying them in Security Hub. We have a very well-honed process now to partner with the service teams to integrate with and define the security controls for each service.

Where do you suggest customers start with Security Hub if they are newer in their cloud journey?

The first step with Security Hub is just to turn it on across all of your accounts and AWS Regions. When you do, you’re likely going to see a lot of alerts. Don’t get overwhelmed with the number of alerts you see. Focus initially on the critical and high-severity alerts and work them as campaigns. Identify the owners for all open critical and high-severity alerts and start tracking burndown on a weekly basis. Coordinate with the leadership in your organization so you can identify which teams are keeping up with the alerts and which ones aren’t.

What’s your favorite Leadership Principle and why?

My favorite is one that I initially discounted: frugality. When I first joined AWS, what came to mind was Jeff Bezos using doors as desks. Although that’s certainly a component of frugality, I’ve found that for me, this principle means that we need to be frugal with each other’s time. There are so many competing demands on everyone’s time, and it’s extremely important in a place like AWS to be mindful of that. Make sure you’ve done your due diligence on something before you broadly ask the question or escalate.

What’s the thing you’re most proud of in your career?

There are two things. First is the acquisition of Sqrrl by AWS. I couldn’t have picked a better landing spot for Sqrrl and the team. I feel really lucky that I joined AWS through this acquisition. I’ve really learned a lot here in a short amount of time.

The other thing I’m especially proud of is to have been selected to do a stint through the White House National Security Council staff as the Department of Homeland Security representative to the Council. I sat in the cybersecurity directorate from 2009–2010 as part of that detail to the White House and got a chance to work in the West Wing and attend meetings in the Situation Room, which was just such a special experience.

If you had to pick an industry outside of security, what would you want to do?

This is pretty similar to security, but I got very close to going into the military. Out of high school, I was being recruited for lacrosse at the U.S. Air Force Academy. I had convinced myself that I wanted to go fly jets. I have the utmost respect for our military community, and I certainly could’ve seen myself taking that path.

 
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Author

Maddie Bacon

Maddie (she/her) is a technical writer for AWS Security with a passion for creating meaningful content. She previously worked as a security reporter and editor at TechTarget and has a BA in Mathematics. In her spare time, she enjoys reading, traveling, and all things Harry Potter.

Ely Kahn

Ely Kahn

Ely Kahn is the Principal Product Manager for AWS Security Hub. Before his time at AWS, Ely was a co-founder for Sqrrl, a security analytics startup that AWS acquired and is now Amazon Detective. Earlier, Ely served in a variety of positions in the federal government, including Director of Cybersecurity at the National Security Council in the White House.

The collective thoughts of the interwebz

By continuing to use the site, you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.

Close