How to Audit and Report S3 Prefix Level Access Using S3 Access Analyzer

2022-02-11 Somdeb Bhattacharjee

Post Syndicated from Somdeb Bhattacharjee original https://aws.amazon.com/blogs/architecture/how-to-audit-and-report-s3-prefix-level-access-using-s3-access-analyzer/

Data Services teams in all industries are developing centralized data platforms that provide shared access to datasets across multiple business units and teams within the organization. This makes data governance easier, minimizes data redundancy thus reducing cost, and improves data integrity. The central data platform is often built with AWS Simple Storage Service (S3).

A common pattern for providing access to this data is for you to set up cross-account IAM Users and IAM Roles to allow direct access to the datasets stored in S3 buckets. You then enforce the permission on these datasets with S3 Bucket Policies or S3 Access Point policies. These policies can be very granular and you can provide access at the bucket level, prefix level as well as object level within an S3 bucket.

To reduce risk and unintended access, you can use Access Analyzer for S3 to identify S3 buckets within your zone of trust (Account or Organization) that are shared with external identities. Access Analyzer for S3 provides a lot of useful information at the bucket level but you often need S3 audit capability one layer down, at the S3 prefix level, since you are most likely going to organize your data using S3 prefixes.

Common use cases

Many organizations need to ingest a lot of third-party/vendor datasets and then distribute these datasets within the organization in a subscription-based model. Irrespective of how the data is ingested, whether it is using AWS Transfer Family service or other mechanisms, all the ingested datasets are stored in a single S3 bucket with a separate prefix for each vendor dataset. The hierarchy can be represented as:

vendor-s3-bucket
->vendorA-prefix
->vendorA.dataset.csv
->vendorB-prefix
->vendorB.dataset.csv

Based on this design, access is also granted to the data subscribers at the S3 prefix level. Access Analyzer for S3 does not provide visibility at the S3 prefix level so you need to develop custom scripts to extract this information from the S3 policy documents. You also need the information in an easy-to-consume format, for example, a csv file, that can be queried, filtered, readily downloaded and shared across the organization.

To help address this requirement, we show how to implement a solution that builds on the S3 access analyzer findings to generate a csv file on a pre-configured frequency. This solution provides details about:

External Principals outside your trust zone that have access to your S3 buckets
Permissions granted to these external principals (read, write)
List of s3 prefixes these external principals have access to that is configured using S3 bucket policy and/or S3 access point policies.

Architecture Overview

Architecture Diagram showing How to Audit and Report S3 prefix level access using S3 Access Analyzer

Figure 1 – How to Audit and Report S3 prefix level access using S3 Access Analyzer

The solution entails the following steps:

Step 1 – The Access Analyzer ARN and the S3 bucket parameters are passed to an AWS Lambda function via Environment variables.

Step 2 – The Lambda code uses the Access Analyzer ARN to call the list-findings API to retrieve the findings information and store it in the S3 bucket (under json prefix) in JSON format.

Step 3 – The Lambda function then also parses the JSON file to get the required fields and store it as a csv file in the same S3 bucket (under report prefix). It also scans the bucket policy and/or the access point policies to retrieve the S3 prefix level permission granted to the external identity. That information is added to the csv file.

Steps 4 and 5 – As part of the initial deployment, an AWS Glue crawler is provided to discover and create the schema of the csv file and store it in the AWS Glue Data Catalog.

Step 6 – An Amazon Athena query is run to create a spreadsheet of the findings that can be downloaded and distributed for audit.

Prerequisites

For this walkthrough, you should have the following prerequisites:

An AWS account
S3 buckets that are shared with external identities via cross-account IAM roles or IAM users. Follow these instructions in this user guide to set up cross-account S3 bucket access.
IAM Access Analyzer enabled for your AWS account. Follow these instructions to enable IAM Access Analyzer within your account.

Once the IAM Access Analyzer is enabled, you should be able to view the Analyzer findings from the S3 console by selecting the bucket name and clicking on the ‘View findings’ box or directly going to the Access Analyzer findings on the IAM console.

When you select a ‘Finding id’ for an S3 Bucket, a screen similar to the following will appear:

Figure 2 – IAM Console Screenshot

Setup

Now that your access analyzer is running, you can open the link below to deploy the CloudFormation template. Make sure to launch the CloudFormation in the same AWS Region where IAM Access Analyzer has been enabled.

Launch template

Specify a name for the stack and input the following parameters:

ARN of the Access Analyzer which you can find from the IAM Console.
New S3 bucket where your findings will be stored. The Cloudformation template will add a suffix to the bucket name you provide to ensure uniqueness.

Figure 3 – CloudFormation Template screenshot

Select Next twice and on the final screen check the box allowing CloudFormation to create the IAM resources before selecting Create Stack.
It will take a couple of minutes for the stack to create the resources and launch the AWS Lambda function.
Once the stack is in CREATE_COMPLETE status, go to the Outputs tab of the stack and note down the value against the DataS3BucketName key. This is the S3 bucket the template generated. It would be of the format analyzer-findings-xxxxxxxxxxxx. Go to the S3 console and view the contents of the bucket.
There should be two folders archive/ and report/. In the report folder you should have the csv file containing the findings report.
You can download the csv directly and open it in a excel sheet to view the contents. If would like to query the csv based on different attributes, follow the next set of steps.
Go to the AWS Glue console and click on Crawlers. There should be an analyzer-crawler created for you. Select the crawler to run it.
After the crawler runs successfully, you should see a new table, analyzer-report created under analyzerdb Glue database.
Select the tablename to view the table properties and schema.
To query the table, go to the Athena console and select the analyzerdb database. Then you can run a query like “Select * from analyzer-report where externalaccount = <<valid external account>>” to list all the S3 buckets the external account has access to.

Figure 4 – Amazon Athena Console screenshot

The output of the query with a subset of columns is shown as follows:

Figure 5 – Output of Amazon Athena Query

This CloudFormation template also creates a Cloudwatch event rule, testanalyzer-ScheduledRule-xxxxxxx, that launches the Lambda function every Monday to generate a new version of the findings csv file. You can update the rule to set it to a frequency you desire.

Clean Up

To avoid incurring costs, remember to delete the resources you created. First, manually delete the folders ‘archive’ and ‘report’ in the S3 bucket and then delete the CloudFormation stack you deployed at the beginning of the setup.

Conclusion

In this blog, we showed how you can build audit capabilities for external principals accessing your S3 buckets at a prefix level. Organizations looking to provide shared access to datasets across multiple business units will find this solution helpful in improving their security posture. Give this solution a try and share your feedback!

LGR – Cities: Skylines Airports Review

2022-02-11 LGR

Post Syndicated from LGR original https://www.youtube.com/watch?v=ROOyNiXZhcI

Security updates for Friday

2022-02-11

Post Syndicated from original https://lwn.net/Articles/884516/

Security updates have been issued by Debian (cryptsetup), Fedora (firefox, java-1.8.0-openjdk, microcode_ctl, python-django, rlwrap, and vim), openSUSE (kernel), and SUSE (kernel and ldb, samba).

Security updates for Friday

2022-02-11

Post Syndicated from original https://lwn.net/Articles/884516/rss

The Forecast Is Flipped: How Rapid7 Is Flipping L&D for the Future of Work

2022-02-11 Megan Yawor

Post Syndicated from Megan Yawor original https://blog.rapid7.com/2022/02/11/the-forecast-is-flipped-how-rapid7-is-flipping-l-d-for-the-future-of-work/

The Forecast Is Flipped: How Rapid7 Is Flipping L&D for the Future of Work

The last 2 years have turned the world on its head, and now, companies across the globe are transitioning into a new normal. In this hybrid world, employee engagement is a moving target, the market is more competitive, and historical face-to-face teaching practices are no longer viable. Rapid7’s People Development team is leaning into innovation, striving to define the next best practice, and reimagining the possibilities of hybrid learning.

Focused on supporting employees (or Moose, as we call ourselves here at Rapid7) on their unique development journeys and the embodiment of our Core Values, People Development noted, “Our peoples’ feedback, hybrid work, and our company goals are pointing us in a new direction. So the question becomes, what is People Development’s next evolution in learning? We realize our programs must not only be aligned, impactful, and engaging but also scalable and accessible to our global audience. Additionally, further developing and fostering an environment where continuous learning is woven into our everyday employee experience is key.”

People Dev is flipping over flipped content

So, what could this next evolution look like? People Development’s answer: flipped.

The flipped classroom is a blended learning method that reverses the traditional teaching approach of live session lectures to live sessions focused on discussion and practice. Studies show that flipped learning results in higher engagement, observable behavior shifts, and greater retention of information.

In 2022, through the introduction of flipped learning, we will see People Development raise the bar and innovate as they evolve their current trainings into fuller, more robust learning experiences. These programs will be built around a connected story with a more intentional focus on the “why” and “how,” enabling employees to better see the connection between their learnings, their personal development, and the impact they have on the business. People Dev will strategically design flipped experiences that offer the right learning, at the right time, coupling live sessions with self-paced content, discussion groups, and lab environments. To further engage and enable employees, they will enhance these flipped experiences by utilizing carefully crafted delivery methods such as eLearnings, interactive challenges, virtual scenarios, and customized program hubs.

Technology will be key as we flip our trainings into dynamic learning experiences that provide the same opportunities for learning and engagement to all Moose. Director of People Development Maureen Spalluto said their goal is to “ensure that the skills and behaviors we prioritize in our programs align with the strategic needs of the business,” and moving forward, People Development will utilize flipped content to “maximize ILT (instructor-led training) time… and ensure our Moose are provided with time to practice, gain feedback, and foster accountability around learned skills.”

The unique design of these learning experiences will enhance program effectiveness, empowering employees to own their own development by meeting them where they are, regardless of role, tenure, or location, and further support their ability to embody our core value of Never Done.

Never Done is rooted in the idea of continuous learning, but that’s just the beginning. It’s about individuals bringing their authentic selves, committing to a higher bar, and continuing to learn the skills and behaviors needed to re-engineer experiences. The Never Done mindset is foundational to embodying our other Core Values — supporting innovation to Challenge Convention, honing skills to master complex collaborations and Impact Together, and strengthening practices that create world-class experiences and allow us to Be Advocates for our customers. People Development’s embodiment of Never Done and their passion around Challenging Convention will not only help scale their offerings but also transform their programs into adaptive, active learning environments where employees can learn at their own pace, engage in real-world scenarios, be accountable for their development, and leave trainings feeling confident in their ability to demonstrate and apply new skills and techniques.

A glimpse into People Dev’s flipped content

By Denée DAndrea, Katie Almanzan, Carolyn Kirkman, Kirsty McAleese, and Megan Yawor

New Hires – We will use flipped content to create an onboarding experience that extends beyond one-and-done live sessions and self-paced content. We will create a journey that gives New Moose weekly opportunities to engage with relevant content, reflect on and apply learnings with their peers, connect with their cohort, and participate in experiential learning.
All Moose – Our learning opportunities will utilize flipped learning by combining self-paced digital assignments, live sessions, and partner activities to foster meaningful connections, provide access to learning that supports a growing global business, and encourage employee accountability.
Emerging Moose Development Program (Emerging Leaders) – Focused on content that is customized and aligned to Rapid7’s business strategies, our Emerging Leaders program will use the collaborative, interactive, and application-based aspects of a flipped classroom to maximize classroom time, deepen connections, and encourage a more focused, hybrid learning experience.
Moose Managers – To better support our Managers to optimize the potential of their people, teams, and the company, we will introduce flipped content through community learning opportunities and a variety of digital, self-paced learning materials, allowing for increased practice and application opportunities during live ILT time.
Experience and Technology – The goal is to find innovative uses for existing tools as well as incorporate new technologies, as needed, to ensure our flipped content not only tells a story and feels part of a whole but is also seamless, relevant, and exciting.

As we think about re-engineering our programs for 2022 and beyond, we have to consider what will work best to support a hybrid, global workforce and what will create thoughtful learning experiences that support individuals, their careers, the business, and our customers. The answer is clear: flipped content, and we can’t wait to see it in action.

Flipped content is just the beginning for People Development. As Rapid7 works to establish the new normal, People Dev will continue to reimagine their programs and learning experiences in order to create the best possible training grounds and career pathways for technical talent, emerging leaders, and those early in their careers. The team will look to not only evolve their work but also enable and support our Moose to understand how and where to apply what they are learning so they can develop at their own pace. Ultimately, empowering our people to own their own careers and build their journey with us.

Stay tuned over the next several months to dive deeper into how People Development will be introducing flipped content and other innovative practices into all of their programs for 2022 and beyond in our new blog series, “The Forecast Is Flipped.”

Additional reading:

NEVER MISS A BLOG

Get the latest stories, expertise, and news about security today.

Flight of the XP-38

2022-02-11 The History Guy: History Deserves to Be Remembered

Post Syndicated from The History Guy: History Deserves to Be Remembered original https://www.youtube.com/watch?v=sKYlLVkYwWg

On the Irish Health Services Executive Hack

2022-02-11 Bruce Schneier

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2022/02/on-the-irish-health-services-executive-hack.html

A detailed report of the 2021 ransomware attack against Ireland’s Health Services Executive lists some really bad security practices:

The report notes that:

The HSE did not have a Chief Information Security Officer (CISO) or a “single responsible owner for cybersecurity at either senior executive or management level to provide leadership and direction.
It had no documented cyber incident response runbooks or IT recovery plans (apart from documented AD recovery plans) for recovering from a wide-scale ransomware event.
Under-resourced Information Security Managers were not performing their business as usual role (including a NIST-based cybersecurity review of systems) but were working on evaluating security controls for the COVID-19 vaccination system. Antivirus software triggered numerous alerts after detecting Cobalt Strike activity but these were not escalated. (The antivirus server was later encrypted in the attack).
There was no security monitoring capability that was able to effectively detect, investigate and respond to security alerts across HSE’s IT environment or the wider National Healthcare Network (NHN).
There was a lack of effective patching (updates, bug fixes etc.) across the IT estate and reliance was placed on a single antivirus product that was not monitored or effectively maintained with updates across the estate. (The initial workstation attacked had not had antivirus signatures updated for over a year.)
Over 30,000 machines were running Windows 7 (out of support since January 2020).
The initial breach came after a HSE staff member interacted with a malicious Microsoft Office Excel file attached to a phishing email; numerous subsequent alerts were not effectively investigated.

PwC’s crisp list of recommendations in the wake of the incident as well as detail on the business impact of the HSE ransomware attack may prove highly useful guidance on best practice for IT professionals looking to set up a security programme and get it funded.

Coding for kids: Art, games, and animations with our new beginners’ Python path

2022-02-11 Rebecca Franks

Post Syndicated from Rebecca Franks original https://www.raspberrypi.org/blog/coding-for-kids-art-games-animations-beginners-python-programming/

Python is a programming language that’s popular with learners and educators in clubs and schools. It also is widely used by professional programmers, particularly in the data science field. Many educators and young people like how similar the Python syntax is to the English language.

That’s why Python is often the first text-based language that young people learn to program in. The familiar syntax can lower the barrier to taking the first steps away from a block-based programming environment, such as Scratch.

In 2021, Python ranked in first place in an industry-standard popularity index of a major software quality assessment company, confirming its favoured position in software engineering. Python is, for example, championed by Google and used in many of its applications.

Coding for kids in Python

Python’s popularity means there are many excellent resources for learning this language. These resources often focus on creating programs that produce text outputs. We wanted to do something different.

Our new ‘Introduction to Python’ project path focuses on creating digital visuals using the Python p5 library. This library is like a set of tools that allows you to get creative by using Python code to draw shapes, edit images, and create frame-by-frame animations. That makes it the perfect choice for young learners: they can develop their knowledge and skills in Python programming while creating cool visuals that they’ll be proud of.

What is in the ‘Introduction to Python’ path?

The ‘Introduction to Python’ project path is designed according to our Digital Making Framework, encouraging learners to become independent coders and digital makers by gently removing scaffolding as they progress along the projects in a path. Paths begin with three Explore projects, in which learners are guided through tasks that introduce them to new coding skills. Next, learners complete two Design projects. Here, they are encouraged to practise their skills and bring in their own interests to personalise their coding creations. Finally, learners complete one Invent project. This is where they put everything that they have learned together and create something unique that matters to them.

Emoji, archery, rockets, art, and movement are all part of this Python path.

The structure of our Digital Making Framework means that learners experience the structured development process of a coding project and learn how to turn their ideas into reality. The Framework also supports with finding errors in their code (debugging), showing them that errors are a part of computer programming and just temporary setbacks that you can overcome.

What coding skills and knowledge will young people learn?

The Explore projects are where the initial learning takes place. The key programming concepts covered in this path are:

Variables
Performing calculations with variables
Using functions
Using selection (if, elif and else)
Using repetition (for loops)
Using randomisation
Importing from libraries

Learners also explore aspects of digital visual media concepts:

Coordinates
RGB colours
Screen size
Layers
Frames and animation

Learners then develop these skills and knowledge by putting them into practice in the Design and Invent projects, where they add in their own ideas and creativity.

Explore project 1: Hello world emoji

In the first Explore project of this path, learners create an interactive program that uses emoji characters as the visual element.

This is the first step into Python and gets learners used to the syntax for printing text, using variables, and defining functions.

Explore project 2: Target practice

In this Explore project, learners create an archery game. They are introduced to the p5 library, which they use to draw an archery board and create the arrows.

The new programming concept covered in this project is selection, where learners use if, elif and else to allocate points for the game.

Explore project 3: Rocket launch

The final Explore project gets learners to animate a rocket launching into space. They create an interactive animation where the user is asked to enter an amount of fuel for the rocket launch. The animation then shows if the fuel is enough to get the rocket into orbit.

The new programming concept covered here is repetition. Learners use for loops to animate smoke coming from the exhaust of the rocket.

Design project 1: Make a face

The first Design project allows learners to unleash their creativity by drawing a face using the Python coding skills that they have built in the Explore projects. They have full control of the design for their face and can explore three examples for inspiration.

Learners are also encouraged to share their drawings in the community library, where there are lots of fun projects to discover already. In this project, learners apply all of the coding skills and knowledge covered in the Explore projects, including selection, repetition, and variables.

Design project 2: Don’t collide!

In the second Design project, learners code a scrolling game called ‘Don’t collide’, where a character or vehicle moves down the screen while having to avoid obstacles.

Learners can choose their own theme for the game, and decide what will move down the screen and what the obstacles will look like. In this project, they also get to practice everything they learned in the Explore projects.

Invent project: Powerful patterns

This project is the ultimate chance for learners to put all of their skills and knowledge into practice and get creative. They design their own unique patterns and create frame-by-frame animations.

The Invent project offers ingredients, which are short reminders of all the key skills that learners have gained while completing the previous projects in the path. The ingredients encourage them to be independent whilst also supporting them with code snippets to help them along.

Explore the ‘Introduction to Python’ path

Key questions answered

Who is the Introduction to Python path for?

We have written the projects in the path with young people around the age of 9 to 13 in mind. To code in a text-based language, a young person needs to be familiar with using a keyboard, due to the typing involved. A learner may have completed one of our Scratch paths prior to this one, but this isn’t essential. and we encourage beginner coders to take this path first if that is their choice.

A young person codes at a Raspberry Pi computer.

What software do learners need to code these projects?

A web browser. In every project, starter code is provided in a free web-based development environment called Trinket, where learners add their own code. The starter Trinkets include everything that learners need to use Python and access the p5 library.

If preferred, the projects also include instructions for using a desktop-based programming environment, such as Thonny.

How long will the path take to complete?

We’ve designed the path to be completed in around six one-hour sessions, with one hour per project. However, the project instructions encourage learners to upgrade their projects and go further if they wish. This means that young people might want to spend a little more time getting their projects exactly as they imagine them.

What can young people do next after completing this path?

Taking part in Coolest Projects Global

At the end of the path, learners are encouraged to register a project they’re making with their new coding skills for Coolest Projects Global, our world-leading online technology showcase for young people.

Three young tech creators show off their tech project at Coolest Projects.

Taking part is free, all online, and beginners as well as more experienced young tech creators are welcome and invited. This is their unique opportunity to share their ingenuity in an online gallery for the world and the Coolest Projects community to celebrate.

Coding more Python projects with us

Coming very soon is our ‘More Python’ path. In this path, learners will move beyond the basics they learned in Introduction to Python. They will learn how to use lists, dictionaries, and files to create charts, models, and artwork. Keep your eye on our blog and social media for the release of ‘More Python’.

The post Coding for kids: Art, games, and animations with our new beginners’ Python path appeared first on Raspberry Pi.

How Grab built a scalable, high-performance ad server

2022-02-11 Grab Tech

Post Syndicated from Grab Tech original https://engineering.grab.com/scalable-ads-server

Why ads?

GrabAds is a service that provides businesses with an opportunity to market their products to Grab’s consumer base. During the pandemic, as the demand for food delivery grew, we realised that ads could be a service we offer to our small restaurant merchant-partners to expand their reach. This would allow them to not only mitigate the loss of in-person traffic but also grow by attracting more customers.

Many of these small merchant-partners had no experience with digital advertising and we provided an easy-to-use, scalable option that could match their business size. On the other side of the equation, our large network of merchant-partners provided consumers with more choices. For hungry consumers stuck at home, personalised ads and promotions helped them satisfy their cravings, thus fulfilling their intent of opening the Grab app in the first place!

Why build our own ad server?

Building an ad server is an ambitious undertaking and one might rightfully ask why we should invest the time and effort to build a technically complex distributed system when there are several reasonable off-the-shelf solutions available.

The answer is we didn’t, at least not at first. We used one of these off-the-shelf solutions to move fast and build a minimally viable product (MVP). The result of this experiment was a resounding success; we were providing clear value to our merchant-partners, our consumers and Grab’s overall business.

However, to take things to the next level meant scaling the ads business up exponentially. Apart from being one of the few companies with the user engagement to support an ads business at scale, we also have an ecosystem that combines our network of merchant-partners, an understanding of our consumers’ interactions across multiple services in the Grab superapp, and a payments solution, GrabPay, to close the loop. Furthermore, given the hyperlocal nature of our business, the in-app user experience is highly customised by location. In order to integrate seamlessly with this ecosystem, scale as Grab’s overall business grows and handle personalisation using machine learning (ML), we needed an in-house solution.

What we built

We designed and built a set of microservices, streams and pipelines which orchestrated the core ad serving functionality, as shown below.

Targeting – This is the first step in the ad serving flow. We fetch a set of candidate ads specifically targeted to the request based on keywords the user searched for, the user’s location, the time of day, and the data we have about the user’s preferences or other characteristics. We chose ElasticSearch as the data store for our ads repository as it allows us to query based on a disparate set of targeting criteria.
Capping – In this step, we filter out candidate ads which have exceeded various caps. This includes cases where an advertising campaign has already reached its budget goal, as well as custom requirements about the frequency an ad is allowed to be shown to the same user. In order to make this decision, we need to know how much budget has already been spent and how many times an ad has already been shown. We chose ScyllaDB to store these “stats”, which is scalable, low-cost and can handle the large read and write requirements of this process (more on how this data gets written to ScyllaDB in the Tracking step).
Pacing – In this step, we alter the probability that a matching ad candidate can be served, based on a specific campaign goal. For example, in some cases, it is desirable for an ad to be shown evenly throughout the day instead of exhausting the entire ad budget as soon as possible. Similar to Capping, we require access to information on how many times an ad has already been served and use the same ScyllaDB stats store for this.
Scoring – In this step, we score each ad. There are a number of factors that can be used to calculate this score including predicted clickthrough rate (pCTR), predicted conversion rate (pCVR) and other heuristics that represent how relevant an ad is for a given user.
Ranking – This is where we compare the scored candidate ads with each other and make the final decision on which candidate ads should be served. This can be done in several ways such as running a lottery or performing an auction. Having our own ad server allows us to customise the ranking algorithm in countless ways, including incorporating ML predictions for user behaviour. The team has a ton of exciting ideas on how to optimise this step and now that we have our own stack, we’re ready to execute on those ideas.
Pricing – After choosing the winning ads, the final step before actually returning those ads in the API response is to determine what price we will charge the advertiser. In an auction, this is called the clearing price and can be thought of as the minimum bid price required to outbid all the other candidate ads. Depending on how the ad campaign is set up, the advertiser will pay this price if the ad is seen (i.e. an impression occurs), if the ad is clicked, or if the ad results in a purchase.
Tracking – Here, we close the feedback loop and track what users do when they are shown an ad. This can include viewing an ad and ignoring it, watching a video ad, clicking on an ad, and more. The best outcome is for the ad to trigger a purchase on the Grab app. For example, placing a GrabFood order with a merchant-partner; providing that merchant-partner with a new consumer. We track these events using a series of API calls, Kafka streams and data pipelines. The data ultimately ends up in our ScyllaDB stats store and can then be used by the Capping and Pacing steps above.

Principles

In addition to all the usual distributed systems best practices, there are a few key principles that we focused on when building our system.

Latency – Latency is important for ads. If the user scrolls faster than an ad can load, the ad won’t be seen. The longer an ad remains on the screen, the more likely the user will notice it, have their interest piqued and click on it. As such, we set strict limits on the latency of the ad serving flow. We spent a large amount of effort tuning ElasticSearch so that it could return targeted ads in the shortest amount of time possible. We parallelised parts of the serving flow wherever possible and we made sure to A/B test all changes both for business impact and to ensure they did not increase our API latency.
Graceful fallbacks – We need user-specific information to make personalised decisions about which ads to show to a given user. This data could come in the form of segmentation of our users, attributes of a single user or scores derived from ML models. All of these require the ad server to make dependency calls that could add latency to the serving flow. We followed the principle of setting strict timeouts and having graceful fallbacks when we can’t fetch the data needed to return the most optimal result. This could be due to network failures or dependencies operating slower than usual. It’s often better to return a non-personalised result than no result at all.
Global optimisation – Predicting supply (the amount of users viewing the app) and demand (the amount of advertisers wanting to show ads to those users) is difficult. As a superapp, we support multiple types of ads on various screens. For example, we have image ads, video ads, search ads, and rewarded ads. These ads could be shown on the home screen, when booking a ride, or when searching for food delivery. We intentionally decided to have a single ad server supporting all of these scenarios. This allows us to optimise across all users and app locations. This also ensures that engineering improvements we make in one place translate everywhere where ads or promoted content are shown.

What’s next?

Grab’s ads business is just getting started. As the number of users and use cases grow, ads will become a more important part of the mix. We can help our merchant-partners grow their own businesses while giving our users more options and a better experience.

Some of the big challenges ahead are:

Optimising our real-time ad decisions, including exciting work on using ML for more personalised results. There are many factors that can be considered in ad personalisation such as past purchase history, the user’s location and in-app browsing behaviour. Another area of optimisation is improving our auction strategy to ensure we have the most efficient ad marketplace possible.
Expanding the types of ads we support, including experimenting with new types of content, finding the best way to add value as Grab expands its breadth of services.
Scaling our services so that we can match Grab’s velocity and handle growth while maintaining low latency and high reliability.

Join us

Grab is a leading superapp in Southeast Asia, providing everyday services that matter to consumers. More than just a ride-hailing and food delivery app, Grab offers a wide range of on-demand services in the region, including mobility, food, package and grocery delivery services, mobile payments, and financial services across over 400 cities in eight countries.

Powered by technology and driven by heart, our mission is to drive Southeast Asia forward by creating economic empowerment for everyone. If this mission speaks to you, join our team today!

Rest and Fluids

2022-02-11

Post Syndicated from original https://xkcd.com/2580/

Remember not to take it easy. Put a hot washcloth on your forehead, remain standing, and breathe dry air while taking lots of histamines. You need to give your body a chance to get sick again.

Build a data sharing workflow with AWS Lake Formation for your data mesh

2022-02-11 Jan Michael Go Tan

Post Syndicated from Jan Michael Go Tan original https://aws.amazon.com/blogs/big-data/build-a-data-sharing-workflow-with-aws-lake-formation-for-your-data-mesh/

A key benefit of a data mesh architecture is allowing different lines of business (LOBs) and organizational units to operate independently and offer their data as a product. This model not only allows organizations to scale, but also gives the end-to-end ownership of maintaining the product to data producers that are the domain experts of the data. This ownership entails maintaining the data pipelines, debugging ETL scripts, fixing data quality issues, and keeping the catalog entries up to date as the dataset evolves over time.

On the consumer side, teams can search the central catalog for relevant data products and request access. Access to the data is done via the data sharing feature in AWS Lake Formation. As the amount of data products grow and potentially more sensitive information is stored in an organization’s data lake, it’s important that the process and mechanism to request and grant access to specific data products are done in a scalable and secure manner.

This post describes how to build a workflow engine that automates the data sharing process while including a separate approval mechanism for data products that are tagged as sensitive (for example, containing PII data). Both the workflow and approval mechanism are customizable and should be adapted to adhere to your company’s internal processes. In addition, we include an optional workflow UI to demonstrate how to integrate with the workflow engine. The UI is just one example of how the interaction works. In a typical large enterprise, you can also use ticketing systems to automatically trigger both the workflow and the approval process.

Solution overview

A typical data mesh architecture for analytics in AWS contains one central account that collates all the different data products from multiple producer accounts. Consumers can search the available data products in a single location. Sharing data products to consumers doesn’t actually make a separate copy, but instead just creates a pointer to the catalog item. This means any updates that producers make to their products are automatically reflected in the central account as well as in all the consumer accounts.

Building on top of this foundation, the solution contains several components, as depicted in the following diagram:

The central account includes the following components:

AWS Glue – Used for Data Catalog purposes.
AWS Lake Formation – Used to secure access to the data as well as provide the data sharing capabilities that enable the data mesh architecture.
AWS Step Functions – The actual workflow is defined as a state machine. You can customize this to adhere to your organization’s approval requirements.
AWS Amplify – The workflow UI uses the Amplify framework to secure access. It also uses Amplify to host the React-based application. On the backend, the Amplify framework creates two Amazon Cognito components to support the security requirements:
- User pools – Provide a user directory functionality.
- Identity pools – Provide federated sign-in capabilities using Amazon Cognito user pools as the location of the user details. The identity pools vend temporary credentials so the workflow UI can access AWS Glue and Step Functions APIs.
AWS Lambda – Contains the application logic orchestrated by the Step Functions state machine. It also provides the necessary application logic when a producer approves or denies a request for access.
Amazon API Gateway – Provides the API for producers to accept and deny requests.

The producer account contains the following components:

Amazon Simple Notification Service (Amazon SNS) – Producers maintain their list of approvers in an SNS topic. The workflow engine publishes a message in the respective data owner’s SNS topic when an approval is required.
AWS Identity and Access Management (IAM) – The ProducerWorkflowRole role is assumed by the workflow engine if it needs to send an approval request to the producer’s SNS topic.

The consumer account contains the following components:

AWS Glue – Used for Data Catalog purposes.
AWS Lake Formation – After the data has been made available, consumers can grant access to its own users via Lake Formation.
AWS Resource Access Manager (AWS RAM) – If the grantee account is in the same organization as the grantor account, the shared resource is available immediately to the grantee. If the grantee account is not in the same organization, AWS RAM sends an invitation to the grantee account to accept or reject the resource grant. For more details about Lake Formation cross-account access, see Cross-Account Access: How It Works.

The solution is split into multiple steps:

Deploy the central account backend, including the workflow engine and its associated components.
Deploy the backend for the producer accounts. You can repeat this step multiple times depending on the number of producer accounts that you’re onboarding into the workflow engine.
Deploy the optional workflow UI in the central account to interact with the central account backend.

Workflow overview

The following diagram illustrates the workflow. In this particular example, the state machine checks if the table or database (depending on what is being shared) has the pii_flag parameter and if it’s set to TRUE. If both conditions are valid, it sends an approval request to the producer’s SNS topic. Otherwise, it automatically shares the product to the requesting consumer.

This workflow is the core of the solution, and can be customized to fit your organization’s approval process. In addition, you can add custom parameters to databases, tables, or even columns to attach extra metadata to support the workflow logic.

Prerequisites

The following are the deployment requirements:

A data mesh set up
NodeJS v14.x
Yarn v1.22.10+
The AWS Cloud Development Kit (AWS CDK) CLI v1.119.0+
The Amplify CLI v5.3.0+ (for the workflow UI)
AWS profiles for each of the accounts (central and producers) that you’re deploying in

You can clone the workflow UI and AWS CDK scripts from the GitHub repository.

Deploy the central account backend

To deploy the backend for the central account, go to the root of the project after cloning the GitHub repository and enter the following code:

yarn deploy-central --profile <PROFILE_OF_CENTRAL_ACCOUNT>

This deploys the following:

IAM roles used by the Lambda functions and Step Functions state machine
Lambda functions
The Step Functions state machine (the workflow itself)
An API Gateway

When the deployment is complete, it generates a JSON file in the src/cfn-output.json location. This file is used by the UI deployment script to generate a scoped-down IAM policy and workflow UI application to locate the state machine that was created by the AWS CDK script.

The actual AWS CDK scripts for the central account deployment are in infra/central/. This also includes the Lambda functions (in the infra/central/functions/ folder) that are used by both the state machine and the API Gateway.

Lake Formation permissions

The following table contains the minimum required permissions that the central account data lake administrator needs to grant to the respective IAM roles for the backend to have access to the AWS Glue Data Catalog.

Role	Permission	Grantable
WorkflowLambdaTableDetails	Database: DESCRIBE Tables: DESCRIBE	N/A
WorkflowLambdaShareCatalog	Tables: SELECT, DESCRIBE	Tables: SELECT, DESCRIBE

Workflow catalog parameters

The workflow uses the following catalog parameters to provide its functionality.

Catalog Type	Parameter Name	Description
Database	`data_owner`	(Required) The account ID of the producer account that owns the data products.
Database	`data_owner_name`	A readable friendly name that identifies the producer in the UI.
Database	`pii_flag`	A flag (`true/false`) that determines whether the data product requires approval (based on the example workflow).
Column	`pii_flag`	A flag (`true/false`) that determines whether the data product requires approval (based on the example workflow). This is only applicable if requesting table-level access.

You can use UpdateDatabase and UpdateTable to add parameters to database and column-level granularity, respectively. Alternatively, you can use the CLI for AWS Glue to add the relevant parameters.

Use the AWS CLI to run the following command to check the current parameters in your database:

aws glue get-database --name <DATABASE_NAME> --profile <PROFILE_OF_CENTRAL_ACCOUNT>

You get the following response:

{
  "Database": {
    "Name": "<DATABASE_NAME>",
    "CreateTime": "<CREATION_TIME>",
    "CreateTableDefaultPermissions": [],
    "CatalogId": "<CATALOG_ID>"
  }
}

To update the database with the parameters indicated in the preceding table, we first create the input JSON file, which contains the parameters that we want to update the database with. For example, see the following code:

{
  "Name": "<DATABASE_NAME>",
  "Parameters": {
    "data_owner": "<AWS_ACCOUNT_ID_OF_OWNER>",
    "data_owner_name": "<AWS_ACCOUNT_NAME_OF_OWNER>",
    "pii_flag": "true"
  }
}

Run the following command to update the Data Catalog:

aws glue update-database --name <DATABASE_NAME> --database-input file://<FILE_NAME>.json --profile <PROFILE_OF_CENTRAL_ACCOUNT>

Deploy the producer account backend

To deploy the backend for your producer accounts, go to the root of the project and run the following command:

yarn deploy-producer --profile <PROFILE_OF_PRODUCER_ACCOUNT> --parameters centralMeshAccountId=<central_account_account_id>

This deploys the following:

An SNS topic where approval requests get published.
The ProducerWorkflowRole IAM role with a trust relationship to the central account. This role allows Amazon SNS publish to the previously created SNS topic.

You can run this deployment script multiple times, each time pointing to a different producer account that you want to participate in the workflow.

To receive notification emails, subscribe your email in the SNS topic that the deployment script created. For example, our topic is called DataLakeSharingApproval. To get the full ARN, you can either go to the Amazon Simple Notification Service console or run the following command to list all the topics and get the ARN for DataLakeSharingApproval:

aws sns list-topics --profile <PROFILE_OF_PRODUCER_ACCOUNT>

After you have the ARN, you can subscribe your email by running the following command:

aws sns subscribe --topic-arn <TOPIC_ARN> --protocol email --notification-endpoint <EMAIL_ADDRESS> --profile <PROFILE_OF_PRODUCER_ACCOUNT>

You then receive a confirmation email via the email address that you subscribed. Choose Confirm subscription to receive notifications from this SNS topic.

Deploy the workflow UI

The workflow UI is designed to be deployed in the central account where the central data catalog is located.

To start the deployment, enter the following command:

yarn deploy-ui

This deploys the following:

Amazon Cognito user pool and identity pool
React-based application to interact with the catalog and request data access

The deployment command prompts you for the following information:

Project information – Use the default values.
AWS authentication – Use your profile for the central account. Amplify uses this profile to deploy the backend resources.

UI authentication – Use the default configuration and your username. Choose No, I am done when asked to configure advanced settings.

UI hosting – Use hosting with the Amplify console and choose manual deployment.

The script gives a summary of what is deployed. Entering Y triggers the resources to be deployed in the backend. The prompt looks similar to the following screenshot:

When the deployment is complete, the remaining prompt is for the initial user information such as user name and email. A temporary password is automatically generated and sent to the email provided. The user is required to change the password after the first login.

The deployment script grants IAM permissions to the user via an inline policy attached to the Amazon Cognito authenticated IAM role:

{
   "Version":"2012-10-17",
   "Statement":[
      {
         "Effect":"Allow",
         "Action":[
            "glue:GetDatabase",
            "glue:GetTables",
            "glue:GetDatabases",
            "glue:GetTable"
         ],
         "Resource":"*"
      },
      {
         "Effect":"Allow",
         "Action":[
            "states:ListExecutions",
            "states:StartExecution"
         ],
         "Resource":[
"arn:aws:states:<REGION>:<AWS_ACCOUNT_ID>:stateMachine:<STATE_MACHINE_NAME>"
]
      },
      {
         "Effect":"Allow",
         "Action":[
             "states:DescribeExecution"
         ],
         "Resource":[
"arn:aws:states:<REGION>:<AWS_ACCOUNT_ID>:execution:<STATE_MACHINE_NAME>:*"
]
      }


   ]
}

The last remaining step is to grant Lake Formation permissions (DESCRIBE for both databases and tables) to the authenticated IAM role associated with the Amazon Cognito identity pool. You can find the IAM role by running the following command:

cat amplify/team-provider-info.json

The IAM role name is in the AuthRoleName property under the awscloudformation key. After you grant the required permissions, you can use the URL provided in your browser to open the workflow UI.

Your temporary password is emailed to you so you can complete the initial login, after which you’re asked to change your password.

The first page after logging in is the list of databases that consumers can access.

Choose Request Access to see the database details and the list of tables.

Choose Request Per Table Access and see more details at the table level.

Going back in the previous page, we request database-level access by entering the consumer account ID that receives the share request.

Because this database has been tagged with a pii_flag, the workflow needs to send an approval request to the product owner. To receive this approval request email, the product owner’s email needs to be subscribed to the DataLakeSharingApproval SNS topic in the product account. The details should look similar to the following screenshot:

The email looks similar to the following screenshot:

The product owner chooses the Approve link to trigger the Step Functions state machine to continue running and share the catalog item to the consumer account.

For this example, the consumer account is not part of an organization, so the admin of the consumer account has to go to AWS RAM and accept the invitation.

After the resource share is accepted, the shared database appears in the consumer account’s catalog.

Clean up

If you no longer need to use this solution, use the provided cleanup scripts to remove the deployed resources.

Producer account

To remove the deployed resources in producer accounts, run the following command for each producer account that you deployed in:

yarn clean-producer --profile <PROFILE_OF_PRODUCER_ACCOUNT>

Central account

Run the following command to remove the workflow backend in the central account:

yarn clean-central --profile <PROFILE_OF_CENTRAL_ACCOUNT>

Workflow UI

The cleanup script for the workflow UI relies on an Amplify CLI command to initiate the teardown of the deployed resources. Additionally, you can use a custom script to remove the inline policy in the authenticated IAM role used by Amazon Cognito so that Amplify can fully clean up all the deployed resources. Run the following command to trigger the cleanup:

yarn clean-ui

This command doesn’t require the profile parameter because it uses the existing Amplify configuration to infer where the resources are deployed and which profile was used.

Conclusion

This post demonstrated how to build a workflow engine to automate an organization’s approval process to gain access to data products with varying degrees of sensitivity. Using a workflow engine enables data sharing in a self-service manner while codifying your organization’s internal processes to be able to safely scale as more data products and teams get onboarded.

The provided workflow UI demonstrated one possible integration scenario. Other possible integration scenarios include integration with your organization’s ticketing system to trigger the workflow as well as receive and respond to approval requests, or integration with business chat applications to further shorten the approval cycle.

Lastly, a high degree of customization is possible with the demonstrated approach. Organizations have complete control over the workflow, how data product sensitivity levels are defined, what gets auto-approved and what needs further approvals, the hierarchy of approvals (such as a single approver or multiple approvers), and how the approvals get delivered and acted upon. You can take advantage of this flexibility to automate your company’s processes to help them safely accelerate towards being a data-driven organization.

About the Author

Jan Michael Go Tan is a Principal Solutions Architect for Amazon Web Services. He helps customers design scalable and innovative solutions with the AWS Cloud.

Extract ServiceNow data using AWS Glue Studio in an Amazon S3 data lake and analyze using Amazon Athena

2022-02-10 Navnit Shukla

Post Syndicated from Navnit Shukla original https://aws.amazon.com/blogs/big-data/extract-servicenow-data-using-aws-glue-studio-in-an-amazon-s3-data-lake-and-analyze-using-amazon-athena/

Many different cloud-based software as a service (SaaS) offerings are available in AWS. ServiceNow is one of the common cloud-based workflow automation platforms widely used by AWS customers. In the past few years, we saw a lot of customers who wanted to extract and integrate data from IT service management (ITSM) tools like ServiceNow for various use cases:

Generate insight from data – When you combine ServiceNow data with data from other services like CRM (such as Salesforce) or Martech data (such as Amazon Pinpoint) to generate better insights (e.g., building complete customer 360 view).
Archive data for future business or regulatory requirements – You can archive the data in raw form in your data lake to work on future use cases or just keep it to satisfy regulatory requirements such as auditing.
Improve performance by decoupling reporting or machine learning use cases from ITSM – When you move your ITSM reporting from ServiceNow to an Amazon Simple Storage Service (Amazon S3) data lake, there is no performance impact on your ServiceNow instance.
Data democratization – You can extract the data and put it into a data lake so it can be available to other business users and units to explore and use.

Many customers have been building modern data architectures on AWS, which includes building data lakes on Amazon S3 and using broad and deep AWS analytics and an AI/ML services to extract meaningful information from data by combining data from different data sources.

In this post, we provide a step-by-step guide to bring data from ServiceNow to an S3 data lake using AWS Glue Studio and analyze the data with Amazon Athena.

Solution overview

In this solution, ServiceNow data is being extracted through AWS Glue using a Marketplace connector. AWS Glue provides built-in support for the most commonly used data stores (such as Amazon Redshift, Amazon Aurora, Microsoft SQL Server, MySQL, MongoDB, and PostgreSQL) using JDBC connections. AWS Glue also allows you to use custom JDBC drivers in your extract, transform, and load (ETL) jobs. For data stores that are not natively supported, such as SaaS applications, you can use connectors and stored in Amazon S3. The data is cataloged in the AWS Glue Data Catalog, and we use Athena to query the data.

AWS Glue is a serverless data integration service that makes it easy to discover, prepare, and combine data for analytics, machine learning (ML), and application development. AWS Glue provides all the capabilities needed for data integration so you can start analyzing your data and put it to use in minutes instead of months.

Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run.

ServiceNow is a cloud-based software platform for ITSM that helps to automate IT business management. It’s designed based on ITIL guidelines to provide service orientation for tasks, activities, and processes.

The following diagram illustrates our solution architecture.

To implement the solution, we complete the following high-level steps:

Subscribe to the AWS Glue Connector Marketplace for ServiceNow from AWS Marketplace.
Create a connection in AWS Glue Studio.
Create an AWS Identity and Access Management (IAM) role for AWS Glue.
Configure and run an AWS Glue job that uses the connection.
Run the query against the data lake (Amazon S3) using Athena.

Prerequisites

For this walkthrough, you should have the following:

An AWS account.
A ServiceNow account. To follow along with this post, you can sign up for a developer account, which is pre-populated with sample records in many of the ServiceNow objects.
ServiceNow connection properties credentials stored in AWS Secrets Manager. On the Secrets Manager console, create a new secret (select Other type of secrets) with a key-value pair for each property, for example:
- Username – ServiceNow Instance account user name (for example, admin)
- Password – ServiceNow Instance account password
- Instance – ServiceNow instance name without https and .service-now.com

Copy the secret name to use when configuring the connection in AWS Glue Studio.

Subscribe to the AWS Glue Marketplace Connector for ServiceNow

To connect, we use the AWS Glue Marketplace Connector for ServiceNow. You need to subscribe to the connector from AWS Marketplace.

The AWS Glue Marketplace Connector for ServiceNow is provided by third-party independent software vendor (ISV) listed on AWS Marketplace. Associated subscription fees and AWS usage fees apply once subscribed.

To use the connector in AWS Glue, you need to activate the subscribed connector in AWS Glue Studio. The activation process creates a connector object and connection in your AWS account.

On the AWS Glue console, choose AWS Glue Studio.
Choose Connectors.
Choose Marketplace.
Search for the CData AWS Glue Connector for ServiceNow.

After you subscribe to the connector, a new config tab appears on the AWS Marketplace connector page.

Review the pricing and other relevant information.
Choose Continue to Subscribe.
Choose Accept Terms.

After you subscribe to the connector, the next steps are to configure it.

Retain the default selections for Delivery Method and Software Version to use the latest connector software version.
Choose Continue to Launch.

Choose Usage Instructions.

A pop-up appears with a hyperlink to activate the connector with AWS Glue Studio.

Choose this link to start configuring the connection to your ServiceNow account in AWS Glue Studio.

Create a connection in AWS Glue Studio

Create a connection in AWS Glue Studio with the following steps:

For Name, enter a unique name for your ServiceNow connection.
For Connection credential type, choose username_password.
For AWS Secret, choose the Secrets Manager secret you created as a prerequisite.

Don’t provide any additional details in the optional Credentials section because it retrieves the value from Secrets Manager.

Choose Create connection and activate connector to finish creating the connection.

You should now be able to view the ServiceNow connector you subscribed to and its associated connection.

Create an IAM role for AWS Glue

The next step is to create an IAM role with the necessary permissions for the AWS Glue job. The name of the role must start with the string AWSGlueServiceRole for AWS Glue Studio to use it correctly. You need to grant your IAM role permissions that AWS Glue can assume when calling other services on your behalf. For more information, see Create an IAM Role for AWS Glue.

Attach the following AWS managed policies to the role:

AmazonEC2ContainerRegistryReadOnly for accessing connectors purchased from AWS Marketplace.
AWSGlueServiceRole for accessing related services such as Amazon Elastic Compute Cloud (Amazon EC2), Amazon S3, and Amazon CloudWatch Logs.
If you’re accessing data found in Amazon S3, you need to add access to read and write to Amazon S3. Create and attach a policy with write access to the S3 bucket. For instructions, see Amazon S3: Allows read and write access to objects in an S3 Bucket.
Lastly, if you’re using Secrets Manager to store confidential connection properties, you need to add an inline policy similar to the following, granting access to the specific secrets needed for the AWS Glue job:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "secretsmanager:GetResourcePolicy",
                "secretsmanager:GetSecretValue",
                "secretsmanager:DescribeSecret",
                "secretsmanager:ListSecretVersionIds"
            ],
            "Resource": [
                "{secret name arn}"
            ]
        }
    ]
}

For more information about permissions, see Review IAM permissions needed for the AWS Glue Studio user.

Configure and run the AWS Glue job

After you configure your connection, you can create and run an AWS Glue job.

Create a job that uses the connection

To create a job, complete the following steps:

In AWS Glue Studio, choose Connectors.
Select the connection you created.
Choose Create job.

The visual job editor appears. A new source node, derived from the connection, is displayed on the job graph. In the node details panel on the right, the Data source properties tab is selected for user input.

Configure the source node properties

You can configure the access options for your connection to the data source on the Data source properties tab. For this post, we provide a simple walkthrough. Refer to the AWS Glue Studio User Guide for more information.

On the Source menu, choose CData AWS Glue Connector for ServiceNow.

On the Data source properties – Connector tab, make sure the source node for your connector is selected.

The Connection field is populated automatically with the name of the connection associated with the marketplace connector.

Enter either a source table name or a query to use to retrieve data from the data source. For this post, we enter the table name incident.

On the Transform menu, choose Apply Mapping.
In a Node Property Tab, Select Node Parents CData AWS Glue Connector for ServiceNow.
As we are connecting to an external data source; when you first look into Transform and Output schema tab; you won’t find the schema extracted from the source.
In order for you to retrieve schema, Go to Data Preview tab, click on Start data preview session and select the IAM role you have created for this job.
Once the Data preview is done, go to Data Source section and click on Use datapreview schema.
Go to Transform and Check all the columns where Data Type showing as NULL.

On the Target menu, choose Amazon S3.
On the Data target properties – S3 tab, for Format, choose Parquet.
For Compression Type, choose GZIP.
For S3 Target Location, enter the Amazon S3 location to store the data.
For Data Catalog update options, select Create a table in the Data Catalog and on subsequent runs, keep existing schema and add new partitions.
For Database, enter sampledb.
For Table name, enter incident.

Edit, save, and run the job

Edit the job by adding and editing the nodes in the job graph. See Editing ETL jobs in AWS Glue Studio for more information.

After you edit the job, enter the job properties.

Choose the Job details tab above the visual graph editor.
For Name, enter a job name.
For IAM Role, choose an IAM role with the necessary permissions, as described previously.
For Type, choose Spark.
For Glue version, choose Glue 3.0 – Supports spark 3.1, Scala 2, Python 3.
For Language, choose Python 3.
Worker type : G.1X
Requested number of workers: 2
Number of retries: 1
Job timeout (minutes): 3
Use the default values for the other parameters.

For more information about job parameters, see Defining Job Properties for Spark Jobs.

12. After you save the job, choose Run to run the job.

Note – Running the Glue Job incur cost. You can learn more about AWS Glue Pricing here.

To view the generated script for the job, choose the Script tab at the top of the visual editor. The Job runs tab shows the job run history for the job. For more information about job run details, see View information for recent job runs.

Query against the data lake using Athena

After the job is complete, you can query the data in Athena.

On the Athena console, choose the sampledb database.

You can view the newly created table called incident.

Choose the options icon (three vertical dots) and choose Preview table to view the data.

Now let’s perform some analyses.

Find all the incident tickets that are escalated by running the following query:
```
SELECT task_effective_number FROM "sampledb"."incident" 
where escalation = 2;
```

Find ticket count with priority:

SELECT priority, count(distinct task_effective_number)  FROM "sampledb"."incident"
group by priority
order by priority asc

Conclusion

In this post, we demonstrated how you can use an AWS Glue Studio connector to connect from ServiceNow and bring data into your data lake for further use cases.

AWS Glue provides built-in support for the most commonly used data stores (such as Amazon Redshift, Amazon Aurora, Microsoft SQL Server, MySQL, MongoDB, and PostgreSQL) using JDBC connections. AWS Glue also allows you to use custom JDBC drivers in your extract, transform, and load (ETL) jobs. For data stores that are not natively supported, such as SaaS applications, you can use connectors.

To learn more, refer to the AWS Glue Studio Connector, AWS Glue Studio User Guide and Athena User Guide.

About the Authors

Navnit Shukla is AWS Specialist Solution Architect in Analytics. He is passionate about helping customers uncover insights from their data. He builds solutions to help organizations make data-driven decisions.

Srikanth Sopirala is a Principal Solutions Architect at AWS. He is a seasoned leader with over 20 years of experience, who is passionate about helping customers build scalable data and analytic solutions to gain timely insights and make critical business decisions. In his spare time, he enjoys reading, spending time with his family, and road biking.

Naresh Gautam is a Principal Solutions Architect at AWS. His role is helping customers architect highly available, high-performance, and cost-effective data analytics solutions to empower customers with data-driven decision-making. In his free time, he enjoys meditation and cooking.

Introducing AWS Virtual Waiting Room

2022-02-10 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/introducing-aws-virtual-waiting-room/

This post is written by Justin Pirtle, Principal Solutions Architect, Joan Morgan, Software Developer Engineer, and Jim Thario, Software Developer Engineer.

Today, AWS is introducing an official AWS Virtual Waiting Room solution. You can integrate this new, open-source solution with existing web and mobile applications. It can help buffer users during times of peak demand and sudden bursts of traffic, preventing systems from resource exhaustion.

Events commonly use virtual waiting rooms where there is either unknown demand or expected large bursts of traffic. Examples of such events include concert ticket sales, Black Friday promotions, COVID-19 vaccine registrations, and more. Virtual waiting rooms allow a quota of users to view, select, and complete their transactions directly. They shield the application’s backend environment from traffic by buffering users in a waiting room until it is their turn in line.

Like any real-life queuing system, a user enters the AWS Virtual Waiting Room and requests a number in line. After receiving a number corresponding to the unique device ID, the browser then polls regularly for updates. The update provides the current number being served and anticipated time until they are front of line.

After reaching the front of the line, the user can exchange the number and device ID for a secure session token. This is included with their downstream requests to authenticate users securely.

If a user discovers the backend endpoint and tries to send requests, they are redirected into the waiting room. The API requests are denied access until they have a valid token. This prevents the backend from needing to scale to accommodate all users at a single time.

Integrating the AWS Virtual Waiting Room into your application

Integration steps depend on the integration pattern for your application. You can decide if all users are routed through the waiting room or only during periods of excessive traffic. You can also choose to protect only the web host serving the backend webpages or one or more APIs powering backend commerce services.

There are four common patterns supported for integrating the waiting room into your application:

Upstream redirection of all traffic from the main target site to flow through AWS Virtual Waiting Room. This option sends all user traffic through the waiting room with the initial capacity of users permitted to the protected system. The traffic passes through transparently, then it buffers the remaining users. It admits new users as capacity becomes available. The target system is only accessible by users who pass through the waiting room.
Downstream redirection to the virtual waiting room from the target site. This option sends all traffic to the target site. The target site conditionally redirects requests that need to enter the waiting room. No DNS or upstream modifications are needed. The target site must be able to handle the initial user requests and redirection responses.
Direct target site API integration for buffering users from an existing website without any redirection. Your web or mobile application integrates the virtual waiting room at the API-level. This does not need any redirection to a different waiting room endpoint or site. This can offer a seamless user experience but may require more development for the integration.
OpenID Connect (OIDC) adapter. This option offers no-code native integration of the waiting room with OpenID Connect-enabled system components, such as the AWS Application Load Balancer (ALB). Users are redirected by the load balancer or similar component to the waiting room. They are buffered until issued a signed, time-limited JSON Web Token (JWT). Once the user’s JWT token is issued, the load balancer then forwards user requests to the target backend systems.

Overview of the AWS Virtual Waiting Room solution

The AWS Virtual Waiting Room solution implementation includes three main components:

Core APIs. The main resources deployed include two Amazon API Gateway deployments, a VPC, several AWS Lambda functions, an Amazon DynamoDB table, and an Amazon ElastiCache cluster. This API provides the basic mechanisms for tracking clients entering the waiting room. It requests status of the line progression and an authentication token to enter the target protected site.
Waiting room front-end website. The waiting room static site is shown to users awaiting their turn. This site dynamically updates the position being served and their place in line on a configurable interval. You customize this site’s HTML, CSS, and JavaScript to match your frontend styling and theme.
Lambda authorizer for protected target system. The Lambda authorizer wraps and protects the downstream protected target system’s APIs. This ensures that all user invocations have a validated time-limited token issued by the waiting room core API. It helps to prevent users from bypassing the waiting room.

The Virtual Waiting Room CloudFormation template deploys the following infrastructure:

An Amazon CloudFront distribution to deliver public API calls for the client.
Amazon API Gateway public API resources to process queue requests from the virtual waiting room, track the queue position, and support validation of tokens that allow access to the target website.
An Amazon Simple Queue Service (Amazon SQS) queue to regulate traffic to the AWS Lambda function that processes the queue messages. Instead of invoking the Lambda function for each request, the SQS queue batches the incoming bursts of requests.
API Gateway private API resources to support administrative functions.
Lambda functions to validate and process public and private API requests, and return the appropriate responses.
Amazon Virtual Private Cloud (VPC) to host the Lambda functions that interact directly with the Amazon ElastiCache for Redis cluster. VPC endpoints allow Lambda functions in the VPC to communicate with services within the solution.
An Amazon CloudWatch rule to invoke a Lambda function that works with a custom Amazon EventBridge bus to periodically broadcast status updates.
An Amazon DynamoDB table to store token data.
AWS Secrets Manager to store keys for token operations and other sensitive data.
(Optional) Authorizer component consisting of an AWS Identity and Access Management (IAM) role and a Lambda function to validate signatures for your API calls. The only requirement for the authorizer to protect your API is to use API Gateway.
(Optional) Amazon Simple Notification Service (Amazon SNS), CloudWatch, and Lambda functions to support two inlet strategies.
(Optional) OpenID adaptor component with API Gateway and Lambda functions to allow an OpenID provider to authenticate users to your website. CloudFront distribution with an Amazon Simple Storage Service (Amazon S3) bucket for the waiting room page for this component.
(Optional) A CloudFront distribution with Amazon S3 origin bucket for the optional sample waiting room web application.

Deploying the AWS Virtual Waiting Room

To get started with the AWS Virtual Waiting Room, deploy the Getting Started stack. This deploys the Core APIs stack, the Authorizers stack, and a sample application CloudFormation stack:

Launch the Getting Started CloudFormation stack. The template launches in the US East (N. Virginia) Region by default. To launch the solution in a different AWS Region, use the Region selector in the console navigation bar.
On the Create stack page, verify that the correct template URL is in the Amazon S3 URL text box and choose Next.
On the Specify stack details page, assign a name to your solution stack, and accept all default parameter values. For information about naming character limitations, refer to IAM and STS Limits in the AWS Identity and Access Management User Guide. Choose Next.
On the Configure stack options page, choose Next.
On the Review page, review and confirm the settings. Check the box acknowledging that the template creates AWS Identity and Access Management (IAM) resources.
Choose Create stack to deploy the stack.
You can view the status of the stack in the AWS CloudFormation Console in the Status column. You should receive a CREATE_COMPLETE status in approximately 30 minutes.
Once successfully deployed, browse to the Outputs tab.
Copy the ControlPanelURL and WaitingRoomURL to a scratch pad file for later use.

Configuring the AWS Virtual Waiting Room

After deploying the three stacks, test the waiting room using the sample application:

Navigate to the IAM console. Create a new IAM user or select an existing IAM user in the same account where you deployed the waiting room stack.
Grant the selected IAM user programmatic access. Download the key file or copy the access key ID and secret access key values to your scratch pad for later use.
Add the IAM user to the ProtectedAPIGroup IAM user group created by the getting started template:
Open the control panel in a new tab or browser window using the ControlPanelURL output you saved earlier.
In the control panel, expand the Configuration section.
Enter the access key ID and secret access key that you retrieved in Generate AWS keys to call the IAM secured APIs. The endpoints and event ID are filled in from the URL parameters.
Choose Use. The button activates after you have supplied the credentials.
You now see the status “Connected” shown following the various metrics reported:

Test the sample waiting room

Browse to the sample waiting room in a new browser tab. Use the WaitingRoomURL you captured previously from the CloudFormation stack output values.
Select Reserve to enter the waiting room. If you are unable to proceed with your transaction, your assigned number is not yet reached.
Navigate back to the browser tab with the control panel.
Under Increment Serving Counter, select Change. This manually increments the serving counter and allows 100 users to move on from the waiting room to the target site.
Navigate back to the waiting room and choose Check out now! You are redirected to the target site since your serving number is eligible to proceed beyond the waiting room.
Select Purchase now to finish your transaction at the target site. This page represents the protected system beyond the waiting room. Replace this with the actual system users you are protecting.
After the simulated purchase is complete, you can see that the transaction is successful. This transaction is authorized using the time-limited authorization token, which came from the waiting room previously. If a user bypasses the waiting room, they would not be successful in completing a transaction.

Customizing the AWS Virtual Waiting Room for your application

The sample browser client demonstrates an entire user flow frontend with the AWS Virtual Waiting Room flow for an ecommerce purchase. You can use this code as a starting point for your waiting room or reference the API communication code for integrating the waiting room into your existing website.

This sample code is built with Vue.js and Bootstrap to render the user interface. It uses the Axios and Axios-Retry packages to make API calls to the virtual waiting room stack. The sample code uses the Axios-Retry package to show how to handle throttling conditions and exponential backoff in high-traffic situations.

The control panel client is used to make requests to the private waiting room API that requires IAM-based authorization. The control panel client demonstrates how to construct and sign requests to the private API. It can be used in production or customized further. All of the sample source code room source is available in GitHub including the sample user client and control panel client.

Conclusion

The AWS Virtual Waiting Room solution is available today at no additional cost, provided as open source under the Apache 2 license. It supports customized integration with any front-end application via a variety of integration techniques. You can also customize how and when the waiting room allows users to progress into the protected target system using a variety of strategies.

To learn more about the AWS Virtual Waiting Room solution, visit the solution implementation and implementation guide.

For more serverless learning resources, visit Serverless Land.

Cloudflare acquires Vectrix to expand Zero Trust SaaS security

2022-02-10 Corey Mahan

Post Syndicated from Corey Mahan original https://blog.cloudflare.com/cloudflare-acquires-vectrix-to-expand-zero-trust-saas-security/

Cloudflare acquires Vectrix to expand Zero Trust SaaS security

We are excited to share that Vectrix has been acquired by Cloudflare!

Vectrix helps IT and security teams detect security issues across their SaaS applications. We look at both data and users in SaaS apps to alert teams to issues ranging from unauthorized user access and file exposure to misconfigurations and shadow IT.

We built Vectrix to solve a problem that terrified us as security engineers ourselves: how do we know if the SaaS apps we use have the right controls in place? Is our company data protected? SaaS tools make it easy to work with data and collaborate across organizations of any size, but that also makes them vulnerable.

The growing SaaS security problem

The past two years have accelerated SaaS adoption much faster than any of us could have imagined and without much input on how to secure this new business stack.

Google Workspace for collaboration. Microsoft Teams for communication. Workday for HR. Salesforce for customer relationship management. The list goes on.

With this new reliance on SaaS, IT and security teams are faced with a new set of problems like files and folders being made public on the Internet, external users joining private chat channels, or an employee downloading all customer data from customer relationship tools.

The challenge of securing users and data across even a handful of applications, each with its own set of security risks and a unique way of protecting it, is overwhelming for most IT and security teams. Where should they begin?

One platform, many solutions

Enter the API-driven Cloud Access Security Broker (CASB). We think about an API-driven CASB as a solution that can scan, detect, and continuously monitor for security issues across organization-approved, IT-managed SaaS apps like Microsoft 365, ServiceNow, Zoom, or Okta.

CASB solutions help teams with:

Data security – ensuring the wrong file or folder is not shared publicly in Dropbox.
User activity – alerting to suspicious user permissions changing in Workday at 2:00 AM.
Misconfigurations – keeping Zoom Recordings from becoming publicly accessible.
Compliance – tracking and reporting who modified Bitbucket branch permissions.
Shadow IT – detecting users that signed up for an unapproved app with their work email.

Securing SaaS applications starts with visibility into what users and data reside in a service, and then understanding how they’re used. From there, protective and preventive measures, within the SaaS application and on the network, can be used to ensure data stays safe.

It’s not always the extremely complex things either. A really good example of this came from an early Vectrix customer who asked if we could detect public Google Calendars for them. They recently had an issue where someone on the team had shared their calendar which contained several sensitive meeting links and passcodes. They would have saved themselves a headache if they could have detected this prior, and even better, been able to correct it in a few clicks.

In this SaaS age something as innocent as a calendar invite can introduce risks that IT and security teams now have to think about. This is why we’re excited to grow further at Cloudflare, helping more teams stay one step ahead.

Ridiculously easy setup

A core component of an API-first approach is the access system, which powers integrations via an OAuth 2.0 or vendor marketplace app to authorize secure API access into SaaS services. This means the API-driven CASB works out of band, or not in the direct network path, and won’t cause any network slowdowns or require any network configuration changes.

In just a few clicks, you can securely integrate with SaaS apps from anywhere—no agents, no installs, no downloads.

Over a cup of coffee an IT or security system administrator can connect their company’s critical SaaS apps and start getting visibility into data and user activity right away. In fact, we usually see no more than 15 minutes pass from creating an account to the first findings being reported.

The more, the merrier

By integrating with more and more organization-approved SaaS application patterns that may otherwise not be visible start to emerge.

For example, being alerted that Sam attempted to disable two-factor authentication in multiple SaaS applications may indicate a need for more security awareness training. Or being able to detect numerous users granting sensitive account permissions to an unapproved third-party app could indicate a possible phishing attempt.

The more integrations you protect the better your overall SaaS security becomes.

Better together in Zero Trust

The entire Vectrix team has joined Cloudflare and will be integrating API-driven CASB functionality into the Cloudflare Zero Trust platform, launching later this year.

This means an already impressive set of growing products like Access (ZTNA), Gateway (SWG), and Browser Isolation, will be getting even better, together. Even more exciting though, is that using all of these services will be a seamless experience, managed from a unified Zero Trust platform and dashboard.

A few examples of what we’re looking forward to growing together are:

Shadow IT: use Gateway to detect all your SaaS apps in use, block those that are unapproved, and use CASB to ensure your data stays safe in sanctioned ones.
Secure access: use Access to ensure only users who match your device policies will be allowed into SaaS apps and CASB to ensure the SaaS app stays configured only for your approved authentication method.
Data control: use Browser Isolation’s input controls to prevent users from copy/pasting or printing data and CASB to ensure the data isn’t modified to be shared publicly from within the SaaS app itself for total control.

What’s next?

Vectrix will be integrated into the Cloudflare Zero Trust platform to extend the security of Cloudflare’s global network to the data stored in SaaS applications from a single control plane.

If you’d like early beta access, please click here to join the waitlist. We will send invites out in the sign-up order we received them. You can learn more about the acquisition here.

Adding a CASB to Cloudflare Zero Trust

2022-02-10 Sam Rhea

Post Syndicated from Sam Rhea original https://blog.cloudflare.com/cloudflare-zero-trust-casb/

Adding a CASB to Cloudflare Zero Trust

Earlier today, Cloudflare announced that we have acquired Vectrix, a cloud-access security broker (CASB) company focused on solving the problem of control and visibility in the SaaS applications and public cloud providers that your team uses.

We are excited to welcome the Vectrix team and their technology to the Cloudflare Zero Trust product group. We don’t believe a CASB should be a point solution. Instead, the features of a CASB should be one component of a comprehensive Zero Trust deployment. Each piece of technology, CASB included, should work better together than they would as a standalone product.

We know that this migration is a journey for most customers. That’s true for our own team at Cloudflare, too. We’ve built our own Zero Trust platform to solve problems for customers at any stage of that journey.

Start by defending the resources you control

Several years ago, we protected the internal resources that Cloudflare employees needed by creating a private network with hardware appliances. We deployed applications in a data center and made them available to this network. Users inside the San Francisco office connected to a secure Wi-Fi network that placed them on the network.

For everyone else, we punched a hole in that private network and employees pretended they were in the office by using Virtual Private Network (VPN) clients on their device. We had created a castle-and-moat by attempting to extend the walls of the San Francisco office to the rest of the world.

Our Security team hated this. Once authenticated to the VPN client, a user could generally connect to any destination on our private network – the network trusted them by default. We lacked segmentation over who could reach what resource. Just as terrifying, we had almost no visibility into what was happening inside the network.

One option would have been to build out a traditional segmented network with internal firewalls and a configuration nightmare keeping VPN appliances, firewalls and servers synchronized. We knew that there was a better, more flexible, more modern way.

We built the first product in Cloudflare One, Cloudflare Access, to solve these problems. Cloudflare Access uses our global network to check every request or connection for identity, group membership, device posture, multifactor method and more to determine if it should be allowed. Organizations can build rules that are specific to applications or IP addresses on a private network that runs on Cloudflare. Cloudflare Access also logs every request and connection, providing high-visibility with low-effort.

This migration changed our security model at Cloudflare. We also never had to compromise performance thanks to Cloudflare’s global network and Application Performance products. Decisions about who is allowed are made milliseconds away from the user in data centers in over 250+ cities around the world. For web applications, Cloudflare Access runs in-line with our WAF and works out-of-the-box with our load balancers. Cloudflare’s network accelerates requests and packets, connecting users to the tools they need even faster.

Cloudflare Access let us and thousands of other teams deprecate the legacy VPN security model, but the rest of the Internet posed a different kind of challenge—how do we keep our users, and their devices and data, safe from attack?

Next, protect your team from the rest of the Internet

The public Internet allows just about anyone to connect either as a user or a host. That openness is both powerful and terrifying. When employees on corporate devices need to use the rest of the Internet, they run a risk of encountering phishing websites, malware hosts, and other attempts to steal data and compromise businesses.

Historically, organizations relied on a similar castle-and-moat approach. They backhauled user traffic to any destination on the Internet through a centralized data center. Inside that data center, IT departments installed and monitored physical appliances to provide security like network firewalls, proxies, and secure web gateways.

This model worked fine when employees only needed to connect to the public Internet occasionally. Most work was performed on the desktop in front of the user. When companies began moving to SaaS applications hosted by other teams, and employees spent the majority of their day on the Internet, this security framework fell apart.

User experience suffered when all traffic had to first reach a distant security appliance. IT and Security teams had to maintain and patch appliances while struggling to scale up or down. The cost of backhauling traffic over MPLS links erased the financial savings gained by migrating to SaaS applications on the Internet.

Cloudflare Gateway turns Cloudflare’s network in the other direction to protect users as they connect out to the rest of the Internet. Instead of backhauling traffic to a centralized location, users connect to a nearby Cloudflare data center where we apply one or more layers of security filtering and logging before accelerating their traffic to its final destination.

Customers can choose how they want to start this journey. Cloudflare operates the world’s fastest DNS resolver, on top of which we’ve built DNS filtering powered by the intelligence we collect from handling so much of the Internet every day. Other customers decide to begin by ripping out their network firewall appliances and moving that functionality into Cloudflare’s network by connecting roaming users or entire offices and data centers to Cloudflare.

As threats become more advanced, Cloudflare’s Secure Web Gateway inspects HTTPS traffic for malware hiding in file downloads or the accidental loss of data to unapproved SaaS services. Cloudflare’s Browser Isolation service adds another layer of threat protection by running the browser in our network instead of on the user device. With Cloudflare Gateway and Browser Isolation, security teams also can apply granular data loss control to traffic as it flows through our network—from stopping file uploads to blocking copy-and-paste in the web page itself.

Now, control the data and configurations in your SaaS applications

At this point in a Zero Trust journey, your team can control how users access critical resources and how you keep those users and their data safe from external attack. Both of these require control of the network—inspecting traffic as it leaves devices in your organization or as it arrives in your infrastructure. That leaves one piece missing. As more of your data lives in SaaS applications outside your control, how do you maintain a consistent level of filtering, logging, and auditing?

The Cloudflare Zero Trust platform released many features in the last year to help customers solve this problem and the broader range of “CASB” challenges. First, we built a feature that allows your team to force logins to your SaaS applications through Cloudflare’s Secure Web Gateway where you can control rules and visibility. Next, we used the data from the Secure Web Gateway to provide your team with a comprehensive Shadow IT report to discover what applications your team is using and what they should be using.

Customers use the Shadow IT report in particular to begin building rules to block access to unapproved SaaS applications, or to block actions like file uploads to specific unapproved SaaS applications, but the collaboration available in these tools becomes a risk to your organization.

It’s easy to be a single-click away from a data breach. We could share a document with the public Internet instead of our team. We could leave an S3 bucket unprotected. We could invite the wrong users to a private GitHub repository or install a malicious plugin to our email system. The data-at-rest in these SaaS applications is vulnerable to new types of attacks.

Some of these applications have tried to solve this problem in their own space, but the rapid adoption of SaaS applications and the struggle to configure each separately led to thousands of wasted hours in security teams. The Vectrix founders talked with teams who had to dedicate full-time employees just to manually configure and check permission settings and logs. So they built a better answer.

Vectrix scans the SaaS applications that your team uses to detect anomalies in configuration, permissions, and sharing. Each SaaS application is different – the risks vary from a Google Sheet that is made public to leaked secrets in GitHub – and Vectrix gives customers a single place to control and audit those types of events.

Why Vectrix?

To solve this problem for our customers, we evaluated options including building our own API-driven CASB solution and talking to other companies in this space. Vectrix became the best option after evaluating them against the priorities we have for this group of products.

The Vectrix team is customer obsessed

Vectrix mission focuses on giving organizations of any size, including those without a large security team, “simple, straightforward security scans that anyone can use…” By making the solution accessible and easy to use, Vectrix reduces the barrier to security.

We share that same goal. Cloudflare exists to help build a better Internet. That starts with an Internet made safer by making security tools accessible to anyone. From offering SSL certificates at no cost to any customer to making Zero Trust product group available at no cost to teams of up to 50 users, we are obsessed with helping our customers solve problems previously out of their reach.

Their technology delivers value faster

One of the original pitches of Cloudflare’s Application Security and Performance products was set up that could be completed in less than five minutes. We know that the cost to deploy a new service, especially for smaller teams, can mean that organizations delay making security and performance improvements.

We don’t think that customers should have to compromise and neither does Vectrix. The Vectrix product focuses on delivering immediate value in less than five minutes after the two or three clicks required to configure the first scan of a SaaS application. Customers can begin to flag risks in their organization in a matter of minutes without the need for a complex deployment.

1+1=3 in terms of value for our customers when used with our existing Zero Trust products

The Vectrix product will not be inserted as a point solution add-on. We’re making it a core part of our Zero Trust bundle because integrating features from products like our Secure Web Gateway give customers a comprehensive solution that works better together.

What’s next?

We’re excited to welcome Vectrix to the Cloudflare team. You can learn more about why they decided to join Cloudflare in this blog post published today.

We have already started migrating their services to the Cloudflare global network and plan to open sign-ups for a beta in the next couple of months. If you are interested, please sign up here. Don’t let the beta delay the start of your own journey with these products—we’ll be inviting users off of the waitlist based on when they first started deploying Cloudflare’s Zero Trust products.

Amazon Redshift at AWS re:Invent 2021 recap

2022-02-10 Sunaina Abdul Salah

Post Syndicated from Sunaina Abdul Salah original https://aws.amazon.com/blogs/big-data/amazon-redshift-at-aws-reinvent-2021-recap/

The annual AWS re:Invent learning conference is an exciting time full of new product and program launches. At the first re:Invent conference in 2012, AWS announced Amazon Redshift. Since then, tens of thousands of customers have started using Amazon Redshift as their preferred cloud data warehouse. At re:Invent 2021, AWS announced several new Amazon Redshift features that bring easy analytics for everyone while continuing to increase performance and help you break through data silos to analyze all the data in your data warehouse. With re:Invent packed with information and new announcements, it’s easy to miss the best of the updates in Amazon Redshift. In this post, we summarize these announcements, along with resources for you to get more details.

AWS takes analytics serverless

Adam Selipsky took the re:Invent keynote stage on November 29, 2021, for the first time as AWS CEO, announcing a string of innovations along the theme of pathfinders. He shared the story about Florence Nightingale, a pathfinder and data geek who had a passion for statistics, who collected and analyzed data on sanitation impact on mortality rates to mobilize the army at the time to approve new hygiene standards and restructure health efforts. The theme set the stage for the AWS modern data strategy, which enables everyone in the organization to find patterns in data and mobilize their business with data with the right tools for the right job. Selipsky’s announcement of serverless options for four AWS Analytics services, including Amazon Redshift, emphasized the growing need to be able to run complex analytics without touching infrastructure or managing capacity for your applications. Watch Adam Selipsky’s keynote announcement of AWS Analytics services with new serverless options (00:52).

Amazon Redshift: Under the hood

This year, VP of AWS Machine Learning Services Swami Sivasubramanian expanded his keynote to include the entire data and machine learning (ML) journey for all workloads and data types. In addition to mentioning the new serverless option announcement for Amazon Redshift in preview, he dove under the hood of Amazon Redshift to explore core innovations since its inception in 2012 that drove the service to become a best-in-class, petabyte-scale data warehouse for tens of thousands of customers. Sivasubramanian touched on features that enable Amazon Redshift to power large workloads with top-notch performance, including RA3 instances, AQUA (Advanced Query Accelerator), materialized views, short query acceleration, and automatic workload management. He also elaborated on how Amazon Redshift ML uses SQL to make ML predictions from your data warehouse. Neeraja Rentachintala then took you through a demo set in a gaming environment to outline the value of Amazon Redshift Serverless and Amazon QuickSight Q.

Reinvent your business for the future with AWS Analytics

Rahul Pathak, VP of AWS Analytics Services, talked in detail about how you can put your data to work and transform your businesses through end-to-end AWS Analytics services that help you modernize, unify, and innovate. He talked about how customers like Zynga, Schneider Electric, Magellan Rx, Jobcase, and Nasdaq are benefitting from Amazon Redshift with innovations in performance as data volumes grow. He elaborated on how Amazon Redshift helps you analyze all your data with AWS service integration, and touched upon the quest to make analytics easy for everyone with the introduction of the new Query Editor v2, Amazon Redshift Serverless, and data sharing capabilities. Watch the session to gain a deeper understanding of AWS Analytics services.

What’s new with Amazon Redshift, featuring Schneider Electric

This session is a must-watch for every existing and new Amazon Redshift customer to get a full understanding of the breadth and depth of features that Amazon Redshift offers along the dimensions of easy analytics for everyone, analyze all your data, and performance at scale. Eugene Kawamoto, Director of Amazon Redshift Product, goes into detail about all the new launches in 2021, tracing the architectural evolution of Amazon Redshift to the new serverless option. He explores how Amazon Redshift integrates with other popular AWS services to help you break through data silos, analyze all your data, and derive value from this data.

Democratizing data for self-service analytics and ML

Access to all your data for fast analytics at scale is foundational for 360-degree projects involving data engineers, database developers, data analysts, data scientists, business intelligence professionals, and the line of business. In this session, Greg Khairallah and Shruti Worlikar, leaders in the AWS Analytics GTM organization, team up with our customer Jobcase, represented by Senior Scientist Clay Martin, to show how easy-to-use ML can help your organization imagine new products or services, transform your customer experiences, streamline your business operations, and improve your decision-making. A secure, integrated platform that’s easy to use and supports nonproprietary data formats can improve collaboration through data sharing and also improve customer responsiveness.

Introducing Amazon Redshift Serverless

Following the announcements in the keynotes, this session takes you through the new serverless option on Amazon Redshift, which enables you to get started in seconds and run data warehousing and analytics workloads at scale without worrying about data warehouse management. In this session, learn from Yan Leshinsky, VP of Amazon Redshift, and Neeraja Rentachintala, Principal Product Manager for Amazon Redshift, on how Amazon Redshift Serverless automatically provisions data warehouse capacity and intelligently scales the underlying resources to deliver consistently high performance and simplified operations for even the most demanding and volatile workloads.

Introduction to AWS Data Exchange for Amazon Redshift

We’ve talked about Amazon Redshift Serverless several times, but there were some exciting announcements from the service leading into re:Invent. We launched AWS Data Exchange for Amazon Redshift, which allows you to combine third-party data found on AWS Data Exchange with your own data from your Amazon Redshift cloud data warehouse, requiring no ETL and accelerating time to value. This provides a powerful enhancement to the strong data-sharing capabilities in Amazon Redshift to share secure, live data across Regions, accounts, and organizations. In this session, product managers Neeraja Rentachintala and Ryan Waldorf walk through the value of this integration and how to access and analyze a provider’s data with data providers, which enables licensing this access to their Amazon Redshift cloud data warehouses. Alex Bohl, Director of Data Innovation from Mathematica, joins them in this session to provide a real-world example.

Additional sessions

In addition to these sessions, the hands-on-workshops and chalk talks were packed with customers looking to learn more about Amazon Redshift’s capabilities in ML with Redshift ML, concurrency scaling, and much more. These sessions were not recorded.

Get started with Amazon Redshift

Learn more about the latest and greatest of what Amazon Redshift offers you, and explore the following resources for more information about new releases:

What’s new in Amazon Redshift – 2021, a year in review
Amazon Redshift Serverless (in preview) – AWS News Blog, What’s new, LinkedIn announcement
Amazon Redshift Query Editor v2 – AWS News Blog, What’s new, Demo
AWS Data Exchange for Amazon Redshift – AWS News Blog, What’s new
Redshift ML – AWS News Blog, What’s new, Case study
Amazon Redshift data sharing – Featuring cross-region and cross-account data sharing

About the Author

Sunaina Abdul Salah leads product marketing for Amazon Redshift.

Creating computing quotas on AWS Outposts rack with EC2 Capacity Reservations sharing

2022-02-10 Rachel Zheng

Post Syndicated from Rachel Zheng original https://aws.amazon.com/blogs/compute/creating-computing-quotas-on-aws-outposts-rack-with-ec2-capacity-reservation-sharing/

This post is written by Yi-Kang Wang, Senior Hybrid Specialist SA.

AWS Outposts rack is a fully managed service that delivers the same AWS infrastructure, AWS services, APIs, and tools to virtually any on-premises datacenter or co-location space for a truly consistent hybrid experience. AWS Outposts rack is ideal for workloads that require low latency access to on-premises systems, local data processing, data residency, and migration of applications with local system interdependencies. In addition to these benefits, we have started to see many of you need to share Outposts rack resources across business units and projects within your organization. This blog post will discuss how you can share Outposts rack resources by creating computing quotas on Outposts with Amazon Elastic Compute Cloud (Amazon EC2) Capacity Reservations sharing.

In AWS Regions, you can set up and govern a multi-account AWS environment using AWS Organizations and AWS Control Tower. The natural boundaries of accounts provide some built-in security controls, and AWS provides additional governance tooling to help you achieve your goals of managing a secure and scalable cloud environment. And while Outposts can consistently use organizational structures for security purposes, Outposts introduces another layer to consider in designing that structure. When an Outpost is shared within an Organization, the utilization of the purchased capacity also needs to be managed and tracked within the organization. The account that owns the Outpost resource can use AWS Resource Access Manager (RAM) to create resource shares for member accounts within their organization. An Outposts administrator (admin) can share the ability to launch instances on the Outpost itself, access to the local gateways (LGW) route tables, and/or access to customer-owned IPs (CoIP). Once the Outpost capacity is shared, the admin needs a mechanism to control the usage and prevent over utilization by individual accounts. With the introduction of Capacity Reservations on Outposts, we can now set up a mechanism for computing quotas.

Concept of computing quotas on Outposts rack

In the AWS Regions, Capacity Reservations enable you to reserve compute capacity for your Amazon EC2 instances in a specific Availability Zone for any duration you need. On May 24, 2021, Capacity Reservations were enabled for Outposts rack. It supports not only EC2 but Outposts services running over EC2 instances such as Amazon Elastic Kubernetes (EKS), Amazon Elastic Container Service (ECS) and Amazon EMR. The computing power of above services could be covered in your resource planning as well. For example, you’d like to launch an EKS cluster with two self-managed worker nodes for high availability. You can reserve two instances with Capacity Reservations to secure computing power for the requirement.

Here I’ll describe a method for thinking about resource pools that an admin can use to manage resource allocation. I’ll use three resource pools, that I’ve named reservation pool, bulk and attic, to effectively and extensibly manage the Outpost capacity.

A reservation pool is a resource pool reserved for a specified member account. An admin creates a Capacity Reservation to match member account’s need, and shares the Capacity Reservation with the member account through AWS RAM.

A bulk pool is an unreserved resource pool that is used when member accounts run out of compute capacity such as EC2, EKS, or other services using EC2 as underlay. All compute capacity in the bulk pool can be requested to launch until it is exhausted. Compute capacity that is not under a reservation pool belongs to the bulk pool by default.

An attic is a resource pool created to hold the compute capacity that the admin wouldn’t like to share with member accounts. The compute capacity remains in control by admin, and can be released to the bulk pool when needed. Admin creates a Capacity Reservation for the attic and owns the Capacity Reservation.

The following diagram shows how the admin uses Capacity Reservations with AWS RAM to manage computing quotas for two member accounts on an Outpost equipped with twenty-four m5.xlarge. Here, I’m going to break the idea into several pieces to help you understand easily.

There are three Capacity Reservations created by admin. CR #1 reserves eight m5.xlarge for the attic, CR #2 reserves four m5.xlarge instances for account A and CR #3 reserves six m5.xlarge instances for account B.
The admin shares Capacity Reservation CR #2 and CR #3 with account A and B respectively.
Since eighteen m5.xlarge instances are reserved, the remaining compute capacity in the bulk pool will be six m5.xlarge.
Both Account A and B can continue to launch instances exceeding the amount in their Capacity Reservation, by utilizing the instances available to everyone in the bulk pool.

Once the bulk pool is exhausted, account A and B won’t be able to launch extra instances from the bulk pool.
The admin can release more compute capacity from the attic to refill the bulk pool, or directly share more capacity with CR#2 and CR#3. The following diagram demonstrates how it works.

Based on this concept, we realize that compute capacity can be securely and efficiently allocated among multiple AWS accounts. Reservation pools allow every member account to have sufficient resources to meet consistent demand. Making the bulk pool empty indirectly sets the maximum quota of each member account. The attic plays as a provider that is able to release compute capacity into the bulk pool for temporary demand. Here are the major benefits of computing quotas.

Centralized compute capacity management
Reserving minimum compute capacity for consistent demand
Resizable bulk pool for temporary demand
Limiting maximum compute capacity to avoid resource congestion.

Configuration process

To take you through the process of configuring computing quotas in the AWS console, I have simplified the environment like the following architecture. There are four m5.4xlarge instances in total. An admin account holds two of the m5.4xlarge in the attic, and a member account gets the other two m5.4xlarge for the reservation pool, which results in no extra instance in the bulk pool for temporary demand.

Prerequisites

The admin and the member account are within the same AWS Organization.
The Outpost ID, LGW and CoIP have been shared with the member account.

Creating a Capacity Reservation for the member account

Sign in to AWS console of the admin account and navigate to the AWS Outposts page. Select the Outpost ID you want to share with the member account, choose Actions, and then select Create Capacity Reservation. In this case, reserve two m5.4xlarge instances.

In the Reservation details, you can terminate the Capacity Reservation by manually enabling or selecting a specific time. The first option of Instance eligibility will automatically count the number of instances against the Capacity Reservation without specifying a reservation ID. To avoid misconfiguration from member accounts, I suggest you select Any instance with matching details in most use cases.

Sharing the Capacity Reservation through AWS RAM

Go to the RAM page, choose Create resource share under Resource shares page. Search and select the Capacity Reservation you just created for the member account.

Choose a principal that is an AWS ID of the member account.

Creating a Capacity Reservation for attic

Create a Capacity Reservation like step 1 without sharing with anyone. This reservation will just be owned by the admin account. After that, check Capacity Reservations under the EC2 page, and the two Capacity Reservations there, both with availability of two m5.4xlarge instances.

Launching EC2 instances

Log in to the member account, select the Outpost ID the admin shared in step 2 then choose Actions and select Launch instance. Follow AWS Outposts User Guide to launch two m5.4xlarge on the Outpost. When the two instances are in Running state, you can see a Capacity Reservation ID on Details page. In this case, it’s cr-0381467c286b3d900.

So far, the member account has run out of two m5.4xlarge instances that the admin reserved for. If you try to launch the third m5.4xlarge instance, the following failure message will show you there is not enough capacity.

Allocating more compute capacity in bulk pool

Go back to the admin console, select the Capacity Reservation ID of the attic on EC2 page and choose Edit. Modify the value of Quantity from 2 to 1 and choose Save, which means the admin is going to release one more m5.4xlarge instance from the attic to the bulk pool.

Launching more instances from bulk pool

Switch to the member account console, and repeat step 4 but only launch one more m5.4xlarge instance. With the resource release on step 5, the member account successfully gets the third instance. The compute capacity is coming from the bulk pool, so when you check the Details page of the third instance, the Capacity Reservation ID is blank.

Cleaning up

Terminate the three EC2 instances in the member account.
Unshare the Capacity Reservation in RAM and delete it in the admin account.
Unshare the Outpost ID, LGW and CoIP in RAM to get the Outposts resources back to the admin.

Conclusion

In this blog post, the admin can dynamically adjust compute capacity allocation on Outposts rack for purpose-built member accounts with an AWS Organization. The bulk pool offers an option to fulfill flexibility of resource planning among member accounts if the maximum instance need per member account is unpredictable. By contrast, if resource forecast is feasible, the admin can revise both the reservation pool and the attic to set a hard limit per member account without using the bulk pool. In addition, I only showed you how to create a Capacity Reservation of m5.4xlarge for the member account, but in fact an admin can create multiple Capacity Reservations with various instance types or sizes for a member account to customize the reservation pool. Lastly, if you would like to securely share Amazon S3 on Outposts with your member accounts, check out Amazon S3 on Outposts now supports sharing across multiple accounts to get more details.

The Experiment Podcast: The Salted Wounds of the SPAM Labor Strike

2022-02-10 The Atlantic

Post Syndicated from The Atlantic original https://www.youtube.com/watch?v=QrMJChDPg-I

Optimize AI/ML workloads for sustainability: Part 1, identify business goals, validate ML use, and process data

2022-02-10 Benoit de Chateauvieux

Post Syndicated from Benoit de Chateauvieux original https://aws.amazon.com/blogs/architecture/optimize-ai-ml-workloads-for-sustainability-part-1-identify-business-goals-validate-ml-use-and-process-data/

Training artificial intelligence (AI) services and machine learning (ML) workloads uses a lot of energy—and they are becoming bigger and more complex. As an example, the Carbontracker: Tracking and Predicting the Carbon Footprint of Training Deep Learning Models study estimates that a single training session for a language model like GPT-3 can have a carbon footprint similar to traveling 703,808 kilometers by car.

Although ML uses a lot of energy, it is also one of the best tools we have to fight the effects of climate change. For example, we’ve used ML to help deliver food and pharmaceuticals safely and with much less waste, reduce the cost and risk involved in maintaining wind farms, restore at-risk ecosystems, and predict and understand extreme weather.

In this series of three blog posts, we’ll provide guidance from the Sustainability Pillar of the AWS Well-Architected Framework to reduce the carbon footprint of your AI/ML workloads.

This first post follows the first three phases provided in the Well-Architected machine learning lifecycle (Figure 1):

Business goal identification
ML problem framing
Data processing (data collection, data preprocessing, feature engineering)

You’ll learn best practices for each phase to help you review and refine your workloads to maximize utilization and minimize waste and the total resources deployed and powered to support your workload.

Figure 1. ML lifecycle

Business goal identification

Define the overall environmental impact or benefit

Measure your workload’s impact and its contribution to the overall sustainability goals of the organization. Questions you should ask:

How does this workload support our overall sustainability mission?
How much data will we have to store and process? What is the impact of training the model? How often will we have to re-train?
What are the impacts resulting from customer use of this workload?
What will be the productive output compared with this total impact?

Asking these questions will help you establish specific sustainability objectives and success criteria to measure against in the future.

ML problem framing

Identify if ML is the right solution

Always ask if AI/ML is right for your workload. There is no need to use computationally intensive AI when a simpler, more sustainable approach might succeed just as well.

For example, using ML to route Internet of Things (IoT) messages may be unwarranted; you can express the logic with a Rules Engine.

Consider AI services and pre-trained models

Once you decide if AI/ML is the right tool, consider whether the workload needs to be developed as a custom model.

Many workloads can use the managed AWS AI services shown in Figure 2. Using these services means that you won’t need the associated resources to collect/store/process data and to prepare/train/tune/deploy an ML model.

Figure 2. Managed AWS AI services

If adopting a fully managed AI service is not appropriate, evaluate if you can use pre-existing datasets, algorithms, or models. AWS Marketplace offers over 1,400 ML-related assets that customers can subscribe to. You can also fine-tune an existing model starting from a pre-trained model, like those available on Hugging Face. Using pre-trained models from third parties can reduce the resources you need for data preparation and model training.

Select sustainable Regions

Select an AWS Region with sustainable energy sources. When regulations and legal aspects allow, choose Regions near Amazon renewable energy projects and Regions where the grid has low published carbon intensity to host your data and workloads.

Data processing (data collection, data preprocessing, feature engineering)

Avoid datasets and processing duplication

Evaluate if you can avoid data processing by using existing publicly available datasets like AWS Data Exchange and Open Data on AWS (which includes the Amazon Sustainability Data Initiative). They offer weather and climate datasets, satellite imagery, air quality or energy data, among others. When you use these curated datasets, it avoids duplicating the compute and storage resources needed to download the data from the providers, store it in the cloud, organize, and clean it.

For internal data, you can also reduce duplication and rerun of feature engineering code across teams and projects by using a feature storage, such as Amazon SageMaker Feature Store.

Once your data is ready for training, use pipe input mode to stream it from Amazon Simple Storage Service (Amazon S3) instead of copying it to Amazon Elastic Block Store (Amazon EBS). This way, you can reduce the size of your EBS volumes.

Minimize idle resources with serverless data pipelines

Adopt a serverless architecture for your data pipeline so it only provisions resources when work needs to be done. For example, when you use AWS Glue and AWS Step Functions for data ingestion and preprocessing, you are not maintaining compute infrastructure 24/7. As shown in Figure 3, Step Functions can orchestrate AWS Glue jobs to create event-based serverless ETL/ELT pipelines.

Figure 3. Orchestrating data preparation with AWS Glue and Step Functions

Implement data lifecycle policies aligned with your sustainability goals

Classify data to understand its significance to your workload and your business outcomes. Use this information to determine when you can move data to more energy-efficient storage or safely delete it.

Manage the lifecycle of all your data and automatically enforce deletion timelines to minimize the total storage requirements of your workload using Amazon S3 Lifecycle policies. The Amazon S3 Intelligent-Tiering storage class will automatically move your data to the most sustainable access tier when access patterns change.

Define data retention periods that support your sustainability goals while meeting your business requirements, not exceeding them.

Adopt sustainable storage options

Use the appropriate storage tier to reduce the carbon impact of your workload. On Amazon S3, for example, you can use energy-efficient, archival-class storage for infrequently accessed data, as shown in Figure 4. And if you can easily recreate an infrequently accessed dataset, use the Amazon S3 One Zone-IA class to reduce by 3x or more its carbon footprint.

Figure 4. Data access patterns for Amazon S3

Don’t over-provision block storage for notebooks and use object storage services like Amazon S3 for common datasets.

Tip: You can check the free disk space on your SageMaker Notebooks using !df -h.

Select efficient file formats and compression algorithms

Use efficient file formats such as Parquet or ORC to train your models. Compared to CSV, they can help you reduce your storage by up to 87%.

Migrating to a more efficient compression algorithm can also greatly contribute to your storage reduction efforts. For example, Zstandard produces 10–15% smaller files than Gzip at the same compression speed. Some SageMaker built-in algorithms accept x-recordio-protobuf input, which can be streamed directly from Amazon S3 instead of being copied to a notebook instance.

Minimize data movement across networks

Compress your data before moving it over the network.

Minimize data movement across networks when selecting a Region; store your data close to your producers and train your models close to your data.

Measure results and improve

To monitor and quantify improvements, track the following metrics:

Total size of your S3 buckets and storage class distribution, using Amazon S3 Storage Lens
DiskUtilization metric of your SageMaker processing jobs
StorageBytes metric of your SageMaker Studio shared storage volume

Conclusion

In this blog post, we discussed the importance of defining the overall environmental impact or benefit of your ML workload and why managed AI services or pre-trained ML models are sustainable alternatives to custom models. You also learned best practices to reduce the carbon footprint of your ML workload in the data processing phase.

In the next post, we will continue our sustainability journey through the ML lifecycle and discuss the best practices you can follow in the model development phase.

Want to learn more? Check out the Sustainability Pillar of the AWS Well-Architected Framework, the Architecting for sustainability session at re:Invent 2021, and other blog posts on architecting for sustainability.

Looking for more architecture content? AWS Architecture Center provides reference architecture diagrams, vetted architecture solutions, Well-Architected best practices, patterns, icons, and more!

Ransomware Takeaways From Q4 2021

2022-02-10 Jeremy Milk

Post Syndicated from Jeremy Milk original https://www.backblaze.com/blog/ransomware-takeaways-from-q4-2021/

Ransomware commanded attention from both the media and governments like never before in 2021. It was an unprecedented year of major breaches, astronomical ransom demands, and attacks on businesses of all sizes. And much of what stood out to us towards the end of the year was the seemingly heightened regulatory response to previous quarters’ developments.

New regulations are hopeful signs that people are taking the ransomware threat more seriously, but they’re not enough to stop ransomware operators just yet. If you’re in charge of managing company data, knowing the latest in ransomware developments can help guide the choices and actions you take to protect company assets. Here are five key takeaways based on what we saw over Q4 2021.

This post is a part of our ongoing series on ransomware. Take a look at our other posts for more information on how businesses can defend themselves against a ransomware attack, and more.

1. U.S. State Department Sweetened the Deal for Reporting Cybercrime.

In Q4, we learned that the U.S. State Department put $10 million bounties on two specific ransomware groups—DarkSide and Sodinokibi—as well as $5 million bounties on their affiliates. This follows a statement issued earlier in 2021 that offered $10 million bounties for information on any person who engages in cybercrime. The bounties have proven effective in the past, with the department paying out more than $200 million since 1984 to individuals who provided intelligence that helped address threats to U.S. security.

2. Cyber Insurers Are Taking a More Conservative Stance.

The rise in attacks in 2021 led to a rise in companies seeking out cyber insurance coverage if they hadn’t already, and subsequently, a rise in claims against cyber insurance policies. The cyber insurance dynamics are evolving in response, and companies may need to think about coverage differently. Lloyds of London, for example, will no longer cover losses stemming from nation-state-affiliated criminals, cyber warfare, and “retaliatory” cyber activity. Whether or not ransomware gangs will be fully accepted as nation-state attackers is still up for debate, but the truth is that the cybersecurity community understands that some big name groups are definitely operating in league with their particular locale’s government branches.

3. Governments Named Names.

Also in November, the Ukrainian Security Service disclosed the names and positions of five members of a major cybercrime syndicate. The disclosure revealed the members’ links to the Crimean branch of the Russian Federal Security Service (FSB). They furthermore released recorded telephone conversations where the members discussed attacks and griped about their FSB salaries. According to the Ukrainian Security Service, the group has heavily targeted the Ukrainian government in more than 5,000 cyberattacks. Despite these efforts to dox major players, the group has continued their attacks as tensions between Russia and Ukraine continue to escalate.

4. Sanctions Tightened Ransomware’s Vice Grip.

In October, a ransomware group linked to a sanctioned entity—Evil Corp—posted information allegedly stolen from the National Rifle Association (NRA). While the NRA has not confirmed the attack, if true, it would potentially put them between a rock and a hard place. If they pay the attackers, they could face penalties from the U.S. government.

The sanctions are also changing the behavior of ransomware groups. Sanctioned groups are less likely to be successful in getting victims to pay. One way they get around this is by creating subsidiary brands or spinoff entities that, to an unknowing victim, seem to be unaffiliated with the sanctioned entity. When victims are unaware of affiliations between groups, they’re more likely to pay ransoms and less likely to disclose attacks to the authorities. However, pleading innocence may not be enough for victims to avoid consequences should the attacks be discovered by authorities.

5. Players in the Ransomware Economy Came Under Fire.

The ransomware economy is a murky web of actors that includes entities beyond just the ransomware operators themselves. In December, researchers linked 15+ ransomware-related crypto exchanges to a single prestigious skyscraper in Moscow—the tallest in the city, in fact. The findings provide more fuel for security experts to argue that Russian authorities give ransomware gangs a wide berth.

What This Means for You

While Q4 saw increased scrutiny on some ransomware operations, stopping ransomware is like a game of Whac-A-Mole. When one group gets exposed or dissolved, the operators and resources just reemerge as a new brand. Ransomware isn’t going away anytime soon, and the stakes for companies who fall victim are only higher with new sanctions. All this makes investing in ransomware protection all the more necessary.

The post Ransomware Takeaways From Q4 2021 appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.