Data load made easy and secure in Amazon Redshift using Query Editor V2

Post Syndicated from Raks Khare original https://aws.amazon.com/blogs/big-data/data-load-made-easy-and-secure-in-amazon-redshift-using-query-editor-v2/

Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse service that makes it simple and cost-effective to analyze all your data efficiently and securely. Users such as data analysts, database developers, and data scientists use SQL to analyze their data in Amazon Redshift data warehouses. Amazon Redshift provides a web-based Query Editor V2 in addition to supporting connectivity via ODBC/JDBC or the Amazon Redshift Data API.

Amazon Redshift Query Editor V2 makes it easy to query your data using SQL and gain insights by visualizing your results using charts and graphs with a few clicks. With Query Editor V2, you can collaborate with team members by easily sharing saved queries, results, and analyses in a secure way.

Analysts performing ad hoc analyses in their workspace need to load sample data in Amazon Redshift by creating a table and load data from desktop. They want to join that data with the curated data in their data warehouse. Data engineers and data scientists have test data, and want to load data into Amazon Redshift for their machine learning (ML) or analytics use cases.

In this post, we walk through a new feature in Query Editor V2 to easily load data files either from your local desktop or Amazon Simple Storage Service (Amazon S3).

Prerequisites

Complete the following prerequisite steps:

    1. Create an Amazon Redshift provisioned cluster or Serverless endpoint.
    2. Provide access to Query Editor V2 for your end-users. To enable your users to access Query Editor V2 using IAM, as an administrator, you can attach one of the following AWS-managed policies to the AWS Identity and Access Management (IAM) user or role to grant permission:
      • AmazonRedshiftQueryEditorV2FullAccess – Grants full access to the Query Editor V2 operations and resources.
      • AmazonRedshiftQueryEditorV2NoSharing – Grants the ability to work with Query Editor V2 without sharing resources.
      • AmazonRedshiftQueryEditorV2ReadSharing – Grants the ability to work with Query Editor V2 with limited sharing of resources. The granted principal can read the resources shared with its team but can’t update them.
      • AmazonRedshiftQueryEditorV2ReadWriteSharing – Grants the ability to work with Query Editor V2 with sharing of resources. The granted principal can read and update the resources shared with its team.
    3. Provide access to the S3 bucket to load data from a local desktop file.
      • To enable your users to load data from a local desktop using Query Editor V2, as an administrator, you have to specify a common S3 bucket, and the user account must be configured with proper permissions. You can use the following IAM policy as an example to configure your IAM user or role:
        {
            "Version": "2012-10-17",
            "Statement": [
                {
                    "Effect": "Allow",
                    "Action": [
                        "s3:ListBucket",
                        "s3:GetBucketLocation"
                    ],
                    "Resource": [
                        "arn:aws:s3:::<staging-bucket-name>>"
                    ]
                },
                {
                    "Effect": "Allow",
                    "Action": [
                        "s3:PutObject",
                        "s3:GetObject",
                        "s3:DeleteObject"
                    ],
                    "Resource": [
                        "arn:aws:s3:::<staging-bucket-name>[/<optional-prefix>]/${aws:userid}/*"
                    ]
                }
            ]
        }
        

      • It’s also recommended to have proper separation of data access when loading data files from your local desktop. You can use the following S3 bucket policy as an example to separate data access between users of the staging bucket you configured:
        {
         "Version": "2012-10-17",
            "Statement": [
                {"Sid": "userIdPolicy",
                    "Effect": "Deny",
                    "Principal": "*",
                    "Action": ["s3:PutObject",
                               "s3:GetObject",
                               "s3:DeleteObject"],
                    "NotResource": [
                        "arn:aws:s3:::<staging-bucket-name>[/<optional-prefix>]/${aws:userid}/*"
                    ]
                 }
            ]
        }
        

Configure Query Editor V2 for your AWS account

As an admin, you must first configure Query Editor V2 before providing access to your end-users. On the Amazon Redshift console, choose Query editor v2 in the navigation pane.

If you’re accessing Query Editor v2 for the first time, you must configure your account by providing AWS Key Management Service (AWS KMS) encryption and, optionally, an S3 bucket.

By default, an AWS-owned key is used to encrypt resources. Optionally, you can create a symmetric customer managed key to encrypt Query Editor V2 resources such as saved queries and query results using the AWS KMS console or AWS KMS API operations.

The S3 bucket URI is required when loading data from your local desktop. You can provide the S3 URI of the same bucket that you configured earlier as a prerequisite.

Configure-QEv2

If you have previously configured Query Editor V2 with only AWS KMS encryption, you can choose Account Settings after launching the interface to update the S3 URI to support loading from your local desktop.

Configure-QEv2

Load data from your local desktop

Users such as data analysts, database developers, and data scientists can now load local files up to 5 MB in size into Amazon Redshift tables from Query Editor V2, without using the COPY command. The supported data formats are CSV, JSON, DELIMITER, FIXEDWIDTH, SHAPEFILE, AVRO, PARQUET, and ORC. Complete the following steps:

      1. On the Amazon Redshift console, navigate to Query Editor V2.
      2. Click on Load data.
        load data
      3. Choose Load from local file and Browse to choose a local file. You can download the student_info.csv file to use as an example.
      4. If your file has column headers as the first row, keep the default selection of Ignore header rows as 1 to ignore first row.
      5. If your file has date columns, choose Data conversion parameters.
        browse and format file
      6. Select Date format, set it to auto and choose Next.
        date format
      7. Choose Load new table to automatically infer the file schema.
      8. Specify the values for Cluster or workgroup, Database, Schema, and Table (for example, Student_info) to load data to.
      9. Choose Create table.
        create-table

A success message appears that the table was created. Now you can load data into the newly created table from a local file.

      1. Choose Load data.
        table created

A message appears that the data load was successful.

      1. Query the Student_info table to see the data.
        query data

Load data from Amazon S3

You can easily load data from Amazon S3 into an Amazon Redshift table using Query Editor V2. Complete the following steps:

      1. On the Amazon Redshift console, launch Query Editor V2 and connect to your cluster.
      2. Browse to the database name (for example, dev), the public schema, and expand Tables.
      3. You can automatically infer the schema of a S3 file similar to Load from local file option shown above however for this demo, we will also show you how to load data to an existing table. Run the following create table script to make a sample table (for this example, public.customer):
CREATE TABLE customer ( 
	c_custkey int8 NOT NULL , 
	c_name varchar(25) NOT NULL, 
	c_address varchar(40) NOT NULL, 
	c_nationkey int4 NOT NULL, 
	c_phone char(15) NOT NULL, 
	c_acctbal numeric(12,2) NOT NULL, 
	c_mktsegment char(10) NOT NULL, 
	c_comment varchar(117) NOT NULL, 
PRIMARY Key(C_CUSTKEY) 
) DISTKEY(c_custkey) sortkey(c_custkey);
      1. Choose Load data.
        Create-Table
      2. Choose Load from S3 bucket.
      3. For this post, we load data from the TPCH Sample data GitHub repo, so for the S3 URI, enter s3://redshift-downloads/TPC-H/2.18/10GB/customer.tbl.
      4. For S3 file location, choose us-east-1.
      5. For File format, choose Delimiter.
      6. For Delimiter character, enter |.
        Load from S3
      7. Choose Data conversion parameters, then select Time format and Date format as auto.
      8. Choose Back.

Refer to Data conversion parameters for more details.

Date Time Format

      1. Choose Load operations.
      2. Select Automatic update for compression encodings.
      3. Select Stop loading when maximum number of errors has been exceeded and specify a value (for example, 100).
      4. Select Statistics update and ON, then choose Next.

Refer to Data load operations for more details.

Load Operations

      1. Choose Load existing table.
      2. Specify the Cluster or workgroup, DatabaseSchema (for example, public) and Table name (for example, customer).
      3. For IAM role, choose a suitable IAM role.
      4. Choose Load data.
        S3 Load Data

Query Editor V2 generates the COPY command and runs it on the Amazon Redshift cluster. The results of the COPY command are displayed in the Result section upon completion.

S3 Load Copy

Conclusion

In this post, we showed how Amazon Redshift Query Editor V2 has simplified the process to load data into Amazon Redshift from Amazon S3 or your local desktop, thereby accelerating the data analysis. It’s an easy-to-use feature that your teams can start using to load and query datasets. If you have any questions or suggestions, please leave a comment.


About the Authors

Raks KhareRaks Khare is an Analytics Specialist Solutions Architect at AWS based out of Pennsylvania. He helps customers architect data analytics solutions at scale on the AWS platform.

Tahir Aziz is an Analytics Solution Architect at AWS. He has worked with building data warehouses and big data solutions for over 13 years. He loves to help customers design end-to-end analytics solutions on AWS. Outside of work, he enjoys traveling and cooking.

Erol MurtezaogluErol Murtezaoglu, a Technical Product Manager at AWS, is an inquisitive and enthusiastic thinker with a drive for self-improvement and learning. He has a strong and proven technical background in software development and architecture, balanced with a drive to deliver commercially successful products. Erol highly values the process of understanding customer needs and problems, in order to deliver solutions that exceed expectations.

Sapna Maheshwari is a Sr. Solutions Architect at Amazon Web Services. She has over 18 years of experience in data and analytics. She is passionate about telling stories with data and enjoys creating engaging visuals to unearth actionable insights.

Karthik Ramanathan is a Software Engineer with Amazon Redshift and is based in San Francisco. He brings close to two decades of development experience across the networking, data storage and IoT verticals. When not at work he is also a writer and loves to be in the water.

Albert Harkema is a Software Development Engineer at AWS. He is known for his curiosity and deep-seated desire to understand the inner workings of complex systems. His inquisitive nature drives him to develop software solutions that make life easier for others. Albert’s approach to problem-solving emphasizes efficiency, reliability, and long-term stability, ensuring that his work has a tangible impact. Through his professional experiences, he has discovered the potential of technology to improve everyday life.

What’s new with Amazon MWAA support for Apache Airflow version 2.4.3

Post Syndicated from Parnab Basak original https://aws.amazon.com/blogs/big-data/whats-new-with-amazon-mwaa-support-for-apache-airflow-version-2-4-3/

Amazon Managed Workflows for Apache Airflow (Amazon MWAA) is a managed orchestration service for Apache Airflow that makes it simple to set up and operate end-to-end data pipelines in the cloud at scale. Amazon MWAA supports multiple versions of Apache Airflow (v1.10.12, v2.0.2, and v2.2.2). Earlier in 2023, we added support for Apache Airflow v2.4.3 so you can enjoy the same scalability, availability, security, and ease of management with Airflow’s most recent improvements. Additionally, with Apache Airflow v2.4.3 support, Amazon MWAA has upgraded to Python v3.10.8, which supports newer Python libraries like OpenSSL 1.1.1 as well as major new features and improvements.

In this post, we provide an overview of the features and capabilities of Apache Airflow v2.4.3 and how you can set up or upgrade your Amazon MWAA environment to accommodate Apache Airflow v2.4.3 as you orchestrate using workflows in the cloud at scale.

New feature: Data-aware scheduling using datasets

With the release of Apache Airflow v2.4.0, Airflow introduced datasets. An Airflow dataset is a stand-in for a logical grouping of data that can trigger a Directed Acyclic Graph (DAG) in addition to regular DAG triggering mechanisms such as cron expressions, timedelta objects, and Airflow timetables. The following are some of the attributes of a dataset:

  • Datasets may be updated by upstream producer tasks, and updates to such datasets contribute to scheduling downstream consumer DAGs.
  • You can create smaller, more self-contained DAGs, which chain together into a larger data-based workflow using datasets.
  • You have an additional option now to create inter-DAG dependencies using datasets besides ExternalTaskSensor or TriggerDagRunOperator. You should consider using this dependency if you have two DAGs related via an irregular dataset update. This type of dependency also provides you with increased observability into the dependencies between your DAGs and datasets in the Airflow UI.

How data-aware scheduling works

You need to define three things:

  • A dataset, or multiple datasets
  • The tasks that will update the dataset
  • The DAG that will be scheduled when one or more datasets are updated

The following diagram illustrates the workflow.

The producer DAG has a task that creates or updates the dataset defined by a Uniform Resource Identifier (URI). Airflow schedules the consumer DAG after the dataset has been updated. A dataset will be marked as updated only if the producer task completes successfully—if the task fails or if it’s skipped, no update occurs, and the consumer DAG will not be scheduled. If your updates to a dataset triggers multiple subsequent DAGs, then you can use the Airflow metric max_active_tasks_per_dag to control the parallelism of the consumer DAG and reduce the chance of overloading the system.

Let’s demonstrate this with a code example.

Prerequisites to build a data-aware scheduled DAG

You must have the following prerequisites:

  • An Amazon Simple Storage Service (Amazon S3) bucket to upload datasets in. This can be a separate prefix in your existing S3 bucket configured for your Amazon MWAA environment, or it can be a completely different S3 bucket that you identify to store your data in.
  • An Amazon MWAA environment configured with Apache Airflow v2.4.3. The Amazon MWAA execution role should have access to read and write to the S3 bucket configured to upload datasets. The latter is only needed if it’s a different bucket than the Amazon MWAA bucket.

The following diagram illustrates the solution architecture.

The workflow steps are as follows:

  1. The producer DAG makes an API call to a publicly hosted API to retrieve data.
  2. After the data has been retrieved, it’s stored in the S3 bucket.
  3. The update to this dataset subsequently triggers the consumer DAG.

You can access the producer and consumer code in the GitHub repo.

Test the feature

To test this feature, run the producer DAG. After it’s complete, verify that a file named test.csv is generated in the specified S3 folder. Verify in the Airflow UI that the consumer DAG has been triggered by updates to the dataset and that it runs to completion.

There are two restrictions on the dataset URI:

  • It must be a valid URI, which means it must be composed of only ASCII characters
  • The URI scheme can’t be an Airflow scheme (this is reserved for future use)

Other notable changes in Apache Airflow v2.4.3:

Apache Airflow v2.4.3 has the following additional changes:

  1. Deprecation of schedule_interval and timetable arguments. Airflow v2.4.0 added a new DAG argument schedule that can accept a cron expression, timedelta object, timetable object, or list of dataset objects.
  2. Removal of experimental Smart Sensors. Smart Sensors were added in v2.0 and were deprecated in favor of deferrable operators in v2.2, and have now been removed. Deferrable operators are not yet supported on Amazon MWAA, but will be offered in a future release.
  3. Implementation of ExternalPythonOperator that can help you run some of your tasks with a different set of Python libraries than other tasks (and other than the main Airflow environment).

For detailed release documentation with sample code, visit the Apache Airflow v2.4.0 Release Notes.

New feature: Dynamic task mapping

Dynamic task mapping was a new feature introduced in Apache Airflow v2.3, which has also been extended in v2.4. Dynamic task mapping lets DAG authors create tasks dynamically based on current data. Previously, DAG authors needed to know how many tasks were needed in advance.

This is similar to defining your tasks in a loop, but instead of having the DAG file fetch the data and do that itself, the scheduler can do this based on the output of a previous task. Right before a mapped task is run, the scheduler will create n copies of the task, one for each input. The following diagram illustrates this workflow.

It’s also possible to have a task operate on the collected output of a mapped task, commonly known as map and reduce. This feature is particularly useful if you want to externally process various files, evaluate multiple machine learning models, or extraneously process a varied amount of data based on a SQL request.

How dynamic task mapping works

Let’s see an example using the reference code available in the Airflow documentation.

The following code results in a DAG with n+1 tasks, with n mapped invocations of count_lines, each called to process line counts, and a total that is the sum of each of the count_lines. Here n represents the number of input files uploaded to the S3 bucket.

With n=4 files uploaded, the resulting DAG would look like the following figure.

Prerequisites to build a dynamic task mapped DAG

You need the following prerequisites:

  • An S3 bucket to upload files in. This can be a separate prefix in your existing S3 bucket configured for your Amazon MWAA environment, or it can be a completely different bucket that you identify to store your data in.
  • An Amazon MWAA environment configured with Apache Airflow v2.4.3. The Amazon MWAA execution role should have access to read to the S3 bucket configured to upload files. The latter is only needed if it’s a different bucket than the Amazon MWAA bucket.

You can access the code in the GitHub repo.

Test the feature

Upload the four sample text files from the local data folder to an S3 bucket data folder. Run the dynamic_task_mapping DAG. When it’s complete, verify from the Airflow logs that the final sum is equal to the sum of the count lines of the individual files.

There are two limits that Airflow allows you to place on a task:

  • The number of mapped task instances that can be created as the result of expansion
  • The number of mapped tasks that can run at once

For detailed documentation with sample code, visit the Apache Airflow v2.3.0 Release Notes.

New feature: Upgraded Python version

With Apache Airflow v2.4.3 support, Amazon MWAA has upgraded to Python v3.10.8, providing support for newer Python libraries, features, and improvements. Python v3.10 has slots for data classes, match statements, clearer and better Union typing, parenthesized context managers, and structural pattern matching. Upgrading to Python v3.10 should also help you align with security standards by mitigating the risk of older versions of Python such as 3.7, which is fast approaching its end of security support.

With structural pattern matching in Python v3.10, you can now use switch-case statements instead of using if-else statements and dictionaries to simplify the code. Prior to Python v3.10, you might have used if statements, isinstance calls, exceptions and membership tests against objects, dictionaries, lists, tuples, and sets to verify that the structure of the data matches one or more patterns. The following code shows what an ad hoc pattern matching engine might have looked like prior to Python v3.10:

def http_error(status):
        if status == 200:
           return 'OK'
        elif status == 400:
            return 'Bad request'
 	    elif status == 401:
      	    return 'Not allowed'
	    elif status == 403:
      	    return 'Not allowed'
 	    elif status == 404:
      	    return 'Not allowed'
 	    else:
	        return 'Something is wrong'

With structural pattern matching in Python v3.10, the code is as follows:

def http_error(status):
    match status:
        case 200:
            return 'OK'
        case 400:
            return 'Bad request'
        case 401 | 403 | 404:
            return 'Not allowed'
        case _:
            return 'Something is wrong'

Python v3.10 also carries forward the performance improvements introduced in Python v3.9 using the vectorcall protocol. vectorcall makes many common function calls faster by minimizing or eliminating temporary objects created for the call. In Python 3.9, several Python built-ins—range, tuple, set, frozenset, list, dict—use vectorcall internally to speed up runs. The second big performance enhancer is more efficient in the parsing of Python source code using the new parser for the CPython runtime.

For a full list of Python v3.10 release highlights, refer to What’s New In Python 3.10.

The code is available in the GitHub repo.

Set up a new Apache Airflow v2.4.3 environment

You can set up a new Apache Airflow v2.4.3 environment in your account and preferred Region using either the AWS Management Console, API, or AWS Command Line Interface (AWS CLI). If you’re adopting infrastructure as code (IaC), you can automate the setup using either AWS CloudFormation, the AWS Cloud Development Kit (AWS CDK), or Terraform.

When you have successfully created an Apache Airflow v2.4.3 environment in Amazon MWAA, the following packages are automatically installed on the scheduler and worker nodes along with other provider packages:

  • apache-airflow-providers-amazon==6.0.0
  • python==3.10.8

For a complete list of provider packages installed, refer to Apache Airflow provider packages installed on Amazon MWAA environments. Note that some imports and operator names have changed in the new provider package in order to standardize the naming convention across the provider package. For a complete list of provider package changes, refer to the package changelog.

Upgrade from Apache Airflow v2.0.2 or v2.2.2 to Apache Airflow v2.4.3

Currently, Amazon MWAA doesn’t support in-place upgrades of existing environments for older Apache Airflow versions. In this section, we show how you can transfer your data from your existing Apache Airflow v2.0.2 or v2.2.2 environment to Apache Airflow v2.4.3:

  1. Create a new Apache Airflow v2.4.3 environment.
  2. Copy your DAGs, custom plugins, and requirements.txt resources from your existing v2.0.2 or v2.2.2 S3 bucket to the new environment’s S3 bucket.
    • If you use requirements.txt in your environment, you need to update the --constraint to v2.4.3 constraints and verify that the current libraries and packages are compatible with Apache Airflow v2.4.3
    • With Apache Airflow v2.4.3, the list of provider packages Amazon MWAA installs by default for your environment has changed. Note that some imports and operator names have changed in the new provider package in order to standardize the naming convention across the provider package. Compare the list of provider packages installed by default in Apache Airflow v2.2.2 or v2.0.2, and configure any additional packages you might need for your new v2.4.3 environment. It’s advised to use the aws-mwaa-local-runner utility to test out your new DAGs, requirements, plugins, and dependencies locally before deploying to Amazon MWAA.
  3. Test your DAGs using the new Apache Airflow v2.4.3 environment.
  4. After you have confirmed that your tasks completed successfully, delete the v2.0.2 or v2.2.2 environment.

Conclusion

In this post, we talked about the new features of Apache Airflow v2.4.3 and how you can get started using it in Amazon MWAA. Try out these new features like data-aware scheduling, dynamic task mapping, and other enhancements along with Python v.3.10.


About the authors

Parnab Basak is a Solutions Architect and a Serverless Specialist at AWS. He specializes in creating new solutions that are cloud native using modern software development practices like serverless, DevOps, and analytics. Parnab works closely in the analytics and integration services space helping customers adopt AWS services for their workflow orchestration needs.

Three Takeaways from the Gartner® Market Guide for Managed Detection and Response Services

Post Syndicated from Tom Caiazza original https://blog.rapid7.com/2023/05/02/three-takeaways-from-the-gartner-r-market-guide-for-managed-detection-and-response-services/

Three Takeaways from the Gartner® Market Guide for Managed Detection and Response Services

Not all MDR services are created equal, and in order for organizations to find the right partner for their managed detection and response needs, Gartner® has published a Market Guide report offering key insights for businesses of all sizes. At Rapid7, we are proud to offer this complimentary report and share our three key takeaways from it.

MDR services have skyrocketed over the past few years. In the report, Gartner says: “MDR is a high-growth, established market (see Market Share: Managed Security Services, Worldwide, 2021 where MDR is a distinct segment, the MDR market grew 48.9% from 2020 to 2021).”

Because of the high growth in the market, many managed security services use the term MDR. However, organizations looking for a true Managed Detection and Response partner, should look to the Gartner definition to identify the right vendor.

Gartner puts it this way: “MDR services provide customers with remotely delivered, humanled, turnkey, modern SOC functions; ultimately delivering threat disruption and containment.”

But choosing a strong MDR partner goes far beyond these high-level requirements. Below are our key takeaways from the report. Without further ado, let’s dive right in.

Takeaway 1: Beware Providers Mimicking MDR

The key to MDR lies as much in the human-centric nature of the service as the power of the technology behind it. Managed Detection and Response is just that… managed. It requires a human with expertise not only in understanding the detection and remediation of threats and breaches, but how these correlate to your business and its goals. Sadly, not all services claiming to be MDR lead with this human expertise.

Gartner shares: “Misnamed technology-centric offerings and vendor-delivered service wrappers (VDSW), that fail to deliver human-driven managed detection and response (MDR) services, are causing challenges for buyers looking to identify and select an outcome-driven provider.”

Human-analyzed context is critically important to the success of an MDR program and an organization’s outcomes in their security programs. Unfortunately, some providers are not living up to their own marketing materials. For instance, Gartner found that some “deliver a far less human-driven experience, depending on the technology for the bulk of the delivery. Although still valuable, these offerings are often promoted as being more engaged than they actually are and would be better described as managed EDR (MEDR).”

Takeaway 2: Context is King

This could be considered a corollary to the previous takeaway, but we acknowledge how important it is for an MDR provider to understand your organization’s unique environment, the context of threats, and how those threats have potential to impact your business. It is not enough to simply detect and remediate threats; an MDR SOC should understand which threats and types of threats will have the biggest impact on your company or organization.

The human-led nature of successful MDR programs means that a company can rest assured that their MDR SOC is able to provide insights that are actually useful to boost their customer’s outcomes.

Gartner has this to say on the subject: “MDR buyers must focus on the ability to provide context-driven insights that will directly impact their business objectives, as wide-scale collection of telemetry and automated analysis are insufficient when facing uncommon threats.”

We feel this has a direct relationship with the expertise of the MDR provider and the quality of the technology they are providing. Too much information without the context necessary to triage and prioritize could overwhelm any security team. Too little information and threats go unchecked. Finding the right balance between the tech and expertise is critical.

Takeaway 3: Threats Know No Boundaries

Ok, that subhead may be a little hyperbolic, but it should surprise no one that threat actors aren’t clocking out at 5pm on a Friday and taking holidays off. Your MDR SOC can’t either. Gartner recommends “Use MDR services to obtain 24/7, remotely delivered, human-led security operations capabilities when there are no existing internal capabilities, or when the organization needs to accelerate or augment existing security operations capabilities.”

So, what exactly does that mean? Essentially, any MDR SOC you choose should provide round-the-clock security that knows no geographical limitations, and has a team of experts actively detecting, assessing, and providing remediation recommendations for threats whenever they arise.

Gartner says: “Turnkey threat detection, investigation and response (TDIR) capabilities are a core requirement for buyers of MDR services who demand remotely delivered services deployed quickly and predictably.”

A follow-the-sun approach that puts highly competent security experts at your fingertips 24/7, 365, and that melds the human-centric nature of deep cybersecurity and business analysis with a powerful threat-detecting technology solution would make for a compelling MDR service option.

Choosing an MDR partner requires some serious due diligence and understanding of your organization’s priorities. This Market Guide helps MDR buyers understand the state of the market and what to look for in an effective MDR provider. Our three takeaways are in no way comprehensive; download the full report to learn more.

Gartner, “Market Guide for Managed Detection and Response Services” Pete Shoard, Al Price, Mitchell Schneider, Craig Lawson, Andrew Davies. 14 February 2023.

GARTNER is a registered trademark and service mark of Gartner, Inc. and/or its affiliates in the U.S. and internationally and is used herein with permission. All rights reserved.

Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.

NIST Draft Document on Post-Quantum Cryptography Guidance

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2023/05/nist-draft-document-on-post-quantum-cryptography-guidance.html

NIST has released a draft of Special Publication1800-38A: “Migration to Post-Quantum Cryptography: Preparation for Considering the Implementation and Adoption of Quantum Safe Cryptography.” It’s only four pages long, and it doesn’t have a lot of detail—more “volumes” are coming, with more information—but it’s well worth reading.

We are going to need to migrate to quantum-resistant public-key algorithms, and the sooner we implement key agility the easier it will be to do so.

News article.

[$] Ruff: a fast Python linter

Post Syndicated from original https://lwn.net/Articles/930487/

Linters are tools that analyze a program’s source code to detect various
problems such as syntax errors, programming mistakes, style violations, and
more. They are important for maintaining code quality and
readability in a project, as well as for catching bugs early in the
development cycle. Last year, a new Python linter appeared: Ruff. It’s fast, written in Rust, and in less than a year it has
been adopted by some high-profile projects, including FastAPI, Pandas, and SciPy.

The Guix (almost) full-source bootstrap

Post Syndicated from original https://lwn.net/Articles/930650/

The Guix project (“a transactional
package manager and an advanced distribution of the GNU system
“) has announced
a milestone toward its goal of bootstrapping an entire distribution from
source:

If you run guix pull today, you get a package graph of more than
22,000 nodes rooted in a 357-byte program—something that had never
been achieved, to our knowledge, since the birth of Unix.

This is an interesting exercise, but should also be a defense against
“trusting trust” attacks.
(Thanks to Ludovic Courtès and Andy Tai).

How we built Network Analytics v2

Post Syndicated from Alex Forster original http://blog.cloudflare.com/building-network-analytics-v2/

How we built Network Analytics v2

How we built Network Analytics v2

Network Analytics v2 is a fundamental redesign of the backend systems that provide real-time visibility into network layer traffic patterns for Magic Transit and Spectrum customers. In this blog post, we'll dive into the technical details behind this redesign and discuss some of the more interesting aspects of the new system.

To protect Cloudflare and our customers against Distributed Denial of Service (DDoS) attacks, we operate a sophisticated in-house DDoS detection and mitigation system called dosd. It takes samples of incoming packets, analyzes them for attacks, and then deploys mitigation rules to our global network which drop any packets matching specific attack fingerprints. For example, a simple network layer mitigation rule might say “drop UDP/53 packets containing responses to DNS ANY queries”.

In order to give our Magic Transit and Spectrum customers insight into the mitigation rules that we apply to their traffic, we introduced a new reporting system called "Network Analytics" back in 2020. Network Analytics is a data pipeline that analyzes raw packet samples from the Cloudflare global network. At a high level, the analysis process involves trying to match each packet sample against the list of mitigation rules that dosd has deployed, so that it can infer whether any particular packet sample was dropped due to a mitigation rule. Aggregated time-series data about these packet samples is then rolled up into one-minute buckets and inserted into a ClickHouse database for long-term storage. The Cloudflare dashboard queries this data using our public GraphQL APIs, and displays the data to customers using interactive visualizations.

What was wrong with v1?

This original implementation of Network Analytics delivered a ton of value to customers and has served us well. However, in the years since it was launched, we have continued to significantly improve our mitigation capabilities by adding entirely new mitigation systems like Advanced TCP Protection (otherwise known as flowtrackd) and Magic Firewall. The original version of Network Analytics only reports on mitigations created by dosd, which meant we had a reporting system that was showing incomplete information.

Adapting the original version of Network Analytics to work with Magic Firewall would have been relatively straightforward. Since firewall rules are “stateless”, we can tell whether a firewall rule matches a packet sample just by looking at the packet itself. That’s the same thing we were already doing to figure out whether packets match dosd mitigation rules.

However, despite our efforts, adapting Network Analytics to work with flowtrackd turned out to be an insurmountable problem. flowtrackd is “stateful”, meaning it determines whether a packet is part of a legitimate connection by tracking information about the other packets it has seen previously. The original Network Analytics design is incompatible with stateful systems like this, since that design made an assumption that the fate of a packet can be determined simply by looking at the bytes inside it.

Rethinking our approach

Rewriting a working system is not usually a good idea, but in this case it was necessary since the fundamental assumptions made by the old design were no longer true. When starting over with Network Analytics v2, it was clear to us that the new design not only needed to fix the deficiencies of the old design, it also had to be flexible enough to grow to support future products that we haven’t even thought of yet. To meet this high bar, we needed to really understand the core principles of network observability.

In the world of on-premise networks, packets typically chain through a series of appliances that each serve their own special purposes. For example, a packet may first pass through a firewall, then through a router, and then through a load balancer, before finally reaching the intended destination. The links in this chain can be thought of as independent “network functions”, each with some well-defined inputs and outputs.

How we built Network Analytics v2

A key insight for us was that, if you squint a little, Cloudflare’s software architecture looks very similar to this. Each server receives packets and chains them through a series of independent and specialized software components that handle things like DDoS mitigation, firewalling, reverse proxying, etc.

How we built Network Analytics v2

After noticing this similarity, we decided to explore how people with traditional networks monitor them. Universally, the answer is either Netflow or sFlow.

How we built Network Analytics v2

Nearly all on-premise hardware appliances can be configured to send a stream of Netflow or sFlow samples to a centralized flow collector. Traditional network operators tend to take these samples at many different points in the network, in order to monitor each device independently. This was different from our approach, which was to take packet samples only once, as soon as they entered the network and before performing any processing on them.

Another interesting thing we noticed was that Netflow and sFlow samples contain more than just information about packet contents. They also contain lots of metadata, such as the interface that packets entered and exited on, whether they were passed or dropped, which firewall or ACL rule they hit, and more. The metadata format is also extensible, so that devices can include information in their samples which might not make sense for other samples to contain. This flexibility allows flow collectors to offer rich reporting without necessarily having to understand the functions that each device performs on a network.

The more we thought about what kind of features and flexibility we wanted in an analytics system, the more we began to appreciate the elegance of traditional network monitoring. We realized that we could take advantage of the similarities between Cloudflare’s software architecture and “network functions” by having each software component emit its own packet samples with its own context-specific metadata attached.

How we built Network Analytics v2

Even though it seemed counterintuitive for our software to emit multiple streams of packet samples this way, we realized through taking inspiration from traditional network monitoring that doing so was exactly how we could build the extensible and future-proof observability that we needed.

Design & implementation

The implementation of Network Analytics v2 could be broken down into two separate pieces of work. First, we needed to build a new data pipeline that could receive packet samples from different sources, then normalize those samples and write them to long-term storage. We called this data pipeline samplerd – the “sampler daemon”.

The samplerd pipeline is relatively small and straightforward. It implements a few different interfaces that other software can use to send it metadata-rich packet samples. It then normalizes these samples and forwards them for postprocessing and insertion into a ClickHouse database.

How we built Network Analytics v2

The other, larger piece of work was to modify existing Cloudflare systems and make them send packet samples to samplerd. The rest of this post will cover a few interesting technical challenges that we had to overcome to adapt these systems to work with samplerd.

l4drop

The first system that incoming packets enter is our xdp daemon, called xdpd. In a few words, xdpd manages the installation of multiple XDP programs: a packet sampler, l4drop and L4LB. l4drop is where many types of attacks are mitigated. Mitigations done at this level are very cheap, because they happen so early in the network stack.

Before introducing samplerd, these XDP programs were organized like this:

How we built Network Analytics v2

An incoming packet goes through a sampler that will emit a packet sample for some packets. It then enters l4drop, a set of programs that will decide the fate of a particular packet. Finally, L4LB is in charge of layer 4 load balancing.

It’s critical that the samples are emitted even for packets that get dropped further down in the pipeline, because that provides visibility into what’s dropped. That’s useful both from a customer perspective to have a more comprehensive view in dashboards but also to continuously adapt our mitigations as attacks change.

In l4drop’s original configuration, a packet sample is emitted prior to the mitigation decision. Thus, that sample can’t record the mitigation action that’s taken on that particular packet.

samplerd wants packet samples to include the mitigation outcome and other metadata that indicates why a particular mitigation decision was taken. For instance, a packet may be dropped because it matched an attack mitigation signature. Or it may pass because it matched a rate limiting rule and it was under the threshold for that rule. All of this is valuable information that needs to be shown to customers.

Given this requirement, the first idea we had was to simply move the sampler after l4drop and have l4drop just mark the packet as “to be dropped”, along with metadata for the reason why. The sampler component would then have all the necessary details to emit a sample with the final fate of the packet and its associated metadata. After emitting the sample, the sampler would drop or pass the packet.

However, this requires copying all the metadata associated with the dropping decision for every single packet, whether it will be sampled or not. The cost of this copying proved prohibitive considering that every packet entering Cloudflare goes through the xdpd programs.

So we went back to the drawing board. What we actually need to know when making a sampling decision is whether we need to copy the metadata. We only need to copy the metadata if a particular packet will be sampled. That’s why it made sense to effectively split the sampler into two parts by sandwiching the programs that make the mitigation decision together. First, we make the mitigation decision, then we go through the mitigation decision programs. These programs can then decide to copy metadata only when a packet will be sampled. They will however always mark a packet with a DROP or PASS mark. Then the sampler will check the mark for sampling and the DROP/PASS mark. Based on those marks, they’ll build a sample if necessary and drop or pass the packet.

Given how tightly the sampler is now coupled with the rest of l4drop, it’s not a standalone part of xdpd anymore and the final result looks like this:

How we built Network Analytics v2

iptables

Another of our mitigation layers is iptables. We use it for some types of mitigations that we can’t perform in l4drop, like stateful connection tracking. iptables mitigations are organized as a list of rules that an incoming packet will be evaluated against. It’s also possible for a rule to jump to another rule when some conditions are met. Some of these rules will perform rate limiting, which will only drop packets beyond a certain threshold. For instance, we might drop all packets beyond a 10 packet-per-second threshold.

Prior to the introduction of samplerd, our typical rules would match on some characteristics of the packet – say, the IP and port – and make a decision whether to immediately drop or pass the packet.

To adapt our iptables rules to samplerd, we need to make them emit annotated samples, so that we can know why a decision was taken. To this end, one idea would be to just make the rules which drop packets also emit a nflog sample with a certain probability. One of the issues with that approach has to do with rate limiting rules. A packet may match such a rule, but the packet may be under the threshold and so that packet gets passed further down the line. That doesn’t work because we also want to sample those passed packets too, since it’s important for a customer to know what was passed and dropped by the rate limiter. But since a packet that passes the rate limiter may be dropped by further rules down the line, it’ll have multiple chances to be sampled, causing oversampling of some parts of the traffic. That would introduce statistical distortions in the sampled data.

To solve this, we can once again separate these steps like we did in l4drop, and make several sets of rules. First, the sampling decision is made by the first set of rules. Then, the pass-or-drop decision is made by the second set of rules. Finally, the sample can be emitted (if necessary), and then the packet can be passed or dropped by the third set of rules.

To communicate between rules we use Linux packet markings. For instance, a mark will be placed on a packet to signal that the packet will be sampled, and another mark will signify that the packet matched a particular rule and that it needs to be dropped.

For incoming packets, the rule in charge of the random sampling decision is evaluated first. Then the mitigation rules are evaluated next, in a specific order. When one of those rules decides to drop a packet, it jumps straight to the last set of rules, which will emit a sample if necessary before dropping. If no mitigation rule matches, eventually packets fall through to the last set of rules, where they will match a generic pass rule. That rule will emit a sample if necessary and pass the packet down the stack for further processing. By organizing rules in stages this way, we won’t ever double-sample packets.

ClickHouse & GraphQL

Once the samplerd daemon has the samples from the various mitigation systems, it does some light processing and ships those samples to be stored in ClickHouse. This inserter further enriches the metadata present in the sample, for instance by identifying the account associated with a particular destination IP. It also identifies ongoing attacks and adds a unique attack ID to each sample that is part of an attack.

We designed the inserters so that we’ll never need to change the data once it has been written, so that we can sustain high levels of insertion. Part of how we achieved this was by using ClickHouse’s MergeTree table engine. However, for improved performance, we have also used a less common ClickHouse table engine, called AggregatingMergeTree. Let’s dive into this using a simplified example.

Each packet sample is stored in a table that looks like the below:

Attack ID Dest IP Dest Port Sample Interval (SI)
abcd 1.1.1.1 53 1000
abcd 1.0.0.1 53 1000

The sample interval is the number of packets that went through between two samples, as we are using ABR.

These tables are then queried through the GraphQL API, either directly or by the dashboard. This required us to build a view of all the samples for a particular attack, to identify (for example) a fixed destination IP. These attacks may span days or even weeks and so these queries could potentially be costly and slow. For instance, a naive query to know whether the attack “abcd” has a fixed destination port or IP may look like this:

SELECT if(uniq(dest_ip) == 1, any(dest_ip), NULL), if(uniq(dest_port) == 1, any(dest_port), NULL)
FROM samples
WHERE attack_id = ‘abcd’

In the above query, we ask ClickHouse for a lot more data than we should need. We only really want to know whether there is one value or multiple values, yet we ask for an estimation of the number of unique values. One way to know if all values are the same (for values that can be ordered) is to check whether the maximum value is equal to the minimum. So we could rewrite the above query as:

SELECT if(min(dest_ip) == max(dest_ip), any(dest_ip), NULL), if(min(dest_port) == max(dest_port), any(dest_port), NULL)
FROM samples
WHERE attack_id = ‘abcd’

And the good news is that storing the minimum or the maximum takes very little space, typically the size of the column itself, as opposed to keeping the state that uniq() might require. It’s also very easy to store and update as we insert. So to speed up that query, we have added a precomputed table with running minimum and maximum using the AggregatingMergeTree engine. This is the special ClickHouse table engine that can compute and store the result of an aggregate function on a particular key. In our case, we will use the attackID as the key to group on, like this:

Attack ID min(Dest IP) max(Dest IP) min(Dest Port) max(Dest Port) sum(SI)
abcd 1.0.0.1 1.1.1.1 53 53 2000

Note: this can be generalized to many aggregating functions like sum(). The constraint on the function is that it gives the same result whether it’s given the whole set all at once or whether we apply the function to the value it returned on a subset and another value from the set.

Then the query that we run can be much quicker and simpler by querying our small aggregating table. In our experience, that table is roughly 0.002% of the original data size, although admittedly all columns of the original table are not present.

And we can use that to build a SQL view that would look like this for our example:

SELECT if(min_dest_ip == max_dest_ip, min_dest_ip, NULL), if(min_dest_port == max_dest_port, min_dest_port, NULL)
FROM aggregated_samples
WHERE attack_id = ‘abcd’

Attack ID Dest IP Dest Port Σ
abcd 53 2000

Implementation detail: in practice, it is possible that a row in the aggregated table gets split on multiple partitions. In that case, we will have two rows for a particular attack ID. So in production we have to take the min or max of all the rows in the aggregating table. That’s usually only three to four rows, so it’s still much faster than going over potentially thousands of samples spanning multiple days. In practice, the query we use in production is thus closer to:

SELECT if(min(min_dest_ip) == max(max_dest_ip), min(min_dest_ip), NULL), if(min(min_dest_port) == max(max_dest_port), min(min_dest_port), NULL)
FROM aggregated_samples
WHERE attack_id = ‘abcd’

Takeaways

Rewriting Network Analytics was a bet that has paid off. Customers now have a more accurate and higher fidelity view of their network traffic. Internally, we can also now troubleshoot and fine tune our mitigation systems much more effectively. And as we develop and deploy new mitigation systems in the future, we are confident that we can adapt our reporting in order to support them.

Effects of the conflict in Sudan on Internet patterns

Post Syndicated from João Tomé original http://blog.cloudflare.com/sudan-armed-conflict-impact-on-the-internet-since-april-15-2023/

Effects of the conflict in Sudan on Internet patterns

Effects of the conflict in Sudan on Internet patterns

On Saturday, April 15, 2023, an armed conflict between rival factions of the military government of Sudan began. Cloudflare observed a disruption in Internet traffic on that Saturday, starting at 08:00 UTC, which deepened on Sunday. Since then, the conflict has continued, and different ISPs have been affected, in some cases with a 90% drop in traffic. On May 2, Internet traffic is still ~30% lower than pre-conflict levels. This blog post will show what we’ve been seeing in terms of Internet disruption there.

On the day that clashes broke out, our data shows that traffic in the country dropped as much as 60% on Saturday, after 08:00 UTC, with a partial recovery on Sunday around 14:00, but it has consistently been lower than before. Although we saw outages and disruptions on major local Internet providers, the general drop in traffic could also be related to different human usage patterns because of the conflict, with people trying to leave the country. In Ukraine, we saw a clear drop in traffic, not always related to ISP outages, after the war started, when people were leaving the country.

Here’s the hourly perspective of Sudan’s Internet traffic over the past weeks as seen on Cloudflare Radar, with the orange shading highlighting the disruption since April 15.

Effects of the conflict in Sudan on Internet patterns

The next chart of daily traffic in Sudan (that is dominated by mobile device traffic — more on that below) clearly shows a daily drop in traffic after April 15. On that Saturday, traffic was 27% lower than on the previous Saturday, and it was a 43% decrease on Sunday, April 16, compared to the previous week.

Effects of the conflict in Sudan on Internet patterns

Frequent outages on different ISPs

On April 23 and 24, there was a more significant outage affecting multiple ISPs (and their ASNs or autonomous systems) that brought Internet traffic in the country, as the previous chart clearly shows, even lower. There was no official reason given for those major disruptions that had a nationwide impact. That said, the disruptions were also felt in neighbor country Chad in several ISPs, given that Sudan’s Sudatel (AS15706) seems to be an upstream provider.

Cloudflare saw a 74% decrease in traffic on Sunday, April 23, compared to Sunday, April 9, before the conflict, and a 70% drop on Monday, April 24, compared with Monday, April 10. In some ISPs, the impact was bigger.

In the news, ISP MTN (AS36972) reportedly blocked Internet services on April 16, and, according to Reuters, was told by the authorities to restore it a few hours after. We saw a clear outage in that ASN, an almost 90% drop in traffic compared with previous weeks for about 10 hours, after 00:00 UTC on April 16, and it mostly recovered after 10:00 UTC.

Effects of the conflict in Sudan on Internet patterns

The most impacted ISPs were Sudatel (AS15706), Zain (AS36998), and Canar (AS33788) with almost complete outages. Canar was the outage that lasted the longest, with 83 hours, from April 21 to 25. Next, it was the main ISP in the country, Sudatel, with 40 hours of almost complete Internet blackout, followed by Zain, with 10 hours on April 24.

The return of traffic coincided with the time a nationwide ceasefire of 72 hours was agreed upon on April 24.

BGP or Border Gateway Protocol is a mechanism to exchange routing information between networks on the Internet, and a crucial part that enables the existence of the network of networks (the Internet). BGP announcements or updates can signal disruption in connectivity or outages, as we saw in Canada in 2022 with Rogers ISP or in the UK in 2023 with Virgin Media, for example. In this case, highlighted in the next chart, BGP updates biggest spikes from Sudatel (AS15706) are consistent with both the start of the outage, and the return to traffic.

Effects of the conflict in Sudan on Internet patterns

Mobile device traffic percentage grew after April 15

Sudan is typically one of the countries with the highest percentage of mobile device traffic in the world. We’ve written about this in the past (see the 2021 mobile device traffic blog post), and at the time the average was 83%. Observing data from the past week, as seen on our Cloudflare Radar traffic worldwide page, Sudan leads our ranking with 88% of traffic coming from mobile devices.

Effects of the conflict in Sudan on Internet patterns

Looking at the past few weeks, we can see mobile device traffic growing as a percentage of all Internet traffic in Sudan. The April 3 week showed a lower percentage than it is now, with 77% (23% was desktop traffic percentage). In the April 10 week, which includes April 15 and 16, mobile device traffic rose to 80%. In the week of April 17, it was 85%, and the week of April 24, it’s 88%.

Effects of the conflict in Sudan on Internet patterns

How is Internet traffic holding up more recently in Sudan? Looking at a week-over-week hourly comparison, traffic last Friday was still around 55% lower than before April 15, and on May 2, traffic is still around 30% lower than pre-conflict levels (April 11).

Effects of the conflict in Sudan on Internet patterns

In the previous chart, there’s a regular drop in traffic observed at around 16:00 UTC, ~18:00 local time. It’s more evident before April 15, but it generally continues after that. That drop in traffic is consistent with Ramadan trends we discussed recently in a blog post. It is related to the Iftar, the first meal after sunset that breaks the fast and often serves as a family or community event — sunset in Khartoum, Sudan, is at 18:07.

As of this Tuesday, Internet traffic data (from a linear perspective) shows that traffic continues to be much lower than before, and this morning at 08:00 UTC it is ~30% lower than it was three weeks ago (pre-conflict), at the same time, showing some recovery in the past couple of days.

Effects of the conflict in Sudan on Internet patterns

According to the BBC, reporting from Sudan, the Internet continues to be impacted, an observation that is consistent with our data.

Looking more closely at Sudan’s capital, Khartoum, where most people live and the conflict began, traffic was impacted after April 15 (the blue line in the next chart). On April 27, Internet traffic was around 76% lower than it was on the same pre-conflict weekday (April 13). The next chart also shows the typical drop around 18:00, for Ramadan’s Iftar, the first meal after sunset.

Effects of the conflict in Sudan on Internet patterns

Looking at DNS queries (from Cloudflare’s resolver) to websites or domains in Sudan, we saw a clear shift from the use of WhatsApp-related domains for messaging to Signal ones after April 15 — the drop in DNS traffic to WhatsApp was similar to the increase in DNS traffic to Signal domains.

Social media platforms such as LinkedIn, but also TikTok or YouTube, had a clear decrease since April 15. On the other hand, Facebook and Twitter saw an increase, especially on April 15 and 16, with some disruptions (possibly related to Internet access), but with bigger spikes than before, usually at night, since then. Here’s the aggregated view to social media platforms:

Effects of the conflict in Sudan on Internet patterns

Conclusion: ongoing impact

The conflict in Sudan continues, and so does its Internet traffic impact. We will continue to monitor the Internet situation on Cloudflare Radar, where you can check Sudan’s country page and the Outage Center.

On social media, we’re at @CloudflareRadar on Twitter or cloudflare.social/@radar on Mastodon.

Integrating primary computing and literacy through multimodal storytelling

Post Syndicated from Veronica Cucuiat original https://www.raspberrypi.org/blog/primary-computing-programming-literacy-storytelling/

Broadening participation and finding new entry points for young people to engage with computing is part of how we pursue our mission here at the Raspberry Pi Foundation. It was also the focus of our March online seminar, led by our own Dr Bobby Whyte. In this third seminar of our series on computing education for primary-aged children, Bobby presented his work on ‘designing multimodal composition activities for integrated K-5 programming and storytelling’. In this research he explored the integration of computing and literacy education, and the implications and limitations for classroom practice.

Young learners at computers in a classroom.

Motivated by challenges Bobby experienced first-hand as a primary school teacher, his two studies on the topic contribute to the body of research aiming to make computing less narrow and difficult. In this work, Bobby integrated programming and storytelling as a way of making the computing curriculum more applicable, relevant, and contextualised.

Critically for computing educators and researchers in the area, Bobby explored how theories related to ‘programming as writing’ translate into practice, and what the implications of designing and delivering integrated lessons in classrooms are. While the two studies described here took place in the context of UK schooling, we can learn universal lessons from this work.

What is multimodal composition?

In the seminar Bobby made a distinction between applying computing to literacy (or vice versa) and true integration of programming and storytelling. To achieve true integration in the two studies he conducted, Bobby used the idea of ‘multimodal composition’ (MMC). A multimodal composition is defined as “a composition that employs a variety of modes, including sound, writing, image, and gesture/movement [… with] a communicative function”.

Storytelling comes together with programming in a multimodal composition as learners create a program to tell a story where they:

  • Decide on content and representation (the characters, the setting, the backdrop)
  • Structure text they’ve written
  • Use technical aspects (i.e. motion blocks, tension) to achieve effects for narrative purposes
A screenshot showing a Scratch project.
Defining multimodal composition (MMC) for a visual programming context

Multimodality for programming and storytelling in the classroom

To investigate the use of MMC in the classroom, Bobby started by designing a curriculum unit of lessons. He mapped the unit’s MMC activities to specific storytelling and programming learning objectives. The MMC activities were designed using design-based research, an approach in which something is designed and tested iteratively in real-world contexts. In practice that means Bobby collaborated with teachers and students to analyse, evaluate, and adapt the unit’s activities.

A list of learning objectives that could be covered by a multimodal composition activity.
Mapping of the MMC activities to storytelling and programming learning objectives

The first of two studies to explore the design and implementation of MMC activities was conducted with 10 K-5 students (age 9 to 11) and showed promising results. All students approached the composition task multimodally, using multiple representations for specific purposes. In other words, they conveyed different parts of their stories using either text, sound, or images.

Bobby found that broadcast messages and loops were the least used blocks among the group. As a consequence, he modified the curriculum unit to include additional scaffolding and instructional support on how and why the students might embed these elements.

A list of modifications to the MMC curriculum unit based on testing in a classroom.
Bobby modified the classroom unit based on findings from his first study

In the second study, the MMC activities were evaluated in a classroom of 28 K-5 students led by one teacher over two weeks. Findings indicated that students appreciated the longer multi-session project. The teacher reported being satisfied with the project work the learners completed and the skills they practised. The teacher also further integrated and adapted the unit into their classroom practice after the research project had been completed.

How might you use these research findings?

Factors that impacted the integration of storytelling and programming included the teacher’s confidence to teach programming as well as the teacher’s ability to differentiate between students and what kind of support they needed depending on their previous programming experience.

In addition, there are considerations regarding the curriculum. The school where the second study took place considered the activities in the unit to be literacy-light, as the English literacy curriculum is ‘text-heavy’ and the addition of multimodal elements ‘wastes’ opportunities to produce stories that are more text-based.

Woman teacher and female student at a laptop.

Bobby’s research indicates that MMC provides useful opportunities for learners to simultaneously pursue storytelling and programming goals, and the curriculum unit designed in the research proved adaptable for the teacher to integrate into their classroom practice. However, Bobby cautioned that there’s a need to carefully consider both the benefits and trade-offs when designing cross-curricular integration projects in order to ensure a fair representation of both subjects.

Can you see an opportunity for integrating programming and storytelling in your classroom? Let us know your thoughts or questions in the comments below.

You can watch Bobby’s full presentation:

And you can read his research paper Designing for Integrated K-5 Computing and Literacy through Story-making Activities (open access version).

You may also be interested in our pilot study on using storytelling to teach computing in primary school, which we conducted as part of our Gender Balance in Computing programme.

Join our next seminar on primary computing education

At our next seminar, we welcome Kate Farrell and Professor Judy Robertson (University of Edinburgh). This session will introduce you to how data literacy can be taught in primary and early-years education across different curricular areas. It will take place online on Tuesday 9 May at 17.00 UK time, don’t miss out and sign up now.

Yo find out more about connecting research to practice for primary computing education, you can find other our upcoming monthly seminars on primary (K–5) teaching and learning and watch the recordings of previous seminars in this series.

The post Integrating primary computing and literacy through multimodal storytelling appeared first on Raspberry Pi Foundation.

Манол Пейков и карнавалното въображение

Post Syndicated from Светла Енчева original https://www.toest.bg/manol-peikov-i-karnavalnoto-vuobrazhenie/

Манол Пейков и карнавалното въображение

Една не особено прилична реплика на Манол Пейков към Костадин Костадинов породи бурен творчески ентусиазъм. На 26 април депутатът от „Демократична България“ се обърна от парламентарната трибуна към председателя на „Възраждане“ с думите:

Господа, не е цивилизовано да си говорим по този начин. Господин Костадинов, Вие се опитвате да нормализирате език, който е неприемлив за публичното пространство. Вие разчитате на това, че ние се държим цивилизовано и уважително към Вас. Имайте предвид, че ние също имаме богато въображение. Аз винаги мога да Ви назова „Коцето Късопишков“ – имам предвид човек, който пише с къси букви. Въображението ми е безкрайно.

Как се стигна до думите на Пейков

Ако разглеждаме думите на издателя, граждански активист и депутат Манол Пейков изолирано от контекста, на преден план ще излезе фактът, че той е казал нещо неприлично. Репликата му обаче е реакция на систематично поведение на Костадинов, което до този момент беше останало безнаказано.

Конкретният повод за изказването е размяна на реплики между председателя на „Възраждане“ и депутата от коалицията ПП–ДБ Явор Божанков. Причината за тях е предложението на ПП–ДБ да има парламентарна комисия, в чиято работа да се включват въпросите на семейството. Костадинов заподозря, че това е начин коалицията „да угоди на поредното НПО, свързано със Сорос-Морос“. На това Явор Божанков реагира:

Аз не знам кога „Възраждане“ са против НПО-тата и кога ги дразнят. Когато жената на Копейката усвоява пари с НПО-та от лоши западни компании или по принцип не им харесват?

Реторичният въпрос на Божанков се отнася за информацията, че НПО-то, в управлението на което е Велина Костадинова, съпругата на Костадинов (наричан от свои критици „Костя Копейкин“), е получило финансиране от „Lidl България“ чрез Фондация „Работилница за граждански инициативи“ (ФРГИ). ФРГИ е неправителствена организация, разпределяща грантове от организации като „Отворено общество“ на Джордж Сорос и „Америка за България“, които председателят на „Възраждане“ редовно обругава. Освен това през август 2022 г. Костадинов поде кампания за изгонването на Lidl от България. Изглежда, това е станало по същото време, когато НПО-то на съпругата му е кандидатствало за финансиране от търговската верига.

Думи с двойно дъно

На питането на Божанков Костадинов реагира така, както обикновено прави – игнорирайки естеството на въпроса и с аргументи ad hominem, тъй щото да се отвлече вниманието на избирателите му и на учениците, наблюдаващи заседанието в парламентарната зала. Без да се обръща директно към говорещия преди него, той заяви по негов адрес:

Заради младите хора, които са горе, не трябва да допускаме тази трибуна да става място за изява на сбъркани хора с нетрадиционна ориентация. Като под „нетрадиционна“ имам предвид такива, които преминават от лява в дясна партия в рамките на само една седмица без абсолютно никакво замисляне и без никакъв проблем с вземането на 90-градусовия завой. Не позволявайте на сбъркани хора с нетрадиционна ориентация да превръщат това място тук в клоака, в позорище, което да служи за присмех и подигравка.

Председателят на „Възраждане“ не пропуска да отправи обидна квалификация и към коалицията ПП–ДБ:

Аз мога да говоря за коалиция „Сорос–Донос“ от днес до края на света и смятам, че има много неща да си кажем. Не бива да позволявате по никакъв начин такива пропаднали субекти да определят начина, по който Народното събрание функционира.

Освен че напада политическия си противник, за да избегне неудобния факт, Костадинов изрича и невярно твърдение. По-точно, с един удар „застрелва“ няколко неверни „заека“. Явор Божанков беше изключен на 9 декември 2022 г. от парламентарната група на БСП заради позицията си против руската агресия срещу Украйна и до разпускането на 48-мото НС остана независим депутат. Стана ясно, че ще се кандидатира за депутат от коалицията ПП–ДБ, повече от два месеца след изключването – на 16 февруари т.г. Освен това никога не е бил член нито на БСП, нито на някоя от партиите в ПП–ДБ, така че не е преминал от лява в дясна партия. Друг е въпросът, че не всички партии в тази коалиция се идентифицират като десни.

Изричайки по адрес на Божанков два пъти „сбъркани хора с нетрадиционна ориентация“, Костадинов директно се опитва да го обиди, намеквайки предполагаема хомосексуална ориентация на опонента си. Същевременно дава да се разбере, че това за „нетрадиционната ориентация“ го е казал в образен смисъл, имайки предвид преминаването на Божанков в друга парламентарна група.

Що се отнася до квалификацията „Сорос–Донос“, председателят на „Възраждане“ редовно нарича „Демократична България“ „Доносническа България“ и това остава без последствия дори когато се случва пред президента Румен Радев.

Костадин Костадинов извън зоната си на комфорт

Това, което издателят Манол Пейков, станал особено популярен с успешните си благотворителни кампании, направи с репликата си, е да демонстрира какво е, когато към Костадинов се прилагат същите двусмислени (и не непременно фактологически верни) аргументи ad hominem, каквито той използва срещу политическите си противници. Както „нетрадиционната ориентация“ на Божанков се отнася към смяната на парламентарната му група, така и „Късопишков“ може да означава „човек, който пише с къси букви“, нали така?

Другото, което направи Пейков с репликата си, е да покаже, че обидните квалификации по адрес на ДБ, отправяни публично и особено пред президента, няма да останат без последствия:

И запомнете, че за всеки следващ път, когато в присъствието на президента използвате подобни думи, ние ще намерим начин да използваме същите срещу Вас. Просто не го правете. По този начин диалогът – цивилизованият диалог – отива на кино и нещата минават на съвсем друго, махленско ниво. Така политика не се прави.

След изказването на Пейков Костадинов изглеждаше така, както не сме свикнали да го виждаме. За разлика от политическия си предшественик Волен Сидеров, председателят на „Възраждане“ обикновено е изключително овладян и премерен. Той не е склонен към спонтанност – при него обидите и словото на омразата са добре премислени и с калкулиран електорален ефект. По този начин успява да увеличава популярността си методично, година след година, докато „Възраждане“ се превърна от маргинална партия в трета политическа сила в парламента (с тенденция да увеличава още подкрепата си).

И изведнъж в парламентарната зала видяхме един Костадин Костадинов, който не може да се овладее. Той избълва куп обвинения и обиди – „утайка“ (няколко пъти), „просташки тон“ и за пореден път – „Доносническа България“ (въпреки че е интересно какъв е „доносът“ в наричането на някого „Късопишков“). И си позволи нещо като заплаха, ако не стане на неговото:

Господин Председател, направете необходимото да изхвърлите тази утайка от парламента, защото иначе ще изхвърлим утайката ние.

Но си остана в губеща позиция.

Кутията на Пандора и безграничното въображение

Макар от институционална гледна точка сблъсъкът между Пейков и Костадинов да завърши 1:1 (и двамата получиха забележка), това, което последва, е огромна победа за издателя и също толкова огромен дискомфорт за председателя на „Възраждане“. Социалните мрежи буквално преляха от всевъзможни колажи, карикатури и мемета, заиграващи се с „Късопишков“ и „късите букви“. Да се чуди човек как занапред Костадинов ще си поръчва кафе – и късо да го поиска, и дълго, все може да му се смеят.

Самият Пейков публикува на профила си във Facebook някои от произведенията на знайни и незнайни творци (междувременно те станаха още повече), като в коментара си написа:

Истината е, че когато в края на изказването си го предупредих, че въображението ни е безгранично, сам не си давах сметка каква Кутия на Пандора съм открехнал пред смаяния му [на Костадинов, б.р.] поглед. Колективното „ни“ поде моята идея, вдигна я на крилете си и изпрати на бабаита стотици недвусмислени съобщения, под формата на забавни и саркастични мемета – някои от които дълго ще го навестяват в съня му.

Полетът на въображението, неволно тласнат от Манол Пейков, успя да надхвърли рамките на индивидуалните инициативи и вдъхнови реклама на верига ресторанти за бургери. Тя публикува в социалните мрежи постер, в който се рекламират „къси картофки“, и го придружи с обяснителен текст: „С това късо копи искаме да ви кажем, че днес в Skapto можете да си купите евтино К.К. – нашите нови къси картофки. Същите, нарязани на ръка, двойно изпържени, оваляни в домашната ни подправка пържени картофки, но много по-къси. И се продават за евро. 1,49 евро по-точно. В кеш. Рестото ще ви върнем в стабилен лев.“

Закачката тук е не само с името на Костадинов и „късите“ работи, а и с непоследователността и двойните стандарти на „Възраждане“. Конкретно с факта, че партията организира референдум против еврото, а в рамките на кампанията за същия референдум събира дарения в евро. Вместо това партията би могла да използва някоя от платформите, позволяващи финансиране чрез дарения в националната валута на България, за чието „спасяване“ уж се бори – и на България, и на валутата ѝ. Но очевидно не го прави.

Карнавална радост сред дългите политически пости

Репликата на Манол Пейков катализира толкова бурно въображение, защото много хора, които не харесват Костадинов, „Възраждане“ и изобщо националпопулизма, я възприеха като отдушник. От 2017 до 2021 г. ГЕРБ управлява съвместно с националистически партии, чиято идентичност се крепи на омраза към другия, който и да е той. След кратко отдъхване „Възраждане“ влезе в парламента и всеки следващ път получава все повече гласове на парламентарните избори.

Костадинов системно си позволява език на омразата и неверни твърдения по адрес на една или друга социална група, както и на демократичните си политически опоненти. А те в общия случай реагират просто с чакане „и това да мине“ – да не би случайно да им пострада рейтингът, който обаче страда и без реакция от тяхна страна. Това става в контекста на постепенното изместване на ориентацията на България от Европейския съюз към недемократични режими.

В тази ситуация изказването на Манол Пейков имаше спонтанен ефект на психологическо освобождение. Така както средновековните карнавали в католическите страни и области служат за разпускане преди поредните пости. При всички опити карнавалите да бъдат сложени под контрол, те включват различни форми на пародиране на съществуващия социален ред. И правене на неща, които в нормална ситуация не биха останали без последствия.

На някои места (например в Кьолн) в наши дни карнавалът включва и форми на остра политическа критика към силните на деня. В Германия обаче дехуманизиращи квалификации, каквито използва Костадин Костадинов, като „подчовеци“ (нацистка дума, в оригинал Untermenschen), „нечовешка сган“, „паразити“, „сбъркани хора с нетрадиционна ориентация“, „пропаднали субекти“, „утайка“ и пр., не биха останали без сериозни политически (че и юридически) последствия. Защото след Втората световна война страната си е „научила урока“ по трудния начин. В българската институционална среда обаче прагът на непоносимост към тъпченето на човешкото достойнство и дехуманизацията е толкова висок, че понякога комай помага само принципът „с твоите камъни по твоята глава“.

А след спонтанния карнавал?

Както стана дума, след карнавала идват постите. По тази логика може да очакваме, че масовият изблик на въображение, породен от репликата на Манол Пейков към Костадин Костадинов, ще утихне така, както се е появил. И ще се върнем в безрадостното всекидневие, в което възпитано ще търпим все по-големи и по-големи прояви на дехуманизация. Или ще реагираме по начини, от които няма да има особен ефект.

Може обаче да се окаже, че Пейков е поставил началото на края на „Възраждане“. Защото, ако веднъж си видял царя гол, трудно ще забравиш тази гледка в колкото и царствени одежди да го гледаш после. Може пък да е дошло време за възраждане. Но не националпопулистко, а като ренесанс. Ала за да се случи то, не може да се разчита само на единия Манол Пейков.

Safer deployment of streaming applications

Post Syndicated from Grab Tech original https://engineering.grab.com/safer-flink-deployments

The Flink framework has gained popularity as a real-time stateful stream processing solution for distributed stream and batch data processing. Flink also provides data distribution, communication, and fault tolerance for distributed computations over data streams. To fully leverage Flink’s features, Coban, Grab’s real-time data platform team, has adopted Flink as part of our service offerings.

In this article, we explore how we ensure that deploying Flink applications remain safe as we incorporate the lessons learned through our journey to continuous delivery.

Background

Figure 1. Flink platform architecture within Coban

Users interact with our systems to develop and deploy Flink applications in three different ways.

Firstly, users create a Merge Request (MR) to develop their Flink applications on our Flink Scala repository, according to business requirements. After the MR is merged, GitOps Continuous Integration/Continuous Deployment (CI/CD) automatically runs and dockerises the application, allowing the containerised applications to be deployed easily.

Secondly, users create another MR to our infrastructure as a code repository. The GitOps CI/CD that is integrated with Terraform runs and configures the created Spinnaker application. This process configures the Flink application that will be deployed.

Finally, users trigger the actual deployment of the Flink applications on Spinnaker, which orchestrates the deployment of the Docker image onto our Kubernetes cluster. Flink applications are deployed as standalone clusters in Grab to ensure resource isolation.

Problem

The main issue we noticed with streaming pipelines like these, is that they are often interconnected, where application A depends on application B’s output. This makes it hard to find a solution that perfectly includes integration tests and ensures that propagated changes do not affect downstream applications.

However, this problem statement is too large to solve with a single solution. As such, we are narrowing the problem statement to focus on ensuring safety of our applications, where engineers can deploy Flink applications that will be rolled back if they fail health checks. In our case, the definition of a Flink application’s health is limited to the uptime of the Flink application itself.

It is worth noting that Flink applications are designed to be stateful streaming applications, meaning a “state” is shared between events (stream entities) and thus, past events can influence the way current events are processed. This also implies that traditional deployment strategies do not apply to the deployment of Flink applications.

Current strategy

Figure 2. Current deployment stages

In Figure 2, our current deployment stages are split into three parts:

  1. Delete current deployment: Remove current configurations (if applicable) to allow applications to pick up the new configurations.
  2. Bake (Manifest): Bake the Helm charts with the provided configurations.
  3. Deploy (Manifest): Deploy the charts onto Kubernetes.

Over time, we learnt that this strategy can be risky. Part 2 can result in a loss of Flink application states due to how internal CI/CD processes are set up. There is also no easy way to rollback if an issue arises. Engineers will need to revert all config changes and rollback the deployment manually by re-deploying the older Docker image – which results in slower operation recovery.

Lastly, there are no in-built monitoring mechanisms that perform regular health probes. Engineers need to manually monitor their applications to see if their deployment was successful or if they need to perform a rollback.

With all these issues, deploying Flink applications for engineers are often stressful and fraught with uncertainty. Common mitigation strategies are canary and blue-green deployments, which we cover in the next section.

Canary deployments

Figure 3. Canary deployment

In canary deployments, you gradually roll out new versions of the application in parallel with the production version, while serving a percentage of total traffic before promoting it gradually.

This does not work for Flink deployments due to the nature of stream processing. Applications are frequently required to do streaming operations like stream joining, which involves matching related events in different Kafka topics. So, if a Flink application is only receiving a portion of the total traffic, the data generated will be considered inaccurate due to incomplete data inputs.

Blue-green deployments

Figure 4. Blue-green deployment

Blue-green deployments work by running two versions of the application with a Load Balancer that acts as a traffic switch, which determines which version traffic is directed to.

This method might work for Flink applications if we only allow one version of the application to consume Kafka messages at any point in time. However, we noticed some issues when switching traffic to another version. For example, the state of both versions will be inconsistent because of the different data traffic each version receives, which complicates the process of switching Kafka consumption traffic.

So if there’s a failure and we need to rollback from Green to Blue deployment, or vice versa, we will need to take an extra step and ensure that before the failure, the data traffic received is exactly the same for both deployments.

Solution

As previously mentioned, it is crucial for streaming applications to ensure that at any point in time, only one application is receiving data traffic to ensure data completeness and accuracy. Although employing blue-green deployments can technically fulfil this requirement, the process must be modified to handle state consistency such that both versions have the same starting internal state and receive the same data traffic as each other, if a rollback is needed.

Figure 5. Visualised deployment flow

This deployment flow will operate in the following way:

  1. Collect metadata regarding current application
  2. Take savepoint and stop the current application
  3. Clear up high availability configurations
  4. Bake and deploy the new application
  5. Monitor application and rollback if the health check fails

Let’s elaborate on the key changes implemented in this new process.

Savepointing

Flink’s savepointing feature helps address the issue of state consistency and ensures safer deployments.

A savepoint in Flink is a snapshot of a Flink application’s state at the point in time. This savepoint allows us to pause the Flink application and restore the application to this snapshot state, if there’s an issue.

Before deploying a Flink application, we perform a savepoint via the Flink API before killing the current application. This would enable us to save the current state of the Flink application and rollback if our deployment fails – just like how you would do a quick save before attempting a difficult level when playing games. This mechanism ensures that both deployment versions have the same internal state during deployment as they both start from the same savepoint.

Additionally, this feature allows us to easily handle Kafka offsets since these consumed offsets are stored as part of the savepoint. As Flink manages their own state, they don’t need to rely on Kafka’s consumer offset management. With this savepoint feature, we can ensure that the application receives the same data traffic post rollback and that no messages are lost due to processing on the failed version.

Monitoring

To consistently monitor Flink applications, we can conduct health probes to the respective API endpoints to check if the application is stuck in a restart state or if it is running healthily.

We also configured our monitoring jobs to wait for a few minutes for the deployment to stabilise before probing it over a defined duration, to ensure that the application is in a stable running state.

Rollback

If the health checks fail, we then perform an automatic rollback. Typically, Flink applications are deployed as a standalone cluster and a rollback involves changes in one of the following:

  • Application and Flink configurations
  • Taskmanager or Jobmanager resource provision

For configuration changes, we leverage the fact that Spinnaker performs versioned deployment of configmap resources. In this case, a rollback simply involves mounting the old configmap back onto the Kubernetes deployment.

To retrieve the old version of the configmap mount, we can simply utilise Kubernetes’ rollback mechanisms – Kubernetes updates a deployment by creating a new replicaset with an incremental version before attaching it to the current deployment and scaling the previous replicaset to 0. To retrieve previous deployment specs, we just need to list all replicasets related to the deployment and find the previous deployed version, before updating the current deployment to mimic the previous template specifications.

However, this deployment does not contain the number of replicas of previously configured task managers. Kubernetes does not register the number of replicas as part of deployment configuration as this is a dynamic configuration and might be changed during processing due to auto scaling operations.

Our Flink applications are deployed as standalone clusters and do not use native or yarn resource providers. Coupled with the fact that Flink has strict resource provision, we realised that we do not have enough information to perform rollbacks, without the exact number of replicas created.

Taskmanager or Jobmanager resource provision changes

To gather information about resource provision changes, we can simply include the previously configured number of replicas as part of our metadata annotation. This allows us to retrieve it in future during rollback.

Making this change involves creating an additional step of metadata retrieval to retrieve and store previous deployment states as annotations of the new deployment.

Impact

With this solution, the deployment flow on Spinnaker looks like this:

Figure 6. New deployment flow on Spinnaker

Engineers no longer need to monitor the deployment pipeline as closely as they get notified of their application’s deployment status via Slack. They only need to interact or take action when they get notified that the different stages of the deployment pipeline are completed.

Figure 7. Slack notifications on deployment status

It is also easier to deploy Flink applications since failures and rollbacks are handled automatically. Furthermore, application state management is also automated, which reduces the amount of uncertainties.

What’s next?

As we work to further improve our deployment pipeline, we will look into extending the capabilities at our monitoring stage to allow engineers to define and configure their own health probes, allowing our deployment configurations to be more extendable.

Another interesting improvement will be to make this deployment flow seamlessly, ensuring as little downtime as possible by minimising cold start duration.

Coban also looks forward to pushing more features on our Flink platform to enable our engineers to explore more use cases that utilises real-time data to allow our operations to become auto adaptive and make data-driven decisions.

References

Join us

Grab is the leading superapp platform in Southeast Asia, providing everyday services that matter to consumers. More than just a ride-hailing and food delivery app, Grab offers a wide range of on-demand services in the region, including mobility, food, package and grocery delivery services, mobile payments, and financial services across 428 cities in eight countries.

Powered by technology and driven by heart, our mission is to drive Southeast Asia forward by creating economic empowerment for everyone. If this mission speaks to you, join our team today!

Week in Review – AWS Verified Access, Java 17, Amplify Flutter, Conferences, and More – May 1, 2023

Post Syndicated from Sébastien Stormacq original https://aws.amazon.com/blogs/aws/week-in-review-aws-verified-access-java-17-amplify-flutter-conferences-and-more-may-1-2023/

Conference season has started and I was happy to meet and talk with iOS and Swift developers at the New York Swifty conference last week. I will travel again to Turino (Italy), Amsterdam (Netherlands), Frankfurt (Germany), and London (UK) in the coming weeks. Feel free to stop by and say hi if you are around. But, while I was queuing for passport control at JFK airport, AWS teams continued to listen to your feedback and innovate on your behalf.

What happened on AWS last week ? I counted 26 new capabilities since last Monday (not counting last Friday, since I am writing these lines before the start of the day in the US). Here are the eight that caught my attention.

Last Week on AWS

Amplify Flutter now supports web and desktop apps. You can now write Flutter applications that target six platforms, including iOS, Android, Web, Linux, MacOS, and Windows with a single codebase. This update encompasses not only the Amplify libraries but also the Flutter Authenticator UI library, which has been entirely rewritten in Dart. As a result, you can now deliver a consistent experience across all targeted platforms.

AWS Lambda adds support for Java 17. AWS Lambda now supports Java 17 as both a managed runtime and a container base image. Developers creating serverless applications in Lambda with Java 17 can take advantage of new language features including Java records, sealed classes, and multi-line strings. The Lambda Java 17 runtime also has numerous performance improvements, including optimizations when running Lambda functions on Graviton 2 processors. It supports AWS Lambda Snap Start (in supported Regions) for fast cold starts, and the latest versions of the popular Spring Boot 3 and Micronaut 4 application frameworks

AWS Verified Access is now generally available. I first wrote about Verified Access when we announced the preview at the re:Invent conference last year. AWS Verified Access is now available. This new service helps you provide secure access to your corporate applications without using a VPN. Built based on AWS Zero Trust principles, you can use Verified Access to implement a work-from-anywhere model with added security and scalability.

AWS Support is now available in Korean. As the number of customers speaking Korean grows, AWS Support is invested in providing the best support experience possible. You can now communicate with AWS Support engineers and agents in Korean when you create a support case at the AWS Support Center.

AWS DataSync Discovery is now generally available. DataSync Discovery enables you to understand your on-premises storage performance and capacity through automated data collection and analysis. It helps you quickly identify data to be migrated and evaluate suggested AWS Storage services that align with your performance and capacity needs. Capabilities added since preview include support for NetApp ONTAP 9.7, recommendations at cluster and storage virtual machine (SVM) levels, and discovery job events in Amazon EventBridge.

Amazon Location Service adds support for long-distance matrix routing. This makes it easier for you to quickly calculate driving time and driving distance between multiple origins and destinations, no matter how far apart they are. Developers can now make a single API request to calculate up to 122,500 routes (350 origins and 350 destinations) within a 180 km region or up to 100 routes without any distance limitation.

AWS Firewall Manager adds support for multiple administrators. You can now create up to 10 AWS Firewall Manager administrator accounts from AWS Organizations to manage your firewall policies. You can delegate responsibility for firewall administration at a granular scope by restricting access based on OU, account, policy type, and Region, thereby enabling policy management tasks to be implemented faster and more effectively.

AWS AppSync supports TypeScript and source maps in JavaScript resolvers. With this update, you can take advantage of TypeScript features when you write JavaScript resolvers. With the updated libraries, you get improved support for types and generics in AppSync’s utility functions. The updated AppSync documentation provides guidance on how to get started and how to bundle your code when you want to use TypeScript.

Amazon Athena Provisioned Capacity. Athena is a query service that makes it simple to analyze data in S3 data lakes and 30 different data sources, including on-premises data sources or other cloud systems, using standard SQL queries. Athena is serverless, so there is no infrastructure to manage, and–until today–you pay only for the queries that you run. Starting last week, you can now get dedicated capacity for your queries and use new workload management features to prioritize, control, and scale your most important queries, paying only for the capacity you provision.

X in Y – We made existing services available in additional Regions and locations:

Upcoming AWS Events
And to finish this post, I recommend you check your calendars and sign up for these AWS events:

AWS Serverless Innovation DayJoin us on May 17, 2023, for a virtual event hosted on the Twitch AWS channel. We will showcase AWS serverless technology choices such as AWS Lambda, Amazon ECS with AWS Fargate, Amazon EventBridge, and AWS Step Functions. In addition, we will share serverless modernization success stories, use cases, and best practices.

AWS re:Inforce 2023 – Now register for AWS re:Inforce, in Anaheim, California, June 13–14. AWS Chief Information Security Officer CJ Moses will share the latest innovations in cloud security and what AWS Security is focused on. The breakout sessions will provide real-world examples of how security is embedded into the way businesses operate. To learn more and get the limited discount code to register, see CJ’s blog post Gain insights and knowledge at AWS re:Inforce 2023 in the AWS Security Blog.

AWS Global Summits – Check your calendars and sign up for the AWS Summit close to where you live or work: Seoul (May 3–4), Berlin and Singapore (May 4), Stockholm (May 11), Hong Kong (May 23), Amsterdam (June 1), London (June 7), Madrid (June 15), and Milano (June 22).

AWS Community Day – Join community-led conferences driven by AWS user group leaders close to your city: Chicago (June 15), Manila (June 29–30), and Munich (September 14). Recently, we have been bringing together AWS user groups from around the world into Meetup Pro accounts. Find your group and its meetups in your city!

AWS User Group Peru Conference – There is more than a new edge location opening in Lima. The local AWS User Group announced a one-day cloud event in Spanish and English in Lima on September 23. Three of us from the AWS News blog team will attend. I will be joined by my colleagues Marcia and Jeff. Save the date and register today!

You can browse all upcoming AWS-led in-person and virtual events and developer-focused events such as AWS DevDay.

Stay Informed
That was my selection for this week! To better keep up with all of this news, don’t forget to check out the following resources:

That’s all for this week. Check back next Monday for another Week in Review!

— seb

This post is part of our Week in Review series. Check back each week for a quick roundup of interesting news and announcements from AWS!

How to monitor the expiration of SAML identity provider certificates in an Amazon Cognito user pool

Post Syndicated from Karthik Nagarajan original https://aws.amazon.com/blogs/security/how-to-monitor-the-expiration-of-saml-identity-provider-certificates-in-an-amazon-cognito-user-pool/

With Amazon Cognito user pools, you can configure third-party SAML identity providers (IdPs) so that users can log in by using the IdP credentials. The Amazon Cognito user pool manages the federation and handling of tokens returned by a configured SAML IdP. It uses the public certificate of the SAML IdP to verify the signature in the SAML assertion returned by the IdP. Public certificates have an expiry date, and an expired public certificate will result in a SAML user federation failing because it can no longer be used for signature verification. To avoid user authentication failures, you must monitor and rotate SAML public certificates before expiration.

You can configure SAML IdPs in an Amazon Cognito user pool by using a SAML metadata document or a URL that points to the metadata document. If you use the SAML metadata document option, you must manually upload the SAML metadata. If you use the URL option, Amazon Cognito downloads the metadata from the URL and automatically configures the SAML IdP. In either scenario, if you don’t rotate the SAML certificate before expiration, users can’t log in using that SAML IdP.

In this blog post, I will show you how to monitor SAML certificates that are about to expire or already expired in an Amazon Cognito user pool by using an AWS Lambda function initiated by an Amazon EventBridge rule.

Solution overview

In this section, you will learn how to configure a Lambda function that checks the validity period of the SAML IdP certificates in an Amazon Cognito user pool, logs the findings to AWS Security Hub, and sends out an Amazon Simple Notification Service (Amazon SNS) notification with the list of certificates that are about to expire or have already expired. This Lambda function is invoked by an EventBridge rule that uses a rate or cron expression and runs on a defined schedule. For example, if the rate expression is defined as 1 day, the EventBridge rule initiates the Lambda function once each day. Figure 1 shows an overview of this process.

Figure 1: Lambda function initiated by EventBridge rule

Figure 1: Lambda function initiated by EventBridge rule

As shown in Figure 1, this process involves the following steps:

  1. EventBridge runs a rule using a rate expression or cron expression and invokes the Lambda function.
  2. The Lambda function performs the following tasks:
    1. Gets the list of SAML IdPs and corresponding X509 certificates.
    2. Verifies if the X509 certificates are about to expire or already expired based on the dates in the certificate.
  3. Based on the results of step 2, the Lambda function logs the findings in AWS Security Hub. Each finding shows the SAML certificate that is about to expire or is already expired.
  4. Based on the results of step 2, the Lambda function publishes a notification to the Amazon SNS topic with the certificate expiration details. For example, if CERT_EXPIRY_DAYS=60, the details of SAML certificates that are going to expire within 60 days or are already expired are published in the SNS notification.
  5. Amazon SNS sends messages to the subscribers of the topic, such as an email address.

Prerequisites

For this setup, you will need to have the following in place:

Implementation details

In this section, we will walk you through how to deploy the Lambda function and configure an EventBridge rule that invokes the Lambda function.

Step 1: Create the Node.js Lambda package

  1. Open a command line terminal or shell.
  2. Create a folder named saml-certificate-expiration-monitoring.
  3. Install the fast-xml-parser module by running the following command:
    cd saml-certificate-expiration-monitoring
    npm install fast-xml-parser
  4. Create a file named index.js and paste the following content in the file.
    const AWS = require('aws-sdk');
    const { X509Certificate } = require('crypto');
    const { XMLParser} = require("fast-xml-parser");
    const https = require('https');
    
    exports.handler = async function(event, context, callback) {
      
        const cognitoUPID = process.env.COGNITO_UPID;
        const expiryDays = process.env.CERT_EXPIRY_DAYS;
        const snsTopic = process.env.SNS_TOPIC_ARN;
        const postToSh = process.env.ENABLE_SH_MONITORING; //Enable security hub monitoring
        var securityhub = new AWS.SecurityHub({apiVersion: '2018-10-26'});
        
        var shParams = {
          Findings: []
        };
    
        AWS.config.apiVersions = {
          cognitoidentityserviceprovider: '2016-04-18',
        };
    
        // Initialize CognitoIdentityServiceProvider.
        const cognitoidentityserviceprovider = new AWS.CognitoIdentityServiceProvider();
    
        let listProvidersParams = {
          UserPoolId: cognitoUPID /* required */
        };
        
        let hasNext = true;
        const providerNames = [];
        
        while (hasNext) {
          const listProvidersResp = await cognitoidentityserviceprovider.listIdentityProviders(listProvidersParams).promise();
          listProvidersResp['Providers'].forEach(function(provider) {
                if(provider.ProviderType == 'SAML') {
                  providerNames.push(provider.ProviderName);
                }
            });
          
          listProvidersParams.NextToken = listProvidersResp.NextToken;
          hasNext = !!listProvidersResp.NextToken; //Keep iterating if there are more pages
        }
     
        let describeIdentityProviderParams = {
          UserPoolId: cognitoUPID /* required */
        };
        
        //Initialize the options for fast-xml-parser  
        //Parse KeyDescriptor as an array
        const alwaysArray = [
          "EntityDescriptor.IDPSSODescriptor.KeyDescriptor"
        ];
        const options = {
          removeNSPrefix: true,
          isArray: (name, jpath, isLeafNode, isAttribute) => { 
            if( alwaysArray.indexOf(jpath) !== -1) return true;
          },
          ignoreDeclaration: true
        };
        const parser = new XMLParser(options);
        
        let certExpMessage = '';
        const today = new Date();
        
        if(providerNames.length == 0) {
          console.log("There are no SAML providers in this Cognito user pool. ID : " + cognitoUPID);
        }
        
        for (let provider of providerNames) {
          describeIdentityProviderParams.ProviderName = provider;
          const descProviderResp = await cognitoidentityserviceprovider.describeIdentityProvider(describeIdentityProviderParams).promise();
          let xml = '';
          //Read SAML metadata from Cognito if the file is available. Else, read the SAML metadata from URL
          if('MetadataFile' in descProviderResp.IdentityProvider.ProviderDetails) {
            xml = descProviderResp.IdentityProvider.ProviderDetails.MetadataFile;
          } else {
            let metadata_promise = getMetadata(descProviderResp.IdentityProvider.ProviderDetails.MetadataURL);
    		    xml = await metadata_promise;
          }
          let jObj = parser.parse(xml);
          if('EntityDescriptor' in jObj) {
            //SAML metadata can have multiple certificates for signature verification. 
            for (let cert of jObj['EntityDescriptor']['IDPSSODescriptor']['KeyDescriptor']) {
              let certificate = '-----BEGIN CERTIFICATE-----\n' 
              + cert['KeyInfo']['X509Data']['X509Certificate'] 
              + '\n-----END CERTIFICATE-----';
              let x509cert = new X509Certificate(certificate);
              console.log("------ Provider : " + provider + "-------");
              console.log("Cert Expiry: " + x509cert.validTo);
              const diffTime = Math.abs(new Date(x509cert.validTo) - today);
              const diffDays = Math.ceil(diffTime / (1000 * 60 * 60 * 24));
              console.log("Days Remaining: " + diffDays);
              if(diffDays <= expiryDays) {
                
                certExpMessage += 'Provider name: ' + provider + ' SAML certificate (serialnumber : '+ x509cert.serialNumber + ') expiring in ' + diffDays + ' days \n';
                
                if(postToSh === 'true') {
                  //Log finding for security hub
                  logFindingToSh(context, shParams,
                  'Provider name: ' + provider + ' SAML certificate is expiring in ' + diffDays + ' days. Please contact the Identity provider to rotate the certificate.',
                  x509cert.fingerprint, cognitoUPID, provider); 
                }
              }
            }
          }
        }
        //Send a SNS message if a certificate is about to expire or already expired
        if(certExpMessage) {
          console.log("SAML certificates expiring within next " + expiryDays + " days :\n");
          console.log(certExpMessage);
          certExpMessage = "SAML certificates expiring within next " + expiryDays + " days :\n" + certExpMessage;
          // Create publish parameters
          let snsParams = {
            Message: certExpMessage, /* required */
            TopicArn: snsTopic
          };
          // Create promise and SNS service object
          let publishTextPromise = await new AWS.SNS({apiVersion: '2010-03-31'}).publish(snsParams).promise();
          console.log(publishTextPromise);
          
          if(postToSh === 'true') {
            console.log("Posting the finding to SecurityHub");
            let shPromise = await securityhub.batchImportFindings(shParams).promise();
            console.log("shPromise : " + JSON.stringify(shPromise));
          }
          
        } else {
          console.log("No certificates are expiring within " + expiryDays + " days");
        }
    };
    
    function getMetadata(url) {
    	return new Promise((resolve, reject) => {
    		https.get(url, (response) => {
    			let chunks_of_data = [];
    
    			response.on('data', (fragments) => {
    				chunks_of_data.push(fragments);
    			});
    
    			response.on('end', () => {
    				let response_body = Buffer.concat(chunks_of_data);
    				resolve(response_body.toString());
    			});
    
    			response.on('error', (error) => {
    				reject(error);
    			});
    		});
    	});
    }
    
    function logFindingToSh(context, shParams, remediationMsg, certFp, cognitoUPID, provider) {
      const accountID = context.invokedFunctionArn.split(':')[4];
      const region = process.env.AWS_REGION;
      const sh_product_arn = `arn:aws:securityhub:${region}:${accountID}:product/${accountID}/default`;
      const today = new Date().toISOString();
      
      shParams.Findings.push(
            {
          SchemaVersion: "2018-10-08",
          AwsAccountId: `${accountID}`, /* required */
          CreatedAt: `${today}`, /* required */
          UpdatedAt: `${today}`,
          Title: 'SAML Certificate expiration',
          Description: 'SAML certificate expiry', /* required */
          GeneratorId: `${context.invokedFunctionArn}`, /* required */
          Id: `${cognitoUPID}:${provider}:${certFp}`, /* required */
          ProductArn: `${sh_product_arn}`, /* required */
          Severity: {
              Original: '89.0',
              Label: 'HIGH'
          },
          Types: [
                    "Software and Configuration Checks/AWS Config Analysis"
          ],
          Compliance: {Status: 'WARNING'},
          Resources: [ /* required */
            {
              Id: `${cognitoUPID}`, /* required */
              Type: 'AWSCognitoUserPool', /* required */
              Region: `${region}`,
              Details : {
                Other: { 
                           "IdPIdentifier" : `${provider}` 
                }
              }
            }
          ],
          Remediation: {
                    Recommendation: {
                        Text: `${remediationMsg}`,
                        Url: `https://console.aws.amazon.com/cognito/v2/idp/user-pools/${cognitoUPID}/sign-in/identity-providers/details/${provider}`
                    }
          }
        }
      );
    }
  5. To create the deployment package for a .zip file archive, you can use a built-in .zip file archive utility or other third-party zip file utility. If you are using Linux or Mac OS, run the following command.
    zip -r saml-certificate-expiration-monitoring.zip .

Step 2: Create an Amazon SNS topic

  1. Create a standard Amazon SNS topic named saml-certificate-expiration-monitoring-topic for the Lambda function to use to send out notifications, as described in Creating an Amazon SNS topic.
  2. Copy the Amazon Resource Name (ARN) for Amazon SNS. Later in this post, you will use this ARN in the AWS Identity and Access Management (IAM) policy and Lambda environment variable configuration.
  3. After you create the Amazon SNS topic, create email subscribers to this topic.

Step 3: Configure the IAM role and policies and deploy the Lambda function

  1. In the IAM documentation, review the section Creating policies on the JSON tab. Then, using those instructions, use the following template to create an IAM policy named lambda-saml-certificate-expiration-monitoring-function-policy for the Lambda role to use. Replace <REGION> with your Region, <AWS-ACCT-NUMBER> with your AWS account ID, <SNS-ARN> with the Amazon SNS ARN from Step 2: Create an Amazon SNS topic, and <USER_POOL_ID> with your Amazon Cognito user pool ID that you want to monitor.
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "AllowLambdaToCreateGroup",
                "Effect": "Allow",
                "Action": "logs:CreateLogGroup",
                "Resource": "arn:aws:logs:<REGION>:<AWS-ACCT-NUMBER>:*"
            },
            {
                "Sid": "AllowLambdaToPutLogs",
                "Effect": "Allow",
                "Action": [
                    "logs:CreateLogStream",
                    "logs:PutLogEvents"
                ],
                "Resource": [
                    "arn:aws:logs:<REGION>:<AWS-ACCT-NUMBER>:log-group:/aws/lambda/saml-certificate-expiration-monitoring:*"
                ]
            },
            {
                "Sid": "AllowLambdaToGetCognitoIDPDetails",
                "Effect": "Allow",
                "Action": [
                    "cognito-idp:DescribeIdentityProvider",
                    "cognito-idp:ListIdentityProviders",
                    "cognito-idp:GetIdentityProviderByIdentifier"
                ],
                "Resource": "arn:aws:cognito-idp:<REGION>:<AWS-ACCT-NUMBER>:userpool/<USER_POOL_ID>"
            },
            {
                "Sid": "AllowLambdaToPublishToSNS",
                "Effect": "Allow",
                "Action": "SNS:Publish",
                "Resource": "<SNS-ARN>"
            } ,
            {
                "Sid": "AllowLambdaToPublishToSecurityHub",
                "Effect": "Allow",
                "Action": [
                    "SecurityHub:BatchImportFindings"
                ],
                "Resource": "arn:aws:securityhub:<REGION>:<AWS-ACCT-NUMBER>:product/<AWS-ACCT-NUMBER>/default"
            }
        ]
    }

  2. After the policy is created, create a role for the Lambda function to use the policy, by following the instructions in Creating a role to delegate permissions to an AWS service. Choose Lambda as the service to assume the role and attach the policy lambda-saml-certificate-expiration-monitoring-function-policy that you created in step 1 of this section. Specify a role named lambda-saml-certificate-expiration-monitoring-function-role, and then create the role.
  3. Review the topic Create a Lambda function with the console within the Lambda documentation. Then create the Lambda function, choosing the following options:
    1. Under Create function, choose Author from scratch to create the function.
    2. For the function name, enter saml-certificate-expiration-monitoring, and for Runtime, choose Node.js 16.x.
    3. For Execution role, expand Change default execution role, select Use an existing role, and select the role created in step 2 of this section.
    4. Choose Create function to open the Designer, and upload the zip file that was created in Step 1: Create the Node.js Lambda package.
    5. You should see the index.js code in the Lambda console.
  4. After the Lambda function is created, you will need to adjust the timeout duration. Set the Lambda timeout to 10 seconds. For more information, see the timeout entry in Configuring functions in the console. If you receive a timeout error, see How do I troubleshoot Lambda function invocation timeout errors?
  5. If you make code changes after uploading, deploy the Lambda function.

Step 4: Create an EventBridge rule

  1. Follow the instructions in creating an Amazon EventBridge rule that runs on a schedule to create a rule named saml-certificate-expiration-monitoring-rule. You can use a rate expression of 24 hours to initiate the event. This rule will invoke the Lambda function once per day.
  2. For Select a target, choose AWS Lambda service.
  3. For Lambda function, select the saml-certificate-expiration-monitoring function that you deployed in Step 3: Configure the IAM role and policies and deploy the Lambda function.

Step 5: Test the Lambda function

  1. Open the Lambda console, select the function that you created earlier, and configure the following environment variables:
    1. Create an environment variable called CERT_EXPIRY_DAYS. This specifies how much lead time, in days, you want to have before the certificate expiration notification is sent.
    2. Create an environment variable called COGNITO_UPID. This identifies the Amazon Cognito user pool ID that needs to be monitored.
    3. Create an environment variable called SNS_TOPIC_ARN and set it to the Amazon SNS topic ARN from Step 2: Create an Amazon SNS topic.
    4. Create an environment variable called ENABLE_SH_MONITORING and set it to true or false. If you set it to true, the Lambda function will log the findings in AWS Security Hub.
  2. Configure a test event for the Lambda function by using the default template and name it TC1, as shown in Figure 2.
    Figure 2: Create a Lambda test case

    Figure 2: Create a Lambda test case

  3. Run the TC1 test case to test the Lambda function. To make sure that the Lambda function ran successfully, check the Amazon CloudWatch logs. You should see the console log messages from the Lambda function. If ENABLE_SH_MONITORING is set to true in the Lambda environment variables, you will see a list of findings in AWS Security Hub for certificates with an expiry of less than or equal to the value of the CERT_EXPIRY_DAYS environment variable. Also, an email will be sent to each subscriber of the Amazon SNS topic.

Cleanup

To avoid future charges, delete the following resources used in this post (if you don’t need them) and disable AWS Security Hub.

  • Lambda function
  • EventBridge rule
  • CloudWatch logs associated with the Lambda function
  • Amazon SNS topic
  • IAM role and policy that you created for the Lambda function

Conclusion

An Amazon Cognito user pool with hundreds of SAML IdPs can be challenging to monitor. If a SAML IdP certificate expires, users can’t log in using that SAML IdP. This post provides the steps to monitor your SAML IdP certificates and send an alert to Amazon Cognito user pool administrators when a certificate is about to expire so that you can proactively work with your SAML IdP administrator to rotate the certificate. Now that you’ve learned the benefits of monitoring your IdP certificates for expiration, I recommend that you implement these, or similar, controls to make sure that you’re notified of these events before they occur.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the Amazon Cognito re:Post or contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Karthik Nagarajan

Karthik Nagarajan

Karthik is Security Engineer with AWS Identity Security Team. He helps the Amazon Cognito team to build a secure product for the customers.

Celebrating Australia’s Privacy Awareness Week 2023

Post Syndicated from Emily Hancock original http://blog.cloudflare.com/celebrating-australia-privacy-awareness-week-2023/

Celebrating Australia’s Privacy Awareness Week 2023

Celebrating Australia’s Privacy Awareness Week 2023

When a country throws a privacy party, Cloudflare is there! We are proud to be an official sponsor of the Australian Privacy Awareness Week 2023, and we think this year’s theme of “Privacy 101: Back to Basics” is more important now than ever. In recent months, Australians have been hit with the news of massive personal data privacy breaches where millions of Australian citizens' private and sensitive data was compromised, seemingly easily. Meanwhile, the Australian Attorney General released its Privacy Act Review Report 2022 earlier this year, calling for a number of changes to Australia’s privacy regulations.

You’re probably familiar with the old-school privacy basics of giving users notice and consent. But we think it’s time for some new “privacy basics”. Thanks to rapid developments in new technologies and new security threat vectors, notice and consent can only go so far to protect the privacy of your personal data. New challenges call for new solutions: security solutions and privacy enhancing technologies to keep personal data protected. Cloudflare is excited to play a role in building and using these technologies to help our customers keep their sensitive information private and enable individual consumers to protect themselves. Investing in and offering these technologies is part of our mission to help build a better Internet – one that is more private and more secure.

Cloudflare is fully committed to supporting Australian individuals and organizations in protecting their and their users’ privacy. We’ve been in Australia since Sydney became Cloudflare’s 15th data center in 2012, and we launched our Australian entity in 2019. We support more than 300 customers in Australia and New Zealand, including some of Australia’s largest banks and online digital natives with our world-leading privacy and security products and services.

For example, Australian tech darling Canva, whose online graphic design tool is used by over 35 million people worldwide each month, uses a number of our solutions that help Canva protect its network from attacks, which in turn ensures that the data of its millions of users is not breached. And we are proud to support Citizens of the Great Barrier Reef, which is a participant of Cloudflare’s Project Galileo. Through Project Galileo, we’ve helped them to secure their origin server from large bursts of traffic or malicious actors attempting to access the website.

This is why we’re proud to support Australia’s Privacy Awareness Week 2023, and we want to share our expertise on how to empower Australian organizations in securing and protecting the privacy of their users. So let’s look at a few key privacy basics and how we think about them at Cloudflare:

  • Minimize the data you collect, and then only use that data for the purpose for which it was collected.
  • Employ reasonable and appropriate security measures — with the bar for what this means going higher every day.
  • Create a culture of privacy by default.

Minimizing personal data in the clear

At Cloudflare, we believe in empowering individuals and entities of all sizes with technological tools to reduce the amount of personal data that gets funneled into the data ocean that is the Internet — regardless of whether someone lives in a country with laws protecting the privacy of their personal data. If we can build tools to help individuals share less personal data online, then that’s a win for privacy no matter what their country of residence.

In 2018, Cloudflare launched the 1.1.1.1 public DNS resolver — the Internet's fastest, privacy-first public DNS resolver. Our public resolver doesn’t retain any personal data about web requests. And because we baked anonymization best practices into the 1.1.1.1 resolver when we built it, we were able to demonstrate that we didn’t have any personal data to sell when we asked independent accountants to conduct a privacy examination of the 1.1.1.1 resolver. And when you combine our 1.1.1.1 public resolver with Warp, our VPN, then your Internet service provider can no longer see every site and app you use—even if they’re encrypted. Which means that even if they wanted to, the ISP can’t sell your data or use it to target you with ads.

We’ve also invested heavily in new technologies that aim to secure Internet traffic from bad actors; the prying eyes of ISPs or other man-in-the-middle machines that might find your Internet communications of interest for advertising purposes; or government entities that might want to crack down on individuals exercising their freedom of speech.

For example, DNS records are like the addresses on the outside of an envelope, and the website content you’re viewing is like the letter inside that envelope. In the snail mail world, courts have long recognized that the address on the outside of a letter doesn’t deserve as much privacy protection as the letter itself. But we’re not living in an age where the only thing someone can tell from the outside of the envelope are the “to” and “from” addresses and place of postage. The digital envelopes of DNS requests can contain much more information about a person than you might expect. Not only is there information about the sender and recipient addresses, but there is specific timestamp information about when requests were submitted, the domains and subdomains visited, and even how long someone stayed on a certain site. Since these digital envelopes contain so much personal information, we think it’s just as important to encrypt this information as to encrypt the contents of the digital letter inside. This is why we doubled down on DNS over HTTPS (DoH).

But we thought we could go further. We were an early supporter of Oblivious DoH (ODoH). ODoH is a proposed DNS standard — co-authored by engineers from Cloudflare, Apple, and Fastly — that separates IP addresses from queries, so that no single entity can see both at the same time. ODoH requires a proxy as a key part of the communication path between client and resolver, with encryption ensuring that the proxy does not know the contents of the DNS query (only where to send it), and the resolver knowing what the query is but not who originally requested it (only the proxy’s IP address). This means the identity of the requester and the content of the request are unlinkable. This technology has formed the basis of Apple’s iCloud Private Relay system, which ensures that no single party handling user data has complete information on both who the user is and what they are trying to access. Cloudflare is proud to serve as a second relay for Apple Private Relay.

But wait – there’s more! We’ve also invested heavily in Oblivious HTTP (OHTTP), an emerging IETF standard and is built upon standard hybrid public-key cryptography. Our Privacy Gateway service relays encrypted HTTP requests and responses between a client and application server. With Privacy Gateway, Cloudflare knows where the request is coming from, but not what it contains, and applications can see what the request contains, but not where it comes from. Neither Cloudflare nor the application server has the full picture, improving end-user privacy.

We recently deployed Privacy Gateway for Flo Health Inc., a leading female health app, for the launch of their Anonymous Mode. With Privacy Gateway in place, all request data for Anonymous Mode users is encrypted between the app user and Flo, which prevents Flo from seeing the IP addresses of those users and Cloudflare from seeing the contents of that request data.

And in the area of analytics, we’ve developed a privacy-first, free web analytics tool. Popular analytics vendors glean visitor and site data in return for web analytics. With business models driven by ad revenue, many analytics vendors track visitor behavior on websites and create buyer profiles to retarget website visitors with ads. But we wanted to give our customers a better option, so they wouldn’t have to sacrifice their visitors’ privacy to get essential and accurate metrics on website usage. Cloudflare Web Analytics works by adding a JavaScript snippet to a website instead of using client-side cookies or instead of fingerprinting individuals using their IP address.

Investing in security to protect data privacy

A key “privacy basic” that is also a fundamental element of almost all data protection legislation globally is the requirement to adopt reasonable and appropriate security measures for the personal data that is being processed. And as was the case with the most recent data breaches in Australia, if personal data is accessed without authorization, poor or failed security measures are often to blame.

Cloudflare's security services enable our customers to screen for cybersecurity risks on Cloudflare's network before those risks can reach the customer's internal network. This helps protect our customers and our customers’ data from a range of cyber threats. By doing so, Cloudflare's services are essentially fulfilling a privacy-enhancing function in themselves. From the beginning, we have built our systems to ensure that data is kept private, even from us, and we have made public policy and contractual commitments about keeping that data private and secure.

But beyond securing our network for the benefit of our customers, Cloudflare is most well-known for its application layer security services – Web Application Firewall (WAF), bot management, DDoS protection, SSL/TLS, Page Shield, and more. We also embrace the critical importance of encryption in transit. In fact, we see encryption as so important that in 2014, Cloudflare introduced Universal SSL to support SSL (and now TLS) connections to every Cloudflare customer. And at the same time, we recognize that blindly passing along encrypted packets would undercut some of the very security that we’re trying to provide. Data privacy and security are a balance. If we let encrypted malicious code get to an end destination, then the malicious code may be used to access information that should otherwise have been protected. If data isn’t encrypted in transit, it’s at risk for interception. But by supporting encryption in transit and ensuring malicious code doesn’t get to its intended destination, we can protect private personal information even more effectively.

Let’s take an example – In June 2022, Atlassian released a Security Advisory relating to a remote code execution (RCE) vulnerability affecting Confluence Server and Confluence Data Center products. Cloudflare responded immediately to roll out a new WAF rule for all of our customers. For customers without this WAF protection, all the trade secret and personal information on their instances of Confluence were potentially vulnerable to data breach. These types of security measures are critical to protecting personal data. And it wouldn’t have mattered if the personal data were stored on a server in Australia, Germany, the U.S., or India – the RCE vulnerability would have exposed data wherever it was stored. Instead, the data was protected because a global network was able to roll out a WAF rule immediately to protect all of its customers globally.

Some of the biggest data breaches in recent years have happened as a result of something pretty simple – an attacker uses a phishing email or social engineering to get an employee of a company to visit a site that infects the employee’s computer with malware or enter their credentials on a fake site that lets the bad actor capture the credentials and then use those to impersonate the employee and log into a company’s systems. Depending on the type of information compromised, these kinds of data breaches can have a huge impact on individuals’ privacy. For this reason, Cloudflare has invested in a number of technologies designed to protect corporate networks, and the personal data on those networks.

As we noted during our CIO week earlier this year, the FBI’s latest Internet Crime Report shows that business email compromise and email account compromise, a subset of malicious phishing campaigns, are the most costly – with U.S. businesses losing nearly $2.4 billion. Cloudflare has invested in a number of Zero Trust solutions to help fight this very problem:

  • Link Isolation means that when an employee clicks a link in an email, it will automatically be opened using Cloudflare’s Remote Browser Isolation technology that isolates potentially risky links, downloads, or other zero-day attacks from impacting that user’s computer and the wider corporate network.
  • With our Data Loss Prevention tools, businesses can identify and stop exfiltration of data.
  • Our Area 1 solution identifies phishing attempts, emails containing malicious code, and emails containing ransomware payloads and prevents them from landing in the inbox of unsuspecting employees.

These Zero Trust tools, combined with the use of hardware keys for multifactor authentication, were key in Cloudflare’s ability to prevent a breach by an SMS phishing attack that targeted more than 130 companies in July and August 2022. Many of these companies reported the disclosure of customer personal information as a result of employees falling victim to this SMS phishing effort.

And remember the Atlassian Confluence RCE vulnerability we mentioned earlier? Cloudflare remained protected not only due to our rapid update of our WAF rules, but also because we use our own Cloudflare Access solution (part of our Zero Trust suite) to ensure that only individuals with Cloudflare credentials are able to access our internal systems. Cloudflare Access verified every request made to a Confluence application to ensure it was coming from an authenticated user.

All of these Zero Trust solutions require sophisticated machine learning to detect patterns of malicious activity, and none of them require data to be stored in a specific location to keep the data safe. Thwarting these kinds of security threats aren’t only important for protecting organizations’ internal networks from intrusion – they are critical for keeping large scale data sets private for the benefit of millions of individuals.

How we do privacy at Cloudflare

All the technologies we build are public examples of how at Cloudflare we put our money where our mouth is when it comes to privacy. We also want to tell you about the ways — some public, some not — we infuse privacy principles at all levels at Cloudflare.

  • Employee education and mindset: An understanding of privacy is core to a Cloudflare employee’s experience right from the start. Employees learn about the role privacy and security play in helping to build a better Internet in their first weeks at Cloudflare. During the comprehensive employee orientation, we stress the role each employee plays in keeping the company and our customers secure. All employees are required to take annual data protection training, and we do targeted training for individual teams, depending on their engagement with personal data, throughout the year.
  • Privacy in product development: Cloudflare employees take privacy-by-design seriously. We develop products and processes with the principles of data minimization, purpose limitation, and data security always front of mind. We have a product development lifecycle that includes performing privacy impact assessments when we may process personal data. We retain personal data we process for as short a time as necessary to provide our services to our customers. We do not track customers’ end users across sites. We don’t sell personal information. We don’t monetize DNS requests. We detect, deter, and deflect bad actors — we’re not in the business of looking at what any one person (or more specifically, browser) is doing when they browse the Internet. That’s not what we’re about.
  • Certifications: In addition to the extensive internal security mechanisms we have in place to protect our customers’ data, we also have become certified under industry standards to demonstrate our commitment to data security. We hold the following certifications: ISO 27001, ISO 27701, ISO 27018, AICPA SOC2 Type II, FedRamp Moderate, PCI DSS 3.2.1, WCAG 2.1 AA and Section 508, C5:2020, and, most recently, the EU Cloud Code of Conduct.
  • Privacy-focused response to government and third-party requests for information: Our respect for our customers' privacy applies with equal force to commercial requests and to government or law enforcement requests. Any law enforcement requests that we receive must strictly adhere to the due process of law and be subject to judicial oversight. We believe that U.S. law enforcement requests for the personal data of a non-U.S. person that conflict with the privacy laws of that person’s country of residence (such as Australia’s Privacy Act) should be legally challenged. We commit in our Data Processing Addendum that we will fight government data requests where such a conflict exists. In addition, it is our policy to notify our customers of a subpoena or other legal process requesting their customer or billing information before disclosure of that information, whether the legal process comes from the government or private parties involved in civil litigation, unless legally prohibited. We also publicly report on the types of requests we receive, as well as our responses, in our semi-annual Transparency Report. Finally, we publicly list certain types of actions that Cloudflare has never taken in response to government requests, and we commit that if Cloudflare were asked to do any of the things on this list, we would exhaust all legal remedies in order to protect our customers from what we believe are illegal or unconstitutional requests.

And there’s more to come…

Cloudflare is committed to fully support Australia’s privacy goals, and we are paying close attention to the current conversations around updating Australia’s privacy law and regulatory structure. And our 2023 roadmap includes focusing on the APEC Cross-Border Privacy Rules (CBPR) System as a way to demonstrate our continued commitment to global privacy and paving the way for beneficial cross-border data transfers.

Happy Privacy Awareness Week 2023!

Build an analytics pipeline for a multi-account support case dashboard

Post Syndicated from Sindhura Palakodety original https://aws.amazon.com/blogs/big-data/build-an-analytics-pipeline-for-a-multi-account-support-case-dashboard/

As organizations mature in their cloud journey, they have many accounts (even hundreds) that they need to manage. Imagine having to manage support cases for these accounts without a unified dashboard. Administrators have to access each account either by switching roles or with single sign-on (SSO) in order to view and manage support cases.

This post demonstrates how you can build an analytics pipeline to push support cases created in individual member AWS accounts into a central account. We also show you how to build an analytics dashboard to gain visibility and insights on all support cases created in various accounts within your organization.

Overview of solution

In this post, we go through the process to create a pipeline to ingest, store, process, analyze, and visualize AWS support cases. We use the following AWS services as key components:

The following diagram illustrates the architecture.

The central account is the AWS account that you use to centrally manage the support case data.

Member accounts are the AWS accounts where, whenever the support cases are created, the data flows into an S3 bucket in the central account that can be visualized using the QuickSight dashboard in the central account.

To implement this solution, you complete the following high-level steps:

  1. Determine the AWS accounts to use for the central account and member accounts.
  2. Set up permissions for AWS CloudFormation StackSets on the central account and member accounts.
  3. Create resources on the central account using AWS CloudFormation.
  4. Create resources on the member accounts using CloudFormation StackSets.
  5. Open up support cases on the member accounts.
  6. Visualize the data in a QuickSight dashboard in the central account.

Prerequisites

Complete the following prerequisite steps:

  1. Create AWS accounts if you haven’t done so already.
  2. Before you get started, make sure that you have a Business or Enterprise support plan for your member accounts.
  3. Sign up for QuickSight if you have never used QuickSight in this account before. To use the forecast capability in QuickSight, sign up for the Enterprise Edition.

Preparation for CloudFormation StackSets

In this section, we go through the steps to set up permissions for StackSets in both the central and member accounts.

Set up permissions for StackSets on the central account

To set up permissions on the central account, complete the following steps:

  1. Sign in to the AWS Management Console of the central account.
  2. Download the administrator role CloudFormation template.
  3. On the AWS CloudFormation console, choose Create stack and With new resources.
  4. Leave the Prepare template setting as default.
  5. For Template source, select Upload a template file.
  6. Choose Choose file and supply the CloudFormation template you downloaded: AWSCloudFormationStackSetAdministrationRole.yml.
  7. Choose Next.
  8. For Stack name, enter StackSetAdministratorRole.
  9. Choose Next.
  10. For Configure stack options, we recommend configuring tags, which are key-value pairs that can help you identify your stacks and the resources they create. For example, enter Owner as the key, and your email address as the value.
  11. We don’t use additional permissions or advanced options, so accept the default values and choose Next.
  12. Review your configuration and select I acknowledge that AWS CloudFormation might create IAM resources with custom names.
  13. Choose Create stack.

The stack takes about 30 seconds to complete.

Set up permissions for StackSets on member accounts

Now that we’ve created a StackSet administrator role on the central account, we need to create the StackSet execution role on the member accounts. Perform the following steps on all member accounts:

  1. Sign in to the console on the member account.
  2. Download the execution role CloudFormation template.
  3. On the AWS CloudFormation console, choose Create stack and With new resources.
  4. Leave the Prepare template setting as default.
  5. For Template source, select Upload a template file.
  6. Choose Choose file and supply the CloudFormation template you downloaded: AWSCloudFormationStackSetExecutionRole.yml.
  7. Choose Next.
  8. For Stack name, use StackSetExecutionRole.
  9. For Parameters, enter the 12-digit account ID for the central account.
  10. Choose Next.
  11. For Configure stack options, we recommend configuring tags. For example, enter Owner as the key and your email address as the value.
  12. We don’t use additional permissions or advanced options, so choose Next.

For more information, see Setting AWS CloudFormation stack options.

  1. Review your configuration and select I acknowledge that AWS CloudFormation might create IAM resources with custom names.
  2. Choose Create stack.

The stack takes about 30 seconds to complete.

Set up the infrastructure for the central account and member accounts

In this section, we go through the steps to create your resources for both accounts and launch the StackSets.

Create resources on the central account with AWS CloudFormation

To launch the provided CloudFormation template, complete the following steps:

  1. Sign in to the console on the central account.
  2. Choose Launch Stack:
  3. Choose Next.
  4. For Stack name, enter a name. For example, support-case-central-account.
  5. For AWSMemberAccountIDs, enter the member account IDs separated by commas from where support case data is gathered.
  6. For Support Case Raw Data Bucket, enter the S3 bucket in the central account that holds the support case raw data from all member accounts. Note the name of this bucket to use in future steps.
  7. For Support Case Transformed Data Bucket, enter the S3 bucket in central account that holds the support case transformed data. Note the name of this bucket to use in future steps.
  8. Choose Next.
  9. Enter any tags you want to assign to the stack and choose Next.
  10. Select the acknowledgement check boxes and choose Create stack.

The stack takes approximately 5 minutes to complete. Wait until the stack is complete before proceeding to the next steps.

Launch CloudFormation StackSets from the central account

To launch StackSets, complete the following steps:

  1. Sign in to the console on the central account.
  2. On the AWS CloudFormation console, choose StackSets in the navigation pane.
  3. Choose Create StackSet.
  4. Leave the IAM execution role name as AWSCloudFormationStackSetExecutionRole.
  5. If AWS Organizations is enabled, under permissions, select Service-managed permissions.
  6. Leave the Prepare template setting as default.
  7. For Template source, select Amazon S3 URL.
  8. Enter the following Amazon S3 URL under Specify Template: https://aws-blogs-artifacts-public.s3.amazonaws.com/artifacts/BDB-2583/AWS_MemberAccount_SupportCaseDashboard_CF.yaml
  9. Choose Next.
  10. For StackSet name, enter a name. For example, support-case-member-account.
  11. For CentralSupportCaseRawBucket, enter the name of the Support Case Raw Data Bucket created in the central account, which you noted previously.
  12. For CentralAccountID, enter the account ID of the central account.
  13. For Configure StackSet options, we recommend configuring tags.
  14. Leave the rest as default and choose Next.
  15. If AWS Organizations is enabled, in the Set deployment options step, for Deployment targets, you can either choose Deploy to organization or Deploy to organizational units (OU).
    • If you deploy to OUs, you will need to specify the AWS OU ID.
  16. If AWS Organizations is not enabled, on the Set Deployment Options page, under Accounts, select Deploy stacks in accounts.
    • Under Account numbers, enter the 12-digit account IDs for the member accounts as a comma-separated list. For example: 111111111111,222222222222.
  17. Under Specify regions, choose US East (N. Virginia).

Due to the limitation of EventBridge with the AWS Support API, this StackSet has to be deployed only in the US East (N. Virginia) Region.

  1. Optionally, you can change the maximum concurrent accounts to match the number of member accounts, adjust the failure tolerance to at least 1, and choose Region Concurrency to be Parallel to set up resources in parallel on the member accounts.
  2. Review your selections, select the acknowledgement check boxes, and choose Submit.

The operation takes about 2–3 minutes to complete.

Visualize your support cases in QuickSight in the central account

In this section, we go through the steps to visualize your support cases in QuickSight.

Grant QuickSight permissions

To grant QuickSight permissions, complete the following steps:

  1. Sign in to the console on the central account.
  2. On the QuickSight console, on the Admin drop-down menu in top right-hand corner, choose Manage QuickSight.
  3. In the navigation pane, choose Security & permissions.
  4. Under QuickSight access to AWS services, choose Manage.
  5. Select Amazon Athena.
  6. Select Amazon S3 to edit QuickSight access to your S3 buckets.
  7. Select the bucket you specified during stack creation.
  8. Choose Finish.
  9. Choose Save.

Prepare the datasets

To prepare your datasets, complete the following steps:

  1. On the QuickSight console, choose Datasets in the navigation pane.
  2. Choose New dataset.
  3. Choose Athena.
  4. For Data source name, enter support-case-data-source.
  5. Choose Validate connection.
  6. After your connection is validated, choose Create data source.
  7. For Database, choose support-case-transformed-data.
  8. For Tables, select the table under the database (there should only be one table that matches the name of the S3 bucket you set as the destination for the transformed data).
  9. Choose Edit/Preview data.
  10. Leave Query mode set as Direct Query.
  11. Choose the options menu (three dots) next to the field case_creation_year and set Change data type to Date.
  12. Enter the date format as yyyy, then choose Validate and Update.
  13. Similarly, right-click on the field case_creation_month and set Change data type to Date.
  14. Enter the date format as MM, then choose Validate and Update.
  15. Right-click on the field case_creation_day and set Change data type to Date.
  16. Enter the date format as dd, then choose Validate and Update.
  17. Right-click on the field case_creation_time and set Change data type to Date.
  18. Enter the date format as yyyy-MM-dd’T’HH:mm:ss.SSSZ, then choose Validate and Update.
  19. Change the name of the QuickSight dataset to support-cases-dataset.
  20. Choose Save & publish.
  21. Note the dataset ID from the URL (alpha-numeric string between datasets and view, excluding slashes) to use later for QuickSight dashboard creation.

  1. Choose Cancel to exit this page.

Set up the QuickSight dashboard from a template

To set up your QuickSight dashboard, complete the following steps:

  1. Navigate to the following link, then right-click and choose Save As to download the QuickSight dashboard JSON template from the browser.
  2. On the console, choose the user profile drop-down menu.
  3. Choose the copy icon next to the Account ID: field (of the central account).

  1. Open the JSON file with a text editor and replace xxxxx with the account ID. This will be replaced in two places.
  2. Replace yyyyy with the dataset ID that you previously noted.
  3. Replace rrrrr with the Region where you deployed resources in the central account.

To determine the principal (user) to be used for the dashboard creation, you can use AWS CloudShell.

  1. Navigate to CloudShell on the console. Ensure it’s the same Region where your resources are deployed.

  1. Wait until the environment gets created and you see the CloudShell prompt.

  1. Run the following command, providing your account ID (central account) and Region:
    aws quicksight list-users –region <region> --aws-account-id <account-id> --namespace default

  2. From the output, select the value of the ARN field. Replace the value of zzzzz with the ARN.
  3. Optionally, you can change the name of the dashboard by changing the value of the fields in the JSON file:
    • For DashboardId, enter SupportCaseCentralDashboard.
    • For Name, enter SupportCaseCentralDashboard.
  4. Save the changes to the JSON file.

Now we use CloudShell to upload the JSON file provided in the previous step.

  1. On the Actions menu, choose Upload file.

  1. To create the QuickSight dashboard from the JSON template, use the following AWS Command Line Interface (AWS CLI) command and pass the updated JSON file as an argument, providing your Region:
    aws quicksight create-dashboard –region <region> --cli-input-json file://support-case-dashboard-template.json

The output of the command looks similar to the following screenshot.

  1. In case of any issues or if you want to see more details about the dashboard, you can use the following command:
    aws quicksight describe-dashboard --region <region> --aws-account-id <central-account-id> --dashboard-id <DashboardId in screenshot above>

  2. On the QuickSight console, choose Dashboards in the navigation pane.
  3. Choose Support Cases Dashboard.

You should see a dashboard similar to the screenshot shown at the beginning of this post, but there should only be one case.

Add additional member accounts

If you want to add additional member accounts, you need to update the CloudFormation stack that you created earlier on the central account. If you followed our name recommendation, the stack is called support-case-central-account-stack. Add the additional account number in the Member Account IDs parameter.

Next, go to the StackSet in the central account. If you followed our naming recommendation, the StackSet is called support-case-member-account. Select the StackSet and on the Actions menu, choose Add stacks to StackSet. Then follow the same instructions that you followed previously when you created the StackSet.

Monitor support cases created in the central account

So far, our setup will monitor all support cases created in the member accounts that you specified. However, it doesn’t include support cases that you create in the central account. To set up monitoring for the central account, complete the following steps:

  1. Update the CloudFormation stack that you created earlier on the central account. If you followed our name recommendation, the stack is called support-case-central-account-stack. Add the central account ID in the Member Account IDs parameter.
  2. Sign in to the CloudFormation console in the central account.
  3. Choose Launch Stack:
  4. Choose Next.
  5. For Stack name, enter a name. For example, support-case-central-as-member-account.
  6. For CentralAccountIDs, enter the central account ID.
  7. For CentralSupportCaseRawBucket, enter the S3 bucket in the central account that holds the support case raw data from all member accounts.
  8. Choose Next.
  9. Enter any tags you want to assign to the stack and choose Next.
  10. Select the acknowledgement check boxes and choose Create stack.

Clean up

To avoid incurring future charges, delete the resources you created as part of this solution.

Troubleshooting

Note the following troubleshooting tips:

  • Make sure that you create the CloudFormation stacks and StackSet in the correct accounts: central and member.
  • If you get a permission denied error from Athena on the S3 path (see the following screenshot), review the steps to grant QuickSight permissions.

  • When creating the QuickSight dashboard using the template, if you get an error similar to the following, make sure that you use the ARN value from the output generated by the aws quicksight list-users --region <region> --aws-account-id <account-id> --namespace default command.

An error occurred (InvalidParameterValueException) when calling the CreateDashboard operation: Principal ARN xxxx is not part of the same account yyyy

  • When deleting the stack, if you encounter the DELETE_FAILED error, it means that your S3 bucket is not empty. To fix it, empty the contents of the bucket and try to delete the Stack again.

Conclusion

Congratulations! You have successfully built an analytics pipeline to push support cases created in individual member accounts into a central account. You have also built an analytics dashboard to gain visibility and insights on all support cases created in various accounts. As you start creating support cases in your member accounts, you will be able to view them in a single pane of glass.

With the steps and resources described in this post, you can build your own analytics dashboard to gain visibility and insights on all support cases created in various accounts within your organization.


About the authors

Sindhura Palakodety is a Solutions Architect at AWS. She is passionate about helping customers build enterprise-scale Well-Architected solutions on the AWS platform and specializes in the data analytics domain.

Shu Sia Lukito is a Partner Solutions Architect at AWS. She is on a mission to help AWS partners build successful AWS practices and help their customers accelerate their journey to the cloud. In her spare time, she enjoys spending time with her family and making spicy food.

The collective thoughts of the interwebz