AWS CSA Consensus Assessment Initiative Questionnaire version 4 now available

Post Syndicated from Sonali Vaidya original https://aws.amazon.com/blogs/security/aws-csa-consensus-assessment-initiative-questionnaire-version-4-now-available/

Amazon Web Services (AWS) has published an updated version of the AWS Cloud Security Alliance (CSA) Consensus Assessment Initiative Questionnaire (CAIQ). The questionnaire has been completed using the current CSA CAIQ standard, v4.0.2 (06.07.2021 update), and is now available for download.

The CSA is a not-for-profit organization dedicated to “defining and raising awareness of best practices to help ensure a secure cloud computing environment.” For more information, see the Cloud Security Alliance website. A wide range of industry security practitioners, corporations, and associations participate in CSA.

What is CSA CAIQ and how can you use it?

The CSA Consensus Assessments Initiative Questionnaire provides a set of questions that CSA anticipates a cloud consumer or a cloud auditor would ask of a cloud provider. The AWS CSA CAIQ provides the AWS control implementation descriptions for a series of cloud-specific security questions based on the Cloud Controls Matrix (CCM). The AWS CSA CAIQ also reflects the AWS customer responsibilities according to the shared responsibility model, which can help customers comply with the CSA CCM.

At AWS, we’re committed to helping you achieve and maintain the highest standards of security and compliance. We value your feedback and questions. You can contact the AWS HITRUST team at AWS Compliance Contact Us. If you have feedback about this post, submit comments in the Comments section below.

 
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Author

Sonali Vaidya

Sonali leads multiple AWS global compliance programs, including HITRUST, ISO 27001, ISO 27017, ISO 27018, ISO 27701, ISO 9001, and CSA STAR. Sonali has over 20 years of experience in information security and privacy management and holds multiple certifications such as CISSP, C-GDPR|P, CCSK, CEH, CISA, PCIP, ISO 27001, and ISO 22301 Lead Auditor.

Mozilla releases a machine-translation plugin

Post Syndicated from original https://lwn.net/Articles/896934/

Mozilla has announced
the release of a translation plugin for Firefox as part of the Project Bergamot initiative.

The ultimate goal of this consortium was to build a set of neural
machine translation tools that would enable Mozilla to develop a
website translation add-on that operates locally, i.e. the engines,
language models and in-page translation algorithms would need to
reside and be executed entirely in the user’s computer, so none of
the data would be sent to the cloud, making it entirely private.

Introducing the newest AWS Heroes – June 2022

Post Syndicated from Ross Barich original https://aws.amazon.com/blogs/aws/introducing-the-newest-aws-heroes-june-2022/

AWS Heroes are some of the worlds most active and vocal leaders in AWS communities, recognized for their unwavering focus on sharing insights and technical knowledge with others. Heroes have a variety of contributions to community learning: they host events, meetups, and workshops, author blogs, contribute to open source projects, speak at conferences, and more. You can view some of their prominent content in the AWS Heroes Content Library.

Today we are thrilled to introduce to the world the latest cohort of AWS Heroes:

Adam Bien – Munich, Germany

DevTools Hero Adam Bien is an independent Architect, Consultant, Developer, Trainer, conference speaker, and podcaster. Adam started with Java since JDK 1.0 and still enjoys writing serverless Java, often in Amazon Corretto. He also codes live on YouTube. Adam uses CDK in greenfield serverless Java applications, as well as to help his clients migrate their on-premise Java applications to the AWS cloud. He likes to apply Java’s pragmatic patterns and best practices to serverless runtimes, especially AWS Lambda and AWS Fargate. High productivity, reduction of complexity, and cost effectiveness are his main focuses.

Adam Elmore – Nixa, USA

DevTools Hero Adam Elmore is an independent cloud consultant who helps startups build products on AWS. He’s also the host of AWS FM, a podcast with guests from around the AWS community, and the creator of the AWS Community on Twitter. Adam is passionate about open source and has made a handful of contributions to the AWS CDK over the years. In 2020 he created Ness, an open source CLI tool for deploying web sites and apps to AWS. Previously, Adam co-founded StatMuse—a Disney backed startup building technology that answers sports questions—and served as CTO for five years.

Brooke Jamieson – Brisbane, Australia

Machine Learning Hero Brooke Jamieson is the Head of Enablement – AI/ML and Data at Blackbook.ai, and is an international conference speaker. Brooke specializes in researching & developing technically robust solutions that help “non-data people” harness the power of Artificial Intelligence and Machine Learning for their industry. Outside of their ‘day job’, Brooke is a dedicated member of the AWS Community and is a regular speaker at local user groups, global events, and guest lectures at multiple Australian Universities. They also make entry-level cloud career and technical content on TikTok, to reach broad audiences and diverse groups wanting to transition to careers in AI/ML and Cloud. Brooke is an Advisory Board member of Women in Digital, and strives to promote STEM pathways to young people in regional Australia & members of the LGBTIQA+ community.

Chao Cai – Beijing, China

Community Hero Chao Cai has 15 years of world-class experience in software development, including more than 10 years as a software architect. He is currently the VP and Chief Architect at Mobvista Inc. Chao is passionate about sharing his knowledge and experience to the community. His WeChat public account has more than 4,000 followers and over 34,000 engineers have taken his online courses. Chao is a respected leader in the China tech community. He is invited as the speaker to the global tech conferences, such as, QCon and ArchSummit each year. As an active advocate for AWS, Chao is also a regular speaker at AWS tech events.

Cyril Bandolo – Douala, Cameroon

Machine Learning Hero Cyril Bandolo is a data scientist working as a Senior Manager Data Analytics at Yoomee Mobile. Cyril has a natural talent and passion for teaching and transferring knowledge in his machine learning blog, where he focuses on building and deploying end-to-end machine learning projects on AWS. On his YouTube channel, he recently launched a weekly live hands-on series called “Sagemaker Saturdays” during which, every weekend he walks the viewers through end-to-end machine learning projects with Sagemaker Studio Lab and Sagemaker Studio. Cyril is always trying to encounter and apply new machine learning solutions to make lives better and help move the bottom line.

Kristi Perreault – Denver, USA

Serverless Hero Kristi Perreault is a Principal Software Engineer at Liberty Mutual Insurance, where her focus is serverless first development, and enterprise enablement. She holds an M.S. in Electrical & Computer Engineering specializing in cloud computing & IoT, and is very passionate about promoting women in technology. She organizes the Serverless Denver user group as part of ServerlessDays, co-organizes CDK Day, and writes extensively about serverless and diversity on her dev.to and Medium blog sites. You’ll find her speaking about embracing and scaling serverless first initiatives on dozens of podcasts, webinars, conferences, and meetups both virtually and on stage.

Sanchit Jain – Mumbai, India

Community Hero Sanchit Jain is the AWS Analytics Practice Lead and a certified expert specializing in AWS Cloud at Quantiphi Inc. He is also the AWS User Group Mumbai Lead and actively contributes to the AWS community by delivering sessions at AWS User Groups, AWS Community Days, and various educational institutes. He also shares his knowledge by publishing blogs about AWS services, architectures, and best practices. Recently, Sanchit hosted an AWS Solution Architect Certification Bootcamp, which spanned over two months, with 7500+ viewers. He also delivered a session recently at AWS Summit India 2022 on Building a data lake with AWS Lake Formation.

Shigeru Oda – Saitama, Japan

Community Hero Shigeru Oda is an expert system engineer at NSD CO., LTD. Since 2020 he has run 25 events with the JAWS-UG Beginners Chapter (about 4200 registered members). In September 2020 he promoted the 24-hour online event JAWS SONIC 2020 & MIDNIGHT JAWS 2020, which was attended by about 1500 people. In March 2021, he promoted JAWS DAYS 2021 as a steering member, which was attended by about 4,000 people. And in November 2021, he promoted JAWS PANKRATION 2021, a second 24-hour online event, providing 900 AWS users in Japan, as well as around the world with an opportunity to learn beyond the language barrier of English and Japanese through simultaneous interpretation. He received the AWS Samurai 2021 Award for these activities.

Yasunori Kirimoto – Sapporo, Japan

DevTools Hero Yasunori Kirimoto is currently the Co-Founder and CTO of MIERUNE Inc. and owner of dayjournal. He specializes in the field of GIS (Geographic Information System) and FOSS4G (Free Open Source Software for GeoSpatial). Yasunori’s work includes contributions to the Amazon Location Service samples on GitHub and open source projects such as AWS Amplify and AWS CDK. He has also published numerous blog posts on AWS CloudFormation and other AWS architectures. When he’s not acting as a bridge between AWS and the location-based information field, he engages with the open-source community and enjoys participating in venture projects to gain a broader understanding of technology.

 

 

 

If you’d like to learn more about the new Heroes, or connect with a Hero near you, please visit the AWS Heroes website or browse the AWS Heroes Content Library.

Ross

Automate your validated dataset deployment using Amazon QuickSight and AWS CloudFormation

Post Syndicated from Jeremy Winters original https://aws.amazon.com/blogs/big-data/automate-your-validated-dataset-deployment-using-amazon-quicksight-and-aws-cloudformation/

A lot of the power behind business intelligence (BI) and data visualization tools such as Amazon QuickSight comes from the ability to work interactively with data through a GUI. Report authors create dashboards using GUI-based tools, then in just a few clicks can share the dashboards with business users and decision-makers. This workflow empowers authors to create and manage the QuickSight resources and dashboards they’re responsible for providing.

Developer productivity is a great benefit of UI-based development, but enterprise customers often need to consider additional factors in their BI implementation:

  • Promoting objects through environments (development, testing, production, and so on)
  • Scaling for hundreds of authors and thousands of users
  • Implementing data security, such as row-level and column-level rules to filter the data elements visible to specific users
  • Regulatory requirements, processes, and compliance controls.

Approaches such as source control-backed CI/CD pipelines allow you to address compliance requirements and security gates with automation. For example, a hypothetical build pipeline for a Java Springboot application may enable developers to build and deploy freely to a dev environment, but the code must pass tests and vulnerability scans before being considered for promotion to upper environments. A human approval step then takes place before the code is released into production. Processes such as this provide quality, consistency, auditability, and accountability for the code being released.

The QuickSight API provides functionality for automation pipelines. Pipeline developers can use the API to migrate QuickSight resources from one environment to another. The API calls that facilitate handling QuickSight datasets enables inspection of the JSON representation of the dataset definition.

This post presents an example of how a QuickSight administrator can automate data resource management and security validation through use of the QuickSight API and AWS CloudFormation.

Solution overview

The model implements security rules that rely on naming conventions for tables and columns as an integral part of the security model. Instead of relying on naming conventions, you may want to use a lookup table or similar approach to store the relationships between data tables and security tables.

We guide you through the following steps:

  1. Create relational database tables to be secured.
  2. Create a QuickSight dataset in your dev account.
  3. Generate a CloudFormation template using a Python script that allows you to enforce row-level and column-level security in each environment. You can customize this script to the needs of your organization.
  4. Use the generated CloudFormation template to deploy through dev, test, and prod using your change management process.

You can use AWS CloudFormation to manage several types of QuickSight resources, but dataset resources are a critical junction for security, so they are our focus in this post.

To implement data security rules in a large organization, controls must be in place to agree upon and implement the rules from a process perspective. This post dives deep into using code to validate security aspects of your QuickSight deployment, but data security requires more than code. The approaches put forward are intended as a part of a larger change management process, much of which is based around human review and approval.

In addition to having a change management process in place, we suggest managing your AWS resources using a CI/CD pipeline. The nature of change management and CI/CD processes can vary greatly, and are outside the scope of this post.

Prerequisites

This post assumes a basic command of the following:

We don’t go into the broader picture of integrating into a full CI/CD process, so an understanding of CI/CD is helpful, but not required.

Security rules for your organization

Before we can write a script to confirm security rules have been applied correctly, we need to know what the security rules actually are. This means we need to determine the following:

  • What – What is the data we are trying to secure? Which fields in the database are sensitive? Which field values will be used to filter access?
  • Who – Who are the users and groups that should be provided access to the data and fields we have identified?

In concrete terms, we need to match identities (users and groups) to actual data values (used in row-level security) and sensitive fields (for column-level security). Identities such as users and groups typically correlate to entities in external systems such as Active Directory, but you can use native QuickSight users and groups.

For this post, we define the following rules that indicate the relationship between database objects (tables and fields) and how they should be secured. Keep in mind that these example rules may not apply to every organization. Security should be developed to match your requirements and processes.

  • Any field name with _sensitive appended to it is identified as containing sensitive data. For example, a column named salary_usd_sensitive should be restricted. For our scenario, we say that the user should be a member of the QuickSight restricted group in order to access sensitive fields. No other groups are allowed access to these fields.
  • For a given table, a companion table with _rls appended to the name contains the row-level security rules used to secure the table. In this model, the row-level security rules for the employees table are found in the employees_rls table.
  • Row-level security rules must be sourced 100% from the underlying data store. This means that you can’t upload rules via the QuickSight console, or use custom SQL in QuickSight to create the rules. Rules can be provided as views (if supported by the underlying data store) as long as the view definition is managed using a change management process.
  • The dataset name should match the name of the underlying database table.

These rules rely on a well-structured change management process for the database. If users and developers have access to change database objects in production, the rules won’t carry much weight. For examples of automated schema management using open-source CI/CD tooling, refer to Deploy, track, and roll back RDS database code changes using open source tools Liquibase and Jenkins and How to Integrate Amazon RDS Schema Changes into CI/CD Pipelines with GitLab and Liquibase.

From the QuickSight perspective, our database becomes the source of the “what” and “who” we discussed earlier. QuickSight doesn’t own the security rules, it merely implements the rules as defined in the database.

Security rule management with database objects

For this post, we source data from a Postgres database using a read-only user created for QuickSight.

First, we create our schema and a data table with a few rows inserted:

create schema if not exists ledger;

--the table we are securing
drop table if exists ledger.transactions;
create table if not exists ledger.transactions (
    txn_id integer,
    txn_type varchar(100),
    txn_desc varchar(100),
    txn_amt float,
    department varchar(100),
    discount_sensitive float
);

insert into ledger.transactions (
    txn_id,
    txn_type,
    txn_desc,
    txn_amt,
    department,
    discount_sensitive
) 
values
(1, 'expense', 'commission', -1000.00, 'field sales', 0.0),
(2, 'revenue', 'widgets',  15000.00, 'field sales', 1000.00),
(3, 'revenue', 'interest', 1000.00, 'corporate', 0.0),
(4, 'expense', 'taxes', -1234.00, 'corporate', 0.0),
(5, 'revenue', 'doodads', 1000.00, 'field sales', 100.0)
;

Note the field discount_sensitive. In our security model, any field name with _sensitive appended to it is identified as containing sensitive data. This information is used later when we implement column-level security. In our example, we have the luxury of using naming conventions to tag the sensitive fields, but that isn’t always possible. Other options could involve the use of SQL comments, or creating a table that provides a lookup for sensitive fields. Which method you choose depends upon your data and requirements, and should be supported by a change management process.

Row-level security table

The following SQL creates a table containing the row-level security rules for the ledger.transactions table, then inserts rules that match the example discussed earlier:

drop table if exists ledger.transactions_rls;
create table ledger.transactions_rls (
    groupname varchar(100),
    department varchar(1000)
);


insert into ledger.transactions_rls (groupname, department) 
values
('restricted', null), --null indicates all values
('anybody', 'field sales');

For more information about how to restrict access to a dataset using row-level security, refer to Using row-level security (RLS) with user-based rules to restrict access to a dataset

These rules match the specified QuickSight user groups to values in the department field of the transactions table.

Our last step in Postgres is to create a user that has read-only access to our tables. All end-user or SPICE refresh queries from QuickSight are run using this user. See the following code:

drop role if exists qs_user;
create role qs_user login password 'GETABETTERPASSSWORD';
grant connect on database quicksight TO qs_user;
grant usage on schema ledger to qs_user;
grant select on ledger.transactions to qs_user;
grant select on ledger.transactions_rls to qs_user;

Create user groups

Our security model provides permissions based on group membership. Although QuickSight allows for these groups to be sourced from external systems such as Active Directory, our example uses native QuickSight groups.

We create our groups using the following AWS Command Line Interface (AWS CLI) commands. Take note of the restricted group we’re creating; this is the group we use to grant access to sensitive data columns.

aws quicksight create-group \
--aws-account-id YOUR_AWS_ACCOUNT_ID_HERE \
--namespace default \
--group-name restricted

aws quicksight create-group \
--aws-account-id YOUR_AWS_ACCOUNT_ID_HERE \
--namespace default \
--group-name anybody

You can also add a user to your group with the following code:

aws quicksight create-group-membership \
--aws-account-id YOUR_AWS_ACCOUNT_ID_HERE \
--namespace default \
--group-name anybody \
--member-name [email protected]

The Python script

Now that we have set up our database and groups, we switch focus to the Python script used for the following actions:

  • Extracting the definition of a manually created dataset using the QuickSight API
  • Ensuring that the dataset definition meets security standards
  • Restructuring the dataset definition into the format of a CloudFormation template
  • Writing the CloudFormation template to a JSON file

In the header of the script, you can see the following variables, which you should set to values in your own AWS environment:

# Parameters for the source data set
region_name = 'AWS_REGION_NAME'
aws_account_id = "AWS_ACCOUNT_ID"
source_data_set_id = "ID_FOR_THE_SOURCE_DATA_SET"

# Parameters are used when creating the cloudformation template
target_data_set_name = "DATA_SET_DISPLAY_NAME"
target_data_set_id = "NEW_DATA_SET_ID"
template_file_name = "dataset.json"

QuickSight datasets have a name and an ID. The name is displayed in the QuickSight UI, and the ID is used to reference the dataset behind the scenes. The ID must be unique for a given account and Region, which is why QuickSight uses UUIDs by default, but you can use any unique string.

Create the datasets

You can use the QuickSight GUI or Public API to create a dataset for the transactions_rls and transactions tables. For instructions, refer to Creating a dataset from a database. Connect to the database, create the datasets, then apply transactions_rls as the row-level security for the transactions dataset. You can use the following list-data-sets AWS CLI call to verify that your tables were created successfully:

$ aws quicksight list-data-sets --aws-account-id YOURACCOUNT            
{
    "DataSetSummaries": [
       {
            "Arn": "arn:aws:quicksight:us-west-2:YOURACCOUNT:dataset/<ID>",
            "DataSetId": "<ID>",
            "Name": "transactions",
            "CreatedTime": "2021-09-15T15:41:56.716000-07:00",
            "LastUpdatedTime": "2021-09-15T16:38:03.658000-07:00",
            "ImportMode": "SPICE",
            "RowLevelPermissionDataSet": {
                "Namespace": "default",
                "Arn": "arn:aws:quicksight:us-west-2: YOURACCOUNT:dataset/<RLS_ID>",
                "PermissionPolicy": "GRANT_ACCESS",
                "FormatVersion": "VERSION_1",
                "Status": "ENABLED"
            },
            "RowLevelPermissionTagConfigurationApplied": false,
            "ColumnLevelPermissionRulesApplied": true
        },
        {
            "Arn": "arn:aws:quicksight:us-west-2: YOURACCOUNT:dataset/<RLS_ID>",
            "DataSetId": "<RLS_ID>",
            "Name": "transactions_rls",
            "CreatedTime": "2021-09-15T15:42:37.313000-07:00",
            "LastUpdatedTime": "2021-09-15T15:42:37.520000-07:00",
            "ImportMode": "SPICE",
            "RowLevelPermissionTagConfigurationApplied": false,
            "ColumnLevelPermissionRulesApplied": false
        }
    ]
}

Script overview

Our script is based around the describe_data_set method of the Boto3 QuickSight client. This method returns a Python dictionary containing all the attributes associated with a dataset resource. Our script analyzes these dictionaries, then coerces them into the structure required for dataset creation using AWS CloudFormation. The structure of the describe_data_set method and the AWS::QuickSight::DataSet CloudFormation resource are very similar, but not quite identical.

The following are the top-level fields in the response for the Boto3 QuickSight client describe_data_set method:

{
    'DataSet': {
        'Arn': 'string',
        'DataSetId': 'string',
        'Name': 'string',
        'CreatedTime': datetime(2015, 1, 1),
        'LastUpdatedTime': datetime(2015, 1, 1),
        'PhysicalTableMap': {},
        'LogicalTableMap': {...},
        'OutputColumns': [...],
        'ImportMode': 'SPICE'|'DIRECT_QUERY',
        'ConsumedSpiceCapacityInBytes': 123,
        'ColumnGroups': [...],
        'FieldFolders': {...},
        'RowLevelPermissionDataSet': {...},
        'ColumnLevelPermissionRules': [...]
    },
    'RequestId': 'string',
    'Status': 123
}

Our script converts the response from the API to the structure required for creating a dataset using AWS CloudFormation.

The following are the top-level fields in the AWS::QuickSight::DataSet CloudFormation resource:

{
  "Type" : "AWS::QuickSight::DataSet",
  "Properties" : {
      "AwsAccountId" : String,
      "ColumnGroups" : [ ColumnGroup, ... ],
      "ColumnLevelPermissionRules" : [ ColumnLevelPermissionRule, ... ],
      "DataSetId" : String,
      "FieldFolders" : {Key : Value, ...},
      "ImportMode" : String,
      "IngestionWaitPolicy" : IngestionWaitPolicy,
      "LogicalTableMap" : {Key : Value, ...},
      "Name" : String,
      "Permissions" : [ ResourcePermission, ... ],
      "PhysicalTableMap" : {Key : Value, ...},
      "RowLevelPermissionDataSet" : RowLevelPermissionDataSet,
      "Tags" : [ Tag, ... ]
    }
}

The key differences between both JSON structures are as follows:

  • describe_data_set contains Arn, CreatedTime, and LastUpdatedTime, which are useful fields but only relevant to an existing resource
  • AWS CloudFormation requires AwsAccountId when creating the resource
  • AWS CloudFormation accepts tags for the dataset, but describe_data_set doesn’t provide them
  • The AWS CloudFormation Permissions property allows for assigning AWS Identity and Access Management (IAM) permissions at the time of creation

Our script is able to selectively choose the top-level properties we want from the describe_data_set response, then add the fields that AWS CloudFormation requires for resource creation.

Validate security

Before the script creates the CloudFormation template, it performs validations to ensure that our dataset conforms to the defined security rules.

The following is the snippet from our script that performs validation for row-level security:

if 'RowLevelPermissionDataSet' in describe_response['DataSet']:
    if describe_response['DataSet']['RowLevelPermissionDataSet'] is None:
        raise Exception("row level permissions must be applied!")
    else:
        # now we look up the rls data set so that we can confirm that it conforms to our rules
        rls_dataset_id = describe_response['DataSet']['RowLevelPermissionDataSet']['Arn'].split('/')[-1]
        rls_response = client.describe_data_set(
            AwsAccountId = aws_account_id,
            DataSetId = rls_dataset_id
        )
        
        rls_table_map = rls_response['DataSet']['PhysicalTableMap']

        # rls table must not be custom SQL
        if 'CustomSql' in rls_table_map[list(rls_table_map.keys())[0]]:
            raise Exception("RLS data set can not contain custom SQL!")

        # confirm that the database table name is what we expect it to be 
        if rls_response['DataSet']['Name'] != describe_response['DataSet']['Name'] + '_rls':
            raise Exception("RLS data set name must match pattern tablename_rls!")

The steps in the code are as follows:

  1. Ensure that any row-level security is applied (this is the bare minimum).
  2. Look up the dataset that contains the row-level security rules using another Boto3 call.
  3. Confirm that the row-level security dataset is not custom SQL.
  4. Confirm that the name of the table is as expected, with _rls appended to the name of the table being secured.

The use of custom SQL for sourcing row-level security rules isn’t secure in our case, because a QuickSight developer could use SQL to alter the underlying rules. Because of this, our model requires that a physical table from the dataset is used as the row-level security rule source. Of course, it’s possible to use a view in the database to provide the rules. A view is okay because the definition (in our scenario) is governed by a change management process, as opposed to the custom SQL, which the QuickSight developer can create.

The rules being implemented for your specific organization will be different. You may need to connect to a database directly from your Python script in order to validate the dataset was created in a secure manner. Regardless of your actual rules, the describe_data_set API method provides you the details you need to begin validation of the dataset.

Column-level security

Our model for column-level security indicates that any database field name that ends in _sensitive should only be accessible to members of a QuickSight group named restricted. Instead of validating that the dataset has the column-level security rules applied correctly, we simply enforce the rules directly in two steps:

  1. Identify the sensitive fields.
  2. Create a dictionary and add it to our dataset with the key ColumnLevelPermissionRules.

To identify the sensitive fields, we create a list and iterate through the input columns of the physical table:

sensitive_fields = []
input_columns = physical_table_map[list(physical_table_map.keys())[0]]["RelationalTable"]["InputColumns"]
for input_column in input_columns:
    field_name = input_column['Name']
    if field_name[-10:len(field_name)] == '_sensitive':
        sensitive_fields.append(field_name)

The result is a list of sensitive fields. We can then take this list and integrate it into the dataset through the use of a dictionary:

if len(sensitive_fields) > 0:
    data_set["ColumnLevelPermissionRules"] = [
        {
            "Principals": [
                {"Ref": "RestrictedUserGroupArn"}
            ],
            "ColumnNames": sensitive_fields
        }
    ]

Instead of specifying a specific principal, we reference the CloudFormation template parameter RestrictedUserGroupArn. The ARN for the restricted group is likely to vary, especially if you’re deploying to another AWS account. Using a template parameter allows us to specify the ARN at the time of dataset creation in the new environment.

Access to the dataset QuickSight resources

The Permissions structure is added to the definition for each dataset:

"Permissions": [
    {
        "Principal": {
            "Ref": "QuickSightAdminPrincipal"
        },
        "Actions": [
            "quicksight:DescribeDataSet",
            "quicksight:DescribeDataSetPermissions",
            "quicksight:PassDataSet",
            "quicksight:DescribeIngestion",
            "quicksight:ListIngestions",
            "quicksight:UpdateDataSet",
            "quicksight:DeleteDataSet",
            "quicksight:CreateIngestion",
            "quicksight:CancelIngestion",
            "quicksight:UpdateDataSetPermissions"
        ]
    }
]

A value for the QuickSightAdminPrincipal CloudFormation template parameter is provided at the time of stack creation. The preceding structure provides the principal access to manage the QuickSight dataset resource itself. Note that this is not the same as data access (though an admin user could manually remove the row-level security rules). Row-level and column-level security rules indicate whether a given user has access to specific data, whereas these permissions allow for actions on the definition of the dataset, such as the following:

  • Updating or deleting the dataset
  • Changing the security permissions
  • Initiating and monitoring SPICE refreshes

End-users don’t require this access in order to use a dashboard created from the dataset.

Run the script

Our script requires you to specify the dataset ID, which is not the same as the dataset name. To determine the ID, use the AWS CLI list-data-sets command.

To set the script parameters, you can edit the following lines to match your environment:

# parameters for the source data set
region_name = 'us-west-2'
aws_account_id = "<YOUR_ACCOUNT_ID>"
source_data_set_id = "<SOURCE_DATA_SET_ID>"

# parameters for the target data set
target_data_set_name = "DATA_SET_PRESENTATION_NAME"
target_data_set_id = "NEW_DATA_SET_ID"

The following snippet runs the Python script:

$ quicksight_security % python3 data_set_to_cf.py                                                       
row level security validated!
the following sensitive fields were found: ['discount_sensitive']
cloudformation template written to dataset.json
cli-input-json file written to params.json

CloudFormation template

Now that the security rules have been validated, our script can generate the CloudFormation template. The describe_response_to_cf_data_set method accepts a describe_data_set response as input (along with a few other parameters) and returns a dictionary that reflects the structure of an AWS::QuickSight::DataSet CloudFormation resource. Our code uses this method once for the primary dataset, and again for the _rls rules. This method handles selecting values from the response, prunes some unnecessary items (such as empty tag lists), and replaces a few values with CloudFormation references. These references allow us to provide parameter values to the template, such as QuickSight principals and the data source ARN.

You can view the template using the cat command:

$ quicksight_security % cat dataset.json 
{
    "AWSTemplateFormatVersion": "2010-09-09",
    "Description": "Creates a QuickSight Data Set",
    "Parameters": {
        "DataSourceArn": {
            "Type": "String",
            "Description": "ARN for Postgres data source resource"
        },
        "QuickSightOwnerPrincipal": {
            "Type": "String",
            "Description": "ARN for a QuickSight principal who will be granted API access to the datasets"
        },
        "RestrictedUserGroupArn": {
            "Type": "String",
            "Description": "ARN for a QuickSight principal who will be granted access to sensitive fields"
        }
    },
    "Resources": {
        "NewDataSet": {
            "Type": "AWS::QuickSight::DataSet",
            "Properties": {
                "DataSetId": "NEW_DATA_SET_ID",
                "Name": "DATA_SET_PRESENTATION_NAME",
                "AwsAccountId": {
                    "Ref": "AWS::AccountId"
                },
                "Permissions": [
                    {
                        "Principal": {
                            "Ref": "QuickSightAdminPrincipal"
                        },
                        "Actions": [
                            "quicksight:DescribeDataSet",
                            "quicksight:DescribeDataSetPermissions",
                            "quicksight:PassDataSet",
                            "quicksight:DescribeIngestion",
                            "quicksight:ListIngestions",
                            "quicksight:UpdateDataSet",
                            "quicksight:DeleteDataSet",
                            "quicksight:CreateIngestion",
                            "quicksight:CancelIngestion",
                            "quicksight:UpdateDataSetPermissions"
                        ]
                    }
                ],
                "FieldFolders": {},
                "ImportMode": "DIRECT_QUERY",
                "LogicalTableMap": {
                    "e2305db4-2c79-4ac4-aff5-224b8c809767": {
                        "Alias": "transactions",
                        "DataTransforms": [
                            {
                                "ProjectOperation": {
                                    "ProjectedColumns": [
                                        "txn_id",
                                        "txn_type",
                                        "txn_desc",
                                        "txn_amt",
                                        "department",
                                        "discount_sensitive"
                                    ]
                                }
                            }
                        ],
                        "Source": {
                            "PhysicalTableId": "someguid-2c79-4ac4-aff5-224b8c809767"
                        }
                    }
                },
                "PhysicalTableMap": {
                    "e2305db4-2c79-4ac4-aff5-224b8c809767": {
                        "RelationalTable": {
                            "DataSourceArn": {
                                "Ref": "DataSourceArn"
                            },
                            "Schema": "ledger",
                            "Name": "transactions",
                            "InputColumns": [
                                {
                                    "Name": "txn_id",
                                    "Type": "INTEGER"
                                },
                                {
                                    "Name": "txn_type",
                                    "Type": "STRING"
                                },
                                {
                                    "Name": "txn_desc",
                                    "Type": "STRING"
                                },
                                {
                                    "Name": "txn_amt",
                                    "Type": "DECIMAL"
                                },
                                {
                                    "Name": "department",
                                    "Type": "STRING"
                                },
                                {
                                    "Name": "discount_sensitive",
                                    "Type": "DECIMAL"
                                }
                            ]
                        }
                    }
                },
                "RowLevelPermissionDataSet": {
                    "Namespace": "default",
                    "Arn": {
                        "Fn::GetAtt": [
                            "NewDataSetRLS",
                            "Arn"
                        ]
                    },
                    "PermissionPolicy": "GRANT_ACCESS",
                    "FormatVersion": "VERSION_1"
                },
                "ColumnLevelPermissionRules": [
                    {
                        "Principals": [
                            {
                                "Ref": "RestrictedUserGroupArn"
                            }
                        ],
                        "ColumnNames": [
                            "discount_sensitive"
                        ]
                    }
                ]
            }
        },
        "NewDataSetRLS": {
            "Type": "AWS::QuickSight::DataSet",
            "Properties": {
                "DataSetId": "NEW_DATA_SET_ID_rls",
                "Name": "DATA_SET_PRESENTATION_NAME_rls",
                "AwsAccountId": {
                    "Ref": "AWS::AccountId"
                },
                "Permissions": [
                    {
                        "Principal": {
                            "Ref": "QuickSightAdminPrincipal"
                        },
                        "Actions": [
                            "quicksight:DescribeDataSet",
                            "quicksight:DescribeDataSetPermissions",
                            "quicksight:PassDataSet",
                            "quicksight:DescribeIngestion",
                            "quicksight:ListIngestions",
                            "quicksight:UpdateDataSet",
                            "quicksight:DeleteDataSet",
                            "quicksight:CreateIngestion",
                            "quicksight:CancelIngestion",
                            "quicksight:UpdateDataSetPermissions"
                        ]
                    }
                ],
                "FieldFolders": {},
                "ImportMode": "SPICE",
                "LogicalTableMap": {
                    "someguid-51d7-43c4-9f8c-c60a286b0507": {
                        "Alias": "transactions_rls",
                        "DataTransforms": [
                            {
                                "ProjectOperation": {
                                    "ProjectedColumns": [
                                        "groupname",
                                        "department"
                                    ]
                                }
                            }
                        ],
                        "Source": {
                            "PhysicalTableId": "someguid-51d7-43c4-9f8c-c60a286b0507"
                        }
                    }
                },
                "PhysicalTableMap": {
                    "someguid-51d7-43c4-9f8c-c60a286b0507": {
                        "RelationalTable": {
                            "DataSourceArn": {
                                "Ref": "DataSourceArn"
                            },
                            "Schema": "ledger",
                            "Name": "transactions_rls",
                            "InputColumns": [
                                {
                                    "Name": "groupname",
                                    "Type": "STRING"
                                },
                                {
                                    "Name": "department",
                                    "Type": "STRING"
                                }
                            ]
                        }
                    }
                }
            }
        }
    }
}

You can deploy this template directly into AWS via the CloudFormation console. You are required to provide the following parameters:

  • DataSourceArn – A QuickSight dataset is a reference to a table or other database object. In order for this object to be accessed, we need to specify a QuickSight data source resource that facilitates the connection.
  • QuickSightAdminPrincipal – The IAM principal allowing access to the data source resource via AWS API calls. You can exclude the IAM permissions from this script and template if your existing security policies automatically provide access to the appropriate users and groups.
  • RestrictedUserGroupArn – The ARN of the QuickSight group that is granted access to the sensitive columns.

You can also deploy the template using the AWS CLI. Although it’s possible to pass in all the parameters directly via the command line, you may find it a bit clunky when entering long values. To simplify this, our script generates a params.json file structured to capture all the parameters required by the template:

{
    "Parameters": [
        {
            "ParameterKey": "DataSourceArn",
            "ParameterValue": "YOUR_DATA_SOURCE_ARN_HERE"
        },
        {
            "ParameterKey": "QuickSightAdminPrincipal",
            "ParameterValue": "YOUR_ADMIN_GROUP_PRINCIPAL_HERE"
        },
        {
            "ParameterKey": "RestrictedUserGroupArn",
            "ParameterValue": "YOUR_RESTRICTED_USER_GROUP_ARN_HERE"
        }
    ]
}

Use the following command to build the stack, with params.json as input:

aws cloudformation create-stack \
--stack-name SecuredDataSet \
--template-body file://dataset.json \
--cli-input-json file://params.json

You can use the AWS CloudFormation console to monitor the stack progress. When the creation is complete, you should see your new dataset in QuickSight!

Conclusion

Though the functionality is relatively new, I consider the API and AWS CloudFormation capabilities to be one of QuickSight’s biggest strengths. Automated validation and enforcement of security rules allows for scale and better security. Being able to manage dataset definitions using AWS CloudFormation provides repeatability, and all of this sets you up for automation. The API and AWS CloudFormation provide tooling to customize QuickSight to suit your workflow, bringing BI into your organization’s cloud management strategy.

If you are looking for related information about dashboard management and migration in QuickSight, refer to Migrate Amazon QuickSight across AWS accounts.


About the Author

Jeremy Winters is an Architect in the AWS Data Lab, where he helps customers design and build data applications to meet their business needs. Prior to AWS, Jeremy built cloud and data applications for consulting customers across a variety of industries.

Trigger an AWS Glue DataBrew job based on an event generated from another DataBrew job

Post Syndicated from Nipun Chagari original https://aws.amazon.com/blogs/big-data/trigger-an-aws-glue-databrew-job-based-on-an-event-generated-from-another-databrew-job/

Organizations today have continuous incoming data, and analyzing this data in a timely fashion is becoming a common requirement for data analytics and machine learning (ML) use cases. As part of this, you need clean data in order to gain insights that can enable enterprises to get the most out of their data for business growth and profitability. You can now use AWS Glue DataBrew, a visual data preparation tool that makes it easy to transform and prepare datasets for analytics and ML workloads.

As we build these data analytics pipelines, we can decouple the jobs by building event-driven analytics and ML workflow pipelines. In this post, we walk through how to trigger a DataBrew job automatically on an event generated from another DataBrew job using Amazon EventBridge and AWS Step Functions.

Overview of solution

The following diagram illustrates the architecture of the solution. We use AWS CloudFormation to deploy an EventBridge rule, an Amazon Simple Queue Service (Amazon SQS) queue, and Step Functions resources to trigger the second DataBrew job.

The steps in this solution are as follows:

  1. Import your dataset to Amazon Simple Storage Service (Amazon S3).
  2. DataBrew queries the data from Amazon S3 by creating a recipe and performing transformations.
  3. The first DataBrew recipe job writes the output to an S3 bucket.
  4. When the first recipe job is complete, it triggers an EventBridge event.
  5. A Step Functions state machine is invoked based on the event, which in turn invokes the second DataBrew recipe job for further processing.
  6. The event is delivered to the dead-letter queue if the rule in EventBridge can’t invoke the state machine successfully.
  7. DataBrew queries data from an S3 bucket by creating a recipe and performing transformations.
  8. The second DataBrew recipe job writes the output to the same S3 bucket.

Prerequisites

To use this solution, you need the following prerequisites:

Load the dataset into Amazon S3

For this post, we use the Credit Card customers sample dataset from Kaggle. This data consists of 10,000 customers, including their age, salary, marital status, credit card limit, credit card category, and more. Download the sample dataset and follow the instructions. We recommend creating all your resources in the same account and Region.

Create a DataBrew project

To create a DataBrew project, complete the following steps:

  1. On the DataBrew console, choose Projects and choose Create project.
  2. For Project name, enter marketing-campaign-project-1.
  3. For Select a dataset, select New dataset.
  4. Under Data lake/data store, choose Amazon S3.
  5. For Enter your source from S3, enter the S3 path of the sample dataset.
  6. Select the dataset CSV file.
  7. Under Permissions, for Role name, choose an existing IAM role created during the prerequisites or create a new role.
  8. For New IAM role suffix, enter a suffix.
  9. Choose Create project.

After the project is opened, a DataBrew interactive session is created. DataBrew retrieves sample data based on your sampling configuration selection.

Create the DataBrew jobs

Now we can create the recipe jobs.

  1. On the DataBrew console, in the navigation pane, choose Projects.
  2. On the Projects page, select the project marketing-campaign-project-1.
  3. Choose Open project and choose Add step.
  4. In this step, we choose Delete to drop the unnecessary columns from our dataset that aren’t required for this exercise.

You can choose from over 250 built-in functions to merge, pivot, and transpose the data without writing code.

  1. Select the columns to delete and choose Apply.
  2. Choose Create job.
  3. For Job name, enter marketing-campaign-job1.
  4. Under Job output settings¸ for File type, choose your final storage format (for this post, we choose CSV).
  5. For S3 location, enter your final S3 output bucket path.
  6. Under Settings, for File output storage, select Replace output files for each job run.
  7. Choose Save.
  8. Under Permissions, for Role name¸ choose an existing role created during the prerequisites or create a new role.
  9. Choose Create job.

Now we repeat the same steps to create another DataBrew project and DataBrew job.

  1. For this post, I named the second project marketing-campaign-project2 and named the job marketing-campaign-job2.
  2. When you create the new project, this time use the job1 output file location as the new dataset.
  3. For this job, we deselect Unknown and Uneducated in the Education_Level column.

Deploy your resources using CloudFormation

For a quick start of this solution, we deploy the resources with a CloudFormation stack. The stack creates the EventBridge rule, SQS queue, and Step Functions state machine in your account to trigger the second DataBrew job when the first job runs successfully.

  1. Choose Launch Stack:
  2. For DataBrew source job name, enter marketing-campaign-job1.
  3. For DataBrew target job name, enter marketing-campaign-job2.
  4. For both IAM role configurations, make the following choice:
    1. If you choose Create a new Role, the stack automatically creates a role for you.
    2. If you choose Attach an existing IAM role, you must populate the IAM role ARN manually in the following field or else the stack creation fails.
  5. Choose Next.
  6. Select the two acknowledgement check boxes.
  7. Choose Create stack.

Test the solution

To test the solution, complete the following steps:

  1. On the DataBrew console, choose Jobs.
  2. Select the job marketing-campaign-job1 and choose Run job.

This action automatically triggers the second job, marketing-campaign-job2, via EventBridge and Step Functions.

  1. When both jobs are complete, open the output link for marketing-campaign-job2.

You’re redirected to the Amazon S3 console to access the output file.

In this solution, we created a workflow that required minimal code. The first job triggers the second job, and both jobs deliver the transformed data files to Amazon S3.

Clean up

To avoid incurring future charges, delete all the resources created during this walkthrough:

  • IAM roles
  • DataBrew projects and their associated recipe jobs
  • S3 bucket
  • CloudFormation stack

Conclusion

In this post, we walked through how to use DataBrew along with EventBridge and Step Functions to run a DataBrew job that automatically triggers another DataBrew job. We encourage you to use this pattern for event-driven pipelines where you can build sequence jobs to run multiple jobs in conjunction with other jobs.


About the Authors

Nipun Chagari is a Senior Solutions Architect at AWS, where he helps customers build highly available, scalable, and resilient applications on the AWS Cloud. He is passionate about helping customers adopt serverless technology to meet their business objectives.

Prarthana Angadi is a Software Development Engineer II at AWS, where she has been expanding what is possible with code in order to make life more efficient for AWS customers.

Extending PowerShell on AWS Lambda with other services

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/extending-powershell-on-aws-lambda-with-other-services/

This post expands on the functionality introduced with the PowerShell custom runtime for AWS Lambda. The previous blog explains how the custom runtime approach makes it easier to run Lambda functions written in PowerShell.

You can add additional functionality to your PowerShell serverless applications by importing PowerShell modules, which are shareable packages of code. Build your own modules or import from the wide variety of existing vendor modules to manage your infrastructure and applications.

You can also take advantage of the event-driven nature of Lambda, which allows you to run Lambda functions in response to events. Events can include an object being uploaded to Amazon S3, a message placed on an Amazon SQS queue, a scheduled task using Amazon EventBridge, or an HTTP request from Amazon API Gateway. Lambda functions support event triggers from over 200 AWS services and software as a service (SaaS) applications.

Adding PowerShell modules

You can add PowerShell modules from a number of locations. These can include modules from the AWS Tools for PowerShell, from the PowerShell Gallery, or your own custom modules. Lambda functions access these PowerShell modules within specific folders within the Lambda runtime environment.

You can include PowerShell modules via Lambda layers, within your function code package, or container image. When using .zip archive functions, you can use layers to package and share modules to use with your functions. Layers reduce the size of uploaded deployment archives and can make it faster to deploy your code. You can attach up to five layers to your function, one of which must be the PowerShell custom runtime layer. You can include multiple modules per layer.

The custom runtime configures PowerShell’s PSModulePath environment variable, which contains the list of folder locations to search to find modules. The runtime searches the folders in the following order:

1. User supplied modules as part of function package

You can include PowerShell modules inside the published Lambda function package in a /modules subfolder.

2. User supplied modules as part of Lambda layers

You can publish Lambda layers that include PowerShell modules in a /modules subfolder. This allows you to share modules across functions and accounts. Lambda extracts layers to /opt within the Lambda runtime environment so the modules are located in /opt/modules. This is the preferred solution to use modules with multiple functions.

3. Default/user supplied modules supplied with PowerShell

You can also include additional default modules and add them within a /modules folder within the PowerShell custom runtime layer.

For example, the following function includes four Lambda layers. One layer includes the custom runtime. Three additional layers include further PowerShell modules; the AWS Tools for PowerShell, your own custom modules, and third-party modules. You can also include additional modules with your function code.

Lambda layers

Lambda layers

Within your PowerShell code, you can load modules during the function initialization (init) phase. This initializes the modules before the handler function runs, which speeds up subsequent warm-start invocations.

Adding modules from the AWS Tools for PowerShell

This post shows how to use the AWS Tools for PowerShell to manage your AWS services and resources. The tools are packaged as a set of PowerShell modules that are built on the functionality exposed by the AWS SDK for .NET. You can follow similar packaging steps to add other modules to your functions.

The AWS Tools for PowerShell are available as three distinct packages:

The AWS.Tools package is the preferred modularized version, which allows you to load only the modules for the services you want to use. This reduces package size and function memory usage. The AWS.Tools cmdlets support auto-importing modules without having to call Import-Module first. However, specifically importing the modules during the function init phase is more efficient and can reduce subsequent invoke duration. The AWS.Tools.Common module is required and provides cmdlets for configuration and authentication that are not service specific.

The accompanying GitHub repository contains the code for the custom runtime, along with a number of example applications. There are also module build instructions for adding a number of common PowerShell modules as Lambda layers, including AWS.Tools.

Building an event-driven PowerShell function

The repository contains an example of an event-driven demo application that you can build using serverless services.

A clothing printing company must manage its t-shirt size and color inventory. The printers store t-shirt orders for each day in a CSV file. The inventory service is one service that must receive the CSV file. It parses the file and, for each order, records the details to manage stock deliveries.

The stores upload the files to S3. This automatically invokes a PowerShell Lambda function, which is configured to respond to the S3 ObjectCreated event. The Lambda function receives the S3 object location as part of the $LambdaInput event object. It uses the AWS Tools for PowerShell to download the file from S3. It parses the contents and, for each line in the CSV file, sends the individual order details as an event to an EventBridge event bus.

In this example, there is a single rule to log the event to Amazon CloudWatch Logs to show the received event. However, you could route each order, depending on the order details, to different targets. For example, you can send different color combinations to SQS queues, which the dyeing service can use to order dyes. You could send particular size combinations to another Lambda function that manages cloth orders.

Example event-driven application

Example event-driven application

The previous blog post shows how to use the AWS Serverless Application Model (AWS SAM) to build a Lambda layer, which includes only the AWS.Tools.Common module to run Get-AWSRegion. To build a PowerShell application to process objects from S3 and send events to EventBridge, you can extend this functionality by also including the AWS.Tools.S3 and AWS.Tools.EventBridge modules in a Lambda layer.

Lambda layers, including S3 and EventBridge

Lambda layers, including S3 and EventBridge

Building the AWS Tools for PowerShell layer

You could choose to add these modules and rebuild the existing layer. However, the example in this post creates a new Lambda layer to show how you can have different layers for different module combinations of AWS.Tools. The example also adds the Lambda layer Amazon Resource Name (ARN) to AWS Systems Manager Parameter Store to track deployed layers. This allows you to reference them more easily in infrastructure as code tools.

The repository includes build scripts for both Windows and non-Windows developers. Windows does not natively support Makefiles. When using Windows, you can use either Windows Subsystem for Linux (WSL)Docker Desktop, or native PowerShell.

When using Linux, macOS, WSL, or Docker, the Makefile builds the Lambda layers. After downloading the modules, it also extracts the additional AWS.Tools.S3 and AWS.Tools.EventBridge modules.

# Download AWSToolsLayer module binaries
curl -L -o $(ARTIFACTS_DIR)/AWS.Tools.zip https://sdk-for-net.amazonwebservices.com/ps/v4/latest/AWS.Tools.zip
mkdir -p $(ARTIFACTS_DIR)/modules

# Extract select AWS.Tools modules (AWS.Tools.Common required)
unzip $(ARTIFACTS_DIR)/AWS.Tools.zip 'AWS.Tools.Common/**/*' -d $(ARTIFACTS_DIR)/modules/
unzip $(ARTIFACTS_DIR)/AWS.Tools.zip 'AWS.Tools.S3/**/*' -d $(ARTIFACTS_DIR)/modules/
unzip $(ARTIFACTS_DIR)/AWS.Tools.zip 'AWS.Tools.EventBridge/**/*' -d $(ARTIFACTS_DIR)/modules/

When using native PowerShell on Windows to build the layer, the build-AWSToolsLayer.ps1 script performs the same file copy functionality as the Makefile. You can use this option for Windows without WSL or Docker.

### Extract entire AWS.Tools modules to stage area but only move over select modules
…
Move-Item "$PSScriptRoot\stage\AWS.Tools.Common" "$PSScriptRoot\modules\" -Force
Move-Item "$PSScriptRoot\stage\AWS.Tools.S3" "$PSScriptRoot\modules\" -Force
Move-Item "$PSScriptRoot\stage\AWS.Tools.EventBridge" "$PSScriptRoot\modules\" -Force

The Lambda function code imports the required modules in the function init phase.

Import-Module "AWS.Tools.Common"
Import-Module "AWS.Tools.S3"
Import-Module "AWS.Tools.EventBridge"

For other combinations of AWS.Tools, amend the example build-AWSToolsLayer.ps1 scripts to add the modules you require. You can use a similar download and copy process, or PowerShell’s Save-Module to build layers for modules from other locations.

Building and deploying the event-driven serverless application

Follow the instructions in the GitHub repository to build and deploy the application.

The demo application uses AWS SAM to deploy the following resources:

  1. PowerShell custom runtime.
  2. Additional Lambda layer containing the AWS.Tools.Common, AWS.Tools.S3, and AWS.Tools.EventBridge modules from AWS Tools for PowerShell. The layer ARN is stored in Parameter Store.
  3. S3 bucket to store CSV files.
  4. Lambda function triggered by S3 upload.
  5. Custom EventBridge event bus and rule to send events to CloudWatch Logs.

Testing the event-driven application

Use the AWS CLI or AWS Tools for PowerShell to copy the sample CSV file to S3. Replace BUCKET_NAME with your S3 SourceBucket Name from the AWS SAM outputs.

AWS CLI

aws s3 cp .\test.csv s3://BUCKET_NAME

AWS Tools for PowerShell

Write-S3Object -BucketName BUCKET_NAME -File .\test.csv

The S3 file copy action generates an S3 notification event. This invokes the PowerShell Lambda function, passing the S3 file location details as part of the function $LambdaInput event object.

The function downloads the S3 CSV file, parses the contents, and sends the individual lines to EventBridge, which logs the events to CloudWatch Logs.

Navigate to the CloudWatch Logs group /aws/events/demo-s3-lambda-eventbridge.

You can see the individual orders logged from the CSV file.

EventBridge logs showing CSV lines

EventBridge logs showing CSV lines

Conclusion

You can extend PowerShell Lambda applications to provide additional functionality.

This post shows how to import your own or vendor PowerShell modules and explains how to build Lambda layers for the AWS Tools for PowerShell.

You can also take advantage of the event-driven nature of Lambda to run Lambda functions in response to events. The demo application shows how a clothing printing company builds a PowerShell serverless application to manage its t-shirt size and color inventory.

See the accompanying GitHub repository, which contains the code for the custom runtime, along with additional installation options and additional examples.

Start running PowerShell on Lambda today.

For more serverless learning resources, visit Serverless Land.

Optimize Your Media Production Workflow With iconik, LucidLink, and Backblaze B2

Post Syndicated from Pat Patterson original https://www.backblaze.com/blog/optimize-your-media-production-workflow-with-iconik-lucidlink-and-backblaze-b2/

In late April, thousands of professionals from all corners of the media, entertainment, and technology ecosystem assembled in Las Vegas for the National Association of Broadcasters trade show, better known as the NAB Show. We were delighted to sponsor NAB after its two year hiatus due to COVID-19. Our staff came in blazing hot and ready to hit the tradeshow floor.

One of the stars of the 2022 event was Backblaze partner LucidLink, named a Cloud Computing and Storage category winner in the NAB Show Product of the Year Awards. In this blog post, I’ll explain how to combine LucidLink’s Filespaces product with Backblaze B2 Cloud Storage and media asset management from iconik, another Backblaze partner, to optimize your media production workflow. But first, some context…

How iconik, LucidLink, and Backblaze B2 Fit in a Media Storage Architecture

The media and entertainment industry has always been a natural fit for Backblaze. Some of our first Backblaze Computer Backup customers were creative professionals looking to protect their work, and the launch of Backblaze B2 opened up new options for archiving, backing up, and distributing media assets.

As the media and entertainment industry moved to 4K Ultra HD for digital video recording over the past few years, file sizes ballooned. An hour of high quality 4K video shot at 60 frames per second can require up to one terabyte of storage. Backblaze B2 matches well with today’s media and entertainment storage demands, as customers such as Fortune Media, Complex Networks, and Alton Brown of “Good Eats” fame have discovered.

Alongside Backblaze B2, an ecosystem of tools has emerged to help professionals manage their media assets, including iconik and LucidLink. iconik’s cloud-native media management and collaboration solution gathers and organizes media securely from a wide range of locations, including Backblaze B2. iconik can scan and index content from a Backblaze B2 bucket, creating an asset for each file. An iconik asset can combine a lower resolution proxy with a link to the original full-resolution file in Backblaze B2. For a large part of the process, the production team can work quickly and easily with these proxy files, previewing and selecting clips and editing them into a sequence.

Complementing iconik and B2 Cloud Storage, LucidLink provides a high-performance, cloud-native, network-attached storage (NAS) solution that allows professionals to collaborate on files stored in the cloud almost as if the files were on their local machine. With LucidLink, a production team can work with multi-terabyte 4K resolution video files, making final edits and rendering the finished product at full resolution.

It’s important to understand that the video editing process is non-destructive. The original video files are immutable—they are never altered during the production process. As the production team “edits” a sequence, they are actually creating a series of transformations that are applied to the original videos as the final product is rendered.

You can think of B2 Cloud Storage and LucidLink as tiers in a media storage architecture. Backblaze B2 excels at cost-effective, durable storage of full-resolution video assets through their entire lifetime from acquisition to archive, while LucidLink shines during the later stages of the production process, from when the team transitions to working with the original full-resolution files to the final rendering of the sequence for release.

iconik brings B2 Cloud Storage and LucidLink together; not only can an iconik asset include a proxy and links to copies of the original video in both B2 Cloud Storage and LucidLink, iconik Storage Gateway can copy the original file from Backblaze B2 to LucidLink when full-resolution work commences, and later delete the LucidLink copy at the end of the production process, leaving the original archived in Backblaze B2. All that’s missing is a little orchestration.

The Backblaze B2 Storage Plugin for iconik

The Backblaze B2 Storage Plugin for iconik allows creative professionals to copy files from B2 Cloud Storage to LucidLink, and later delete them from LucidLink, in a couple of mouse clicks. The plugin adds a pair of custom actions to iconik: “Add to LucidLink” and “Remove from LucidLink,” applicable to one or many assets or collections, accessible from the Search page and the Asset/Collection page. You can see them on the lower right of this screenshot:

The user experience could hardly be simpler, but there is a lot going on under the covers.

There are several components involved:

  • The plugin, deployed as a serverless function. The initial version of the plugin is written in Python for deployment on Google Cloud Functions, but it could easily be adapted for other serverless cloud platforms.
  • A LucidLink Filespace.
  • A machine with both the LucidLink client and iconik Storage Gateway installed. The iconik Storage Gateway accesses the LucidLink Filespace as if it were local file storage.
  • iconik, accessed both by the user via its web interface and by the plugin via the iconik API. iconik is configured with two iconik “storages”, one for Backblaze B2 and one for the iconik Storage Gateway instance.

When the user selects the “Add to LucidLink” custom action, iconik sends an HTTP request, containing the list of selected entities, to the plugin. The plugin calls the iconik API with a request to copy those entities from Backblaze B2 to the iconik Storage Gateway. The gateway writes the files to the LucidLink Filespace, exactly as if it were writing to the local disk, and the LucidLink client sends the files to LucidLink. Now the full-resolution files are available for the production team to access in the Filespace, while the originals remain in B2 Cloud Storage.

Later, when the user selects the “Remove from LucidLink” custom action, iconik sends another HTTP request containing the list of selected entities to the plugin. This time, the plugin has more work to do. Collections can contain other collections as well as assets, so the plugin must access each collection in turn, calling the iconik API for each file in the collection to request that it be deleted from the iconik Storage Gateway. The gateway simply deletes each file from the Filespace, and the LucidLink client relays those operations to LucidLink. Now the files are no longer stored in the Filespace, but the originals remain in B2 Cloud Storage, safely archived for future use.

This short video shows the plugin in action, and walks through the flow in a little more detail:

Deploying the Backblaze B2 Storage Plugin for iconik

The plugin is available open-source under the MIT license at https://github.com/backblaze-b2-samples/b2-iconik-plugin. Full deployment instructions are included in the plugin’s README file.

Don’t have a Backblaze B2 account? You can get started here, and the first 10GB are on us. We can also set up larger scale trials involving terabytes of storage—enter your details and we’ll get back to you right away.

Customize the Plugin to Your Requirements

You can use the plugin as is, or modify it to your requirements. For example, the plugin is written to be deployed on Google Cloud Functions, but you could adapt it to another serverless cloud platform. Please report any issues with the plugin via the issues tab in the GitHub repository, and feel free to submit contributions via pull requests.

The post Optimize Your Media Production Workflow With iconik, LucidLink, and Backblaze B2 appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

AWS Security Profile: CJ Moses, CISO of AWS

Post Syndicated from Maddie Bacon original https://aws.amazon.com/blogs/security/aws_security_profile_cj_moses_ciso_of_aws/

AWS Security Profile: CJ Moses, CISO of AWS

In the AWS Security Profile series, I interview the people who work in Amazon Web Services (AWS) Security and help keep our customers safe and secure. This interview is with CJ Moses—previously the AWS Deputy Chief Information Security Officer (CISO), he began his role as CISO of AWS in February of 2022.

How did you get started in security? What about it piqued your interest?

I was serving in the United States Air Force (USAF), attached to the 552nd Airborne Warning and Control (AWACS) Wing, when my father became ill. The USAF reassigned me to McGuire Air Force Base (AFB) in New Jersey so that I’d be closer to him in New York. Because I was an unplanned resource, they added me to the squadron responsible for base communications. I ended up being the Base CompuSec (Computer Security) Manager, who was essentially the person who had to figure out what a firewall was and how to install it. That role required me to have a lot of interaction with the Air Force Office of Special Investigations (AFOSI), which led to me being recruited as a Computer Crime Investigator (CCI). Normally, when I’m asked what kind of plan I followed to get where I am today, I like to say, one modeled after Forrest Gump.

How has your time in the Air Force influenced your approach to cybersecurity?

It provided a strong foundation that I’ve built on with each and every experience since. My years as a CCI had me chasing hackers around the world on what was the “Wild West” of the internet. I’ve been kicked out of countries, asked (told) never to come back to others, but in the end the thing that stuck is that there is always a human on the other side of the connection. Keyboards don’t type for themselves, and therefore understanding your opponent and their intent will inform the measures you must put in place to deal with them. In the early days, we were investigating Advanced Persistent Threats (APTs) long before anyone had created that acronym, or given the actors names or fancy number designators. I like to use that experience to humanize the threats we face.

You were recently promoted to CISO of AWS. What are you most excited about in your new role?

I’m most excited by the team we have at AWS, not only the security team I’m inheriting, but also across AWS. As a CISO, it’s a dream to have an organization that truly believes security is the top priority, which is what we have at AWS. This company has a strong culture of ownership, which allows the security team to partner with the service owners to enable their business, rather than being the office of, “no, you can’t do that.” I prefer my team to answer questions with “Yes, but” or “Yes, and,” and then talk about how they can do what they need in a more secure manner.

What’s the most challenging part of being CISO?

There’s a right balance I’m working to find between how much time I’m able to spend focusing on the details and doing security, and communicating with customers about what we do. I lean on our Office of the CISO (OCISO) team to make sure we keep up a high level of customer engagement. I strive to keep the right balance between involvement in details, leading our security efforts, and engaging with our customers.

What’s your short- and long-term vision for AWS Security?

In the short term, my vision is to continue on the strong path that Steve Schmidt, former CISO of AWS and current chief security officer of Amazon, provided. In the longer term, I intend to further mechanize, automate, and scale our abilities, while increasing visibility and access for our customers.

If you could give one piece of advice to all AWS customers at scale, what would it be?

My advice to customers is to take advantage of the robust security services and resources we offer. We have a lot of content that is available for little to no cost, and an informed customer is less likely to encounter challenging security situations. Enabling Amazon GuardDuty on a customer’s account can be done with only a few clicks, and the threat detection monitoring it offers will provide organization-wide visibility and alerting.

What’s been the most dramatic change you’ve seen in the industry?

The most dramatic change I’ve seen is the elevated visibility of risk to the C-suite. These challenges used to be delegated lower in the organization to someone, maybe the CISO, who reported to the chief information officer. In companies that have evolved, you’ll find that the CISO reports to the CEO, with regular visibility to the board of directors. This prioritization of information security ensures the right level of ownership throughout the company.

Tell me about your work with military veterans. What drives your passion for this cause?

I’ve aligned with an organization, Operation Motorsport, that uses motorsports to engage with ill, injured, and wounded service members and disabled veterans. We present them with educational and industry opportunities to aid in their recovery and rehabilitation. Over the past few years we’ve sponsored a number of service members across our race teams, and I’ve personally seen the physical, and even more importantly, mental improvements for the beneficiaries who have become part of our race teams. Having started my military career during Operation Desert Shield/Storm (the buildup to and the first Gulf War), I can connect with these vets and help them to find a path and a new team to be part of.

If you had to pick any other industry, what would you want to do?

Professional motorsports. There is an incredible and not often visible alignment between the two industries. The use of data analytics (metrics focus), the culture, leadership principles, and overall drive to succeed are in complete alignment, and I’ve applied lessons learned between the two interchangeably.

What are you most proud of in your career?

I am very fortunate to come from rather humble beginnings and I’m appreciative of all the opportunities provided for me. Through those opportunities, I’ve had the chance to serve my country and, since joining AWS, to serve many customers across disparate industries and geographies. The ability to help people is something I’m passionate about, and I’m lucky enough to align my personal abilities with roles that I can use to leave the world a better place than I found it.

 
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Author

Maddie Bacon

Maddie (she/her) is a technical writer for AWS Security with a passion for creating meaningful content. She previously worked as a security reporter and editor at TechTarget and has a BA in Mathematics. In her spare time, she enjoys reading, traveling, and all things Harry Potter.

CJ Moses

CJ Moses

CJ Moses is the Chief Information Security Officer (CISO) at AWS. In his role, CJ leads product design and security engineering for AWS. His mission is to deliver the economic and security benefits of cloud computing to business and government customers. Prior to joining Amazon in 2007, CJ led the technical analysis of computer and network intrusion efforts at the U.S. Federal Bureau of Investigation Cyber Division. CJ also served as a Special Agent with the U.S. Air Force Office of Special Investigations (AFOSI). CJ led several computer intrusion investigations seen as foundational to the information security industry today.

[$] Per-file OOM badness

Post Syndicated from original https://lwn.net/Articles/896738/

The kernel tries hard to keep memory available for its present and future
needs. Should that effort fail, though, the tool of last resort is the
dreaded out-of-memory (OOM) killer, which is tasked with killing processes
on the system to free their memory and alleviate the problem. The results
of invoking the OOM killer are never going to be good, but they can be
distinctly worse if the wrong processes are chosen for an untimely end. As
one might expect, the effort to properly choose the right processes is an
ongoing effort. Most recently, Christian
König has proposed a
new mechanism
to address a blind spot in the OOM killer’s
deliberations.

Mazzoli: How fast are Linux pipes anyway?

Post Syndicated from original https://lwn.net/Articles/896872/

Francesco Mazzoli delves
deeply into the kernel’s implementation of pipes
(and more) in an
attempt to maximize the throughput of data.

The post was inspired by reading a highly optimized FizzBuzz
program, which pushes output to a pipe at a rate of ~35GiB/s on my
laptop. Our first goal will be to match that speed, explaining
every step as we go along. We’ll also add an additional
performance-improving measure, which is not needed in FizzBuzz
since the bottleneck is actually computing the output, not IO, at
least on my machine.

Security updates for Thursday

Post Syndicated from original https://lwn.net/Articles/896896/

Security updates have been issued by Debian (firefox-esr), Fedora (thunderbird and vim), Red Hat (firefox, postgresql:10, postgresql:12, and postgresql:13), Scientific Linux (firefox and rsyslog), SUSE (hdf5, hdf5, suse-hpc, postgresql14, rubygem-yajl-ruby, and udisks2), and Ubuntu (imagemagick and influxdb).

The Average SIEM Deployment Takes 6 Months. Don’t Be Average.

Post Syndicated from Margaret Wei original https://blog.rapid7.com/2022/06/02/the-average-siem-deployment-takes-6-months-dont-be-average/

The Average SIEM Deployment Takes 6 Months. Don’t Be Average.

If you’re part of the huge growth in demand for cloud-based SIEM (Security Information and Event Management), claim your copy of the new Gartner® Report: “How to Deploy a SIEM Solution Successfully.”

Depending on what SIEM you choose, and how you approach the process, getting to operational and effective can take days, or months, or a lot longer.

Here are the Gartner report’s key findings:

  1. “Ineffective security information and event management (SIEM) deployments occur when requirements and use cases are not aligned with the organization’s risks and risk tolerance.”
  2. “Clients deploying SIEM solutions continue to take an unstructured approach when deciding which event and data sources to onboard, with the goal of getting every source in from the beginning. This leads to long and complex implementations, cost overruns, and higher probabilities of stalled or failed implementations.”
  3. “SIEM buyers struggle to choose between on-premises, cloud, or hybrid deployments due to the complexities created by the various environments that need to be monitored, e.g., on-premises, SaaS, cloud infrastructure and platform services (CIPS), remote workers.”

SIEM centralizes and visualizes your security data to help you identify anomalies in your environment. But nearly all SIEMs require you to do a ton of customizing and configuration. Nearly all disappoint with their detections. And nearly all will exhaust you with false-positive alerts… every hour of every day… until analysts start ignoring alerts, which will surely doom you someday.

Now, here’s what we think

Rapid7 began building InsightIDR nearly a decade ago. While the threat landscape keeps changing, our mission never has: to empower you to find and extinguish evil earlier, faster, easier.

InsightIDR has never been a traditional SIEM. You should consider it if:

Fast deployment is a priority to you. InsightIDR leads the SIEM market in deployment times. With SaaS delivery and a native cloud foundation, customers can be deployed and operational in days and weeks – not months and years.

Time-to-value and tangible ROI matter to your leadership team. InsightIDR combines the best of next-gen SIEM with native extended detection and response (XDR). Get highly correlated UEBA, EDR, NDR, and Cloud detections alongside your critical security logs and policy monitoring, compliance dashboards, and reporting in a single pane of glass.

Your team is tired of false positives. InsightIDR’s expertly vetted detection library provides holistic threat coverage across your entire attack surface. An emphasis on high-fidelity, low-noise detections ensures that all alerts are relevant and ready for action.

You’re ready to accelerate your security posture. InsightIDR empowers teams to up-level their security and achieve sophisticated outcomes – without the complexity of traditional SIEMs. Embedded security orchestration and automation (SOAR) capabilities give you enviable security operations center (SOC) automation and enable even new analysts to respond like experts.

Don’t forget your copy of the new Gartner® Report: “How to Deploy a SIEM Solution Successfully.”

GARTNER is a registered trademark and service mark of Gartner, Inc. and/or its affiliates in the U.S. and internationally and is used herein with permission. All rights reserved.

Gartner, How to Deploy a SIEM Solution Successfully, Andrew Davies, Mitchell Schneider, Toby Bussa, Kelly Kavanagh, 7 July 2021

Additional reading:

NEVER MISS A BLOG

Get the latest stories, expertise, and news about security today.

Optimizing your AWS Lambda costs – Part 2

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/optimizing-your-aws-lambda-costs-part-2/

This post is written by Chris Williams, Solutions Architect and Thomas Moore, Solutions Architect, Serverless.

Part 1 of this blog series looks at optimizing AWS Lambda costs through right-sizing a function’s memory, and code tuning. We also explore how using Graviton2, Provisioned Concurrency and Compute Savings Plans can offer a reduction in per-millisecond billing.

Part 2 continues to explore cost optimization techniques for Lambda with a focus on architectural improvements and cost-effective logging.

Event filtering

A common serverless architecture pattern is Lambda reading events from a queue or a stream, such as Amazon SQS or Amazon Kinesis Data Streams. This uses an event source mapping, which defines how the Lambda service handles incoming messages or records from the event source.

Event filtering

Sometimes you don’t want to process every message in the queue or stream because the data is not relevant. For example, if IoT vehicle data is sent to a Kinesis Stream and you only want to process events where tire_pressure is < 32, then the Lambda code may look like this:

def lambda_handler(event, context):
   if(event[“tire_pressure”] >=32):
      return

# business logic goes here

This is inefficient as you are paying for Lambda invocations and execution time when there is no business value beyond filtering.

Lambda now supports the ability to filter messages before invocation, simplifying your code and reducing costs. You only pay for Lambda when the event matches the filter criteria and triggers an invocation.

Filtering is supported for Kinesis Streams, Amazon DynamoDB Streams and SQS by specifying filter criteria when setting up the event source mapping. For example, using the following AWS CLI command:

aws lambda create-event-source-mapping \
--function-name fleet-tire-pressure-evaluator \
--batch-size 100 \
--starting-position LATEST \
--event-source-arn arn:aws:kinesis:us-east-1:123456789012:stream/fleet-telemetry \
--filter-criteria '{"Filters": [{"Pattern": "{\"tire_pressure\": [{\"numeric\": [\"<\", 32]}]}"}]}'

After applying the filter, Lambda is only invoked when tire_pressure is less than 32 in messages received from the Kinesis Stream. In this example, it may indicate a problem with the vehicle and require attention.

For more information on how to create filters, refer to examples of event pattern rules in EventBridge, as Lambda filters messages in the same way. Event filtering is explored in greater detail in the Lambda event filtering launch blog.

Avoid idle wait time

Lambda function duration is one dimension used for calculating billing. When function code makes a blocking call, you are billed for the time that it waits to receive a response.

This idle wait time can grow when Lambda functions are chained together, or a function is acting as an orchestrator for other functions. For customers who have workflows such as batch operations or order delivery systems, this adds management overhead. Additionally, it may not be possible to complete all workflow logic and error handling within the maximum Lambda timeout of 15 minutes.

Instead of handling this logic in function code, re-architect your solution to use AWS Step Functions as an orchestrator of the workflow. When using a standard workflow, you are billed for each state transition within the workflow rather than the total duration of the workflow. In addition, you can move support for retries, wait conditions, error workflows and callbacks into the state condition allowing your Lambda functions to focus on business logic.

The following example shows an example Step Functions state machine, where a single Lambda function is split into multiple states. During the wait period, there is no charge. You are only billed on state transition.

State machine

Direct integrations

If a Lambda function is not performing custom logic when it integrates with other AWS services, it may be unnecessary and could be replaced by a lower-cost direct integration.

For example, you may be using API Gateway together with a Lambda function to read from a DynamoDB table:

With Lambda

This could be replaced using a direct integration, removing the Lambda function:

Without Lambda

API Gateway supports transformations to return the output response in a format the client expects. This avoids having to use a Lambda function to do the transformation. You can find more detailed instructions on creating an API Gateway with an AWS service integration in the documentation.

You can also benefit from direct integration when using Step Functions. Today, Step Functions supports over 200 AWS services and 9,000 API actions. This gives greater flexibility for direct service integration and in many cases removes the need for a proxy Lambda function. This can simplify Step Function workflows and may reduce compute costs.

Reduce logging output

Lambda automatically stores logs that the function code generates through Amazon CloudWatch Logs. This may be useful for understanding what is happening within your application in near real-time. CloudWatch Logs includes a charge for the total data ingested throughout the month. Therefore, reducing output to include only necessary information can help reduce costs.

When you deploy workloads into production, review the logging level of your application. For example, in a pre-production environment, debug logs can be beneficial in providing additional information to tune the function. Within your production workloads, you may disable debug level logs and use a logging library (such as the Lambda Powertools Python Logger). You can define a minimum logging level to output by using an environment variable, which allows configuration outside of the function code.

Structuring your log format enforces a standard set of information through a defined schema, instead of allowing variable formats or large volumes of text. Defining structures such as error codes and adding accompanying metrics leads to a reduction in the volume of text that repeats throughout your logs. This also improves the ability to filter logs for specific error types and reduces the risk of a mistyped character in a log message.

Use Cost-Effective Storage for Logs

Once CloudWatch Logs ingests data, by default it is persisted forever with a per-GB monthly storage fee. As log data ages, it typically becomes less valuable in the immediate time frame, and is instead reviewed historically on an ad-hoc basis. However, the storage pricing within CloudWatch Logs remains the same.

To avoid this, set retention policies on your CloudWatch Logs log groups to delete old log data automatically. This retention policy applies to both your existing and future log data.

Some applications logs may need to persist for months or years for compliance or regulatory requirements. Instead of keeping the logs in CloudWatch Logs, export them to Amazon S3. By doing this, you can take advantage of lower-cost storage object classes while factoring in any expected usage patterns for how or when data is accessed.

Conclusion

Cost optimization is an important part of creating well-architected solutions and this is no different when using serverless. This blog series explores some best practice techniques to help reduce your Lambda bill.

If you are already running AWS Lambda applications in production today, some techniques are easier to implement than others. For example, you can purchase Savings Plans with zero code or architecture changes, whereas avoiding idle wait time will require new services and code changes. Evaluate which technique is right for your workload in a development environment before applying changes to production.

If you are still in the design and development stages, use this blog series as a reference to incorporate these cost optimization techniques at an early stage. This ensures that your solutions are optimized from day one.

To get hands-on implementing some of the techniques discussed, take the Serverless Optimization Workshop.

For more serverless learning resources, visit Serverless Land.

Optimizing your AWS Lambda costs – Part 1

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/optimizing-your-aws-lambda-costs-part-1/

This post is written by Chris Williams, Solutions Architect and Thomas Moore, Solutions Architect, Serverless.

When you develop and architect solutions, cost-optimization should always be part of the process. This is no different for serverless applications that have been developed using AWS Lambda.

Your workloads may vary in terms of complexity, usage patterns, and technology. However, the following advice is applicable to all customers when deciding how to prioritize cost optimization with the tradeoffs of developing the application:

  • Efficient code makes better use of resources.
  • Consider the downstream services in architectural decisions.
  • Optimization should be a continuous cycle of improvement.
  • Prioritize changes that make the greatest improvements first.

The Optimizing your AWS Lambda costs blog series reviews operational and architectural guidance. It can be applied to existing Lambda functions and those that you develop in the future.

Introduction to Lambda pricing

Lambda pricing is calculated as a combination of:

  • Total number of requests
  • Total duration of invocations
  • Configured memory allocated

When you optimize Lambda functions, each of these components impacts the total monthly cost. This pricing applies after you exceed the AWS Free Tier that is offered for Lambda.

Right-Sizing

Right-sizing is a good starting point in the process of cost optimization. This exercise helps to identify the lowest cost for applications without affecting performance or requiring code changes.

For Lambda, this is accomplished by configuring the memory for a function, ranging anywhere from 128 MB up to 10,240 MB (10 GB). By adjusting the memory configuration, you also adjust the amount of vCPU that is available to the function during invocation. Tuning these settings provides memory- or CPU-bound applications with access to additional resources during the execution, which may lead to an overall reduced duration of invocation.

Identifying the optimal configuration for your Lambda functions may be manually intensive, especially if changes are made frequently. The AWS Lambda Power Tuning tool provides a solution powered by AWS Step Functions that can help identify the appropriate configuration. It analyzes a set of memory configurations against an example payload.

In the following example, as the memory increases for this Lambda function, the total invocation time improves. This leads to a reduction in the cost for the total execution without affecting the original performance of the function. For this function, the optimal memory configuration for the function is 512 MB, as this is where the resource utilization is most efficient for the total cost of each invocation. This varies per function, and using the tool on your Lambda functions can identify if they benefit from right-sizing.

Power Tuning results

This exercise should be completed regularly, especially if code is released frequently. Once you have identified the appropriate memory setting for your Lambda functions, you should add right-sizing to your processes. The AWS Lambda Power Tuning tool generates programmatic output that can be used by your CI/CD workflows during the release of new code, allowing the automation of memory configuration.

Performance efficiency

A key component to Lambda pricing is the total duration of the invocation. The longer the function takes to run, the more it costs and the higher the latency in your application. For this reason, it is important to ensure the code you write is as efficient as possible and follows the Lambda best practices.

At a high level, to optimize code :

  • Minimize deployment package size to its runtime necessities. This reduces the amount of time it takes for the package to be downloaded and unpacked.
  • Minimize the complexity of dependencies. Simpler frameworks often load faster.
  • Take advantage of execution reuse. Initialize SDK clients and database connections outside the function handler, and cache static assets locally in the /tmp directory. Subsequent invocations can reuse open connections and resources in memory and in /tmp.
  • Follow general coding performance best practices for your chosen language and runtime.

To help visualize the components of your application and identify performance bottlenecks, use AWS X-Ray with Lambda. You can enable X-Ray active tracing on new and existing functions by editing the function configuration. For example, with the AWS CLI:

aws lambda update-function-configuration --function-name my-function \
--tracing-config Mode=Active

The AWS X-Ray SDK can be used to trace all AWS SDK calls inside Lambda functions. This helps to identify any bottlenecks in the application performance. The X-Ray SDK for Python can be used to capture data for other libraries such as requests, sqlite3, and httplib, as shown in the following example:

Segments timeline

Amazon CodeGuru Profiler is another tool that can help with code optimization. It uses machines learning algorithms to help find the most expensive lines of code and suggests ways to improve efficiency. It can be enabled on existing Lambda functions by enabling code profiling in the function configuration.

The CodeGuru console shows the results as a series of visualizations and recommendations.

Code Guru recommendations

Use these tools together with the documented best practices to evaluate your code’s performance when developing your serverless applications. Efficient code often means faster applications and reduced costs.

AWS Graviton2

In September 2021, Lambda functions powered by Arm-based AWS Graviton2 processors became generally available. Graviton2 functions are designed to deliver up to 19% better performance at 20% lower cost than x86. In addition to the lower billing cost when using Arm, you could also see a reduction in the function duration due to the CPU performance improvement, reducing costs even further.

You can configure both new and existing functions to target the AWS Graviton2 processor. Functions are invoked in the same way and integrations with services, applications and tools are not affected by the architecture change. Many functions only need the configuration change to take advantage of the price/performance of Graviton2. Others may require repackaging to use Arm-specific dependencies.

It’s always recommended to test your workloads before making the change. To see how much your code benefits from using Graviton2, use the Lambda Power Tuning tool to compare against x86. The tool allows you to compare two results on the same chart:

AWS Lambda Power Tuning Results

Provisioned concurrency

Where customers are looking to reduce their Lambda function cold starts or avoid burst throttling, provisioned concurrency provides execution environments that are ready to be invoked. It can also reduce total Lambda cost when there is a consistent volume of traffic. This is because the provisioned concurrency pricing model offers a lower total price when it is fully used.

Similar to the standard Lambda function pricing, there are price components for total requests, total duration, and memory configuration. In addition, there is the cost of each provisioned concurrency environment (based on its memory configuration). When this execution environment is fully utilized, the combined cost of the invocation and the execution environment can offer up to 16% savings on duration cost when compared to the regular on-demand pricing.

If you cannot maximize usage in an execution environment, provisioned concurrency can still offer a lower total price per invocation. In the following example, once it’s consumed for more than 60% of the available time, it becomes cheaper than using the on-demand pricing model. The savings increase in line with capacity usage.

PC vs on-demand comparison

To identify the invocation baseline of a Lambda function, look at the average concurrent execution metrics per hour over the previous 24 hours. This helps you to find a consistent baseline throughout the day where you are consistently using multiple execution environments.

For Lambda functions where peak invocation levels are only expected during particular windows of time, take advantage of a scheduled scaling action. Where traffic patterns are not easy to determine, you can implement Application Auto Scaling to adjust based on the current level of utilization.

Compute savings plans

AWS Savings Plans is a flexible pricing model offering lower prices compared to on-demand pricing, in exchange for a specific usage commitment (measured in $/hour) for a one- or three-year period.

Compute Savings Plans include Amazon EC2, AWS Fargate and Lambda. Lambda usage for duration and provisioned concurrency are charged at a discounted Savings Plans rate of up to 17% for a 1- or 3-year term.

You can implement Savings Plans without any function code or configuration changes. They can be a simpler way to save money for Lambda-based workloads. Before deciding to use a savings plan, analyze previous patterns to understand any variations in your month to month usage.

Conclusion

This blog post explains how Lambda pricing works and how right-sizing applications and tuning them for performance efficiency offers a more cost-efficient utilization model. The results can also reduce latency, creating a better experience for your end users.

The post explores how architectural changes such as moving to Graviton2 and configuring provisioned concurrency can provide cost reductions for the same operations. Finally, you can use Compute Savings Plans to add an additional cost reduction once you establish a baseline of usage per month.

Part 2 introduces further optimization opportunities for reducing Lambda invocations, moving to an asynchronous model, and reducing logging costs.

For more serverless learning resources, visit Serverless Land.

Как дървената мафия си остава непокътната БСП ударно назначава в горите съпартийци и кадри на ДПС и ГЕРБ

Post Syndicated from Николай Марченко original https://bivol.bg/%D0%B1%D1%81%D0%BF-%D1%83%D0%B4%D0%B0%D1%80%D0%BD%D0%BE-%D0%BD%D0%B0%D0%B7%D0%BD%D0%B0%D1%87%D0%B0%D0%B2%D0%B0-%D0%B2-%D0%B3%D0%BE%D1%80%D0%B8%D1%82%D0%B5-%D1%81%D1%8A%D0%BF%D0%B0%D1%80%D1%82%D0%B8.html

четвъртък 2 юни 2022


Българска социалистическа партия (БСП), от чиято квота е и министърът на земеделието, храните и горите Иван Иванов, ударно си назначава областните, общинските и др. кадри на местно ниво в държавните…

Как преподаваме и говорим за комунизма

Post Syndicated from Йоанна Елми original https://toest.bg/kak-prepodavame-i-govorim-za-komunizma/

Луиза Славкова е основател и изпълнителен директор на Фондация „Софийска платформа“, която за поредна година организира лятно училище в гр. Белене по темите за паметта и демокрацията. Славкова е член на Консултативния съвет на Европейската мрежа за гражданско образование (NECE), през годините е била програмен мениджър в Европейския съвет за външна политика, гост-изследовател в Института за изследване на правата на човека към Колумбийския университет и съветник на министъра на външните работи Николай Младенов. Автор и редактор е на няколко книги и публикации, свързани с международни политики, демократично развитие и гражданско образование, и е съавтор на учебник по гражданско образование по учебната програма на новия предмет. 

Йоанна Елми разговаря с Луиза Славкова за лятното училище, за паметта за комунистическата диктатура и за важността на гражданското образование при малки и големи.


Лятното училище се провежда под надслов „Защо ни е да помним?“. Безспорно отговорът е важен и питам и за него, но първо ми е интересно нещо друго – не е ли важно и как помним? Защото българската образователна система тежко залага на помненето на определени исторически моменти, дори на механично наизустяване, но има ли смисъл от такова помнене? 

Разговорът за паметта е многопластов. И в теорията, и в практиката има различни подходи, включително радикални, според които е по-добре да не се помни нищо. Защото паметта само репродуцирала определени нагласи и никога не можело да има трайно разбирателство, особено където е имало конфликт между две групи. Ако една група от обществото е представена като палач, а друга – като жертва, помненето затвърждава тази представа. Известен представител на този подход е синът на Сюзън Зонтаг – Дейвид Рийф, който години наред отразява войните в бивша Югославия и е убеден, че е по-добре на Балканите нищо да не се помни. Аз не съм съгласна с него, но това не означава, че не обсъждаме това гледище в нашето лятно училище. Ние питаме всеки участник дали трябва да помним комунизма с неговите репресии като тоталитарен режим, или не. И оттам: ако го помним – защо да го помним?

В лятното училище не бягаме от различните пластове на разказа за комунизма, но правим разлика между наративите – които се менят – и фактите. Факт е, че комунизмът е осъден като престъпен режим редом с фашизма и националсоциализма, независимо че има българи, които никога не са чували за лагерите за политзатворници в България, тогава носещи срамното име „трудово-възпитателни общежития“ (ТВО). Носталгията по живота по време на комунизма, ако и да е измамно усещане за предвидимост, е легитимно усещане и никой от нас няма право да се присмива на тези емоции у някои хора. Но това не променя факта, че други стотици хиляди са белязани – често произволно и безвъзвратно – от режима. Така че и в лятното училище ние отваряме темата като един голям сноп, в който побираме всякакви детайли от историята на комунизма, в рамките на една седмица отсяваме и подреждаме кое е факт, кое е спомен и кое е част от държавната политика за паметта (или отсъствието на такава).

Разбира се, много по-лесно се помнят, разказват и преподават славните откъси от историята; онези, в които ние сме победители или пък жертви. Обаче как говорим за един период, в който системата на комунизма е палач, ние като общество сме и жертва, и палач, но сме и онова сиво пространство между двете крайности, където има всякакви хора – тихи свидетели на издевателства; хора, които под натиска на службите стават доносници; хора, наивно вярващи във фасадата на комунизма, и т.н. Разговорът за тази сива зона ни е особено важен. Защото нищо в живота не е само черно или бяло. И колкото по-добре разбира един млад човек, че историята на близкото минало е нещо комплексно, толкова по-вероятно е, занимавайки се с нея, да се учи на критично мислене, да размишлява за противоречията и за собствените си пристрастия, за пропагандата или за ролята на гражданското участие.

Участници в лятното училище заедно с Цветана Джерманова, оцеляла лагеристка от ТВО „Босна“ и „Белене“ © „Софийска платформа“

По Ваши наблюдения какви са промените, ако има такива, в преподаването на историята на тоталитарния режим на деца и студенти в последните години? 

В последните години се случиха много неща. Имаше промяна в учебните програми в класовете, в които се изучава съвременна българска история, и комунизмът е застъпен много по-детайлно и най-вече правдиво. В старите програми бяха изпускани термини като „Народен съд“, „Държавна сигурност“ и още много, а в така или иначе малкото отделени часове по тази тема учениците остават с усещането, че комунизмът не е някаква история, заслужаваща внимание. По време на пандемията не успяхме да проследим какво е въздействието на промяната на учебните програми, но се надявам, че с повечето часове, посветени на този период, и познанията ще набъбват.

Разбира се, преподаването е другият препъникамък. Особено учителите, живели преди 89-та, трудно могат да се дистанцират от личния си опит и да си признаят, че историята, която те са учили като педагози, е продукт на същия пропаганден апарат, който би трябвало да могат да осъдят в клас. При по-младите учители невинаги е по-различно, защото, за съжаление, в университетите ни сред историците продължава да има ядра, които не гледат на комунизма като на престъпен режим. Самите методи на преподаване пък са другата трудност – у нас не е много популярно да се преподава с интерактивни методи, а дидактическият подход все още е най-следван. Това често води и до асоцииране на историята със скучна фактология – освен когато става въпрос за история, свързана с националните ни герои Ботев и Левски, за които впрочем също рядко се знае нещо отвъд клишетата.

Засегнахте какво липсва в преподаването и какво може да се подобри. Вие правите ли усилия в тази посока? 

Усилията, които ние полагаме, са за демократизиране на самото преподаване и за постоянно създаване на актуални материали, помагала, възможности за участие в проекти в помощ на учители и местни активни граждани, които работят по темите за историята на комунизма и гражданското образование в цялата страна. В обученията си за учители наблягаме на интерактивните методи, така че в час по история или гражданско образование преподавателите да насочват учениците, да модерират процеса на работа в час, а не да държат лекции от по 45 минути. Обученията, които организираме, са базирани на тези методи, така че преподавателите изпитват сами добавената стойност на ученето през преживяване, симулация – или направо през топография, на терен. Именно това е и основен метод в работата ни в лятното училище – работа с история през преживяване на мястото, неговите обитатели сега, артефактите, запечатаните истории, спомените на очевидци.

За нас разговорите за историята на комунизма не са самоцел, те са част от опита ни да усилим значението на демократичната култура в нашето общество. Войната в Украйна е най-радикалното доказателство защо познаването на миналото е важно. То показва не само защо е от съществено значение едно общество да е демократично, в общност с други демокрации, които имат колективен ангажимент да си помагат, но и защо паметта за тоталитаризмите наистина предпазва от това една държава и най-вече гражданите ѝ да се плъзнат надолу по авторитарната крива.

Тази пролет, веднага след началото на войната в Украйна, стартирахме мащабен проект в Белене, посветен на паметта за лагера. Освен че ще разработим виртуален тур, започваме записването на разказите на малкото останали сред нас оцелели от лагера посредством нови технологии, така че техните свидетелства да останат запечатани за следващите поколения. След година вие ще можете от екраните си да водите разговор с тях и те да отговарят на въпросите ви, свързани с живота в лагера, без да сте физически на едно и също място. Тези подходи не само съхраняват паметта и запечатват гласовете на малкото оцелели сред нас, но вървят и в крак с времето и с начина, по който младите хора консумират информация.



Из сградите на о. Персина © „Софийска платформа“

Умишлено казвам „тоталитарен режим“ във въпросите си и забелязвам, че и Вие използвате тази формулировка в анонсите на събитието. Засегнахме темата за Украйна, затова питам: в България като че ли асоциираме тоталитаризма единствено с „комунизъм“, но днес виждаме например един руски тоталитарен режим, който няма общо с конкретна идеология и по-скоро заема от всичко по малко. Трудно ли се обясняват тези динамики на ученици и студенти? 

Разбира се, ние сме запознати с дискусиите около термина „тоталитарен режим“ и водим такива помежду си като екип, но и с хората, които срещаме в различните си формати. Спорът за термина е дали един режим може да е толкова репресивен, че да е тотален в подтисничеството си и да не оставя нито една брънка от личния и обществен живот незасегната. И честно казано, като чета за броя на доносниците в България по време на комунизма, като гледам снимки за абсурдните граждански ритуали, с които партията се опитва да изземе ролята на Църквата (като гражданското кръщене например), и т.н., съм склонна да кажа, че комунизмът е тоталитарен, наистина се опитва да присъства навсякъде. Затова и щетите, които нанася на гражданската ни култура, са толкова големи, че продължаваме да не можем да се отърсим от тях. И до днес имаме очаквания държавата да се погрижи за нас, и то не в онзи социалдемократичен смисъл, а с готовност жертваме някоя от свободите си за тази грижа. Това граничи с поданическите нагласи, не с гражданските, и принадлежи към друг, отдавна отминал век.

По сходен начин са белязани и руснаците, струва ми се, въпреки че не познавам Русия толкова добре. Това ги прави и лесна жертва на пропагандата, с която ги облъчва режимът на Путин. След края на Студената война антидемократичните режими трудно влизат в калъпа на идеологиите, както ги познаваме от ХХ век, и дори на нас, възрастните, ни е трудно да се ориентираме в тази нова координатна система. Помислете само колко ни е трудно да кажем какво е популизъм, защо някои държави са полудемокрации, а други – полуавторитарни. Освен че е сложно да се обясни, защото е сложно по принцип, допълнително се затрудняваме, когато средата, в която растат младите хора, е пропутински настроена. Тези нагласи не са дълбоки, те са базирани на клишета, зад които дори няма сериозни размисли. Но те са базирани и на тази наследена култура на пасивност, а Путин има образ на мъж с твърда ръка, който брани руснаците от всички реални и (най-вече) конструирани врагове.

В този дух ми направи впечатление и преподаването на „демократични ценности“. Пандемията показа според мен, че трудно разбираме границата между лична свобода и колективно добруване, между необходими за съвместния живот правила и репресия. Същевременно виждаме, че през последните години някои европейски лидери бяха по-склонни да загърбят базови ценности в името на определена realpolitik, която всъщност често пъти води до повече беди в дългосрочен план. Как обясняваме и изобщо дефинираме демократичните ценности в такива времена? Успяваме ли да излезем от клишето, че Западът е задължително демократичен, а Изтокът – задължително недемократичен? 

Аз мисля, че човек може да губи за моменти вярата си в общото благо, демокрацията, солидарността, равноправието, но това не значи, че демократичните ценности губят валидност по принцип. И да, дори сегашната ситуация с руския газ и с това кой какво е договорил с „Газпром“ може да накара човек да погледне обединена Европа с повдигната вежда. Но да капитулира тотално и цинично да каже, че всички са маскари и големите винаги печелят, е твърде елементарно. Демократичната култура, или по-скоро демократичните компетенции, които ние се опитваме да насърчаваме, не повеляват това. Те възпитават точно обратното – че демокрацията не е перфектна, но е най-свободната и най-справедлива система, която сме могли да измислим от атиняните насам. Тъй като тя е форма на управление на съжителстването ни в големи групи, няма как да бъде лесна, без конфликти, защото всички имаме различни интереси и предпочитания. Менажирането на динамиките в едно семейство е сложно, какво остава в една нация. И както нямаме очаквания отношенията в най-малките ядра да са елементарни, така не би трябвало да ги имаме и на ниво общество.

В този толкова комплексен свят е нелогично да си обясняваме нещата едновалентно или с дихотомии, въпреки че разбирам защо е примамливо, оттам и защо хората стават жертви на конспирации. Те, конспирациите, правят точно това – яхват разочарованието от сложната реалност, особено на онези, които са ощетени от нея по някакъв начин, и намират лесно, елементарно обяснение за несгодата. За съжаление обаче, рядко тези обяснения са правдиви. Те не само че създават илюзии у хората, но и елиминират изцяло собствената роля в битието, отнемат свободата на действие на човека. За мен ценностите трябва винаги с еднаква сила да се насърчават и в особено трудно моменти усещаме последствията от тяхната липса.

В работата си ние говорим повече за компетенции, отколкото за ценности. Без да навлизам в подробности, на практика компетенцията е онова, което те прави демократичен гражданин, и е комбинация от ценности, нагласи, умения – имаме безплатен видеоурок на тази тема с платформата „Уча.се“, както, между другото, и на всички други теми от учебната програма по гражданско образование за ХI и ХII клас.


Работа по групи в Държавен архив – Плевен © „Софийска платформа“

Да поговорим и за самите ученици. Как приемат те програмата? Какви въпроси задават най-често? С какви впечатления идват в лятното училище и как те се променят към края на програмата? 

В лятното ни училище в Белене идват всякакви младежи. Такива с познания и такива без; участници, чиито семейства имат история на репресии или пък са били облагодетелствани по време на комунизма; деца от малки и големи населени места, от цялата страна. Не изискваме знания или успех, търсим грамотно написана мотивация и обоснован интерес, та дори и той да е артикулиран като „Не зная нищо, но ми е любопитно“. Срещите с оцелели от лагера, обходът на разрушените сгради от Обект 2 в лагера „Белене“, разговорите с лектори, креативните саморефлексии със собствените снимки и написани мисли, работата с архиви от особено голямо значение в Държавния архив в Плевен, срещите с беленската общност – всичко това изисква време, за да се осмисли, преживее и подреди. Преживяването със сигурност е едно от онези неща, които остават за цял живот, защото е емоционално, свързано с мястото и с живите следи.

Въпросите, които чуваме, са от всякакво естество, но последната година ни направи впечатление, че участниците имаха нужда да говорим и за Прехода като исторически период, който им помага да осмислят комунизма. Очевидно са достатъчно далеч от Прехода, за да е той в тяхната перцепция едно безвъзвратно приключено минало. Със сигурност си тръгват с много повече познания за периода, с осъзнаване за дупките от училище, за тишината или носталгията по темата у дома, но най-вече за влиянието на периода и до днес върху нас като общество. Началото на лятното ни училище е посветено на демокрацията, защото тя е реалността на участниците. Оттам вървим хронологично назад към комунизма и завършваме с паметта в демократичното ни общество, вкл. и политизирането ѝ.

Като че ли винаги сме склонни да залагаме надеждите си на по-младите поколения. Според Вас нужни ли са обаче усилия за образоване на предишни поколения относно историческото наследство на близкото минало? Как биха изглеждали такива усилия? Планирате ли нещо подобно? 

Като фондация работим точно толкова с възрастни, колкото и с млади хора. В България т.нар. гражданско образование за възрастни не е толкова популярно. Най-вероятно защото гражданското образование не е много популярно като цяло, но и защото с млади хора се работи по-лесно. Нагласите им не са толкова циментирани като тези на възрастните, затова и с тях се постига по-голямо въздействие. С възрастните са нужни по-различни подходи.

Ще дам пример – няколко пъти организираме представления със стендъп комедия, посветена на ЕС и на Прехода. Така през хумора и най-вече през интересите и естествената среда на възрастните инициираме размисли на теми като членството ни в ЕС, финансовите измерения на отворените пазари за стоки и услуги, но и обратната страна на медала – обезлюдяването и липсата на медицински кадри заради емиграцията на Запад. Работим много с преподаватели, както споменах, но нагласите не се променят бързо, възрастните не стават по-демократични от една вечер със стендъп комедия или един семинар. Изисква се работа в продължение на месеци, дори години и разбира се, е много трудно да се изолира влиянието на гражданското образование от всякакви други случки от живота на възрастния.

В този смисъл е трудно и да се каже, че именно гражданското образование за възрастни ги прави по-демократични. Но има достатъчно изследвания, които показват, че то води до по-голяма ангажираност. Че наличието на граждански организации и пространства е предпоставка за повече граждански инициативи и съответно липсата им се свързва с липсата на организиран граждански живот.

Заглавна снимка: © „Софийска платформа“

Източник

Не стреляйте по Калина Константинова

Post Syndicated from Светла Енчева original https://toest.bg/ne-strelyayte-po-kalina-kostadinova/

През последната седмица либерално-демократичната общност у нас (към която се причислява и авторката на този текст) намери нова персонализация на гражданското си недоволство. Става въпрос за вицепремиерката Калина Константинова. Обвинителният ѝ тон по адрес на украинските бежанци трудно може да остави безразлични хората с чувствителност към човешките права. Още повече че бягащите от агресията на Русия срещат у обществото ни многократно повече съпричастност, отколкото тези от Сирия или Афганистан. Макар в последно време това да се променя – не на последно място, в резултат на масираната проруска пропаганда у нас.

Кратко опресняване на паметта

Правителството интерпретира като изключителна щедрост от страна на държавата настаняването на украинците в луксозни хотели по морето вместо в бежански центрове и палаткови лагери. Мярката беше ограничена до 31 май, а краят ѝ съвпада с началото на туристическия сезон у нас. Говорейки за тази необичайна управленска мярка обаче, като че вече сме позабравили как точно се породи тя.

Всъщност благородната идея да се настаняват украинските бежанци в хотели съвсем не е на правителството. Хотелиерите сами решиха да проявят инициатива и гостоприемство. А после поискаха държавата да им плати за човещината, та да поспечелят нещо извън активния сезон. За разлика от частните лица, които прибраха украинци в домовете си безвъзмездно. Така правителството се оказа пред свършен факт. И тъй като държавата не беше измислила нищо по-добро, реши поне да финансира създадената система за подкрепа. При известни условия – до 40 лв. на човек за нощувка и храна при настаняване в хотели и къщи за гости, вписани в Националния туристически регистър.

Нови времена – нови мерки

Според новата мярка за настаняване на бежанците, анонсирана от заместник-премиерката по ефективно управление, те следваше да бъдат настанени в държавни бази, хотели с по-ниски категории и къщи за гости срещу 15 лв. на ден с храна или 10 – без храна. Ако се чудите колко качествени три хранения може да си осигурите за общо 5 лв., не сте единствените, които си задават този въпрос.

Все пак са се намерили, ако се вярва на правителството, достатъчно собственици на места за настаняване, които да се включат в програмата. Голяма част от тях са в планински курорти и в населени места, които не са известни туристически дестинации. Собствениците им вероятно са преценили, че и малкото държавни пари ще са повече, отколкото те обичайно биха получили. А украинците… все ще се намери с какво да ги хранят.

Къде се „изгубиха в превода“ Константинова и украинските бежанци?

Бежанците трябваше да напуснат хотелите, в които са настанени, до 31 май, а държавата – да осигури автобуси и влакове за извозването им. На тях обаче не се качи почти никой, въпреки че много от украинците вече бяха попълнили онлайн въпросник за нуждите и намеренията си. Именно това предизвика гнева на Константинова – от нейна гледна точка „неблагодарниците“ са пратили усилията на правителството на вятъра и имат претенции да продължават да си живеят в луксозни хотели по Черноморието.

Само че украинците бяха третирани като чували с картофи. От тяхна гледна точка ситуацията изглежда съвсем иначе – те просто не са получили никаква информация въпреки обещанието за отговор до 48 часа след попълването на въпросника. Ако не са научили по някакъв неформален начин, е нямало как да разберат къде и кога ще ги чакат автобусите. А къде точно ще ги закарат – това съвсем не са знаели. Проблемът е особено сериозен за имащите личен автомобил. За много от тях това е най-скъпото им имущество, оцеляло през войната, и не биха го зарязали просто ей така, за да се отправят в неизвестна посока.

Който не иска да слуша, трябва да почувства, гласи стара немска поговорка, оправдаваща боя над деца.

Обидена на „претенциозните“ украинци, Калина Константинова заяви, че вече няма да се допускат празни автобуси и влакове. Вместо това най-нуждаещите се бежанци ще бъдат транспортирани до буферни центрове, а оттам – до държавни бази във вътрешността на страната. Ако бягащите от руската агресия просто бяха получили информация, тази по същество наказателна мярка щеше да е излишна, но уви.

И така, стотици украинци бяха настанени в буферни центрове – фургони без климатици, хладилници и топла вода. Там вечерта не получиха храна, на сутринта закуска имаше, но не и подходяща за малки деца. Държавата оправдава тези условия с аргумента, че те все пак са само за 72 часа. А поуката за бягащите от Украйна е да видят какво им е на „истинските бежанци“, които „не ги глезят“. На сирийските и афганистанските майки с деца в бежански лагери да не им е леко? Щом са нуждаещи се, трябва да са благодарни и да търпят.

И няма да имат претенции, когато автобусите отново дойдат да ги отведат в неизвестна посока.

Отделен въпрос е, че някои от украинците вече имат личен лекар и не знаят дали на новото място ще могат да се сдобият с такъв. Както и че в изолирани курорти и в обезлюдени градчета не е лесно и за българи да си намерят работа, а какво остава за бежанци, които искат да се устроят и да се интегрират. В същото време има немалко фирми, които биха наели украински бежанци, две трети от които са с висше образование, но освен административните спънки, като например признаване на дипломи, трудно може да се наеме висококвалифицирана специалистка, разпределена в Паничище.

Държавата обаче не мисли особено как да облекчи пълноценното включване на бежанците от Украйна в българското общество. Едно – заради аргумента „ами те искат да си ходят“, друго – заради устойчивата нагласа у нас, че интеграцията е проблем единствено на интегриращите се. Именно тази нагласа е в основата на известната фраза „те не искат да се интегрират“. Независимо кои са „те“.

Интеграция? Това пък какво е?

Интеграцията на чужденци (и в частност на бежанци) е нещо, с което българската държава традиционно не желае да се занимава. След влизането на страната ни в Европейския съюз беше приета Национална стратегия на Република България по миграция и интеграция 2008–2015 г. Тази стратегия вървеше заедно с европейски пари за интеграция, разпределяни от българските институции за проекти. В екипа на един от първите спечелили проекти бях и аз, така че ще споделя малко личен опит.

Задачата ни беше да изработим индикатори за интеграция на имигранти. За тази цел проучихме европейския опит. Силно впечатление ми направи германската система, в която фигурираха включително такива индикатори за качеството на живот – дали чужденецът се задъхва, когато се качва по стълби. За контраст, наличните български индикатори бяха „брой подадени проекти“, „брой спечелени проекти“, „брой проведени кръгли маси“ – все в този дух. Представихме предложенията си за индикатори и… на следващата година беше обявен проект по същата тема. Така държавата плащаше за разработване на индикатори след индикатори, без да въведе никой от тях.

Ето за толкова „смислени“ дейности отиваха европейските пари за интеграция.

Стратегията за периода 2015–2020 г. вече включваше в заглавието си и думата „убежище“. И този документ си остана само на думи. Отношението към сирийските бежанци в този период показва на какво е способна държавата, ако от нея, а не от предприемчиви хотелиери зависи настаняването на бягащите от война. Единайсет хиляди бежанци от Сирия (колкото беше максималният им брой у нас в един и същи момент) се оказаха непосилно бреме. Те бяха поставени в условия, които дори при огромните усилия на доброволците си оставаха нечовешки.

В заглавието на проекта за стратегия за периода 2021–2025 г., предложен в края на управлението на ГЕРБ и „Обединени патриоти“, интеграцията изобщо отсъства. Както и убежището. До ден днешен документът не е приет, така че понастоящем България е без официална стратегия какво да прави с чуждите граждани на територията си. Като се има предвид въздействието на предишните стратегии, загубата не е голяма. Отсъствието ѝ обаче е показателно за липсата на ангажимент на държавата към чужденците (и бежанците в частност) у нас.

Това е ситуацията, която завари настоящото правителство в началото на бежанската вълна от Украйна.

На фона на приоритетите за правосъдна реформа и борба с корупцията чужденците не бяха на дневен ред. Докато един ден украинските майки с деца не започнаха да идват у нас. За разлика от комай всички премиери в най-новата ни история, отношението на Кирил Петков към бежанците беше позитивно и приемащо, а не ксенофобско. Правителството предприе редица мерки за оптимизиране на работата на институциите, включително смени ръководството на Държавната агенция за бежанците.

Ала няма как цялата система за предоставяне на публични услуги, имащи отношение към бежанците, да се подмени като с магическа пръчица. В нея работят администратори, експерти и всевъзможни служители, много от които не могат да си представят, че трябва да служат на хората, а не да упражняват власт над тях, да ги дисциплинират, наказват и унижават. Това важи за публичните услуги в България като цяло, не само по отношение на бежанците, но спрямо най-уязвимите дехуманизиращото отношение достига невъобразими висини.

И неусетно правителството претърпя трансформация.

Вместо управляващите да успеят да реформират държавната администрация, свикнала, че над бежанците се издевателства по дефиниция, се получи обратното. Не успявайки да овладее бюрократичната и ксенофобска система, правителството неусетно възприе нейната реторика. Която неприятно напомня псевдопатриотичния патос на предишното управление, от чиито прийоми настоящата власт се опитва да се еманципира.

Тази неприятна еволюция обаче си има и обратна страна – управляващите поне се опитват да поемат лична отговорност за действията на администрацията, макар и по не най-подходящия начин. Провалът на първия опит за разселване на украинците и въвеждането на новите „дисциплиниращи“ мерки се представят лично от Калина Константинова.

Затова и в личността на Константинова се персонифицира цялата промяна на отношението към украинските бежанци,

която всъщност е плод на институционална некадърност. А тя – резултат на десетилетно възпроизвеждани вредни модели. Поведението на вицепремиерката не е похвално. Но то е само върхът на айсберга, чиято основа стига надълбоко, далеч отвъд настоящото правителство. По-лесно е да хвърляме вината върху един човек, отколкото да си даваме сметка за принципите на функциониране на социалната система, част от която впрочем сме и самите ние.

За управлението на Кирил Петков настоящата ситуация е тест – дали ще продължат усилията за „продължаване на промяната“, или системата ще претопи опитващите се да я реформират. Случващото се с украинските бежанци е само един пример за ситуацията в почти всяка сфера на държавата.

Заглавна снимка: Стопкадър от видеоизявлението на заместник министър-председателката по ефективно управление Калина Константинова

Източник

Graph concepts and applications

Post Syndicated from Grab Tech original https://engineering.grab.com/graph-concepts

Introduction

In an introductory article, we talked about the importance of Graph Networks in fraud detection. In this article, we will be adding some further context on graphs, graph technology and some common use cases.

Connectivity is the most prominent feature of today’s networks and systems. From molecular interactions, social networks and communication systems to power grids, shopping experiences or even supply chains, networks relating to real-world systems are not random. This means that these connections are not static and can be displayed differently at different times. Simple statistical analysis is insufficient to effectively characterise, let alone forecast, networked system behaviour.

As the world becomes more interconnected and systems become more complex, it is more important to employ technologies that are built to take advantage of relationships and their dynamic properties. There is no doubt that graphs have sparked a lot of attention because they are seen as a means to get insights from related data. Graph theory-based approaches show the concepts underlying the behaviour of massively complex systems and networks.

What are graphs?

Graphs are mathematical models frequently used in network science, which is a set of technological tools that may be applied to almost any subject. To put it simply, graphs are mathematical representations of complex systems.

Origin of graphs

The first graph was produced in 1736 in the city of Königsberg, now known as Kaliningrad, Russia. In this city, there were two islands with two mainland sections that were connected by seven different bridges.

Famed mathematician Euler wanted to plot a journey through the entire city by crossing each bridge only once. Euler proceeded to abstract the four regions of the city and the seven bridges into edges but he demonstrated that the problem was unsolvable. A simplified abstract graph is shown in Fig 1.

Fig 1 Abstraction graph

The graph’s four dots represent Königsberg’s four zones, while the lines represent the seven bridges that connect them. Zones connected by an even number of bridges is clearly navigable because several paths to enter and exit are available. Zones connected by an odd number of bridges can only be used as starting or terminating locations because the same route can only be taken once.

The number of edges associated with a node is known as the node degree. If two nodes have odd degrees and the rest have even degrees, the Königsberg problem could be solved. For example, exactly two regions must have an even number of bridges while the rest have an odd number of bridges. However, as illustrated in Fig 1, no Königsberg location has an even number of bridges, rendering this problem unsolvable.

Definition of graphs

A graph is a structure that consists of vertices and edges. Vertices, or nodes, are the objects in a problem, while edges are the links that connect vertices in a graph.  

Vertices are the fundamental elements that a graph requires to function; there should be at least one in a graph. Vertices are mathematical abstractions that refer to objects that are linked by a condition.

On the other hand, edges are optional as graphs can still be defined without any edges. An edge is a link or connection between any two vertices in a graph, including a connection between a vertex and itself. The idea is that if two vertices are present, there is a relationship between them.

We usually indicate V={v1, v2, …, vn} as the set of vertices, and E = {e1, e2, …, em} as the set of edges. From there, we can define a graph G as a structure G(V, E) which models the relationship between the two sets:

Fig 2 Graph structure

It is worth noting that the order of the two sets within parentheses matters, because we usually express the vertices first, followed by the edges. A graph H(X, Y) is therefore a structure that models the relationship between the set of vertices X and the set of edges Y, not the other way around.

Graph data model

Now that we have covered graphs and their typical components, let us move on to graph data models, which help to translate a conceptual view of your data to a logical model. Two common graph data formats are Resource Description Framework (RDF) and Labelled Property Graph (LPG).

Resource Description Framework (RDF)

RDF is typically used for metadata and facilitates standardised exchange of data based on their relationships. RDFs typically consist of a triple: a subject, a predicate, and an object. A collection of such triples is an RDF graph. This can be depicted as a node and a directed edge diagram, with each triple representing a node-edge-node graph, as shown in Fig 3.

Fig 3 RDF graph

The three types of nodes that can exist are:

  • Internationalised Resource Identifiers (IRI) – online resource identification code.
  • Literals – data type value, i.e. text, integer, etc.
  • Blank nodes – have no identification; similar to anonymous or existential variables.

Let us use an example to illustrate this. We have a person with the name Art and we want to plot all his relationships. In this case, the IRI is http://example.org/art and this can be shortened by defining a prefix like ex.

In this example, the IRI http://xmlns.com/foaf/0.1/knows defines the relationship knows. We define foaf as the prefix for http://xmlns.com/foaf/0.1/. The following code snippet shows how a graph like this will look.

@prefix foaf: <http://xmlns.com/foaf/0.1/>
@prefix ex: <http://example.org/>

ex:art foaf:knows ex:bob
ex:art foaf:knows ex:bea
ex:bob foaf:knows ex:cal
ex:bob foaf:knows ex:cam
ex:bea foaf:knows ex:coe
ex:bea foaf:knows ex:cory
ex:bea foaf:age 23
ex:bea foaf:based_near_:o1

In the last two lines, you can see how a literal and blank node would be depicted in an RDF graph. The variable foaf:age is a literal node with the integer value of 23, while foaf:based_near is an anonymous spatial entity with a node identifier of underscore. Outside the context of this graph, o1 is a data identifier with no meaning.

Multiple IRIs, intended for use in RDF graphs, are typically stored in an RDF vocabulary. These IRIs often begin with a common substring known as a namespace IRI. In some cases, namespace IRIs are also associated with a short name known as a namespace prefix. In the example above, http://xmlns.com/foaf/0.1/ is the namespace IRI and foaf and ex are namespace prefixes.

Note: RDF graphs are considered atemporal as they provide a static snapshot of data. They can use appropriate language extensions to communicate information about events or other dynamic properties of entities.

An RDF dataset is a set of RDF graphs that includes one or more named graphs as well as exactly one default graph. A default graph is one that can be empty, and has no associated IRI or name, while each named graph has an IRI or a blank node corresponding to the RDF graph and its name. If there is no named graph specified in a query, the default graph is queried (hence its name).

Labelled Property Graph (LPG)

A labelled property graph is made up of nodes, links, and properties. Each node is given a label and a set of characteristics in the form of arbitrary key-value pairs. The keys are strings, and the values can be any data type. A relationship is then defined by adding a directed edge that is labelled and connects two nodes with a set of properties.

In Fig 4, we have an LPG that shows two nodes: art and bea. The bea node has two characteristics, age and proximity, that are connected by a known edge. This edge has the attribute since because it commemorates the year that art and bea first met.

Fig 4 Labelled Property Graph: Example 1

Nodes, edges and properties must be defined when designing an LPG data model. In this scenario, based_near might not be applicable to all vertices, but they should be defined. You might be wondering, why not represent the city Seattle as a node and add an edge marked as based_near that connects a person and the city?

In general, if there is a value linked to a large number of other nodes in the network and it requires additional properties to correlate  with other nodes, it should be represented as a node. In this scenario, the architecture defined in Fig 5 is more appropriate for traversing based_near connections. It also gives us the ability to link any new attributes to the based_near relationship.

Fig 5 Labelled Property Graph: Example 2

Now that we have the context of graphs, let us talk about graph databases, how they help with large data queries and the part they play in Graph Technology.

Graph database

A graph database is a type of NoSQL database that stores data using network topology. The idea is derived from LPG, which represents data sets with vertices, edges, and attributes.

  • Vertices are instances or entities of data that represent any object to be tracked, such as people, accounts, locations, etc.
  • Edges are the critical concepts in graph databases which represent relationships between vertices. The connections have a direction that can be unidirectional (one-way) or bidirectional (two-way).
  • Properties represent descriptive information associated with vertices. In some cases, edges have properties as well.

Graph databases provide a more conceptual view of data that is closer to reality. Modelling complex linkages becomes simpler because interconnections between data points are given the same weight as the data itself.

Graph database vs. relational database

Relational databases are currently the industry norm and take a structured approach to data, usually in the form of tables. On the other hand, graph databases are agile and focus on immediate relationship understanding. Neither type is designed to replace the other, so it is important to know what each database type has to offer.

Fig 6 Graph database vs relational database

There is a domain for both graph and relational databases. Graph databases outperform typical relational databases, especially in use cases involving complicated relationships, as they take a more naturalistic and flowing approach to data.

The key distinctions between graph and relational databases are summarised in the following table:

Type Graph Relational
Format Nodes and edges with properties Tables with rows and columns
Relationships Represented with edges between nodes Created using foreign keys between tables
Flexibility Flexible Rigid
Complex queries Quick and responsive Requires complex joins
Use case Systems with highly connected relationships Transaction focused systems with more straightforward relationships

Table. 1 Graph vs. Relational Databases

Advantages and disadvantages

Every database type has its advantages and disadvantages; knowing the distinctions as well as potential options for specific challenges is crucial. Graph databases are a rapidly evolving technology with improved functions compared with other database types.

Advantages

Some advantages of graph databases include:

  • Agile and flexible structures.
  • Explicit relationship representation between entities.
  • Real-time query output – speed depends on the number of relationships.

Disadvantages

The general disadvantages of graph databases are:

  • No standardised query language; depends on the platform used.
  • Not suitable for transactional-based systems.
  • Small user base, making it hard to find troubleshooting support.

Graph technology

Graph technology is the next step in improving analytics delivery. Traditional analytics is insufficient to meet complicated business operations, distribution, and analytical concerns as data quantities expand.

Graph technology aids in the discovery of unknown correlations in data that would otherwise go undetected or unanalysed. When the term graph is used to describe a topic, three distinct concepts come to mind: graph theory, graph analytics, and graph data management.

  • Graph theory – A mathematical notion that uses stack ordering to find paths, linkages, and networks of logical or physical objects, as well as their relationships. Can be used to model molecules, telephone lines, transport routes, manufacturing processes, and many other things.
  • Graph analytics – The application of graph theory to uncover nodes, edges, and data linkages that may be assigned semantic attributes. Can examine potentially interesting connections in data found in traditional analysis solutions, using node and edge relationships.
  • Graph database – A type of storage for data generated by graph analytics. Filling a knowledge graph, which is a model in data that indicates a common usage of acquired knowledge or data sets expressing a frequently held notion, is a typical use case for graph analytics output.

While the architecture and terminology are sometimes misunderstood, graph analytics’ output can be viewed through visualisation tools, knowledge graphs, particular applications, and even some advanced dashboard capabilities of business intelligence tools. All three concepts above are frequently used to improve system efficiency and even to assist in dynamic data management. In this approach, graph theory and analysis are inextricably linked, and analysis may always rely on graph databases.

Graph-centric user stories

Fraud detection

Traditional fraud prevention methods concentrate on discrete data points such as individual accounts, devices, or IP addresses. However, today’s sophisticated fraudsters avoid detection by building fraud rings using stolen and fake identities. To detect such fraud rings, we need to look beyond individual data points to the linkages that connect them.

Graph technology greatly transcends the capabilities of a relational database, by revealing hard-to-find patterns. Enterprise businesses also employ Graph technology to supplement their existing fraud detection skills to tackle a wide range of financial crimes, including first-party bank fraud, fraud, and money laundering.

Real-time recommendations

An online business’s success depends on systems that can generate meaningful recommendations in real time. To do so, we need the capacity to correlate product, customer, inventory, supplier, logistical, and even social sentiment data in real time. Furthermore, a real-time recommendation engine must be able to record any new interests displayed during the consumer’s current visit in real time, which batch processing cannot do.

Graph databases outperform relational and other NoSQL data stores in terms of delivering real-time suggestions. Graph databases can easily integrate different types of data to get insights into consumer requirements and product trends, making them an increasingly popular alternative to traditional relational databases.

Supply chain management

With complicated scenarios like supply chains, there are many different parties involved and companies need to stay vigilant in detecting issues like fraud, contamination, high-risk areas or unknown product sources. This means that there is a need to efficiently process large amounts of data and ensure transparency throughout the supply chain.

To have a transparent supply chain, relationships between each product and party need to be mapped out, which means there will be deep linkages. Graph databases are great for these as they are designed to search and analyse data with deep links. This means they can process enormous amounts of data without performance issues.

Identity and access management

Managing multiple changing roles, groups, products and authorisations can be difficult, especially in large organisations. Graph technology integrates your data and allows quick and effective identity and access control. It also allows you to track all identity and access authorisations and inheritances with significant depth and real-time insights.

Network and IT operations

Because of the scale and complexity of network and IT infrastructure, you need a configuration management database (CMDB) that is far more capable than relational databases. Neptune is an example of a CMDB and graph database that allows you to correlate your network, data centre, and IT assets to aid troubleshooting, impact analysis, and capacity or outage planning.

A graph database allows you to integrate various monitoring tools and acquire important insights into the complicated relationships that exist between various network or data centre processes. Possible applications of graphs in network and IT operations range from dependency management to automated microservice monitoring.

Risk assessment and monitoring

Risk assessment is crucial in the fintech business. With multiple sources of credit data such as ecommerce sites, mobile wallets and loan repayment records, it can be difficult to accurately assess an individual’s credit risk. Graph Technology makes it possible to combine these data sources, quantify an individual’s fraud risk and even generate full credit reviews.

One clear example of this is IceKredit, which employs artificial intelligence (AI) and machine learning (ML) techniques to make better risk-based decisions. With Graph technology, IceKredit has also successfully detected unreported links and increased efficiency of financial crime investigations.

Social network

Whether you’re using stated social connections or inferring links based on behaviour, social graph databases like Neptune introduce possibilities for building new social networks or integrating existing social graphs into commercial applications.

Having a data model that is identical to your domain model allows you to better understand your data, communicate more effectively, and save time. By decreasing the time spent data modelling, graph databases increase the quality and speed of development for your social network application.

Artificial intelligence (AI) and machine learning (ML)

AI and ML use statistical and analytical approaches to find patterns in data and provide insights. However, there are two prevalent concerns that arise – the quality of data and effectiveness of the analytics. Some AI and ML solutions have poor accuracy because there is not enough training data or variants that have a high correlation to the outcome.

These ML data issues can be solved with graph databases as it’s possible to connect and traverse links, as well as supplement raw data. With Graph technology, ML systems can recognise each column as a “feature” and each connection as a distinct characteristic, and then be able to identify data patterns and train themselves to recognise these relationships.

Conclusion

Graphs are a great way to visually represent complex systems and can be used to easily detect patterns or relationships between entities. To help improve graphs’ ability to detect patterns early, businesses should consider using Graph technology, which is the next step in improving analytics delivery.

Graph technology typically consists of:

  • Graph theory – Used to find paths, linkages and networks of logical or physical objects.
  • Graph analytics – Application of graph theory to uncover nodes, edges, and data linkages.
  • Graph database – Storage for data generated by graph analytics.

Although predominantly used in fraud detection, Graph technology has many other use cases such as making real-time recommendations based on consumer behaviour, identity and access control, risk assessment and monitoring, AI and ML, and many more.

Check out our next blog article, where we will be talking about how our Graph Visualisation Platform enhances Grab’s fraud detection methods.

References

  1. https://www.baeldung.com/cs/graph-theory-intro
  2. https://web.stanford.edu/class/cs520/2020/notes/What_Are_Graph_Data_Models.html

Join us

Grab is the leading superapp platform in Southeast Asia, providing everyday services that matter to consumers. More than just a ride-hailing and food delivery app, Grab offers a wide range of on-demand services in the region, including mobility, food, package and grocery delivery services, mobile payments, and financial services across 428 cities in eight countries.

Powered by technology and driven by heart, our mission is to drive Southeast Asia forward by creating economic empowerment for everyone. If this mission speaks to you, join our team today!

The collective thoughts of the interwebz