Now Open–AWS Region in the United Arab Emirates (UAE)

2022-08-30 Marcia Villalba

Post Syndicated from Marcia Villalba original https://aws.amazon.com/blogs/aws/now-open-aws-region-in-the-united-arab-emirates-uae/

The AWS Region in the United Arab Emirates is now open. The official name is Middle East (UAE), and the API name is me-central-1. You can start using it today to deploy workloads and store your data in the United Arab Emirates. The AWS Middle East (UAE) Region is the second Region in the Middle East, joining the AWS Middle East (Bahrain) Region.

The Middle East (UAE) Region has three Availability Zones that you can use to reliably spread your applications across multiple data centers. Each Availability Zone is a fully isolated partition of AWS infrastructure that contains one or more data centers.

Availability Zones are in separate and distinct geographic locations with enough distance to reduce the risk of a single event affecting the availability of the Region but near enough for business continuity for applications that require rapid failover and synchronous replication. This gives you the ability to operate production applications that are more highly available, more fault-tolerant, and more scalable than would be possible from a single data center.

Instances and Services
Applications running in this three-AZ Region can use C5, C5d, C6g, M5, M5d, M6g, M6gd, R5, R5d, R6g, I3, I3en, T3, and T4g instances, and can use a long list of AWS services including: Amazon API Gateway, Amazon Aurora, AWS AppConfig, Amazon CloudWatch, Amazon CloudWatch Logs, Amazon DynamoDB, Amazon EC2 Auto Scaling, Amazon ElastiCache, Amazon Elastic Block Store (Amazon EBS), Elastic Load Balancing, Amazon Elastic Compute Cloud (Amazon EC2), Amazon Elastic Container Registry (Amazon ECR), Amazon Elastic Container Service (Amazon ECS), Amazon EMR, Amazon OpenSearch Service, Amazon EventBridge, Amazon Kinesis Data Streams, Amazon Redshift, Amazon Relational Database Service (Amazon RDS), Amazon Route 53, Amazon Simple Notification Service (Amazon SNS), Amazon Simple Queue Service (Amazon SQS), Amazon Simple Storage Service (Amazon S3), Amazon Simple Workflow Service (Amazon SWF), Amazon Virtual Private Cloud (Amazon VPC), AWS Application Auto Scaling, AWS Certificate Manager, AWS CloudFormation, AWS CloudTrail, AWS CodeDeploy, AWS Config, AWS Database Migration Service, AWS Direct Connect, AWS Identity and Access Management (IAM), AWS Key Management Service (AWS KMS), AWS Lambda, AWS Marketplace, AWS Health Dashboard, AWS Secrets Manager, AWS Step Functions, AWS Support API, AWS Systems Manager, AWS Trusted Advisor, VM Import/Export, AWS VPN, and AWS X-Ray.

AWS in the Middle East
In addition to the two Regions—Bahrain and UAE—the Middle East has two AWS Direct Connect locations, allowing customers to establish private connectivity between AWS and their data centers and offices, as well as two Amazon CloudFront edge locations, one in Dubai and another in Fujairah. The UAE Region also offers low-latency connections to other AWS Regions in the area, as shown in the following chart:

Since 2017, AWS has established offices in Dubai and Bahrain along with a broad network of partners. We continue to build our teams in the Middle East by adding account managers, solutions architects, business developers, and professional services consultants to help customers of all sizes build or move their workloads to the cloud. Visit the Amazon career page to check out the roles we are hiring for.

In addition to Infrastructure, AWS continues to make investments in education initiatives, training, and start-up enablement to support UAE’s digital transformation and economic development plans.

AWS Activate – This global program provides start-ups with credits, training, and support so they can build their business on AWS.
AWS Training and Certification – This program helps developers build cloud skills using digital or classroom training and to validate those skills by earning an industry-recognized credential.
AWS Educate and AWS Academy – These programs provide higher education institutions, educators, and students with cloud computing courses and certifications.

AWS Customers in the Middle East
We have many amazing customers in the Middle East that are doing incredible things with AWS, for example:

The Ministry of Health and Prevention (MoHAP) implements the health care policy in the UAE. MoHAP is working with AWS to modernize their patient experience. With AWS, MoHAP can connect 100 percent of care providers—public and private—to further enhance their data strategy to support predictive and population health programs.

GEMS Education is one of the largest private K–12 operators in the world. Using AWS services like artificial intelligence and machine learning, GEMS developed an all-in-one integrated ED Tech platform called LearnOS. This platform supports teachers and creates personalized learning experiences. For example, with the use of Amazon Rekognition, they reduced 93 percent of the time spent in marking attendance. They also developed an automated quiz generation and assessment platform using Amazon EC2 and Amazon SageMaker. In addition, the algorithms can predict student year-end performance with up to 95 percent accuracy and recommend personalized reading materials.

YAP is a fast-growing regional financial super app that focuses on improving the digital banking experience. It functions as an independent app with no physical branches, making it the first of its kind in the UAE. AWS has helped fuel YAP’s growth and enabled them to scale to become a leading regional FinTech, giving them the elasticity to control costs as their user base has grown to over 130,000 users. With AWS, YAP can scale fast as they launch new markets, reducing the time to build and deploy complete infrastructure from months to weeks.

Available Now
The new Middle East (UAE) Region is ready to support your business. You can find a detailed list of the services available in this Region on the AWS Regional Service List.

With this launch, AWS now spans 87 Availability Zones within 27 geographic Regions around the world. We also have announced plans for 21 more Availability Zones and seven more AWS Regions in Australia, Canada, India, Israel, New Zealand, Spain, and Switzerland.

For more information on our global infrastructure, upcoming Regions, and the custom hardware we use, visit the Global Infrastructure page.

— Marcia

Let’s reward some of your ideas with SwitchBot giveway

2022-08-30 BeardedTinker

Post Syndicated from BeardedTinker original https://www.youtube.com/watch?v=iCBUNiJD4Vg

AWS achieves FedRAMP P-ATO for 20 services in the AWS US East/West Regions and AWS GovCloud (US) Regions

2022-08-30 Steve Earley

Post Syndicated from Steve Earley original https://aws.amazon.com/blogs/security/aws-achieves-fedramp-p-ato-for-20-services-in-the-aws-us-east-west-regions-and-aws-govcloud-us-regions/

Amazon Web Services (AWS) is pleased to announce that 20 additional AWS services have achieved Provisional Authority to Operate (P-ATO) from the Federal Risk and Authorization Management Program (FedRAMP) Joint Authorization Board (JAB). The following are the 20 AWS services with FedRAMP authorization for the U.S. federal government and organizations with regulated workloads:

AWS App Mesh provides application-level networking to help your services communicate with each other across multiple types of compute infrastructure.
AWS Audit Manager helps you to continuously audit your AWS usage to simplify how risk and compliance are assessed with regulations and industry standards.
AWS Chatbot is an interactive agent that helps you monitor, operate, and troubleshoot AWS workloads in your chat channels.
Amazon Chime SDK is a collection of client software development kits that use resources in your AWS account to add collaborative audio calling, video calling, and screen share features to your web or mobile applications.
AWS Cloud9 is a cloud-based integrated development environment (IDE) that helps you write, run, and debug your code with just a browser.
Amazon Detective helps you analyze, investigate, and quickly identify the root cause of potential security issues or suspicious activities.
EC2 Image Builder simplifies the building, testing, and deployment of virtual machine and container images for use on AWS or on-premises.
Amazon FinSpace is a data management and analytics service that is purpose built for the financial services industry (FSI).
AWS Firewall Manager is a security management service that allows you to centrally configure and manage firewall rules across your accounts and applications in AWS Organizations.
Amazon Forecast is a fully managed service that uses machine learning to deliver highly accurate forecasts.
Amazon Keyspaces (for Apache Cassandra) is a scalable, highly available, and managed Apache Cassandra–compatible database service.
Amazon Kinesis Data Analytics is a fully managed service that you can use to process and analyze streaming data using Java, SQL, or Scala.
Amazon Lex is an AWS service for building conversational interfaces into applications using voice and text.
Amazon Managed Streaming for Apache Kafka (Amazon MSK) is an AWS streaming data service that manages Apache Kafka infrastructure and operations.
Amazon MQ is a managed message broker service for Apache ActiveMQ and RabbitMQ to help you set up and operate message brokers on AWS.
Amazon Neptune is a fast, reliable, fully managed graph database service that helps you build and run applications that work with highly connected datasets.
AWS Network Firewall is a managed service that helps you to deploy essential network protections for your Amazon Virtual Private Cloud (Amazon VPC).
Amazon Quantum Ledger Database (Amazon QLDB) is a purpose-built ledger database that provides a complete and cryptographically verifiable history of changes made to your application data.
AWS Resource Access Manager (AWS RAM) is designed to help you securely share resources across AWS accounts, within your organization or organizational units (OUs) in AWS Organizations, and with AWS Identity and Access Management (IAM) roles and users for supported resource types.
Amazon Timestream is a fast, scalable, and serverless time series database service for AWS IoT Core and operational applications that can help you to store and analyze trillions of events per day up to 1,000 times faster and at as little as 1/10th the cost of relational databases.

These 20 services are now listed on the FedRAMP Marketplace and the AWS Services in Scope by Compliance Program page.

Service authorizations by AWS Region

The following table shows our most recent FedRAMP service authorizations by Region and authorization level:

Service	FedRAMP Moderate in the AWS US East/West Region	FedRAMP High in the AWS GovCloud (US) Region
AWS App Mesh	✓
AWS Audit Manager	✓
AWS Chatbot	✓
Amazon Chime SDK		✓
AWS Cloud9	✓
Amazon Detective		✓
EC2 Image Builder	✓	✓
Amazon FinSpace	✓
AWS Firewall Manager	✓	✓
Amazon Forecast	✓
Amazon Keyspaces (for Apache Cassandra)		✓
Amazon Kinesis Data Analytics	✓	✓
Amazon Lex		✓
Amazon Managed Streaming for Apache Kafka (Amazon MSK)	✓	✓
Amazon MQ		✓
Amazon Neptune	✓	✓
AWS Network Firewall	✓	✓
Amazon Quantum Ledger Database (Amazon QLDB)	✓
AWS Resource Access Manager (AWS RAM)	✓	✓
Amazon Timestream	✓

AWS is continually expanding the scope of our compliance programs to help customers use authorized services for sensitive and regulated workloads. AWS now offers 123 AWS services authorized in the AWS US East/West Regions under FedRAMP Moderate Authorization, and 105 services authorized in the AWS GovCloud (US) Regions under FedRAMP High Authorization.

To learn what other public sector customers are doing on AWS, see our Government, Education, and Nonprofits Case Studies and Customer Success Stories. Stay tuned for future updates on our Services in Scope by Compliance Program page. Let us know how this post will help your mission by reaching out to your AWS Account Team. Lastly, if you have feedback about this blog post, let us know in the Comments section.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Фоторепортаж на Биволъ Блокадата на Министерския съвет заради преговорите с Газпром

2022-08-30 Николай Марченко

Post Syndicated from Николай Марченко original https://bivol.bg/%D0%B1%D0%BB%D0%BE%D0%BA%D0%B0%D0%B4%D0%B0%D1%82%D0%B0-%D0%BD%D0%B0-%D0%BC%D0%B8%D0%BD%D0%B8%D1%81%D1%82%D0%B5%D1%80%D1%81%D0%BA%D0%B8%D1%8F-%D1%81%D1%8A%D0%B2%D0%B5%D1%82-%D0%B7%D0%B0%D1%80%D0%B0.html

четвъртък 1 септември 2022

Както и бе обявено предварително, гражданското движение „България Обединена с Една Цел“ (ГД „БОЕЦ“) организира тотална блокада на Министерския съвет (МС) заради плановете на служебния кабинет да възобнови преговорите с…

How to subscribe to the new Security Hub Announcements topic for Amazon SNS

2022-08-29 Mike Saintcross

Post Syndicated from Mike Saintcross original https://aws.amazon.com/blogs/security/how-to-subscribe-to-the-new-security-hub-announcements-topic-for-amazon-sns/

With AWS Security Hub you are able to manage your security posture in AWS, perform security best practice checks, aggregate alerts, and automate remediation. Now you are able to use Amazon Simple Notification Service (Amazon SNS) to subscribe to the new Security Hub Announcements topic to receive updates about new Security Hub services and features, newly supported standards and controls, and other Security Hub changes.

Introducing the Security Hub Announcements topic

Amazon SNS follows the publish/subscribe (pub/sub) messaging model, in which notifications are delivered to you by using a push mechanism that eliminates the need for you to periodically check or poll for new information and updates. You can now use this push mechanism to receive notifications about Security Hub by subscribing to the dedicated Security Hub Announcements topic.

The Security Hub Announcements topic publishes the following types of notifications:

General notifications
Upcoming standards and controls
New AWS Regions supported
New standards and controls
Updated standards and controls
Retired standards and controls
Updates to the AWS Security Finding Format (ASFF)
New integrations
New features
Changes to existing features

How to use the Security Hub Announcements topic

You can subscribe to the SNS topic for Security Hub Announcements to receive notification messages about newly released finding types, updates to the existing finding types, and other functionality changes. By subscribing to the SNS topic, you will receive Security Hub Announcements messages as soon as they are published. The notifications are available in all protocols that Amazon SNS supports, such as email and SMS. For more information about supported protocols in Amazon SNS, see Subscribing to an Amazon SNS topic.

The Security Hub Announcements topic is available in all AWS Regions in the aws and aws-cn partitions, but is not yet available in the AWS GovCloud (US) Regions (the aws-us-gov partition). Later in this post, we’ll show you how to subscribe to the Security Hub Announcements topic in a specific AWS Region by using the topic Amazon Resource Name (ARN) for that Region. The SNS topic messages are the same across Regions in a partition, so you can choose to subscribe to only one Region in a partition to avoid receiving duplicate information.

However, if you want to invoke an AWS Lambda function in reaction to a Security Hub Announcements message, you must subscribe to the topic ARN that is in the same Region as the Lambda function. The Lambda function can receive the SNS topic message payload as an input parameter and manipulate the information in the message, publish the message to other SNS topics, or send the message to other AWS services. For more information, see Subscribing a function to a topic in the Amazon SNS Developer Guide.

The same is true if you want to subscribe an Amazon Simple Queue Service (Amazon SQS) queue to the Security Hub Announcements topic, you must use a topic ARN that is in the same Region as the SQS queue. The SQS queue can be used to persist announcement SNS topic messages in the queue for other applications to process at a later time. For more information, see Subscribing an Amazon SQS queue to an Amazon SNS topic in the Amazon SQS Developer Guide.

IAM permissions

Your user account must have sns::subscribe AWS Identity and Access Management (IAM) permissions to subscribe to an SNS topic. For more information on IAM permissions for Amazon SNS, see Using identity-based policies with Amazon SNS.

Subscribe to the Security Hub Announcements topic

The following is the list of Security Hub Announcements topic ARNs for each currently supported Region. The examples in this post use the US West (Oregon) Region (us-west-2), but you can update the procedures with one of the following ARNs to use a different supported Region.

Security Hub Announcements topic ARNs by Region

arn:aws:sns:us-east-1:088139225913:SecurityHubAnnouncements
arn:aws:sns:us-east-2:291342846459:SecurityHubAnnouncements
arn:aws:sns:us-west-1:137690824926:SecurityHubAnnouncements
arn:aws:sns:us-west-2:393883065485:SecurityHubAnnouncements
arn:aws:sns:eu-central-1:871975303681:SecurityHubAnnouncements
arn:aws:sns:eu-north-1:191971010772:SecurityHubAnnouncements
arn:aws:sns:eu-south-1:151363035580:SecurityHubAnnouncements
arn:aws:sns:eu-west-1:705756202095:SecurityHubAnnouncements
arn:aws:sns:eu-west-2:883600840440:SecurityHubAnnouncements
arn:aws:sns:eu-west-3:313420042571:SecurityHubAnnouncements
arn:aws:sns:ca-central-1:137749997395:SecurityHubAnnouncements
arn:aws:sns:sa-east-1:359811883282:SecurityHubAnnouncements
arn:aws:sns:me-south-1:585146626860:SecurityHubAnnouncements
arn:aws:sns:af-south-1:463142546776:SecurityHubAnnouncements
arn:aws:sns:ap-northeast-1:592469075483:SecurityHubAnnouncements
arn:aws:sns:ap-northeast-2:374299265323:SecurityHubAnnouncements
arn:aws:sns:ap-northeast-3:633550238216:SecurityHubAnnouncements
arn:aws:sns:ap-southeast-1:512267288502:SecurityHubAnnouncements
arn:aws:sns:ap-southeast-2:475730049140:SecurityHubAnnouncements
arn:aws:sns:ap-southeast-3:627843640627:SecurityHubAnnouncements
arn:aws:sns:ap-east-1:464812404305:SecurityHubAnnouncements
arn:aws:sns:ap-south-1:707356269775:SecurityHubAnnouncements
arn:aws-cn:sns:cn-north-1:672341567257:SecurityHubAnnouncements
arn:aws-cn:sns:cn-northwest-1:672534482217:SecurityHubAnnouncements

The two procedures that follow show you how to subscribe an email address to the Security Hub Announcements topic by using the AWS Management Console and the AWS CLI.

To subscribe an email address to the Security Hub Announcements topic (console)

Sign in to the Amazon SNS console.
In the Region list, choose the same Region as the topic ARN to which you want to subscribe. This example uses the us-west-2 Region.
In the left navigation pane, choose Subscriptions, then choose Create subscription.
In the Create subscription dialog box, do the following:
- For Topic ARN, paste the following topic ARN for the us-west-2 Region, or use one of the ARNs listed above for a different supported Region:
  arn:aws:sns:us-west-2:393883065485:SecurityHubAnnouncements
- For Protocol, choose Email.
- For Endpoint, enter an email address that you can use to receive the notification.
Choose Create subscription.
In your email application, open the message from AWS Notifications and open the link to confirm your subscription. Your web browser displays a confirmation response from Amazon SNS, similar to that shown in Figure 1.

Figure 1: SNS notification subscription confirmation

The following steps show you how to subscribe an email address to the Security Hub Announcements topic by using the AWS Command Line Interface (AWS CLI).

To subscribe an email address to the Security Hub Announcements topic (AWS CLI)

Run the following command in the AWS CLI, replacing <[email protected]> with your email address, and optionally replacing the ARN and reference to us-west-2 if you want to use a different Region:
aws sns --region us-west-2 subscribe --topic-arn arn:aws:sns:us-west-2:393883065485:SecurityHubAnnouncements --protocol email --notification-endpoint <[email protected]>
In your email application, open the message from AWS Notifications and open the link to confirm your subscription.
Your web browser displays a confirmation response from Amazon SNS, similar to that shown in Figure 1.

Example subscription responses

The following sections contain examples of a message announcing new standard controls supported by Security Hub in email and sqs protocol types.

Example message from an email subscription (protocol type: email)

{"AnnouncementType":"NEW_STANDARDS_CONTROLS", “Title”:”[New Controls] 36 new Security Hub controls added to the AWS Foundational Security Best Practices standard”, "Description":"We have added 36 new controls to the AWS Foundational Security Best Practices standard. These include controls for Amazon Auto Scaling (AutoScaling.3, AutoScaling.4, AutoScaling.6), AWS CloudFormation (CloudFormation.1), Amazon CloudFront (CloudFront.10), Amazon Elastic Compute Cloud (Amazon EC2) (EC2.23, EC2.24, EC2.27), Amazon Elastic Container Registry (Amazon ECR) (ECR.1, ECR.2), Amazon Elastic Container Service (Amazon ECS) (ECS.3, ECS.4, ECS.5, ECS.8, ECS.10, ECS.12), Amazon Elastic File System (Amazon EFS) (EFS.3, EFS.4), Amazon Elastic Kubernetes Service (Amazon EKS) (EKS.2), Elastic Load Balancing (ELB.12, ELB.13, ELB.14), Amazon Kinesis (Kinesis.1), AWS Network Firewall (NetworkFirewall.3, NetworkFirewall.4, NetworkFirewall.5), Amazon OpenSearch Service (Opensearch.7), Amazon Redshift (Redshift.9), Amazon Simple Storage Service (Amazon S3) (S3.13), Amazon Simple Notification Service (SNS.2), AWF WAF (WAF.2, WAF.3, WAF.4, WAF.6, WAF.7, WAF.8). If you enabled the AWS Foundational Security Best Practices standard in an account and configured Security Hub to automatically enable new controls, these controls are enabled by default. Availability of controls can vary by Region."}

Example message from an SQS queue subscription (protocol type: sqs)

The following message shows the additional metadata included with an SQS subscription to the Security Hub Announcements topic. For more information about the metadata included in an SNS topic message delivered to an SQS queue, see Fanout to Amazon SQS Queues.

{
  "Type" : "Notification",
  "MessageId" : "c9c03e46-69df-5c3c-84e9-6520708ac394",
  "TopicArn" : "arn:aws:sns:us-west-2:393883065485:SecurityHubAnnouncements",
  "Message" : "{\"AnnouncementType\":\"NEW_STANDARDS_CONTROLS\",\"Title\":\"[New Controls] 36 new Security Hub controls added to the AWS Foundational Security Best Practices standard\",\"Description\":\"We have added 36 new controls to the AWS Foundational Security Best Practices standard. These include controls for Amazon Auto Scaling (AutoScaling.3, AutoScaling.4, AutoScaling.6), AWS CloudFormation (CloudFormation.1), Amazon CloudFront (CloudFront.10), Amazon Elastic Compute Cloud (Amazon EC2) (EC2.23, EC2.24, EC2.27), Amazon Elastic Container Registry (Amazon ECR) (ECR.1, ECR.2), Amazon Elastic Container Service (Amazon ECS) (ECS.3, ECS.4, ECS.5, ECS.8, ECS.10, ECS.12), Amazon Elastic File System (Amazon EFS) (EFS.3, EFS.4), Amazon Elastic Kubernetes Service (Amazon EKS) (EKS.2), Elastic Load Balancing (ELB.12, ELB.13, ELB.14), Amazon Kinesis (Kinesis.1), AWS Network Firewall (NetworkFirewall.3, NetworkFirewall.4, NetworkFirewall.5), Amazon OpenSearch Service (Opensearch.7), Amazon Redshift (Redshift.9), Amazon Simple Storage Service (Amazon S3) (S3.13), Amazon Simple Notification Service (SNS.2), AWF WAF (WAF.2, WAF.3, WAF.4, WAF.6, WAF.7, WAF.8). If you enabled the AWS Foundational Security Best Practices standard in an account and configured Security Hub to automatically enable new controls, these controls are enabled by default. Availability of controls can vary by Region. \"}",
  "Timestamp" : "2022-08-04T18:59:33.319Z",
  "SignatureVersion" : "1",
  "Signature" : "GdKokPEUexpKZn5da5u/p5eZF1cE3JUyL0uPVKmPnDzd3orkk5jJ211VsOflUFi6V9lSXF/V6RBpQN/9f3+JBFBprng7BRQwT9I4jSa1xOn1L3xKXEVGvWI6nl1oDqBl21Pj3owV+NZ+Exd2W0dpgg8B1LG4bYq5T73MjHjWGtelcBa15TpIz/+rynqanXCKCvc/50V/XZLjA5M7gU6Dzs9CULIjkdEpCsw5FvSxbtkEd6Ktx4LH7Zq6FlPKNli3EaEHRKh9uYPo6sR/yvF4RWg3E9O4dVsK7A8uTdR+pwVCU1M601KMRxO1OWF8VIdvyPINJND8Nu/70GRA2L+MRA==",
  "SigningCertURL" : "https://sns.us-west-2.amazonaws.com/SimpleNotificationService-56e67fcb41f6fec09b0196692625d385.pem",
  "UnsubscribeURL" : "https://sns.us-west-2.amazonaws.com/?Action=Unsubscribe&SubscriptionArn=arn:aws:sns:us-west-2:393883065485:SecurityHubAnnouncements:1eb29a83-8726-4366-891c-293ad5e35a53"
}

Note: You need to set up the SQS access policy in order for SNS to push message to the SNS queue. For more information, see Basic examples of Amazon SQS policies.

Available now

The SNS topic for Security Hub Announcements is available today in the Regions described in this post. Subscribe now to stay informed of Security Hub updates. With Amazon SNS, there is no minimum fee, and you pay only for what you use. For more information, see the Amazon SNS pricing page.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support. You can also start a new thread on AWS Security Hub re:Post to get answers from the community.

Want more AWS Security news? Follow us on Twitter.

AWS Week in Review – August 29, 2022

2022-08-29 Antje Barth

Post Syndicated from Antje Barth original https://aws.amazon.com/blogs/aws/aws-week-in-review-august-29-2022/

I’ve just returned from data and machine learning (ML) conferences in Los Angeles and San Francisco, California. It’s been great to chat with customers and developers about the latest technology trends and use cases. This past week has also been packed with launches at AWS.

Last Week’s Launches
Here are some launches that got my attention during the previous week:

Amazon QuickSight announces fine-grained visual embedding. You can now embed individual visuals from QuickSight dashboards in applications and portals to provide key insights to users where they’re needed most. Check out Donnie’s blog post to learn more, and tune into this week’s The Official AWS Podcast episode.

Sample Web App with a Visual

Amazon SageMaker Automatic Model Tuning is now available in the Europe (Milan), Africa (Cape Town), Asia Pacific (Osaka), and Asia Pacific (Jakarta) Regions. In addition, SageMaker Automatic Model Tuning now reuses SageMaker Training instances to reduce start-up overheads by 20x. In scenarios where you have a large number of hyperparameter evaluations, the reuse of training instances can cumulatively save 2 hours for every 50 sequential evaluations.

Amazon RDS now supports setting up connectivity between your RDS database and EC2 compute instance in one click. Amazon RDS automatically sets up your VPC and related network settings during database creation to enable a secure connection between the EC2 instance and the RDS database.

In addition, Amazon RDS for Oracle now supports managed Oracle Data Guard Switchover and Automated Backups for replicas. With the Oracle Data Guard Switchover feature, you can reverse the roles between the primary database and one of its standby databases (replicas) with no data loss and a brief outage. You can also now create Automated Backups and manual DB snapshots of an RDS for Oracle replica, which reduces the time spent taking backups following a role transition.

Amazon Forecast now supports what-if analyses. Amazon Forecast is a fully managed service that uses ML algorithms to deliver highly accurate time series forecasts. You can now use what-if analyses to quantify the potential impact of business scenarios on your demand forecasts.

AWS Asia Pacific (Jakarta) Region now supports additional AWS services and EC2 instance types – Amazon SageMaker, AWS Application Migration Service, AWS Glue, Red Hat OpenShift Service on AWS (ROSA), and Amazon EC2 X2idn and X2iedn instances are now available in the Asia Pacific (Jakarta) Region.

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

Other AWS News
Here are some additional news, blog posts, and fun code competitions you may find interesting:

Scaling AI and Machine Learning Workloads with Ray on AWS – This past week, I attended Ray Summit in San Francisco, California, and had great conversations with the community. Check out this blog post to learn more about AWS contributions to the scalability and operational efficiency of Ray on AWS.

New AWS Heroes – It’s great to see both new and familiar faces joining the AWS Heroes program, a worldwide initiative that acknowledges individuals who have truly gone above and beyond to share knowledge in technical communities. Get to know them in the blog post!

DFL Bundesliga Data Shootout – DFL Deutsche Fußball Liga launched a code competition, powered by AWS: the Bundesliga Data Shootout. The task: Develop a computer vision model to classify events on the pitch. Join the competition as an individual or in a team and win prizes.

Become an AWS GameDay World Champion – AWS GameDay is an interactive, team-based learning experience designed to put your AWS skills to the test by solving real-world problems in a gamified, risk-free environment. Developers of all skill levels can get in on the action, to compete for worldwide glory, as well as a chance to claim the top prize: an all-expenses-paid trip to AWS re:Invent Las Vegas 2022!

Learn more about the AWS Impact Accelerator for Black Founders from one of the inaugural members of the program in this blog post. The AWS Impact Accelerator is a series of programs designed to help high-potential, pre-seed start-ups led by underrepresented founders succeed.

Upcoming AWS Events
Check your calendars and sign up for these AWS events:

AWS Global Summits – AWS Global Summits are free events that bring the cloud computing community together to connect, collaborate, and learn about AWS.

Registration is open for the following in-person AWS Summits that might be close to you in August and September: Canberra (August 31), Ottawa (September 8), New Delhi (September 9), and Mexico City (September 21–22), Bogotá (October 4), and Singapore (October 6).

AWS Community Days – AWS Community Day events are community-led conferences that deliver a peer-to-peer learning experience, providing developers with a venue for them to acquire AWS knowledge in their preferred way: from one another.

In September, the AWS community will host events in the Bay Area, California (September 9) and in Arlington, Virginia (September 30). In October, you can join Community Days in Amersfoort, Netherlands (October 3), in Warsaw, Poland (October 14), and in Dresden, Germany (October 19).

That’s all for this week. Check back next Monday for another Week in Review! And maybe I’ll see you at the AWS Community Day here in the Bay Area!

– Antje

This post is part of our Week in Review series. Check back each week for a quick roundup of interesting news and announcements from AWS!

AWS announces migration plans for NIST 800-53 Revision 5

2022-08-29 James Mueller

Post Syndicated from James Mueller original https://aws.amazon.com/blogs/security/aws-announces-migration-plans-for-nist-800-53-revision-5/

Amazon Web Services (AWS) is excited to begin migration plans for National Institute of Standards and Technology (NIST) 800-53 Revision 5.

The NIST 800-53 framework is a regulatory standard that defines the minimum baseline of security controls for U.S. federal information systems. In 2020, NIST released Revision 5 of the framework to improve security standards for industry partners and government agencies. The set of NIST 800-53 controls provides a foundation for additional laws and regulations within the U.S. government.

The Federal Information Security Modernization Act (FISMA) of 2014 is a law that requires federal agencies and contractors to meet information security standards. The Federal Risk and Authorization Management Program (FedRAMP) is a federal government program that provides a standardized approach to security assessment, authorization, and continuous monitoring of cloud services. Both FISMA and FedRAMP rely on the NIST 800-53 framework.

NIST 800-53 Revision 5

AWS meets the NIST 800-53 Revision 4 regulatory standards mandated by government authorities. NIST added numerous security enhancements, such as privacy and supply chain management, to Revision 5 to keep abreast of emerging threats to federal information systems.

In preparation for federal regulators to accept NIST 800-53 Revision 5 as the new requirement standard, AWS has begun efforts to adapt to the new security controls, processes, and procedures. AWS security compliance teams have analyzed the new requirements and launched a project to implement the updates. Although AWS is not required to migrate to the new Revision 5 standard until NIST announces the official regulatory compliance deadline, we are already taking steps to meet the deadline.

To learn more about AWS compliance programs, see the AWS Compliance Programs page.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Convert Oracle XML BLOB data to JSON using Amazon EMR and load to Amazon Redshift

2022-08-29 Abhilash Nagilla

Post Syndicated from Abhilash Nagilla original https://aws.amazon.com/blogs/big-data/convert-oracle-xml-blob-data-to-json-using-amazon-emr-and-load-to-amazon-redshift/

In legacy relational database management systems, data is stored in several complex data types, such XML, JSON, BLOB, or CLOB. This data might contain valuable information that is often difficult to transform into insights, so you might be looking for ways to load and use this data in a modern cloud data warehouse such as Amazon Redshift. One such example is migrating data from a legacy Oracle database with XML BLOB fields to Amazon Redshift, by performing preprocessing and conversion of XML to JSON using Amazon EMR. In this post, we describe a solution architecture for this use case, and show you how to implement the code to handle the XML conversion.

Solution overview

The first step in any data migration project is to capture and ingest the data from the source database. For this task, we use AWS Database Migration Service (AWS DMS), a service that helps you migrate databases to AWS quickly and securely. In this example, we use AWS DMS to extract data from an Oracle database with XML BLOB fields and stage the same data in Amazon Simple Storage Service (Amazon S3) in Apache Parquet format. Amazon S3 is an object storage service offering industry-leading scalability, data availability, security, and performance, and is the storage of choice for setting up data lakes on AWS.

After the data is ingested into an S3 staging bucket, we used Amazon EMR to run a Spark job to perform the conversion of XML fields to JSON fields, and the results are loaded in a curated S3 bucket. Amazon EMR runtime for Apache Spark can be over three times faster than clusters without EMR runtime, and has 100% API compatibility with standard Apache Spark. This improved performance means your workloads run faster and it saves you compute costs, without making any changes to your application.

Finally, transformed and curated data is loaded into Amazon Redshift tables using the COPY command. The Amazon Redshift table structure should match the number of columns and the column data types in the source file. Because we stored the data as a Parquet file, we specify the SERIALIZETOJSON option in the COPY command. This allows us to load complex types, such as structure and array, in a column defined as SUPER data type in the table.

The following architecture diagram shows the end-to-end workflow.

In detail, AWS DMS migrates data from the source database tables into Amazon S3, in Parquet format. Apache Spark on Amazon EMR reads the raw data, transforms the XML data type into JSON, and saves the data to the curated S3 bucket. In our code, we used an open-source library, called spark-xml, to parse and query the XML data.

In the rest of this post, we assume that the AWS DMS tasks have already run and created the source Parquet files in the S3 staging bucket. If you want to set up AWS DMS to read from an Oracle database with LOB fields, refer to Effectively migrating LOB data to Amazon S3 from Amazon RDS for Oracle with AWS DMS or watch the video Migrate Oracle to S3 Data lake via AWS DMS.

Prerequisites

If you want to follow along with the examples in this post using your AWS account, we provide an AWS CloudFormation template you can launch by choosing Launch Stack:

Provide a stack name and leave the default settings for everything else. Wait for the stack to display Create Complete (this should only take a few minutes) before moving on to the other sections.

The template creates the following resources:

A virtual private cloud (VPC) with two private subnets that have routes to an Amazon S3 VPC endpoint
The S3 bucket {stackname}-s3bucket-{xxx}, which contains the following folders:
- libs – Contains the JAR file to add to the notebook
- notebooks – Contains the notebook to interactively test the code
- data – Contains the sample data
An Amazon Redshift cluster, in one of the two private subnets, with a database named rs_xml_db and a schema named rs_xml
A secret (rs_xml_db) in AWS Secrets Manager
An EMR cluster

The CloudFormation template shared in this post is purely for demonstration purposes only. Please conduct your own security review and incorporate best practices prior to any production deployment using artifacts from the post.

Finally, some basic knowledge of Python and Spark DataFrames can help you review the transformation code, but isn’t mandatory to complete the example.

Understanding the sample data

In this post, we use college students’ course and subjects sample data that we created. In the source system, data consists of flat structure fields, like course_id and course_name, and an XML field that includes all the course material and subjects involved in the respective course. The following screenshot is an example of the source data, which is staged in an S3 bucket as a prerequisite step.

We can observe that the column study_material_info is an XML type field and contains nested XML tags in it. Let’s see how to convert this nested XML field to JSON in the subsequent steps.

Run a Spark job in Amazon EMR to transform the XML fields in the raw data to JSON

In this step, we use an Amazon EMR notebook, which is a managed environment to create and open Jupyter Notebook and JupyterLab interfaces. It enables you to interactively analyze and visualize data, collaborate with peers, and build applications using Apache Spark on EMR clusters. To open the notebook, follow these steps:

On the Amazon S3 console, navigate to the bucket you created as a prerequisite step.
Download the file in the notebooks folder.
On the Amazon EMR console, choose Notebooks in the navigation pane.
Choose Create notebook.
For Notebook name, enter a name.
For Cluster, select Choose an existing cluster.
Select the cluster you created as a prerequisite.
For Security Groups, choose BDB1909-EMR-LIVY-SG and BDB1909-EMR-Notebook-SG
For AWS Service Role, choose the role bdb1909-emrNotebookRole-{xxx}.
For Notebook location, specify the S3 path in the notebooks folder (s3://{stackname}-s3bucket-xxx}/notebooks/).
Choose Create notebook.
When the notebook is created, choose Open in JupyterLab.
Upload the file you downloaded earlier.
Open the new notebook.

The notebook should look as shown in the following screenshot, and it contains a script written in Scala.
Run the first two cells to configure Apache Spark with the open-source spark-xml library and import the needed modules.The spark-xml package allows reading XML files in local or distributed file systems as Spark DataFrames. Although primarily used to convert (portions of) large XML documents into a DataFrame, spark-xml can also parse XML in a string-valued column in an existing DataFrame with the from_xml function, in order to add it as a new column with parsed results as a struct.
To do so, in the third cell, we load the data from the Parquet file generated by AWS DMS into a DataFrame, then we extract the attribute that contains the XML code (STUDY_MATERIAL_INFO) and map it to a string variable name payloadSchema.
We can now use the payloadSchema in the from_xml function to convert the field STUDY_MATERIAL_INFO into a struct data type and added it as a column named course_material in a new DataFrame parsed.
Finally, we can drop the original field and write the parsed DataFrame to our curated zone in Amazon S3.

Due to the structure differences between DataFrame and XML, there are some conversion rules from XML data to DataFrame and from DataFrame to XML data. More details and documentation are available XML Data Source for Apache Spark.

When we convert from XML to DataFrame, attributes are converted as fields with the heading prefix attributePrefix (underscore (_) is the default). For example, see the following code:

  <book category="undergraduate">
    <title lang="en">Introduction to Biology</title>
    <author>Demo Author 1</author>
    <year>2005</year>
    <price>30.00</price>
  </book>

It produces the following schema:

root
 |-- category: string (nullable = true)
 |-- title: struct (nullable = true)
 |    |-- _VALUE: string (nullable = true)
 |    |-- _lang: string (nullable = true)
 |-- author: string (nullable = true)
 |-- year: string (nullable = true)
 |-- price: string (nullable = true)

Next, we have a value in an element that has no child elements but attributes. The value is put in a separate field, valueTag. See the following code:

<title lang="en">Introduction to Biology</title>

It produces the following schema, and the tag lang is converted into the _lang field inside the DataFrame:

|-- title: struct (nullable = true)
 |    |-- _VALUE: string (nullable = true)
 |    |-- _lang: string (nullable = true)

Copy curated data into Amazon Redshift and query tables seamlessly

Because our semi-structured nested dataset is already written in the S3 bucket as Apache Parquet formatted files, we can use the COPY command with the SERIALIZETOJSON option to ingest data into Amazon Redshift. The Amazon Redshift table structure should match the metadata of the Parquet files. Amazon Redshift can replace any Parquet columns, including structure and array types, with SUPER data columns.

The following code demonstrates CREATE TABLE example to create a staging table.

create table rs_xml_db.public.stg_edw_course_catalog 
(
course_id bigint,
course_name character varying(5000),
course_material super
);

The following code uses the COPY example to load from Parquet format:

COPY rs_xml_db.public.stg_edw_course_catalog FROM 's3://<<your Amazon S3 Bucket for curated data>>/data/target/<<your output parquet file>>' 
IAM_ROLE '<<your IAM role>>' 
FORMAT PARQUET SERIALIZETOJSON;

By using semistructured data support in Amazon Redshift, you can ingest and store semistructured data in your Amazon Redshift data warehouses. With the SUPER data type and PartiQL language, Amazon Redshift expands the data warehouse capability to integrate with both SQL and NoSQL data sources. The SUPER data type only supports up to 1 MB of data for an individual SUPER field or object. Note, the JSON object may be stored in a SUPER data type, but reading this data using JSON functions currently has a VARCHAR (65535 byte) limit. See Limitations for more details.

The following example shows how nested JSON can be easily accessed using SELECT statements:

SELECT DISTINCT bk._category
	,bk.author
	,bk.price
	,bk.year
	,bk.title._lang
FROM rs_xml_db.public.stg_edw_course_catalog main
INNER JOIN main.course_material.book bk ON true;

The following screenshot shows our results.

Clean up

To avoid incurring future charges, first delete the notebook and the related files on Amazon S3 bucket as explained in this EMR documentation page then the CloudFormation stack.

Conclusion

This post demonstrated how to use AWS services like AWS DMS, Amazon S3, Amazon EMR, and Amazon Redshift to seamlessly work with complex data types like XML and perform historical migrations when building a cloud data lake house on AWS. We encourage you to try this solution and take advantage of all the benefits of these purpose-built services.

If you have questions or suggestions, please leave a comment.

About the authors

Abhilash Nagilla is a Sr. Specialist Solutions Architect at AWS, helping public sector customers on their cloud journey with a focus on AWS analytics services. Outside of work, Abhilash enjoys learning new technologies, watching movies, and visiting new places.

Avinash Makey is a Specialist Solutions Architect at AWS. He helps customers with data and analytics solutions in AWS. Outside of work he plays cricket, tennis and volleyball in free time.

Fabrizio Napolitano is a Senior Specialist SA for DB and Analytics. He has worked in the analytics space for the last 20 years, and has recently and quite by surprise become a Hockey Dad after moving to Canada.

Enable federation to Amazon QuickSight accounts with Ping One

2022-08-29 Srikanth Baheti

Post Syndicated from Srikanth Baheti original https://aws.amazon.com/blogs/big-data/enable-federation-to-amazon-quicksight-accounts-with-ping-one/

Amazon QuickSight is a scalable, serverless, embeddable, machine learning (ML)-powered business intelligence (BI) service built for the cloud that supports identity federation in both Standard and Enterprise editions. Organizations are working towards centralizing their identity and access strategy across all of their applications, including on-premises, third-party, and applications on AWS. Many organizations use Ping One to control and manage user authentication and authorization centrally. If your organization uses Ping One for cloud applications, you can enable federation to all of your QuickSight accounts without needing to create and manage users in QuickSight. This authorizes users to access QuickSight assets—analyses, dashboards, folders, and datasets—through centrally managed Ping One.

In this post, we go through the steps to configure federated single sign-on (SSO) between a Ping One instance and a QuickSight account. We demonstrate registering an SSO application in Ping One, creating groups, and mapping to an AWS Identity and Access Management (IAM) role that translates to QuickSight user license types (admin, author, and reader). These QuickSight roles represent three different personas supported in QuickSight. Administrators can publish the QuickSight app in Ping One to enable users to perform SSO to QuickSight using their Ping credentials.

Prerequisites

To complete this walkthrough, you must have the following prerequisites:

A Ping One subscription
One or more QuickSight account subscriptions

Solution overview

The walkthrough includes the following steps:

Create groups in Ping One for each of the QuickSight user license types.
Register an AWS application in Ping One.
Add Ping One as your SAML identity provider (IdP) in AWS.
Configure an IAM policy.
Configure an IAM role.
Configure your AWS application in Ping One.
Test the application from Ping One.

Create groups in Ping One for each of the QuickSight roles

To create groups in Ping One, complete the following steps:

Sign in to the Ping One portal using an administrator account.
Under Identities, choose Groups.
Choose the plus sign to add a group.
For Group Name, enter QuickSightReaders.
Choose Save.
Repeat these steps to create the groups QuickSightAdmins and QuickSightAuthors.

Register an AWS application in Ping One

To configure the integration of an AWS application in Ping One, you need to add AWS to your list of managed software as a service (SaaS) apps.

Sign in to the Ping One portal using an administrator account.
Under Connections, choose Application Catalog.
In the search box, enter amazon web services.
Choose Amazon Web Services – AWS from the results to add the application.
For Name, enter Amazon QuickSight.
Choose Next.
Under Map Attributes, there should be four attributes.
Delete the attribute related to SessionDuration.
Choose Username as the value for all the remaining attributes for now.
We update these values in later steps.
Choose Next.
In the Select Groups section, add the QuickSightAdmins, QuickSightAuthors, and QuickSightReaders groups you created.
Choose Save.
After the application is created, choose the application again and download the federation metadata XML.

You use this in the next step.
BDB-2210-Ping-AWS-Metadata

Add Ping One as your SAML IdP in AWS

To configure Ping One as your SAML IdP, complete the following steps:

Open a new tab in your browser.
Sign in to the IAM console in your AWS account with admin permissions.
On the IAM console, under Access Management in the navigation pane, choose Identity providers.
Choose Add provider.
For Provider name, enter PingOne.
Choose file to upload the metadata document you downloaded earlier.
Choose Add provider.
In the banner message that appears, choose View provider.
Copy the IdP ARN to use in a later step.

Configure an IAM policy

In this step, you create an IAM policy to map three different roles with permissions in QuickSight.

Use the following steps to set up QuickSightUserCreationPolicy. This policy grants privileges in QuickSight to the federated user based on the assigned groups in Ping One.

On the IAM console, choose Policies.
Choose Create policy.

On the JSON tab, replace the existing text with the following code:

{
   "Version": "2012-10-17",
    "Statement": [ 
         {  
            "Sid": "VisualEditor0", 
             "Effect": "Allow", 
             "Action": "quicksight:CreateAdmin", 
             "Resource": "*", 
             "Condition": { 
                 "StringEquals": { 
                     "aws:PrincipalTag/user-role": "QuickSightAdmins" 
 
                } 
             } 
         }, 
         { 
             "Sid": "VisualEditor1", 
             "Effect": "Allow", 
             "Action": "quicksight:CreateUser", 
             "Resource": "*", 
             "Condition": { 
                 "StringEquals": { 
                     "aws:PrincipalTag/user-role": "QuickSightAuthors" 
                 } 
             } 
         }, 
         { 
             "Sid": "VisualEditor2", 
             "Effect": "Allow", 
             "Action": "quicksight:CreateReader", 
             "Resource": "*", 
             "Condition": { 
                 "StringEquals": { 
                     "aws:PrincipalTag/user-role": "QuickSightReaders" 
                 } 
             } 
         } 
     ] 
 }

Choose Review policy.
For Name, enter QuickSightUserCreationPolicy.
Choose Create policy.

Configure an IAM role

Next, create the role that Ping One users assume when federating into QuickSight. Use the following steps to set up the federated role:

On the IAM console, choose Roles.
Choose Create role.
For Trusted entity type, select SAML 2.0 federation.
For SAML 2.0-based provider, choose the provider you created earlier (PingOne).
Select Allow programmatic and AWS Management Console access.
For Attribute, choose SAML:aud.
For Value, enter https://signin.aws.amazon.com/saml.
Choose Next.
Under Permissions policies, select the QuickSightUserCreationPolicy IAM policy you created in the previous step.
Choose Next.
For Role name, enter QSPingOneFederationRole.
Choose Create role.
On the IAM console, in the navigation pane, choose Roles.
Choose the QSPingOneFederationRole role you created to open the role’s properties.
Copy the role ARN to use in later steps.
On the Trust relationships tab, under Trusted entities, verify that the IdP you created is listed.
Under Condition in the policy code, verify that SAML:aud with a value of https://signin.aws.amazon.com/saml is present.
Choose Edit trust policy to add an additional condition.

Under Condition, add the following code:

"StringLike": {
"aws:RequestTag/user-role": "*"
}

Under Action, add the following code:
```
  "sts:TagSession"
```
Choose Update policy to save changes.

Configure an AWS application in Ping One

To configure your AWS application, complete the following steps:

Sign in to the Ping One portal using a Ping One administrator account.
Under Connections, choose Application.
Choose the Amazon QuickSight application you created earlier.
On the Profile tab, choose Enable Advanced Configuration.
Choose Enable in the pop-up window.
On the Configuration tab, choose the pencil icon to edit the configuration.
Under SIGNING KEY, select Sign Assertion & Response.
Under SLO BINDING, for Assertion Validity Duration In Seconds, enter a duration, such as 900.
For Target Application URL, enter https://quicksight.aws.amazon.com/.
Choose Save.
On the Attribute Mappings tab, you now add or update the attributes as in the following table.

Attribute Name	Value
`saml_subject`	Username
`https://aws.amazon.com/SAML/Attributes/RoleSessionName`	Username
`https://aws.amazon.com/SAML/Attributes/Role`	‘arn:aws:iam::xxxxxxxxxx:role/QSPingOneFederationRole, arn:aws:iam::xxxxxxxxxx:saml-provider/PingOne’
`https://aws.amazon.com/SAML/Attributes/PrincipalTag:user-role`	user.memberOfGroupNames[0]

Enter https://aws.amazon.com/SAML/Attributes/PrincipalTag:user-role for the attribute name and use the corresponding value from the table for the expression.
Choose Save.

If you have more than one QuickSight user role (for this post, QuickSightAdmins, QuicksightAuthors, and QuickSightReaders), you can add all the appropriate role names as follows:

#data.containsAny(user.memberOfGroupNames,{'QuickSightAdmins'})? 'QuickSightAdmins' : 

#data.containsAny(user.memberOfGroupNames,{'QuickSightAuthorss'}) ? 'QuickSightAuthors' : 

#data.containsAny(user.memberOfGroupNames,{'QuickSightReaders'}) ?'QuickSightReaders' : null

To edit the role attribute, choose the gear icon next to the role.
Populate the corresponding expression from the table and choose Save.

The format of the expression is the role ARN (copied in the role creation step) followed by the IdP ARN (copied in the IdP creation step) separated by a comma.

Test the application

In this section, you test your Ping One SSO configuration by using a Microsoft application.

In the Ping One portal, under Identities, choose Groups.
Choose a group and choose Add Users Individually.
From the list of users, add the appropriate users to the group by choosing the plus sign.
Choose Save.
To test the connectivity, under Environment, choose Properties, then copy the URL under APPLICATION PORTAL URL.
Browse to the URL in a private browsing window.
Enter your user credentials and choose Sign On.
Upon a successful sign-in, you’re redirected to the All Applications page with a new application called Amazon QuickSight.
Choose the Amazon QuickSight application to be redirected to the QuickSight console.

Note in the following screenshot that the user name at the top of the page shows as the Ping One federated user.

Summary

This post provided step-by-step instructions to configure federated SSO between Ping One and the QuickSight console. We also discussed how to create policies and roles in IAM and map groups in Ping One to IAM roles for secure access to the QuickSight console.

For additional discussions and help getting answers to your questions, check out the QuickSight Community.

About the authors

Srikanth Baheti is a Specialized World Wide Sr. Solution Architect for Amazon QuickSight. He started his career as a consultant and worked for multiple private and government organizations. Later he worked for PerkinElmer Health and Sciences & eResearch Technology Inc, where he was responsible for designing and developing high traffic web applications, highly scalable and maintainable data pipelines for reporting platforms using AWS services and Serverless computing.

Raji Sivasubramaniam is a Sr. Solutions Architect at AWS, focusing on Analytics. Raji is specialized in architecting end-to-end Enterprise Data Management, Business Intelligence and Analytics solutions for Fortune 500 and Fortune 100 companies across the globe. She has in-depth experience in integrated healthcare data and analytics with wide variety of healthcare datasets including managed market, physician targeting and patient analytics.

Raj Jayaraman is a Senior Specialist Solutions Architect for Amazon QuickSight. Raj focuses on helping customers develop sample dashboards, embed analytics and adopt BI design patterns and best practices.

Deploying AWS Lambda functions using AWS Controllers for Kubernetes (ACK)

2022-08-29 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/deploying-aws-lambda-functions-using-aws-controllers-for-kubernetes-ack/

This post is written by Rajdeep Saha, Sr. SSA, Containers/Serverless.

AWS Controllers for Kubernetes (ACK) allows you to manage AWS services directly from Kubernetes. With the ACK service controller for AWS Lambda, you can provision and manage Lambda functions with kubectl and custom resources. With ACK, you can have a single consolidated approach to managing container workloads and other AWS services, such as Lambda, directly from Kubernetes without needing additional infrastructure automation tools.

This post walks you through deploying a sample Lambda function from a Kubernetes cluster provided by Amazon EKS.

Use cases

Some of the use cases for provisioning Lambda functions from ACK include:

Your organization already has a DevOps process to deploy resources into the Amazon EKS cluster using Kubernetes declarative YAMLs (known as manifest files). With ACK for AWS Lambda, you can now use manifest files to provision Lambda functions without creating separate infrastructure as a code template.
Your project has implemented GitOps with Kubernetes. With GitOps, git becomes the single source of truth, and all the changes are done via git repo. In this model, Kubernetes continuously reconciles the git repo (desired state) with the resources running inside the cluster (current state). If any differences are found, the GitOps process automatically implements changes to the cluster from the git repo. Using ACK for AWS Lambda, since you are creating the Lambda function using Kubernetes custom resource, the GitOps model is applied for Lambda.
Your organization has established permissions boundaries for different users and groups using role-based access control (RBAC) and IAM roles for service accounts (IRSA). You can reuse this security model for Lambda without having to create new users and policies.

How ACK for AWS Lambda works

The ‘Ops’ team deploys the ACK service controller for Lambda. This controller runs as a pod within the Amazon EKS cluster.
The controller pod needs permission to read the Lambda function code and create the Lambda function. The Lambda function code is stored as a zip file in an S3 bucket for this example. The permissions are granted to the pod using IRSA.
Each AWS service has separate ACK service controllers. This specific controller for AWS Lambda can act on the custom resource type ‘Function’.
The ‘Dev’ team deploys Kubernetes manifest file with custom resource type ‘Function’. This manifest file defines the necessary fields required to create the function, such as S3 bucket name, zip file name, Lambda function IAM role, etc.
The ACK service controller creates the Lambda function using the values from the manifest file.

Prerequisites

You need a few tools before deploying the sample application. Ensure that you have each of the following in your working environment:

This post uses shell variables to make it easier to substitute the actual names for your deployment. When you see placeholders like NAME=<your xyz name>, substitute in the name for your environment.

Setting up the Amazon EKS cluster

Run the following to create an Amazon EKS cluster. The following single command creates a two-node Amazon EKS cluster with a unique name.
```
eksctl create cluster
```
It may take 15–30 minutes to provision the Amazon EKS cluster. When the cluster is ready, run:
```
kubectl get nodes
```
The output shows the following:

To get the Amazon EKS cluster name to use throughout the walkthrough, run:

eksctl get cluster

export EKS_CLUSTER_NAME=<provide the name from the previous command>

Setting up the ACK Controller for Lambda

To set up the ACK Controller for Lambda:

Install an ACK Controller with Helm by following these instructions:
– Change ‘export SERVICE=s3’ to ‘export SERVICE=lambda’.
– Change ‘export AWS_REGION=us-west-2’ to reflect your Region appropriately.
To configure IAM permissions for the pod running the Lambda ACK Controller to permit it to create Lambda functions, follow these instructions.
– Replace ‘SERVICE=”s3”’ with ‘SERVICE=”lambda”’.
Validate that the ACK Lambda controller is running:
```
kubectl get pods -n ack-system
```
The output shows the running ACK Lambda controller pod:

Provisioning a Lambda function from the Kubernetes cluster

In this section, you write a sample “Hello world” Lambda function. You zip up the code and upload the zip file to an S3 bucket. Finally, you deploy that zip file to a Lambda function using the ACK Controller from the EKS cluster you created earlier. For this example, use Python3.9 as your language runtime.

To provision the Lambda function:

Run the following to create the sample “Hello world” Lambda function code, and then zip it up:

mkdir my-helloworld-function
cd my-helloworld-function
cat << EOF > lambda_function.py 
import json

def lambda_handler(event, context):
    # TODO implement
    return {
        'statusCode': 200,
        'body': json.dumps('Hello from Lambda!')
    }
EOF
zip my-deployment-package.zip lambda_function.py

Create an S3 bucket following the instructions here. Alternatively, you can use an existing S3 bucket in the same Region of the Amazon EKS cluster.

Run the following to upload the zip file into the S3 bucket from the previous step:

export BUCKET_NAME=<provide the bucket name from step 2>
aws s3 cp  my-deployment-package.zip s3://${BUCKET_NAME}

The output shows:
upload: ./my-deployment-package.zip to s3://<BUCKET_NAME>/my-deployment-package.zip
Create your Lambda function using the ACK Controller. The full spec with all the available fields is listed here. First, provide a name for the function:
```
export FUNCTION_NAME=hello-world-s3-ack
```

Create and deploy the Kubernetes manifest file. The command at the end, kubectl create -f function.yaml submits the manifest file, with kind as ‘Function’. The ACK Controller for Lambda identifies this custom ‘Function’ object and deploys the Lambda function based on the manifest file.

export AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query "Account" --output text)
export LAMBDA_ROLE="arn:aws:iam::${AWS_ACCOUNT_ID}:role/lambda_basic_execution"

cat << EOF > lambdamanifest.yaml 
apiVersion: lambda.services.k8s.aws/v1alpha1
kind: Function
metadata:
 name: $FUNCTION_NAME
 annotations:
   services.k8s.aws/region: $AWS_REGION
spec:
 name: $FUNCTION_NAME
 code:
   s3Bucket: $BUCKET_NAME
   s3Key: my-deployment-package.zip
 role: $LAMBDA_ROLE
 runtime: python3.9
 handler: lambda_function.lambda_handler
 description: function created by ACK lambda-controller e2e tests
EOF
kubectl create -f lambdamanifest.yaml

The output shows:
function.lambda.services.k8s.aws/< FUNCTION_NAME> created
To retrieve the details of the function using a Kubernetes command, run:
```
kubectl describe function/$FUNCTION_NAME
```
This Lambda function returns a “Hello world” message. To invoke the function, run:
```
aws lambda invoke --function-name $FUNCTION_NAME  response.json
cat response.json
```
The Lambda function returns the following output:
{"statusCode": 200, "body": "\"Hello from Lambda!\""}

Congratulations! You created a Lambda function from your Kubernetes cluster.

To learn how to provision the Lambda function using the ACK controller from an OCI container image instead of a zip file in an S3 bucket, follow these instructions.

Cleaning up

This section cleans up all the resources that you have created. To clean up:

Delete the Lambda function:
```
kubectl delete function $FUNCTION_NAME
```

If you have created a new S3 bucket, delete it by running:

aws s3 rm s3://${BUCKET_NAME} --recursive
aws s3api delete-bucket --bucket ${BUCKET_NAME}

Delete the EKS cluster:

eksctl delete cluster --name $EKS_CLUSTER_NAME

Delete the IAM role created for the ACK Controller. Get the IAM role name by running the following command, then delete the role from the IAM console:
```
echo $ACK_CONTROLLER_IAM_ROLE
```

Conclusion

This blog post shows how AWS Controllers for Kubernetes enables you to deploy a Lambda function directly from your Amazon EKS environment. AWS Controllers for Kubernetes provides a convenient way to connect your Kubernetes applications to AWS services directly from Kubernetes.

ACK is open source: you can request new features and report issues on the ACK community GitHub repository.

For more serverless learning resources, visit Serverless Land.

[$] Crash recovery for user-space block drivers

2022-08-29

Post Syndicated from original https://lwn.net/Articles/906097/

A new user-space block driver mechanism
entered the kernel during the 6.0 merge window. This subsystem, called
“ublk”, uses io_uring to communicate with
user-space drivers, resulting in some impressive performance numbers. Ublk
has a lot of interesting potential, but the current use cases for it are
not entirely clear. The recently posted crash-recovery
mechanism for ublk makes it clear, though, that those use cases do
exist.

Levels of Assurance for DoD Microelectronics

2022-08-29 Bruce Schneier

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2022/08/levels-of-assurance-for-dod-microelectronics.html

The NSA has has published criteria for evaluating levels of assurance required for DoD microelectronics.

The introductory report in a DoD microelectronics series outlines the process for determining levels of hardware assurance for systems and custom microelectronic components, which include application-specific integrated circuits (ASICs), field programmable gate arrays (FPGAs) and other devices containing reprogrammable digital logic.

The levels of hardware assurance are determined by the national impact caused by failure or subversion of the top-level system and the criticality of the component to that top-level system. The guidance helps programs acquire a better understanding of their system and components so that they can effectively mitigate against threats.

The report was published last month, but I only just noticed it.

Two stable kernels

2022-08-29

Post Syndicated from original https://lwn.net/Articles/906356/

Greg Kroah-Hartman has released the 5.19.5
and 5.10.139 stable kernels to fix a problem
stemming
from an incorrect merge of a patch
to the dummy-tools used for building kernels.

Security updates for Monday

2022-08-29

Post Syndicated from original https://lwn.net/Articles/906355/

Security updates have been issued by Debian (curl, exim4, maven-shared-utils, ndpi, puma, webkit2gtk, and wpewebkit), Fedora (dotnet3.1, firefox, and webkit2gtk3), Mageia (clamav, mariadb, net-snmp, postgresql, python-ldap, and thunderbird), SUSE (freeciv, gnutls, keepalived, libyang, nim, python-Django, and varnish), and Ubuntu (schroot).

Git’s database internals I: packed object store

2022-08-29 Derrick Stolee

Post Syndicated from Derrick Stolee original https://github.blog/2022-08-29-gits-database-internals-i-packed-object-store/

Developers collaborate using Git. It is the medium that allows us to share code, work independently on our own machines, and then finally combine our efforts into a common understanding. For many, this is done by following some well-worn steps and sticking to that pattern. This works in the vast majority of use cases, but what happens when we need to do something new with Git? Knowing more about Git’s internals helps when exploring those new solutions.

In this five-part blog post series, we will illuminate Git’s internals to help you collaborate via Git, especially at scale.

It might also be interesting because you love data structures and algorithms. That’s what drives me to be interested in and contribute to Git.

Git’s architecture follows patterns that may be familiar to developers, except the patterns come from a different context. Almost all applications use a database to persist and query data. When building software based on an application database system, it’s easy to get started without knowing any of the internals. However, when it’s time to scale your solution, you’ll have to dive into more advanced features like indexes and query plans.

The core idea I want to convey is this:

Git is the distributed database at the core of your engineering system.

Here are some very basic concepts that Git shares with application databases:

Data is persisted to disk.
Queries allow users to request information based on that data.
The data storage is optimized for these queries.
The query algorithms are optimized to take advantage of these structures.
Distributed nodes need to synchronize and agree on some common state.

While these concepts are common to all databases, Git is particularly specialized. Git was built to store plain-text source code files, where most change are small enough to read in a single sitting, even if the codebase contains millions of lines. People use Git to store many other kinds of data, such as documentation, web pages, or configuration files.

While many application databases use long-running processes with significant amounts of in-memory caching, Git uses short-lived processes and uses the filesystem to persist data between executions. Git’s data types are more restrictive than a typical application database. These aspects lead to very specialized data storage and access patterns.

Today, let’s dig into the basics of what data Git stores and how it accesses that data. Specifically, we will learn about Git’s object store and how it uses packfiles to compress data that would otherwise contain redundant information.

Git’s object store

The most fundamental concepts in Git are Git objects. These are the “atoms” of your Git repository. They combine in interesting ways to create the larger structure. Let’s start with a quick overview of the important Git objects. Feel free to skip ahead if you know this, or you can dig deep into Git’s object model if you’re interested.

In your local Git repositories, your data is stored in the .git directory. Inside, there is a .git/objects directory that contains your Git objects.

$ ls .git/objects/
01  34  9a  df  info    pack

$ ls .git/objects/01/
12010547a8990673acf08117134bdc181bd735

$ ls .git/objects/pack/
multi-pack-index
pack-7017e6ce443801478cf19006fc5499ba1c4d2960.idx
pack-7017e6ce443801478cf19006fc5499ba1c4d2960.pack
pack-9f9258a8ffe4187f08a93bcba47784e07985d999.idx
pack-9f9258a8ffe4187f08a93bcba47784e07985d999.pack

The .git/objects directory is called the object store. It is a content-addressable data store, meaning that we can retrieve the contents of an object by providing a hash of those contents.

In this way, the object store is like a database table with two columns: the object ID and the object content. The object ID is the hash of the object content and acts like a primary key.

Table with columns labeled Object ID and Object Data

Upon first encountering content-addressable data stores, it is natural to ask, “How can we access an object by hash if we don’t already know its content?” We first need to have some starting points to navigate into the object store, and from there we can follow links between objects that exist in the structure of the object data.

First, Git has references that allow you to create named pointers to keys in the object database. The reference store mainly exists in the .git/refs/ directory and has its own advanced way of storing and querying references efficiently. For now, think of the reference store as a two-column table with columns for the reference name and the object ID. In the reference store, the reference name is the primary key.

Image showing how the Object ID table relates to the Object Store

Now that we have a reference store, we can navigate into the object store from some human-readable names. In addition to specifying a reference by its full name, such as refs/tags/v2.37.0, we can sometimes use short names, such as v2.37.0 where appropriate.

In the Git codebase, we can start from the v2.37.0 reference and follow the links to each kind of Git object.

The refs/tags/v2.37.0 reference points to an annotated tag object. An annotated tag contains a reference to another object (by object ID) and a plain-text message.
That tag’s object references a commit object. A commit is a snapshot of the worktree at a point in time, along with connections to previous versions. It contains links to parent commits, a root tree, as well as metadata, such as commit time and commit message.
That commit’s root tree references a tree object. A tree is similar to a directory in that it contains entries that link a path name to an object ID.
From that tree, we can follow the entry for README.md to find a blob object. Blobs store file contents. They get their name from the tree that points to them.

Image displaying hops through the object database in response to a user request.

From this example, we navigated from a ref to the contents of the README.md file at that position in the history. This very simple request of “give me the README at this tag” required several hops through the object database, linking an object ID to that object’s contents.

These hops are critical to many interesting Git algorithms. We will explore how the graph structure of the object store is used by Git’s algorithms in parts two through four. For now, let’s focus on the critical operation of linking an object ID to the object contents.

Object store queries

To store and access information in an application database, developers interact with the database using a query language such as SQL. Git has its own type of query language: the command-line interface. Git commands are how we interact with the Git object store. Since Git has its own structure, we do not get the full flexibility of a relational database. However, there are some parallels.

To select object contents by object ID, the git cat-file command will do the object lookup and provide the necessary information. We’ve already been using git cat-file -p to present “pretty” versions of the Git object data by object ID. The raw content is not always fit for human readers, with object IDs stored as raw hashes and not hexadecimal digits, among other things like null bytes. We can also use git cat-file -t to show the type of an object, which is discoverable from the initial few bytes of the object data.

To insert an object into the object store, we can write directly to a blob using git hash-object. This command takes file content and writes it into a blob in the object store. After the input is complete, Git reports the object ID of the written blob.

$ git hash-object -w --stdin
Hello, world!
af5626b4a114abcb82d63db7c8082c3c4756e51b

$ git cat-file -t af5626b4a114abcb82d63db7c8082c3c4756e51b
blob

$ git cat-file -p af5626b4a114abcb82d63db7c8082c3c4756e51b
Hello, world!

More commonly, we not only add a file’s contents to the object store, but also prepare to create new commit and tree objects to reference that new content. The git add command hashes new changes in the worktree and stores their blobs in the object store then writes the list of objects to a staging area known as the Git index. The git commit command takes those staged changes and creates trees pointing to all of the new blobs, then creates a new commit object pointing to the new root tree. Finally, git commit also updates the current branch to point to the new commit.

The figure below shows the process of creating several Git objects and finally updating a reference that happens when running git commit -a -m "Update README.md" when the only local edit is a change to the README.md file.

Image showing the process of creating several Git objects and updating references

We can do slightly more complicated queries based on object data. Using git log --pretty=format:<format-string>, we can make custom queries into the commits by pulling out “columns” such as the object ID and message, and even the committer and author names, emails, and dates. See the git log documentation for a full column list.

There are also some prebuilt formats ready for immediate use. For example, we can get a simple summary of a commit using git log --pretty=reference -1 <ref>. This query parses the commit at <ref> and provides the following information:

An abbreviated object ID.
The first sentence of the commit message.
The commit date in short form.

$ git log --pretty=reference -1 378b51993aa022c432b23b7f1bafd921b7c43835
378b51993aa0 (gc: simplify --cruft description, 2022-06-19)

Now that we’ve explored some of the queries we can make in Git, let’s dig into the actual storage of this data.

Compressed object storage: packfiles

Looking into the .git/objects directory again, we might see several directories with two-digit names. These directories then contain files with long hexadecimal names. These files are called loose objects, and the filename corresponds to the object ID of an object: the first two hexadecimal characters form the directory name while the rest form the filename. While the files themselves are compressed, there is not much interesting about querying these files, since Git relies on filesystem queries to satisfy most of these needs.

However, it does not take many objects before it is infeasible to store an entire Git repository using only loose objects. Not only does it strain the filesystem to have so many files, it is also inefficient when storing many versions of the same text file. Thus, Git’s packed object store in the .git/objects/pack/ directory forms a more efficient way to store Git objects.

Packfiles and pack-indexes

Each *.pack file in .git/objects/pack/ is called a packfile. Packfiles store multiple objects in compressed forms. Not only is each object compressed individually, they can also be compressed against each other to take advantage of common data.

At its simplest, a packfile contains a concatenated list of objects. It only stores the object data, not the object ID. It is possible to read a packfile to find objects by object ID, but it requires decompressing and hashing each object to compare it to the input hash. Instead, each packfile is paired with a pack-index file ending with .idx. The pack-index file stores the list of object IDs in lexicographical order so a quick binary search is sufficient to discover if an object ID is in the packfile, then an offset value points to where the object’s data begins within the packfile. The pack-index operates like a query index that speeds up read queries that rely on the primary key (object ID).

One small optimization is that a fanout table of 256 entries provides boundaries within the full list of object IDs based on their first byte. This reduces the time spent by the binary search, specifically by focusing the search on a smaller number of memory pages. This works particularly well because object IDs are uniformly distributed so the fanout ranges are well-balanced.

If we have a number of packfiles, then we could ask each pack-index in sequence to look up the object. A further enhancement to packfiles is to put several pack-indexes together in a single multi-pack-index, which stores the same offset data plus which packfile the object is in.

Lookups and prefixes work the same as in pack-indexes, except now we can skip the linear issue with many packs. You can read more about the multi-pack-index file and how it helps scale monorepo maintenance at GitHub.

Diffable object content

Packfiles also have a hyper-specialized version of row compression called deltification. Since read queries are only indexed by the object ID, we can perform extra compression on the object data part.

Git was built to store source code, which consists of plain-text files that are used as input to a compiler or interpreter to create applications. Git was also built to store many versions of this source code as it is changed by humans. This provides additional context about the kind of data typically stored in Git: diffable files with significant portions in common. If you’ve ever wondered why you shouldn’t store large binary files in Git repositories, this is the reason.

The field of software engineering has made it clear that it is difficult to understand applications in their entirety. Humans can grasp a very high-level view of an architecture and can parse small sections of code, but we cannot store enough information in our brains to grasp huge amounts of concrete code at once. You can read more about this in the excellent book, The Programmer’s Brain by Dr. Felienne Hermans.

Because of the limited size of our working memory, it is best to change code in small, well-documented iterations. This helps the code author, any code reviewers, and future developers looking at the code history. Between iterations, a significant majority of the code remains fixed while only small portions change. This allows Git to use difference algorithms to identify small diffs between the content of blob objects.

There are many ways to compute a difference between two blobs. Git has several difference algorithms implemented which can have drastically different results. Instead of focusing on unstructured differences, I want to focus on differences between structured object data. Specifically, tree objects usually change in small ways that are easy to compress.

Tree diffs

Git’s tree objects can also be compared using a difference algorithm that is aware of the structure of tree entries. Each tree entry stores a mode (think Unix file permissions), an object type, a name, and an object ID. Object IDs are for all intents and purposes random, but most edits will change a file without changing its mode, type, or name. Further, large trees are likely to have only a few entries change at a time.

For example, the tip commit at any major Git release only changes one file: the GIT-VERSION-GEN file. This means also that the root tree only has one entry different from the previous root tree:

$ git diff v2.37.0~1 v2.37.0
diff --git a/GIT-VERSION-GEN b/GIT-VERSION-GEN
index 120af376c1..b210b306b7 100755
--- a/GIT-VERSION-GEN
+++ b/GIT-VERSION-GEN
@@ -1,7 +1,7 @@
 #!/bin/sh

 GVF=GIT-VERSION-FILE
-DEF_VER=v2.37.0-rc2
+DEF_VER=v2.37.0

 LF='
 '

$ git cat-file -p v2.37.0~1^{tree} >old
$ git cat-file -p v2.37.0^{tree} >new

$ diff old new
13c13
< 100755 blob 120af376c147799e6c0069bac1f61709a0286cd6  GIT-VERSION-GEN
---
> 100755 blob b210b306b7554f28dc687d1c503517d2a5f87082  GIT-VERSION-GEN

Once we have an algorithm that can compute diffs for Git objects, the packfile format can take advantage of that.

Delta compression

The packfile format begins with some simple header information, but then it contains Git object data concatenated together. Each object’s data starts with a type and a length. The type could be the object type, in which case the content in the packfile is the full object content (subject to DEFLATE compression). The object’s type could instead be an offset delta, in which case the data is based on the content of a previous object in the packfile.

An offset delta begins with an integer offset value pointing to the relative position of a previous object in the packfile. The remaining data specifies a list of instructions which either instruct how to copy data from the base object or to write new data chunks.

Thinking back to our example of the root tree for Git’s v2.37.0 tag, we can store that tree as an offset delta to the previous root tree by copying the tree up until the object ID 120af37..., then write the new object ID b210b30..., and finally copy the rest of the previous root tree.

Keep in mind that these instructions are also DEFLATE compressed, so the new data chunks can also be compressed similarly to the base object. For the example above, we can see that the root tree for v2.37.0 is around 19KB uncompressed, 14KB compressed, but can be represented as an offset delta in only 50 bytes.

$ git rev-parse v2.37.0^{tree}
a4a2aa60ab45e767b52a26fc80a0a576aef2a010

$ git cat-file -s v2.37.0^{tree}
19388

$ ls -al .git/objects/a4/a2aa60ab45e767b52a26fc80a0a576aef2a010
-r--r--r--   1 ... ... 13966 Aug  1 13:24 a2aa60ab45e767b52a26fc80a0a576aef2a010

$ git rev-parse v2.37.0^{tree} | git cat-file --batch-check="%(objectsize:disk)"
50

Also, an offset delta can be based on another object that is also an offset delta. This creates a delta chain that requires computing the object data for each object in the list. In fact, we need to traverse the delta links in order to even determine the object type.

For this reason, there is a cost to storing objects efficiently this way. At read time, we need to do a bit extra work to materialize the raw object content Git needs to parse to satisfy its queries. There are multiple ways that Git tries to optimize this trade-off.

One way Git minimizes the extra work when parsing delta chains is by keeping the delta-chains short. The pack.depth config value specifies an upper limit on how long delta chains can be while creating a packfile. The default limit is 50.

When writing a packfile, Git attempts to use a recent object as the base and order the delta chain in reverse-chronological order. This allows the queries that involve recent objects to have minimum overhead, while the queries that involve older objects have slightly more overhead.

However, while thinking about the overhead of computing object contents from a delta chain, it is important to think about what kind of resources are being used. For example, to compute the diff between v2.37.0 and its parent, we need to load both root trees. If these root trees are in the same delta chain, then that chain’s data on disk is smaller than if they were stored in raw form. Since the packfile also places delta chains in adjacent locations in the packfile, the cost of reading the base object and its delta from disk is almost identical to reading just the base object. The extra overhead of some CPU during the parse is very small compared to the disk read. In this way, reading multiple objects in the same delta chain is faster than reading multiple objects across different chains.

In addition, some Git commands query the object store in such a way that we are very likely to parse multiple objects in the same delta chain. We will cover this more in part III when discussing file history queries.

In addition to persisting data efficiently to disk, the packfile format is also critical to how Git synchronizes Git object data across distributed copies of the repository during git fetch and git push. We will learn more about this in part IV when discussing distributed synchronization.

Packfile maintenance

In order to take advantage of packfiles and their compressed representation of Git objects, Git needs to actually write these packfiles. It is too expensive to create a packfile for every object write, so Git batches the packfile write into certain commands.

You could roll your own packfile using git pack-objects and create a pack-index for it using git index-pack. However, you instead might want to recompute a new packfile containing your entire object store using git repack -a or git gc.

As your repository grows, it becomes more difficult to replace your entire object store with a new packfile. For starters, you need enough space to store two copies of your Git object data. In addition, the computation effort to find good delta compression is very expensive and demanding. An optimal way to do delta compression takes quadratic time over the number objects, which is quickly infeasible. Git uses several heuristics to help with this, but still the cost of repacking everything all at once can be more than we are willing to spend, especially if we are just a client repository and not responsible for serving our Git data to multiple users.

There are two primary ways to update your object store for efficient reads without rewriting the entire object store into a new packfile. One is the geometric repacking option where you can run git repack --geometric to repack only a portion of packfiles until the resulting packfiles form a geometric sequence. That is, each packfile is some fixed multiple smaller than the next largest one. This uses the multi-pack-index to keep logarithmic performance for object lookups, but will occasionally tip over to repack all of the object data. That “tip over” moment only happens when the repository doubles in size, which does not happen very often.

Another approach to reducing the amount of work spent repacking is the incremental repack task in the git maintenance command. This task collects packfiles below a fixed size threshold and groups them together, at least until their total size is above that threshold. The default threshold is two gigabytes. This task is used by default when you enable background maintenance with the git maintenance start command. This also uses the multi-pack-index to keep fast lookups, but also will not rewrite the entire object store for large repositories since once a packfile is larger than the threshold it is not considered for repacking. The storage is slightly inefficient here, since objects in newer packfiles could be stored as deltas to objects in those fixed packs, but the simplicity in avoiding expensive repository maintenance is worth that slight overhead.

If you’re interested in keeping your repositories well maintained, then think about these options. You can always perform a full repack that recomputes all delta chains using git repack -adf at any time you are willing to spend that upfront maintenance cost.

What could Git learn from other databases?

Now that we have some understanding about how Git stores and accesses packed object data, let’s think about features that exist in application database systems that might be helpful here.

One thing to note is that there are no B-trees to be found! Almost every database introduction talks about how B-trees are used to efficiently index data in a database table. Why are they not present here in Git?

The main reason Git does not use B-trees is because it doesn’t do “live updating” of packfiles and pack-indexes. Once a packfile is written, it is static until it is replaced by another packfile containing its objects. That packfile is also not accessed by Git processes until its pack-index is completely written.

In this world, objects are dynamically added to the object store by adding new loose object files (such as in git add or git commit) or by adding new packfiles (such as in git fetch). If a packfile has fixed content, then we can do the most space-and-time efficient index: a binary search tree. Specifically, performing binary search on the list of object IDs in a pack-index is very efficient. It’s not an exact binary search because there is an initial fan-out table for the first byte of the object ID. It’s kind of like a rooted binary tree, except the root node has 256 children instead of only two.

B-trees excel when data is being inserted or removed from the tree. Being able to track those modifications with minimal modifications to the overall tree structure is critical for an application database serving many concurrent requests.

Git does not currently have the capability to update a packfile in real time without shutting down concurrent reads from that file. Such a change could be possible, but it would require updating Git’s storage significantly. I think this is one area where a database expert could contribute to the Git project in really interesting ways.

Another difference between Git and most database systems is that Git runs as short-lived processes. Typically, we think of the database as a process that has data cached in memory. We send queries to the existing process and it returns results and keeps running. Instead, Git starts a new process with every “query” and relies on the filesystem for persisted state. Git also relies on the operating system to cache the disk pages during and between the processes. Expert database systems tell the kernel to stop managing disk pages and instead the database manages the page cache since it knows its usage needs better than a general purpose operating system could predict.

What if Git had a long-running daemon that could satisfy queries on-demand, but also keep that in-memory representation of data instead of needing to parse objects from disk every time? Although the current architecture of Git is not well-suited to this, I believe it is an idea worth exploring in the future.

Come back tomorrow for more!

In the next part of this blog series, we will explore how Git commit history queries use the structure of Git commits to present interesting information to the user. We’ll also explore the commit-graph file and how it acts as a specialized query index for these commands.

I’ll also be speaking at Git Merge 2022 covering all five parts of this blog series, so I look forward to seeing you there!

Best of The History Guy: Outlaws of the Wild West

2022-08-29 The History Guy: History Deserves to Be Remembered

Post Syndicated from The History Guy: History Deserves to Be Remembered original https://www.youtube.com/watch?v=F4Qp_twkcJo

AI Images: Last Week Tonight with John Oliver (HBO)

2022-08-29 LastWeekTonight

Post Syndicated from LastWeekTonight original https://www.youtube.com/watch?v=3YNku5FKWjw

America Songs

2022-08-29

Post Syndicated from original https://xkcd.com/2665/

Juraaaassic Park, Juraaaassic Park, God shed his grace on theeeee

Kernel prepatch 6.0-rc3

2022-08-29

Post Syndicated from original https://lwn.net/Articles/906315/

The 6.0-rc3 kernel prepatch is out for
testing.

So as some people already noticed, last week was an anniversary
week – 31 years since the original Linux development
announcement. How time flies.

But this is not that kind of historic email – it’s just the regular
weekly RC release announcement, and things look pretty normal.

Noise level – Nuki vs SwitchBot lock #shorts

2022-08-28 BeardedTinker

Post Syndicated from BeardedTinker original https://www.youtube.com/watch?v=ywTzA0a9COI

Service authorizations by AWS Region

Introducing the Security Hub Announcements topic

How to use the Security Hub Announcements topic

IAM permissions

Subscribe to the Security Hub Announcements topic

Security Hub Announcements topic ARNs by Region

To subscribe an email address to the Security Hub Announcements topic (console)

To subscribe an email address to the Security Hub Announcements topic (AWS CLI)

Example subscription responses

Example message from an email subscription (protocol type: email)

Example message from an SQS queue subscription (protocol type: sqs)

Available now

Solution overview

Prerequisites

Understanding the sample data

Run a Spark job in Amazon EMR to transform the XML fields in the raw data to JSON

Copy curated data into Amazon Redshift and query tables seamlessly

Clean up

Conclusion

About the authors

Prerequisites

Solution overview

Create groups in Ping One for each of the QuickSight roles

Register an AWS application in Ping One

Add Ping One as your SAML IdP in AWS

Configure an IAM policy

Configure an IAM role

Configure an AWS application in Ping One

Test the application

Summary

About the authors

Use cases

How ACK for AWS Lambda works

Prerequisites

Setting up the Amazon EKS cluster

Setting up the ACK Controller for Lambda

Provisioning a Lambda function from the Kubernetes cluster

Cleaning up

Conclusion

Git’s object store

Object store queries

Compressed object storage: packfiles

Packfiles and pack-indexes

Diffable object content

Tree diffs

Delta compression

Packfile maintenance

What could Git learn from other databases?

Come back tomorrow for more!

The collective thoughts of the interwebz