Tag Archives: Uncategorized

Political Disinformation and AI

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2023/10/political-disinformation-and-ai.html

Elections around the world are facing an evolving threat from foreign actors, one that involves artificial intelligence.

Countries trying to influence each other’s elections entered a new era in 2016, when the Russians launched a series of social media disinformation campaigns targeting the US presidential election. Over the next seven years, a number of countries—most prominently China and Iran—used social media to influence foreign elections, both in the US and elsewhere in the world. There’s no reason to expect 2023 and 2024 to be any different.

But there is a new element: generative AI and large language models. These have the ability to quickly and easily produce endless reams of text on any topic in any tone from any perspective. As a security expert, I believe it’s a tool uniquely suited to Internet-era propaganda.

This is all very new. ChatGPT was introduced in November 2022. The more powerful GPT-4 was released in March 2023. Other language and image production AIs are around the same age. It’s not clear how these technologies will change disinformation, how effective they will be or what effects they will have. But we are about to find out.

Election season will soon be in full swing in much of the democratic world. Seventy-one percent of people living in democracies will vote in a national election between now and the end of next year. Among them: Argentina and Poland in October, Taiwan in January, Indonesia in February, India in April, the European Union and Mexico in June, and the US in November. Nine African democracies, including South Africa, will have elections in 2024. Australia and the UK don’t have fixed dates, but elections are likely to occur in 2024.

Many of those elections matter a lot to the countries that have run social media influence operations in the past. China cares a great deal about Taiwan, Indonesia, India, and many African countries. Russia cares about the UK, Poland, Germany, and the EU in general. Everyone cares about the United States.

And that’s only considering the largest players. Every US national election from 2016 has brought with it an additional country attempting to influence the outcome. First it was just Russia, then Russia and China, and most recently those two plus Iran. As the financial cost of foreign influence decreases, more countries can get in on the action. Tools like ChatGPT significantly reduce the price of producing and distributing propaganda, bringing that capability within the budget of many more countries.

A couple of months ago, I attended a conference with representatives from all of the cybersecurity agencies in the US. They talked about their expectations regarding election interference in 2024. They expected the usual players—Russia, China, and Iran—and a significant new one: “domestic actors.” That is a direct result of this reduced cost.

Of course, there’s a lot more to running a disinformation campaign than generating content. The hard part is distribution. A propagandist needs a series of fake accounts on which to post, and others to boost it into the mainstream where it can go viral. Companies like Meta have gotten much better at identifying these accounts and taking them down. Just last month, Meta announced that it had removed 7,704 Facebook accounts, 954 Facebook pages, 15 Facebook groups, and 15 Instagram accounts associated with a Chinese influence campaign, and identified hundreds more accounts on TikTok, X (formerly Twitter), LiveJournal, and Blogspot. But that was a campaign that began four years ago, producing pre-AI disinformation.

Disinformation is an arms race. Both the attackers and defenders have improved, but also the world of social media is different. Four years ago, Twitter was a direct line to the media, and propaganda on that platform was a way to tilt the political narrative. A Columbia Journalism Review study found that most major news outlets used Russian tweets as sources for partisan opinion. That Twitter, with virtually every news editor reading it and everyone who was anyone posting there, is no more.

Many propaganda outlets moved from Facebook to messaging platforms such as Telegram and WhatsApp, which makes them harder to identify and remove. TikTok is a newer platform that is controlled by China and more suitable for short, provocative videos—ones that AI makes much easier to produce. And the current crop of generative AIs are being connected to tools that will make content distribution easier as well.

Generative AI tools also allow for new techniques of production and distribution, such as low-level propaganda at scale. Imagine a new AI-powered personal account on social media. For the most part, it behaves normally. It posts about its fake everyday life, joins interest groups and comments on others’ posts, and generally behaves like a normal user. And once in a while, not very often, it says—or amplifies—something political. These persona bots, as computer scientist Latanya Sweeney calls them, have negligible influence on their own. But replicated by the thousands or millions, they would have a lot more.

That’s just one scenario. The military officers in Russia, China, and elsewhere in charge of election interference are likely to have their best people thinking of others. And their tactics are likely to be much more sophisticated than they were in 2016.

Countries like Russia and China have a history of testing both cyberattacks and information operations on smaller countries before rolling them out at scale. When that happens, it’s important to be able to fingerprint these tactics. Countering new disinformation campaigns requires being able to recognize them, and recognizing them requires looking for and cataloging them now.

In the computer security world, researchers recognize that sharing methods of attack and their effectiveness is the only way to build strong defensive systems. The same kind of thinking also applies to these information campaigns: The more that researchers study what techniques are being employed in distant countries, the better they can defend their own countries.

Disinformation campaigns in the AI era are likely to be much more sophisticated than they were in 2016. I believe the US needs to have efforts in place to fingerprint and identify AI-produced propaganda in Taiwan, where a presidential candidate claims a deepfake audio recording has defamed him, and other places. Otherwise, we’re not going to see them when they arrive here. Unfortunately, researchers are instead being targeted and harassed.

Maybe this will all turn out okay. There have been some important democratic elections in the generative AI era with no significant disinformation issues: primaries in Argentina, first-round elections in Ecuador, and national elections in Thailand, Turkey, Spain, and Greece. But the sooner we know what to expect, the better we can deal with what comes.

This essay previously appeared in The Conversation.

Hacking Gas Pumps via Bluetooth

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2023/10/hacking-gas-pumps-via-bluetooth.html

Turns out pumps at gas stations are controlled via Bluetooth, and that the connections are insecure. No details in the article, but it seems that it’s easy to take control of the pump and have it dispense gas without requiring payment.

It’s a complicated crime to monetize, though. You need to sell access to the gas pump to others.

EDITED TO ADD (10/13): Reader Jeff Hall says that story is not accurate, and that the gas pumps do not have a Bluetooth connection.

NSA AI Security Center

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2023/10/nsa-ai-security-center.html

The NSA is starting a new artificial intelligence security center:

The AI security center’s establishment follows an NSA study that identified securing AI models from theft and sabotage as a major national security challenge, especially as generative AI technologies emerge with immense transformative potential for both good and evil.

Nakasone said it would become “NSA’s focal point for leveraging foreign intelligence insights, contributing to the development of best practices guidelines, principles, evaluation, methodology and risk frameworks” for both AI security and the goal of promoting the secure development and adoption of AI within “our national security systems and our defense industrial base.”

He said it would work closely with U.S. industry, national labs, academia and the Department of Defense as well as international partners.

Build and deploy to Amazon EKS with Amazon CodeCatalyst

Post Syndicated from Vineeth Nair original https://aws.amazon.com/blogs/devops/build-and-deploy-to-amazon-eks-with-amazon-codecatalyst/

Amazon CodeCatalyst is an integrated service for software development teams adopting continuous integration and deployment (CI/CD) practices into their software development process. CodeCatalyst puts all of the tools that development teams need in one place, allowing for a unified experience for collaborating on, building, and releasing software. You can also integrate AWS resources with your projects by connecting your AWS accounts to your CodeCatalyst space. By managing all of the stages and aspects of your application lifecycle in one tool, you can deliver software quickly and confidently.

Introduction

Containerization has revolutionized the way we develop, deploy, and scale applications. With the rise of managed container services like Amazon Elastic Kubernetes Service (EKS), developers can leverage the power of Kubernetes without worrying about the underlying infrastructure. In this post, we will focus on how DevOps teams can use CodeCatalyst to build and deploy applications to EKS clusters.

CodeCatalyst offers a collection of pre-built actions that encapsulate common container-related tasks such as building and pushing a container image to an ECR and deploying a Kubernetes manifest. In this walkthrough, we will leverage two actions that can greatly simplify the container build and deployment process. We start by building a simple container image with the ‘Push to Amazon ECR’ action from CodeCatalyst labs. This action simplifies the process of building, tagging and pushing an image to an Amazon Elastic Container Registry (ECR). We will also utilize the ‘Deploy to Kubernetes cluster’ action from AWS for pushing our Kubernetes manifests with our updated image.

Architecture diagram demonstrating how a developer uses Cloud9 and a repository to store code, then pushes the image to Amazon ECR and deploys it to Amazon EKS.
Figure 1: Architectural Diagram.

Prerequisites

To follow along with the post, you will need the following items:

Walkthrough

In this walkthrough, we will build a simple Nginx based application and push this to an ECR, we will then build and deploy this image to an EKS cluster. The emphasis of this post, will be on how to translate a fairly common pattern with microservices applications to a CodeCatalyst workflow. At the end of the post, our workflow will look like so:

The image shows how codecatalyst worflow configured. 1st stage is pusing the Image to Amazon ECR followed by Deploy to Amazon EKS
Figure 2: CodeCatalyst workflow.

Create the base workflow

To begin, we will create our workflow, in the CodeCatalyst project, Select CI/CD → Workflows → Create workflow:

Image shows how to create workflow which has Source repository and Branch to be selected from drop down option
Figure 3: Create workflow.

Leave the defaults for the Source Repository and Branch, select Create. We will have an empty workflow:

Image shows how an emapy workflow looks like. It doesn't have any sptes configured which need to be added in this file based on our requirement.
Figure 4: Empty workflow.

We can edit the workflow from within the CodeCatalyst console, or use a Dev Environment. We will create an initial commit of this workflow file, ignore any validation errors at this stage:

Image shows how to create a Dev environment. User need to provide a workflow name, commit message along with Repository name and Branch name.
Figure 5: Creating Dev environment.

Connect to CodeCatalyst Dev Environment

For this post, we will use an AWS Cloud9 Dev Environment. Our first step is to connect to the Dev environment. Select Code → Dev Environments. If you do not already a Dev Instance, you can create an instance by selecting Create Dev Environment.

Image shows how a Dev environment looks like. The environment we created in the previous step, will be listed here.
Figure 6: My Dev environment.

Create a CodeCatalyst secret

Prior to adding the code, we will add a CodeCatalyst secret that will be consumed by our workflow. Using CodeCatalyst secrets ensures that we do not store sensitive data in plaintext in our workflow file. To create the secrets in the CodeCatalyst console, browse to CICD -> Secrets. Select Create Secret with the following details:

The image shows how we can create a secret. We need to pass Name and Value along with option desctption to create a secret.
Figure 7: Adding secrets.

Name: eks_cluster_name

Value: <Your EKS Cluster name>

Connect to the CodeCatalyst Dev Environment

We already have a Dev Environment so we will select Resume Instance. A new browser tab opens for the IDE and will be available in less than a minute. Once the IDE is ready, we can go ahead and start creating the Dockerfile and Kubernetes manifest that make up our application

mkdir WebApp
cat <<EOF > WebApp/Dockerfile
FROM nginx
RUN apt-get update && apt-get install -y curl
EOF

The previous command block creates our Dockerfile, which we will build in our CodeCatalyst workflow from an Nginx base image and installs cURL. Next, we will add our Kubernetes manifest file to create a Kubernetes deployment and service for our application:

Create a directory called Manifests and a file inside the directory called demo-app.yaml. Update the file with code for deployment and Kubernetes Service.

The image shows the code structure of demo-app.yaml file
Figure 8: demo-app.yaml file.

The previous code block shows the Kubernetes manifest file for our deployment, along with a Kubernetes service. We modify the image value to include the URI for our ECR as this value is unique. Once we have created our Dockerfile and Kubernetes manifest, pull the latest changes to our repository, including our workflow file that we just created. In our environment, our repository is called eks-demo-app:

cd eks-demo-app && git pull

We can now edit this file in our IDE. In our example our workflow is Workflow_df84 , we will locate Workflow_df84.yaml in the .codecatalyst\workflows directory in our repository. From here we can double click on the file to launch in the IDE for editing:

Image shows empty workflow yaml file.
Figure 9: workflow file in yaml format.

Add the build steps to workflow

We can assign our workflow a name and configure the action for our build phase. The code outlined in the following diagram is our CodeCatalyst workflow definition

The image shows updated workflow file which has triggers and actions filled.
Figure 10: Workflow updated with build phase.

Kustomize starts from here

The image shows Kustomize steps added in the workflow file.
Figure 11: Workflow updated with Kustomize.

Deployment starts from here

The image shows deployment stage added in the workflow file.
Figure 12: Workflow updated with Deployment phase.

The workflow will now contain two CodeCatalyst actions – PushtoAmazonECR which builds and pushes our container image to the ECR. We have also added a dependent stage DeploytoKubernetesCluster which deploys our Kubernetes manifest.

To save our changes we select File -> Save, we can then commit these to our git repository by typing the following at the terminal:

git add . && git commit -m ‘adding workflow’ && git push

The previous command will commit and push our changes the CodeCatalyst source repository, as we have a branch trigger for main defined, this will trigger a run of the workflow. We can monitor the status of the workflow in the CodeCatalyst console by selecting CICD -> Workflows. Locate your workflow and click on Runs to view the status.

We will now have all two stages available, as depicted at the beginning of this walkthrough. We will now have a container image in our ECR along with the newly built image deployed to our EKS cluster.

Cleaning up

If you have been following along with this workflow, you should delete the resources that you have deployed to avoid further changes. First, delete the Amazon ECR repository and Amazon EKS cluster (along with associated IAM roles) using the AWS console. Second, delete the CodeCatalyst project by navigating to project settings and choosing to Delete Project.

Conclusion

In this post, we explained how teams can easily get started building, scanning, and deploying a microservice application to an EKS cluster using CodeCatalyst. We outlined the stages in our workflow that enabled us to achieve the end-to-end build and release cycle. We also demonstrated how to enhance the developer experience of integrating CodeCatalyst with our Cloud9 Dev Environment.

Call to Action

Learn more about CodeCatalyst here.

Vineeth Nair

Vineeth Nair is a DevOps Architect at Amazon Web Services (AWS), Professional Services. He collaborates closely with AWS customers to support and accelerate their journeys to the cloud and within the cloud ecosystem by building performant, resilient, scalable, secure and cost efficient solutions.

Richard Merritt

Richard Merritt is a DevOps Consultant at Amazon Web Services (AWS), Professional Services. He works with AWS customers to accelerate their journeys to the cloud by providing scalable, secure and robust DevOps solutions.

Friday Squid Blogging: Protecting Cephalopods in Medical Research

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2023/09/friday-squid-blogging-protecting-cephalopods-in-medical-research.html

From Nature:

Cephalopods such as octopuses and squid could soon receive the same legal protection as mice and monkeys do when they are used in research. On 7 September, the US National Institutes of Health (NIH) asked for feedback on proposed guidelines that, for the first time in the United States, would require research projects involving cephalopods to be approved by an ethics board before receiving federal funding.

As usual, you can also use this squid post to talk about the security stories in the news that I haven’t covered.

Read my blog posting guidelines here.

Защо в Община Родопи съветници и администрация не желаят оналайн излъчвания на сесиите?

Post Syndicated from VassilKendov original http://kendov.com/%D0%B7%D0%B0%D1%89%D0%BE-%D0%B2-%D0%BE%D0%B1%D1%89%D0%B8%D0%BD%D0%B0-%D1%80%D0%BE%D0%B4%D0%BE%D0%BF%D0%B8-%D1%81%D1%8A%D0%B2%D0%B5%D1%82%D0%BD%D0%B8%D1%86%D0%B8-%D0%B8-%D0%B0%D0%B4%D0%BC%D0%B8%D0%BD/

Когато се преместих от София в с. Бойково през 2020г. реших, че ще се опитам да направя нещо за тази община. Председател съм на Фондация Възраждане на българските села и сме дарили компютърни зали на десетки селца (включително и в Украйна), та се замислих дали пък не е редно да направя нещо и за общината, в която ще живея.

Подходих наивно и поисках среща с кмета. Той ни определи среща в с. Първенец, но… не дойде. Нямаше какво повече да очаквам от него и започнах да се свързвам директно с читалищата – Крумово, Ситово, Лилково, Бойково, Белащица, Златитрап, Младежкия клуб в Марково. Така се получи. Много добър прием навсякъде, а и много добри резултати. Бойково си направиха лятна занималя, в Крумово осигуриха безплатен достъп до Уча.се… В Марково обаче се получи най-добре. Може и да се сърдят останалите, но тамошната кметица г-жа Терзиева е най-дейна и най-сърцата. По-скоро  ако трябва да бъда честен, това което я отличава от останалите дейни кметове е визията. Визията за бъдещето, която за съжаление липсва на ниво Община. Дори в Младежкия клуб в Марково направихме среща с доста изявени визионери от IT сектора в София и си поговорихме за професиите на бъдещето. Това без подкрепата на г-жа Терзиева не можеше да се случи и това е самата истина! Ако съм на марковци ще си я пазя и къткам, защото надали ще намерят по-добър кмет.

Окрилен от добрите резултати, реших да се загледам в бюджета на Община Родопи. Все пак това ми е работата – финансист съм и си мисля, че разбирам от тия работи. Понеже бях гледал заседания на общинския съвет в Пловдив си помислих, че мога да гледам и тук заседанията онлайн. „Да ама не”, както казваше Петко Бочаров. Общината нямала технологичната възможност да излъчва онлайн.

Tук по-младите ще възкликнат – wtf? Всяко 10 годишно хлапе знае как да стриймва в Youtube или FB. Освен това всички училища от времето на Ковид-а ползваха платформи за онлайн уроци. Няма как Община Родопи да не може да излъчва онлайн. Най-малкото е безплатно. Иска се само желание.

Оказа се че желание няма. От първото заседние, на което присъствах през 2020 и обърнах внимание на този проблем, до ден днешен Община Родопи не направи такива излъчвания.
В интерес на истината г-н Цанков (председателя на общинския съвет и сегашен кандидат кмет) пое ангажимент да „провери как стоят нещата” (имам го на запис), но в крайна сметка едно голямо НИЩО!
За съжаление и кмета г-н Михайлов явно беше ОК и не повдигаше този въпрос. Не мога да допусна, че е от неразбиране, защото както казахме най-лесно е да питаш всяко 10 годишно хлапе как се прави и то ще ти покаже.

Аз нямам друго разумно обяснение защо толкова се опъват на това онлайн излъчване на заседанията освен страха от това, заседанията да станат общодостъпни до всички граждани. Много би било лесно сега в предизборно време да извадиш запис и да се види кой какви ги е творил, но тази възможност е отнета от общинарите.

Понеже любимата „дъвка” в момента е – „дайте предложение за решение на „даден” проблем”, искам да ви кажа, че предложения е имало, но никой не ги е чул и видял. И това е релания проблем. За да се промени нещо, първо трябва да има прозрачност и информираност на населението. Всичко останало е следствие.

Та това е моето предложение за начало на промяна в Община Родопи – да се излъчват заседанията на общинския съвет онлайн. Безплатно е и се прави за 1 ден. Само желание да имат.

За въпроси от избиратели

[contact-form-7]

The post Защо в Община Родопи съветници и администрация не желаят оналайн излъчвания на сесиите? appeared first on Kendov.com.

Critical Vulnerability in libwebp Library

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2023/09/critical-vulnerability-in-libwebp-library.html

Both Apple and Google have recently reported critical vulnerabilities in their systems—iOS and Chrome, respectively—that are ultimately the result of the same vulnerability in the libwebp library:

On Thursday, researchers from security firm Rezillion published evidence that they said made it “highly likely” both indeed stemmed from the same bug, specifically in libwebp, the code library that apps, operating systems, and other code libraries incorporate to process WebP images.

Rather than Apple, Google, and Citizen Lab coordinating and accurately reporting the common origin of the vulnerability, they chose to use a separate CVE designation, the researchers said. The researchers concluded that “millions of different applications” would remain vulnerable until they, too, incorporated the libwebp fix. That, in turn, they said, was preventing automated systems that developers use to track known vulnerabilities in their offerings from detecting a critical vulnerability that’s under active exploitation.

Signal Will Leave the UK Rather Than Add a Backdoor

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2023/09/signal-will-leave-the-uk-rather-than-add-a-backdoor.html

Totally expected, but still good to hear:

Onstage at TechCrunch Disrupt 2023, Meredith Whittaker, the president of the Signal Foundation, which maintains the nonprofit Signal messaging app, reaffirmed that Signal would leave the U.K. if the country’s recently passed Online Safety Bill forced Signal to build “backdoors” into its end-to-end encryption.

“We would leave the U.K. or any jurisdiction if it came down to the choice between backdooring our encryption and betraying the people who count on us for privacy, or leaving,” Whittaker said. “And that’s never not true.”

Manage your workloads better using Amazon Redshift Workload Management

Post Syndicated from Rohit Vashishtha original https://aws.amazon.com/blogs/big-data/manage-your-workloads-better-using-amazon-redshift-workload-management/

With Amazon Redshift, you can run a complex mix of workloads on your data warehouse, such as frequent data loads running alongside business-critical dashboard queries and complex transformation jobs. We also see more and more data science and machine learning (ML) workloads. Each workload type has different resource needs and different service-level agreements (SLAs).

Amazon Redshift workload management (WLM) helps you maximize query throughput and get consistent performance for the most demanding analytics workloads by optimally using the resources of your existing data warehouse.

In Amazon Redshift, you implement WLM to define the number of query queues that are available and how queries are routed to those queues for processing. WLM queues are configured based on Redshift user groups, user roles, or query groups. When users belonging to a user group or role run queries in the database, their queries are routed to a queue as depicted in the following flowchart.

Role-based access control (RBAC) is a new enhancement that helps you simplify the management of security privileges in Amazon Redshift. You can use RBAC to control end-user access to data at a broad or granular level based on their job role. We have introduced support for Redshift roles in WLM queues, you will now find User roles along with User groups and Query groups as query routing mechanism.

This post provides examples of analytics workloads for an enterprise, and shares common challenges and ways to mitigate those challenges using WLM. We guide you through common WLM patterns and how they can be associated with your data warehouse configurations. We also show how to assign user roles to WLM queues and how to use WLM query insights to optimize configuration.

Use case overview

ExampleCorp is an enterprise using Amazon Redshift to modernize its data platform and analytics. They have variety of workloads with users from various departments and personas. The service-level performance requirements vary by the nature of the workload and user personas accessing the datasets. ExampleCorp would like to manage resources and priorities on Amazon Redshift using WLM queues. For this multitenant architecture by department, ExampleCorp can achieve read/write isolation using the Amazon Redshift data sharing feature and meet its unpredictable compute scaling requirements using concurrency scaling.

The following figure illustrates the user personas and access in ExampleCorp.

ExampleCorp has multiple Redshift clusters. For this post, we focus on the following:

  • Enterprise data warehouse (EDW) platform – This has all write workloads, along with some of the applications running reads via the Redshift Data API. The enterprise standardized data from the EDW cluster is accessed by multiple consumer clusters using the Redshift data sharing feature to run downstream reports, dashboards, and other analytics workloads.
  • Marketing data mart – This has predictable extract, transform, and load (ETL) and business intelligence (BI) workloads at specific times of day. The cluster admin understands the exact resource requirements by workload type.
  • Auditor data mart – This is only used for a few hours a day to run scheduled reports.

ExampleCorp would like to better manage their workloads using WLM.

Solution overview

As we discussed in the previous section, ExampleCorp has multiple Redshift data warehouses: one enterprise data warehouse and two downstream Redshift data warehouses. Each data warehouse has different workloads, SLAs, and concurrency requirements.

A database administrator (DBA) will implement appropriate WLM strategies on each Redshift data warehouse based on their use case. For this post, we use the following examples:

  • The enterprise data warehouse demonstrates Auto WLM with query priorities
  • The marketing data mart cluster demonstrates manual WLM
  • The auditors team uses their data mart infrequently for sporadic workloads; they use Amazon Redshift Serverless, which doesn’t require workload management

The following diagram illustrates the solution architecture.

Prerequisites

Before beginning this solution, you need the following:

  • An AWS account
  • Administrative access to Amazon Redshift

Let’s start by understanding some foundational concepts before solving the problem statement for ExampleCorp. First, how to choose between auto vs. manual WLM.

Auto vs. manual WLM

Amazon Redshift WLM enables you to flexibly manage priorities within workloads to meet your SLAs. Amazon Redshift supports Auto WLM or manual WLM for your provisioned Redshift data warehouse. The following diagram illustrates queues for each option.

Auto WLM determines the amount of resources that queries need and adjusts the concurrency based on the workload. When queries requiring large amounts of resources are in the system (for example, hash joins between large tables), the concurrency is lower. For additional information, refer to Implementing automatic WLM. You should use Auto WLM when your workload is highly unpredictable.

With manual WLM, you manage query concurrency and memory allocation, as opposed to auto WLM, where it’s managed by Amazon Redshift automatically. You configure separate WLM queues for different workloads like ETL, BI, and ad hoc and customize resource allocation. For additional information, refer to Tutorial: Configuring manual workload management (WLM) queues.

Use manual when When your workload pattern is predictable or if you need to throttle certain types of queries depending on the time of day, such as throttle down ingestion during business hours. If you need to guarantee multiple workloads are able to run at the same time, you can define slots for each workload.

Now that you have chosen automatic or manual WLM, let’s explore WLM parameters and properties.

Static vs. dynamic properties

The WLM configuration for a Redshift data warehouse is set using a parameter group under the database configuration properties.

The parameter group WLM settings are either dynamic or static. You can apply dynamic properties to the database without a cluster reboot, but static properties require a cluster reboot for changes to take effect. The following table summarizes the static vs. dynamic requirements for different WLM properties.

WLM Property Automatic WLM Manual WLM
Query groups Dynamic Static
Query group wildcard Dynamic Static
User groups Dynamic Static
User group wildcard Dynamic Static
User roles Dynamic Static
User role wildcard Dynamic Static
Concurrency on main Not applicable Dynamic
Concurrency Scaling mode Dynamic Dynamic
Enable short query acceleration Not applicable Dynamic
Maximum runtime for short queries Dynamic Dynamic
Percent of memory to use Not applicable Dynamic
Timeout Not applicable Dynamic
Priority Dynamic Not applicable
Adding or removing queues Dynamic Static

Note the following:

  • The parameter group parameters and WLM switch from manual to auto or vice versa and are static properties, and therefore require a cluster reboot.
  • For the WLM properties Concurrency on main, Percentage of memory to use, and Timeout, which are dynamic for manual WLM, the change only applies to new queries submitted after the value has changed and not for currently running queries.
  • The query monitoring rules, which we discuss later in this post, are dynamic and don’t require a cluster reboot.

In the next section, we discuss the concept of service class, meaning which queue does the query get submitted to and why.

Service class

Whether you use Auto or manual WLM, the user queries submitted go to the intended WLM queue via one of the following mechanisms:

  • User_Groups – The WLM queue directly maps to Redshift groups that would appear in the pg_group table.
  • Query_Groups – Queue assignment is based on the query_group label. For example, a dashboard submitted from the same reporting user can have separate priorities by designation or department.
  • User_Roles (latest addition) – The queue is assigned based on the Redshift roles.

WLM queues from a metadata perspective are defined as service class configuration. The following table lists common service class identifiers for your reference.

ID Service class
1–4 Reserved for system use.
5 Used by the superuser queue.
6–13 Used by manual WLM queues that are defined in the WLM configuration.
14 Used by short query acceleration.
15 Reserved for maintenance activities run by Amazon Redshift.
100–107 Used by automatic WLM queue when auto_wlm is true.

The WLM queues you define based on user_groups, query_groups, or user_roles fall in service class ID 6–13 for manual WLM and service class id 100–107 for automatic WLM.

Using Query_group, you can force a query to go to service class 5 and run in the superuser queue (provided you are an authorized superuser) as shown in the following code:

set query_group to 'superuser';
analyze table_xyz;
vacuum full table_xyz;
reset query_group;

For more details on how to assign a query to a particular service class, refer to Assigning queries to queues.

The short query acceleration (SQA) queue (service class 14) prioritizes short-running queries ahead of longer-running queries. If you enable SQA, you can reduce WLM queues that are dedicated to running short queries. In addition, long-running queries don’t need to contend with short queries for slots in a queue, so you can configure your WLM queues to use fewer query slots (a term used for available concurrency). Amazon Redshift uses an ML algorithm to analyze each eligible query and predict the query’s runtime. Auto WLM dynamically assigns a value for the SQA maximum runtime based on analysis of your cluster’s workload. Alternatively, you can specify a fixed value of 1–20 seconds when using manual WLM.

SQA is enabled by default in the default parameter group and for all new parameter groups. SQA can have a maximum concurrency of six queries.

Now that you understand how queries get submitted to a service class, it’s important to understand ways to avoid runaway queries and initiate an action for an unintended event.

Query monitoring rules

You can use Amazon Redshift query monitoring rules (QMRs) to set metrics-based performance boundaries for WLM queues and specify what action to take when a query goes beyond those boundaries.

The Redshift cluster automatically collects query monitoring metrics. You can query the system view SVL_QUERY_METRICS_SUMMARY as an aid to determine threshold values for defining the QMR. Then create the QMR based on following attributes:

  • Query runtime, in seconds
  • Query return row count
  • The CPU time for a SQL statement

For a complete list of QMRs, refer to WLM query monitoring rules.

Create sample parameter groups

For our ExampleCorp use case, we demonstrate automatic and manual WLM for a provisioned Redshift data warehouse and share a serverless perspective of WLM.

The following AWS CloudFormation template provides an automated way to create sample parameter groups that you can attach to your Redshift data warehouse for workload management.

Enterprise data warehouse Redshift cluster using automatic WLM

For the EDW cluster, we use Auto WLM. To configure the service class, we look at all three options: user_roles, user_groups, and query_groups.

Here’s a glimpse of how this can be set up in WLM queues and then used in your queries.

On the Amazon Redshift console, under Configurations in the navigation pane, choose Workload Management. You can create a new parameter group or modify an existing one created by you. Select the parameter group to edit its queues. There’s always a default queue (the last one in case of multiple queues defined), which is a catch-all for queries that don’t get routed to any specific queue.

User roles in WLM

With the introduction of user roles in WLM queues, now you can manage your workload by adding different roles to different queues. This can help you prioritize the queries based on the roles a user has. When a user runs a query, WLM will check if this user’s roles were added in any workload queues and assign the query to the first matching queue. To add roles into the WLM queue, you can go to the WLM page, create or modify an existing workload queue, add a user’s roles in the queue, and select Matching wildcards to add roles that get matched as wildcards.

For more information about how to convert from groups to roles, refer to Amazon Redshift Roles (RBAC), which walks you through a stored procedure to convert groups to roles.

In the following example, we have created the WLM queue EDW_Admins, which uses edw_admin_role created in Amazon Redshift to submit the workloads in this queue. The EDW_Admins queue is created with a high priority and automatic concurrency scaling mode.

User groups

Groups are collections of users who are all granted permissions associated with the group. You can use groups to simplify permission management by granting privileges just one time. If the members of a group get added or removed, you don’t need to manage them at a user level. For example, you can create different groups for sales, administration, and support and give the users in each group the appropriate access to the data they need for their work.

You can grant or revoke permissions at the user group level, and those changes will apply to all members of the group.

ETL, data analysts, or BI or decision support systems can use user groups to better manage and isolate their workloads. For our example, ETL WLM queue queries will be run with the user group etl. The data analyst group (BI) WLM queue queries will run using the bi user group.

Choose Add queue to add a new queue that you will use for user_groups, in this case ETL. If you would like these to be matched as wildcards (strings containing those keywords), select Matching wildcards. You can customize other options like query priority and concurrency scaling, explained earlier in this post. Choose Save to complete this queue setup.

In the following example, we have created two different WLM queues for ETL and BI. The ETL queue has a high priority and concurrency scaling mode is off, whereas the BI queue has a low priority and concurrency scaling mode is off.

Use the following code to create a group with multiple users:

-- Example of create group with multiple users
create group ETL with user etl_user1, etl_user2;
Create group BI with user bi_user1, bi_user2;

Query groups

Query_Groups are labels used for queries that are run within the same session. Think of these as tags that you may want to use to identify queries for a uniquely identifiable use case. In our example use case, the data analysts or BI or decision support systems can use query_groups to better manage and isolate their workloads. For our example, weekly business reports can run with the query_group label wbr. Queries from the marketing department can be run with a query_group of marketing.

The benefit of using query_groups is that you can use it to constrain results from the STL_QUERY and STV_INFLIGHT tables and the SVL_QLOG view. You can apply a separate label to every query that you run to uniquely identify queries without having to look up their IDs.

Choose Add queue to add a new queue that you will use for query_groups, in this case wbr or weekly_business_report. If you would like these to be matched as wildcards (strings containing those keywords), select Matching wildcards. You can customize other options like query priority and concurrency scaling options as explained earlier in this post. Choose Save to save this queue setup.

Now let’s see how you can force a query to use the query_groups queue just created.

You can assign a query to a queue at runtime by assigning your query to the appropriate query group. Use the SET command to begin a query group:

SET query_group TO wbr;
-- or
SET query_group TO weekly_business_report;

Queries following the SET command would go to the WLM queue Query_Group_WBR until you either reset the query group or end your current login session. For information about setting and resetting server configuration parameter, see SET and RESET, respectively.

The query group labels that you specify must be included in the current WLM configuration; otherwise, the SET query_group command has no effect on query queues.

For more query_groups examples, refer to WLM queue assignment rules.

Marketing Redshift cluster using manual WLM

Expanding on the marketing Redshift cluster use case of ExampleCorp, this cluster serves two types of workloads:

  • Running ETL for a period of 2 hours between 7:00 AM to 9:00 AM
  • Running BI reports and dashboards for the remaining time during the day

When you have such a clarity in the workloads, and your scope of usage is customizable by design, you may want to consider using manual WLM, where you can control the memory and concurrency resource allocation. Auto WLM will still be applicable, but manual WLM can also be a choice.

Let’s set up manual WLM in this case, with two WLM queues: ETL and BI.

To best utilize the resources, we use an AWS Command Line Interface (AWS CLI) command at the start of our ETL, which will make our WLM queues ETL-friendly, providing higher concurrency to the ETL queue. At the end of our ETL, we use an AWS CLI command to change the WLM queue to have BI-friendly resource settings. Modifying the WLM queues doesn’t require a reboot of your cluster; however, modifying the parameters or parameter group does.

If you were to use Auto WLM, this could have been achieved by dynamically changing the query priority of the ETL and BI queues.

By default, when you choose Create, the WLM created will be Auto WLM. You can switch to manual WLM by choosing Switch WLM mode. After switching WLM mode, choose Edit workload queues.

This will open the Modify workload queues page, where you can create your ETL and BI WLM queues.

After you add your ETL and BI queues, choose Save. You should have configured the following:

  • An ETL queue with 60% memory allocation and query concurrency of 9
  • A BI queue with 30% memory allocation and query concurrency of 4
  • A default queue with 10% memory allocation and query concurrency of 2

Your WLM queues should appear with settings as shown in the following screenshot.

Enterprises may prefer to complete these steps in an automated way. For the marketing data mart use case, the ETL starts at 7:00 AM. An ideal start to the ETL flow would be to have a job that makes your WLM settings ETL queue friendly. Here’s how you would modify concurrency and memory (both dynamic properties in manual WLM queues) to an ETL-friendly configuration:

aws redshift --region 'us-east-1' modify-cluster-parameter-group --parameter-group-name manual-wlm-demo --parameters '{"ParameterName": "wlm_json_configuration","ParameterValue": "[{\"query_group\": [], \"user_group\": [\"etl\"],\"query_group_wild_card\": 0,\"user_group_wild_card\": 0, \"query_concurrency\": 9, \"max_execution_time\": 0, \"memory_percent_to_use\": 60, \"name\": \"ETL\" }, {\"query_group\": [], \"user_group\": [\"bi\"],\"query_group_wild_card\": 0,\"user_group_wild_card\": 0, \"query_concurrency\": 3, \"max_execution_time\": 0, \"memory_percent_to_use\": 20, \"name\": \"BI\" }, { \"query_group\": [], \"user_group\": [], \"query_group_wild_card\": 0, \"user_group_wild_card\": 0, \"query_concurrency\": 3, \"max_execution_time\": 5400000, \"memory_percent_to_use\": 20, \"name\": \"Default queue\", \"rules\": [ { \"rule_name\": \"user_query_duration_threshold\", \"predicate\": [ { \"metric_name\": \"query_execution_time\", \"operator\": \">\", \"value\": 10800 } ], \"action\": \"abort\" } ] }, { \"short_query_queue\": \"true\" } ]","Description": "ETL Start, ETL Friendly"}';

The preceding AWS CLI command programmatically sets the configuration of your WLM queues without requiring a reboot of the cluster because the queue settings changed were all dynamic settings.

For the marketing data mart use case, at 9:00 AM or when the ETL is finished, you can have a job run an AWS CLI command to modify the WLM queue resource settings to a BI-friendly configuration as shown in the following code:

aws redshift --region 'us-east-1' modify-cluster-parameter-group --parameter-group-name manual-wlm-demo --parameters '{"ParameterName": "wlm_json_configuration","ParameterValue": "[{\"query_group\": [], \"user_group\": [\"etl\"],\"query_group_wild_card\": 0,\"user_group_wild_card\": 0, \"query_concurrency\": 1, \"max_execution_time\": 0, \"memory_percent_to_use\": 5, \"name\": \"ETL\" }, {\"query_group\": [], \"user_group\": [\"bi\"],\"query_group_wild_card\": 0,\"user_group_wild_card\": 0, \"query_concurrency\": 12, \"max_execution_time\": 0, \"memory_percent_to_use\": 80, \"name\": \"BI\" }, { \"query_group\": [], \"user_group\": [], \"query_group_wild_card\": 0, \"user_group_wild_card\": 0, \"query_concurrency\": 2, \"max_execution_time\": 5400000, \"memory_percent_to_use\": 15, \"name\": \"Default queue\", \"rules\": [ { \"rule_name\": \"user_query_duration_threshold\", \"predicate\": [ { \"metric_name\": \"query_execution_time\", \"operator\": \">\", \"value\": 10800 } ], \"action\": \"abort\" } ] }, { \"short_query_queue\": \"true\" } ]","Description": "ETL End, BI Friendly"}';

Note that in regards to a manual WLM configuration, the maximum slots you can allocate to a queue is 50. However, this doesn’t mean that in an automatic WLM configuration, a Redshift cluster always runs 50 queries concurrently. This can change based on the memory needs or other types of resource allocation on the cluster. We recommend configuring your manual WLM query queues with a total of 15 or fewer query slots. For more information, see Concurrency level.

In case of WLM timeout or a QMR hop action within manual WLM, a query can attempt to hop to the next matching queue based on WLM queue assignment rules. This action in manual WLM is called query queue hopping.

Auditor Redshift data warehouse using WLM in Redshift Serverless

The auditor data warehouse workload runs on the month, and quarter end. For this periodic workload, Redshift Serverless is well suited, both from a cost and ease of administration perspective. Redshift Serverless uses ML to learn from your workload to automatically manage workload and auto scaling of compute needed for your workload.

In Redshift Serverless, you can set up usage and query limits. The query limits let you set up the QMR. You can choose Manage query limits to automatically trigger the default abort action when queries go beyond performance boundaries. For more information, refer to Query monitoring metrics for Amazon Redshift Serverless.

For other detailed limits in Redshift Serverless, refer to Configure monitoring, limits, and alarms in Amazon Redshift Serverless to keep costs predictable.

Monitor using system views for operational metrics

The system views in Amazon Redshift are used to monitor the workload performance. You can view the status of queries, queues, and service classes by using WLM-specific system tables. You can query system tables to explore the following details:

  • View which queries are being tracked and what resources are allocated by the workload manager
  • See which queue a query has been assigned to
  • View the status of a query that is currently being tracked by the workload manager

You can download the sample SQL notebook system queries. You can import this in Query Editor V2.0. The queries in the sample notebook can help you explore your workloads being managed by WLM queues.

Conclusion

In this post, we covered real-world examples for Auto WLM and manual WLM patterns. We introduced user roles assignment to WLM queues, and shared queries on system views and tables to gather operational insights on your WLM configuration. We encourage you to explore using Redshift user roles with workload management. Use the script provided on AWS re:Post to convert groups to roles, and start using user roles for your WLM queues.


About the Authors

Rohit Vashishtha is a Senior Analytics Specialist Solutions Architect at AWS based in Dallas, Texas. He has over 17 years of experience architecting, building, leading, and maintaining big data platforms. Rohit helps customers modernize their analytic workloads using the breadth of AWS services and ensures that customers get the best price/performance with utmost security and data governance.

Harshida Patel is a Principal specialist SA with AWS.

Nita Shah is an Analytics Specialist Solutions Architect at AWS based out of New York. She has been building data warehouse solutions for over 20 years and specializes in Amazon Redshift. She is focused on helping customers design and build enterprise-scale well-architected analytics and decision support platforms.

Yanzhu Ji is a Product Manager in the Amazon Redshift team. She has experience in product vision and strategy in industry-leading data products and platforms. She has outstanding skill in building substantial software products using web development, system design, database, and distributed programming techniques. In her personal life, Yanzhu likes painting, photography, and playing tennis.

Deploying container application using Amazon CodeCatalyst and AWS App Runner

Post Syndicated from Jacky Wong original https://aws.amazon.com/blogs/devops/deploying-container-application-using-amazon-codecatalyst-and-aws-app-runner/

Introduction

Amazon CodeCatalyst provides a collaboration space where developers easily build applications and streamline the deployment process. It integrates with container application services such as AWS App Runner, making containerized application deployments efficient and straightforward. Using CodeCatalyst’s native CI/CD workflow actions, development teams swiftly roll out applications and mitigate operational challenges. The workflow’s fluidity from coding to deploying ensures that developers move between stages effortlessly, promoting an agile development experience.

In this post, we’ll guide you through deploying a containerized application to App Runner using CodeCatalyst’s native CI/CD workflow actions for a seamless deployment experience.

Solution Overview

In this walkthrough, you will set up a basic “Hello World” static webpage using the Apache HTTP Server. The primary focus of this blog is to navigate the essential features of the CodeCatalyst workflow and its specific actions for App Runner service. Within this workflow, the “Push to Amazon ECR” action builds the container image and then pushes the image to Amazon Elastic Container Registry (ECR) . Subsequently, the “Deploy to AWS App Runner” action initiates an AWS CloudFormation template that launches the App Runner service.

The “Push to Amazon ECR” action build and push the container image to Amazon ECR, and the “Deploy to AWS AppRunner” action pull container image from ECR and deploy the application to AWS App Runner.

Figure 1 – Solution Architecture

The solution described in the post covers the following steps:

  1. Create a sample application
  2. Set up CodeCatalyst environment
  3. Create a CodeCatalyst CI/CD workflow

Create a sample application

For this demonstration, GitHub serves as the source repository. Alternatively, you can create a source repository directly within CodeCatalyst or you can choose to link an existing source repository hosted by another service if that service is supported by an installed extension on CodeCatalyst.

The first step is to create a simple index.html file with the following content:

<html>
  <head>
     <title>Hello, World!</title>
  </head>
  <body>
     <h1>Hello, World!</h1>
     <p>Welcome to my demo of CodeCatalyst CI/CD Workflow. Watch how changes here magically appear live through the power of automation!</p>
  </body>
</html>

Next, create a Dockerfile in addition to the HTML file. This file contains two commands: the first instructs Docker to build an image from the httpd image, while the second places the index.html file into this image. Here’s a snippet of the Dockerfile:

FROM httpd
COPY ./index.html /usr/local/apache2/htdocs/

Once finalized, you can push these files to the GitHub repository. The following image displays the GitHub repository layout.

The GitHub repository consists of a Dockerfile, a README.md and an index.html file.

Figure 2 – GitHub repository layout

Set up CodeCatalyst environment

Follow these steps to set up the CodeCatalyst environment:

Create CodeCatalyst CI/CD workflow

To create a new workflow, select CI/CD from navigation on the left and then select Workflows. Then, select Create workflow, leave the default options, and select Create.

To create a new workflow, select CI/CD from the left panel and then select Workflows. Then, select Create workflow, leave the default options, and select Create.

Figure 3 – Create CodeCatalyst workflow

If the workflow editor opens in YAML mode, select Visual to open the visual designer. Now, you can start adding actions to the workflow.

Step1: Add “Push to Amazon ECR” action

Add the “Push to Amazon ECR” action to the workflow as shown in Figure 4. Refer to adding an action to learn how to use the CodeCatalyst action.

Add the “Push to Amazon ECR” action to the workflow, select Visual to open the visual designer.

Figure 4 – Add “Push to Amazon ECR” Action

Choose the Configuration tab and specify the configurations as shown in Figure 5.

The Configuration section of the “Push to Amazon ECR” action displays the values used in configuring the action.

Figure 5 – Configure “Push to Amazon ECR” action

Step 2: Add “Deploy to AWS App Runner” action

Add the “Deploy to AWS App Runner” action to the workflow for deploying the latest image in ECR to App Runner as shown in Figure 6. In the Inputs tab, under Depends on, choose the “PushToAmazonECR_xx” action.

Add the “Deploy to AWS App Runner” action to the workflow. In the Inputs tab, under Depends on, choose the “PushToAmazonECR_xx” action.

Figure 6 – Add “Deploy to AWS App Runner” action

Choose the Configuration tab and specify the configurations of the App Runner action, as shown in Figure 7.

For Image Identifier, use the format “<account number>.dkr.ecr.<region>.amazonaws.com/<repository name>:<image tag>”, this will align with the ARN of the ECR image once the workflow generates it. For this guide, you are going to create a new App Runner service. When providing a Stack Name, the action determines whether the App Runner service is already exists. If not, it creates a new App Runner service using default configurations.

The Configuration section of the “Push to Amazon ECR” action displays the values used in configuring the action.

Figure 7 – Configure “Deploy to AWS App Runner” Action

You may notice that action does not support configuring all of the option available in App Runner. For those who find the options limiting, you can deploy your App Runner service manually through the AWS console or using Infrastructure-as-Code (IaC) methods. Once set up, you can simply use the ARN of your existing App Runner service in the action.

If you switch back to the YAML view, you can see that the following actions are added to the workflow definition.

Name: Workflow_371a
SchemaVersion: "1.0"

Triggers:
- Type: Push
  Branches:
   - main

Actions:
 PushToAmazonECR_fa:
  Identifier: codecatalyst-labs/[email protected]
  Environment:
   Connections:
    - Role: CodeCatalystWorkflowDevelopmentRole
    Name: "0123456789"
   Name: Staging
  Inputs:
   Sources:
   - WorkflowSource
   Type: EC2
  Configuration:
   RepositoryName: hello-world
   AWSRegion: us-west-2
   ImageTag: latest
   PublicRepository: false
   ImageScanningConfiguration: false
  DeployToAWSAppRunner_72:
   Identifier: codecatalyst-labs/[email protected]
   Environment:
    Connections:
     - Role: CodeCatalystWorkflowDevelopmentRole-mymymy
       Name: "0123456789"
    Name: Staging
   Compute:
    Type: EC2
   Configuration:
    Tags: "[]"
    StackName: hello-world
    Wait: true
    AWSRegion: us-west-2
    Port: 80
    ImageIdentifier: 0123456789.dkr.ecr.us-west-2.amazonaws.com/hello-world:latest
   DependsOn:
    - PushToAmazonECR_fa

The workflow is now ready and you can validate it by choosing Validate. After that, save the changes by choosing Commit (Figure 8).

Validate the workflow it by choosing “Validate”. Next, save the changes by choosing “Commit”.

Figure 8 – Validate and commit workflow

When saving the workflow with commit, the workflow runs automatically. CodeCatalyst builds the container image of the application, pushes the image to ECR, and deploys the application to App Runner (Figure 9).

The workflow runs automatically on commit, CodeCatalyst builds the container image of the application, pushes the image to ECR, and deploys the application to AWS App Runner.

Figure 9 – Workflow run succeeded

Test the deployment

Once the “Deploy To AWS App Runner” action finishes, you can check the URL to the website of your service. To check the App Runner domain value, click on the Variables tab. Under Output variables, copy the value of the “AppUrl”.

To check the App Runner domain value, click on the “Variables” tab. Under “Output variables”, copy the value of the “AppUrl"

Figure 10 – App Runner domain value

Visit the URL to verify the App Runner service is running.

Visit the URL to verify the App Runner service is running, which shows the content of the index.html page.

Figure 11 – Verify the App Runner service is running

You can go ahead and test the CI/CD workflow by committing an update to theindex.htmlcontent in the source GitHub repo.

<html>
  <head>
    <title>Hello, Universe!</title>
  </head>
  <body>
    <h1>Hello, Universe!</h1>
    <p>Ta-da! Noticed that change? That's the magic of our CI/CD workflow in action with CodeCatalyst</p>
  </body>
</html>

Another workflow run starts automatically on a commit to source. When the workflow completes, refresh the browser tab to verify that the update is successful.

Verify the App Runner service is updated via browser, showing the updated index.html content.

Figure 12 – Verify the App Runner service is updated

Cleanup

To avoid incurring extra charges after following this post’s workflow, consider cleaning up the resources that you set up.

1. Delete the ECR repository via AWS Management Console.

2. Delete the App Runner service by deleting the CloudFormation stack with the name identical to the “StackName” specified for the App Runner action.

3. Delete the CodeCatalyst workflow.

4. Delete the CodeCatalyst project by navigating to Project settings and choosing Delete project.

Conclusion

In summary, this post shows how to configure CodeCatalyst workflow to build and deploy containerized applications using CodeCatalyst’s native CI/CD workflow actions. By leveraging these actions, developers can simplify the deployment process to App Runner. Learn more and embark on your Amazon CodeCatalyst journey.

Jacky Wong

Jacky Wong is a Technical Account Manager at AWS. His main focus includes collaborating with customers to provide technical guidance on AWS, understanding their technical requirements, and designing solutions to meet their specific needs. His areas of expertise cover automation, CI/CD pipelines, and DevOps. Outside of work, he finds enjoyment in playing badminton, table tennis, and cycling.

Bonnie Ng

Bonnie is a Cloud Support Engineer at AWS. She has gained knowledge in multiple services, being a subject matter expert in Amazon Elastic Container Service (ECS). Outside of work, she values time with family and friends, takes pleasure in crafting art, and loves listening to music.

Friday Squid Blogging: New Squid Species

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2023/09/friday-squid-blogging-new-squid-species-2.html

An ancient squid:

New research on fossils has revealed that a vampire-like ancient squid haunted Earth’s oceans 165 million years ago. The study, published in June edition of the journal Papers in Palaeontology, says the creature had a bullet-shaped body with luminous organs, eight arms and sucker attachments. The discovery was made by scientists in France, who used modern imaging technique to analyse the previously discovered fossils. The ancient squid has been named Vampyrofugiens atramentum, which stands for the “fleeing vampire”. The researchers said that these features have never been recorded before.

As usual, you can also use this squid post to talk about the security stories in the news that I haven’t covered.

Read my blog posting guidelines here.

Use the new SQL commands MERGE and QUALIFY to implement and validate change data capture in Amazon Redshift

Post Syndicated from Yanzhu Ji original https://aws.amazon.com/blogs/big-data/use-the-new-sql-commands-merge-and-qualify-to-implement-and-validate-change-data-capture-in-amazon-redshift/

Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud. Tens of thousands of customers use Amazon Redshift to process exabytes of data every day to power their analytics workloads.

Amazon Redshift has added many features to enhance analytical processing like ROLLUP, CUBE and GROUPING SETS, which were demonstrated in the post Simplify Online Analytical Processing (OLAP) queries in Amazon Redshift using new SQL constructs such as ROLLUP, CUBE, and GROUPING SETS. Amazon Redshift has recently added many SQL commands and expressions. In this post, we talk about two new SQL features, the MERGE command and QUALIFY clause, which simplify data ingestion and data filtering.

One familiar task in most downstream applications is change data capture (CDC) and applying it to its target tables. This task requires examining the source data to determine if it is an update or an insert to existing target data. Without the MERGE command, you needed to test the new dataset against the existing dataset using a business key. When that didn’t match, you inserted new rows in the existing dataset; otherwise, you updated existing dataset rows with new dataset values.

The MERGE command conditionally merges rows from a source table into a target table. Traditionally, this could only be achieved by using multiple insert, update, or delete statements separately. When using multiple statements to update or insert data, there is a risk of inconsistencies between the different operations. Merge operation reduces this risk by ensuring that all operations are performed together in a single transaction.

The QUALIFY clause filters the results of a previously computed window function according to user‑specified search conditions. You can use the clause to apply filtering conditions to the result of a window function without using a subquery. This is similar to the HAVING clause, which applies a condition to further filter rows from a WHERE clause. The difference between QUALIFY and HAVING is that filtered results from the QUALIFY clause could be based on the result of running window functions on the data. You can use both the QUALIFY and HAVING clauses in one query.

In this post, we demonstrate how to use the MERGE command to implement CDC and how to use QUALIFY to simplify validation of those changes.

Solution overview

In this use case, we have a data warehouse, in which we have a customer dimension table that needs to always get the latest data from the source system. This data must also reflect the initial creation time and last update time for auditing and tracking purposes.

A simple way to solve this is to override the customer dimension fully every day; however, that won’t achieve the update tracking, which is an audit mandate, and it might not be feasible to do for bigger tables.

You can load sample data from Amazon S3 by following the instruction here. Using the existing customer table under sample_data_dev.tpcds, we create a customer dimension table and a source table that will contain both updates for existing customers and inserts for new customers. We use the MERGE command to merge the source table data with the target table (customer dimension). We also show how to use the QUALIFY clause to simplify validating the changes in the target table.

To follow along with the steps in this post, we recommend downloading the accompanying notebook, which contains all the scripts to run for this post. To learn about authoring and running notebooks, refer to Authoring and running notebooks.

Prerequisites

You should have the following prerequisites:

Create and populate the dimension table

We use the existing customer table under sample_data_dev.tpcds to create a customer_dimension table. Complete the following steps:

  1. Create a table using a few selected fields, including the business key, and add a couple of maintenance fields for insert and update timestamps:
     -- create the customer dimension table DROP TABLE IF EXISTS customer_dim CASCADE;
    CREATE TABLE customer_dim ( 
    customer_dim_id     bigint GENERATED BY DEFAULT AS IDENTITY(1, 1), 
    c_customer_sk integer NOT NULL ENCODE az64 distkey,
    c_first_name character(20) ENCODE lzo,
    c_last_name character(30) ENCODE lzo,
    c_current_addr_sk integer ENCODE az64,
    c_birth_country character varying(20) ENCODE lzo,
    c_email_address character(50) ENCODE lzo,
    record_insert_ts    timestamp WITHOUT time ZONE DEFAULT current_timestamp ,
    record_upd_ts       timestamp WITHOUT time ZONE DEFAULT NULL
    )
    SORTKEY (c_customer_sk);

  2. Populate the dimension table:
    -- populate dimension 
    insert into customer_dim 
           (c_customer_sk, c_first_name,c_last_name, c_current_addr_sk, c_birth_country, c_email_address) 
    select  c_customer_sk, c_first_name,c_last_name, c_current_addr_sk, c_birth_country, c_email_address
    from “sample_data_dev”.”tpcds”.”customer”;

  3. Validate the row count and the contents of the table:
    -- check customers count and look at sample data
    select count(1) from customer_dim; 
    select * from customer_dim limit 10;

Simulate customer table changes

Use the following code to simulate changes made to the table:

-- create a source table with some updates and some inserts
-- Update- Email has changed for 100 customers 
drop table if exists src_customer;
create table src_customer distkey(c_customer_sk) as 
select c_customer_sk , c_first_name , c_last_name, c_current_addr_sk, c_birth_country, ‘x’+c_email_address as c_email_address, getdate() as effective_dt
from   customer_dim 
where  c_email_address is not null
limit 100;


-- also let’s add three completely new customers
insert into src_customer values 
(15000001, ‘Customer#15’,’000001’, 10001 ,’USA’    , ‘Customer#[email protected]’, getdate() ),
(15000002, ‘Customer#15’,’000002’, 10002 ,’MEXICO’ , ‘Customer#[email protected]’, getdate() ),
(15000003, ‘Customer#15’,’000003’, 10003 ,’CANADA’ , ‘Customer#[email protected]’, getdate() );

-- check source count
select count(1) from src_customer;

Merge the source table into the target table

Now you have a source table with some changes you need to merge with the customer dimension table.

Before the MERGE command, this type of task needed two separate UPDATE and INSERT commands to implement:

-- merge changes to dim customer
BEGIN TRANSACTION;
-- update current records
UPDATE customer_dim
SET    c_first_name      = src.c_first_name      ,
       c_last_name       = src.c_last_name       , 
       c_current_addr_sk = src.c_current_addr_sk , 
       c_birth_country   = src.c_birth_country   , 
       c_email_address   = src.c_email_address   ,
       record_upd_ts     = current_timestamp
from   src_customer AS src
where  customer_dim.c_customer_sk = src.c_customer_sk ;
-- Insert new records
INSERT INTO customer_dim (c_customer_sk, c_first_name,c_last_name, c_current_addr_sk, c_birth_country, c_email_address) 
select src.c_customer_sk, src.c_first_name,src.c_last_name, src.c_current_addr_sk, src.c_birth_country, src.c_email_address 
from   src_customer AS src
where  src.c_customer_sk NOT IN (select c_customer_sk from customer_dim);
-- end merge operation
COMMIT TRANSACTION;

The MERGE command uses a more straightforward syntax, in which we use the key comparison result to decide if we perform an update DML operation (when matched) or an insert DML operation (when not matched):

MERGE INTO customer_dim using src_customer AS src ON customer_dim.c_customer_sk = src.c_customer_sk
WHEN MATCHED THEN UPDATE 
SET c_first_name      = src.c_first_name      ,
    c_last_name       = src.c_last_name       , 
    c_current_addr_sk = src.c_current_addr_sk , 
    c_birth_country   = src.c_birth_country   , 
    c_email_address   = src.c_email_address   ,
    record_upd_ts     = current_timestamp
WHEN NOT MATCHED THEN INSERT (c_customer_sk, c_first_name,c_last_name, c_current_addr_sk, c_birth_country, c_email_address) 
                      VALUES (src.c_customer_sk, src.c_first_name,src.c_last_name, src.c_current_addr_sk, src.c_birth_country, src.c_email_address );

Validate the data changes in the target table

Now we need to validate the data has made it correctly to the target table. We can first check the updated data using the update timestamp. Because this was our first update, we can examine all rows where the update timestamp is not null:

-- Check the changes
-- to get updates
select * 
from customer_dim
where record_upd_ts is not null

Use QUALIFY to simplify validation of the data changes

We need to examine the data inserted in this table most recently. One way to do that is to rank the data by its insert timestamp and get those with the first rank. This requires using the window function rank() and also requires a subquery to get the results.

Before the availability of QUALIFY, we needed to build that using a subquery like the following:

select customer_dim_id,c_customer_sk ,c_first_name ,c_last_name ,c_current_addr_sk,c_birth_country ,c_email_address ,record_insert_ts ,record_upd_ts 
from 
( select rank() OVER (ORDER BY DATE_TRUNC(‘second’,record_insert_ts) desc) AS rnk, 
         customer_dim_id,c_customer_sk ,c_first_name ,c_last_name ,c_current_addr_sk,c_birth_country ,c_email_address ,record_insert_ts ,record_upd_ts 
  from customer_dim
  where record_upd_ts is null)
where rnk = 1;

The QUALIFY function eliminates the need for the subquery, as in the following code snippet:

-- to get the newly inserted rows we can make use of Qualify feature
select * 
from customer_dim
where record_upd_ts is null
qualify rank() OVER (ORDER BY DATE_TRUNC(‘second’,record_insert_ts) desc) = 1 

Validate all data changes

We can union the results of both queries to get all the inserts and update changes:

-- To get all changes
select *
from (
select 'Updates' as operations, cd.* 
from   customer_dim as cd
where  cd.record_upd_ts is not null
union 
select 'Inserts' as operations, cd.* 
from customer_dim cd
where cd.record_upd_ts is null
qualify rank() OVER (ORDER BY DATE_TRUNC('second',cd.record_insert_ts) desc) = 1 
) order by 1

Clean up

To clean up the resources used in the post, delete the Redshift provisioned cluster or Redshift Serverless workgroup and namespace you created for this post (this will also drop all the objects created).

If you used an existing Redshift provisioned cluster or Redshift Serverless workgroup and namespace, use the following code to drop these objects:

DROP TABLE IF EXISTS customer_dim CASCADE;
DROP TABLE IF EXISTS src_customer CASCADE;

Conclusion

When using multiple statements to update or insert data, there is a risk of inconsistencies between the different operations. The MERGE operation reduces this risk by ensuring that all operations are performed together in a single transaction. For Amazon Redshift customers who are migrating from other data warehouse systems or who regularly need to ingest fast-changing data into their Redshift warehouse, the MERGE command is a straightforward way to conditionally insert, update, and delete data from target tables based on existing and new source data.

In most analytic queries that use window functions, you may need to use those window functions in your WHERE clause as well. However, this is not permitted, and to do so, you have to build a subquery that contains the required window function and then use the results in the parent query in the WHERE clause. Using the QUALIFY clause eliminates the need for a subquery and therefore simplifies the SQL statement and makes it less difficult to write and read.

We encourage you to start using those new features and give us your feedback. For more details, refer to MERGE and QUALIFY clause.


About the authors

Yanzhu Ji is a Product Manager in the Amazon Redshift team. She has experience in product vision and strategy in industry-leading data products and platforms. She has outstanding skill in building substantial software products using web development, system design, database, and distributed programming techniques. In her personal life, Yanzhu likes painting, photography, and playing tennis.

Ahmed Shehata is a Senior Analytics Specialist Solutions Architect at AWS based on Toronto. He has more than two decades of experience helping customers modernize their data platforms. Ahmed is passionate about helping customers build efficient, performant, and scalable analytic solutions.

Ranjan Burman is an Analytics Specialist Solutions Architect at AWS. He specializes in Amazon Redshift and helps customers build scalable analytical solutions. He has more than 16 years of experience in different database and data warehousing technologies. He is passionate about automating and solving customer problems with cloud solutions.

New Revelations from the Snowden Documents

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2023/09/new-revelations-from-the-snowden-documents.html

Jake Appelbaum’s PhD thesis contains several new revelations from the classified NSA documents provided to journalists by Edward Snowden. Nothing major, but a few more tidbits.

Kind of amazing that that all happened ten years ago. At this point, those documents are more historical than anything else.

And it’s unclear who has those archives anymore. According to Appelbaum, The Intercept destroyed their copy.

I recently published an essay about my experiences ten years ago.

On the Cybersecurity Jobs Shortage

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2023/09/on-the-cybersecurity-jobs-shortage.html

In April, Cybersecurity Ventures reported on extreme cybersecurity job shortage:

Global cybersecurity job vacancies grew by 350 percent, from one million openings in 2013 to 3.5 million in 2021, according to Cybersecurity Ventures. The number of unfilled jobs leveled off in 2022, and remains at 3.5 million in 2023, with more than 750,000 of those positions in the U.S. Industry efforts to source new talent and tackle burnout continues, but we predict that the disparity between demand and supply will remain through at least 2025.

The numbers never made sense to me, and Ben Rothke has dug in and explained the reality:

…there is not a shortage of security generalists, middle managers, and people who claim to be competent CISOs. Nor is there a shortage of thought leaders, advisors, or self-proclaimed cyber subject matter experts. What there is a shortage of are computer scientists, developers, engineers, and information security professionals who can code, understand technical security architecture, product security and application security specialists, analysts with threat hunting and incident response skills. And this is nothing that can be fixed by a newbie taking a six-month information security boot camp.

[…]

Most entry-level roles tend to be quite specific, focused on one part of the profession, and are not generalist roles. For example, hiring managers will want a network security engineer with knowledge of networks or an identity management analyst with experience in identity systems. They are not looking for someone interested in security.

In fact, security roles are often not considered entry-level at all. Hiring managers assume you have some other background, usually technical before you are ready for an entry-level security job. Without those specific skills, it is difficult for a candidate to break into the profession. Job seekers learn that entry-level often means at least two to three years of work experience in a related field.

That makes a lot more sense, and matches what I experience.

Detecting AI-Generated Text

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2023/09/detecting-ai-generated-text.html

There are no reliable ways to distinguish text written by a human from text written by an large language model. OpenAI writes:

Do AI detectors work?

  • In short, no. While some (including OpenAI) have released tools that purport to detect AI-generated content, none of these have proven to reliably distinguish between AI-generated and human-generated content.
  • Additionally, ChatGPT has no “knowledge” of what content could be AI-generated. It will sometimes make up responses to questions like “did you write this [essay]?” or “could this have been written by AI?” These responses are random and have no basis in fact.
  • To elaborate on our research into the shortcomings of detectors, one of our key findings was that these tools sometimes suggest that human-written content was generated by AI.
    • When we at OpenAI tried to train an AI-generated content detector, we found that it labeled human-written text like Shakespeare and the Declaration of Independence as AI-generated.
    • There were also indications that it could disproportionately impact students who had learned or were learning English as a second language and students whose writing was particularly formulaic or concise.
  • Even if these tools could accurately identify AI-generated content (which they cannot yet), students can make small edits to evade detection.

There is some good research in watermarking LLM-generated text, but the watermarks are not generally robust.

I don’t think the detectors are going to win this arms race.

Using Hacked LastPass Keys to Steal Cryptocurrency

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2023/09/using-hacked-lastpass-keys-to-steal-cryptocurrency.html

Remember last November, when hackers broke into the network for LastPass—a password database—and stole password vaults with both encrypted and plaintext data for over 25 million users?

Well, they’re now using that data break into crypto wallets and drain them: $35 million and counting, all going into a single wallet.

That’s a really profitable hack. (It’s also bad opsec. The hackers need to move and launder all that money quickly.)

Look, I know that online password databases are more convenient. But they’re also risky. This is why my Password Safe is local only. (I know this sounds like a commercial, but Password Safe is not a commercial product.)

LLM Summary of My Book Beyond Fear

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2023/09/llm-summary-of-my-book-beyond-fear.html

Claude (Anthropic’s LLM) was given this prompt: Please summarize the themes and arguments of Bruce Schneier’s book Beyond Fear. I’m particularly interested in a taxonomy of his ethical arguments—please expand on that. Then lay out the most salient criticisms of the book.

Claude’s reply:

Here’s a brief summary of the key themes and arguments made in security expert Bruce Schneier’s book Beyond Fear: Thinking Sensibly About Security in an Uncertain World:

Central Themes:

  • Effective security requires realistic risk assessment, not fear-driven responses. We need to rationally determine actual threats versus perceived ones.
  • Security trade-offs need to be carefully balanced against costs to other values like privacy, freedom, and social/economic impacts.
  • Human psychology and biases lead to exaggerated fears that politicians/marketers can exploit, resulting in wasteful and theatrical “security theater.”
  • Security is a process that requires ongoing flexibility, not total prevention. We can’t anticipate every threat but can build resilience.

Key Arguments:

    Analyze risks empirically using evidence, statistics, and probability rather than intuition. Avoid underestimating everyday risks while overestimating novel or sensational ones.

  • Reject the temptation to implement oppressive security restrictions after tragic but unlikely events. Focus only on proportional responses.
  • Where possible, favor openness and transparency over aggressive data collection or restrictions which erode civil liberties.
  • Seek security policies and technologies that enhance society broadly rather than benefit special interests.

Schneier argues for rational, ethical approaches to security aimed at genuine protection while avoiding violations of human rights in the name of safety.

Here are some of the key ethical arguments Bruce Schneier makes in Beyond Fear regarding security policy and practices:

  • Utilitarian Ethics – Security measures should aim to provide the greatest societal benefit and well-being for the most people, not just special interests.
  • Individual Autonomy – Restrictions should be minimized to avoid excessive infringements on civil liberties and individual freedoms. Surveillance creep risks violating autonomy.
  • Privacy Rights – Pervasive monitoring and data collection erode privacy rights and dignity. Citizens are entitled to some degree of opacity.
  • Proportional Responses – Reactive policies should weigh costs versus statistically likely benefits. Draconian rules beyond proportionality are unethical.
  • Justice and Fairness – Security should not create racial, religious or other profiling biases that punish groups rather than conduct.
  • Accountability and Transparency – Ethical security requires checks against excessive power and oversight for responsibility. Secrecy enables abuses.
  • Honest Representation – Officials and pundits have an ethical duty not to stoke disproportionate public fears. Security theater manufactures consent.
  • Integrity and Character – Leaders should have the wisdom and courage to make difficult trade-offs rather than exploit fears for authority and control.

Overall Schneier argues security policies grounded in ethics rather than fear promote human well-being while recognizing inherent trade-offs. Prudence, transparency, and proportionality are essential.

Here are some of the most salient criticisms made against the arguments put forth in Bruce Schneier’s book Beyond Fear:

  • Downplays real threats – Some argue Schneier dangerously downplays serious risks like terrorism and cybercrime by calling fears overblown.
  • Too theoretical – His risk analysis approach may rely too much on statistics versus real-world uncertainties that defy probabilistic reasoning.
  • No easy answers – While citing misguided policies, critics say Schneier provides few actionable solutions or concrete policy guidance.
  • Conflicting values – His balancing of security, privacy and liberty involves inherent trade-offs and tough choices he gives little direction on.
  • Politically naive – Suggesting special interests drive policy may under-appreciate legitimate national security concerns of elected officials.
  • Hindsight bias – Assessing past overreactions easier than predicting future threats that justify caution today.
  • No accounting for psychology – Factoring human irrationality into policy is essential, not dismissed as “security theater.”
  • Technical focus – Focuses narrowly on certain technologies while ignoring broader organizational and sociological dimensions.

In summary, critics argue Schneier’s risk analysis orientation is too clinical, abstract, and devoid of political and human realities to transform concrete policy in the ways he intends. His framework may therefore lack sufficient nuance or guidance.

It’s interesting. The summary is pretty accurate, and so are the criticisms. The book was a product of my thinking twenty years ago, and my own thinking has evolved to address those criticisms.

Of course, this only works with older books that the LLM has ingested, and probably works better with books that have been written about.

On Technologies for Automatic Facial Recognition

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2023/09/on-technologies-for-automatic-facial-recognition.html

Interesting article on technologies that will automatically identify people:

With technology like that on Mr. Leyvand’s head, Facebook could prevent users from ever forgetting a colleague’s name, give a reminder at a cocktail party that an acquaintance had kids to ask about or help find someone at a crowded conference. However, six years later, the company now known as Meta has not released a version of that product and Mr. Leyvand has departed for Apple to work on its Vision Pro augmented reality glasses.

The technology is here. Maybe the implementation is still dorky, but that will change. The social implications will be enormous.