All posts by Marcilio Mendonca

Using Generative AI, Amazon Bedrock and Amazon CodeGuru to Improve Code Quality and Security

Post Syndicated from Marcilio Mendonca original https://aws.amazon.com/blogs/devops/using-generative-ai-amazon-bedrock-and-amazon-codeguru-to-improve-code-quality-and-security/

Automated code analysis plays a key role in improving code quality and compliance. Amazon CodeGuru Reviewer provides automated recommendations that can assist developers in identifying defects and deviation from coding best practices. For instance, CodeGuru Security automatically flags potential security vulnerabilities such as SQL injection, hardcoded AWS credentials and cross-site request forgery, to name a few. After becoming aware of these findings, developers can take decisive action to remediate their code.

On the other hand, determining what the best course of action is to address a particular automated recommendation might not always be obvious. For instance, an apprentice developer may not fully grasp what a SQL injection attack means or what makes the code at hand particularly vulnerable. In another situation, the developer reviewing a CodeGuru recommendation might not be the same developer who wrote the initial code. In these cases, the developer will first need to get familiarized with the code and the recommendation in order to take proper corrective action.

By using Generative AI, developers can leverage pre-trained foundation models to gain insights on their code’s structure, the CodeGuru Reviewer recommendation and the potential corrective actions. For example, Generative AI models can generate text content, e.g., to explain a technical concept such as SQL injection attacks or the correct use of a given library. Once the recommendation is well understood, the Generative AI model can be used to refactor the original code so that it complies with the recommendation. The possibilities opened up by Generative AI are numerous when it comes to improving code quality and security.

In this post, we will show how you can use CodeGuru Reviewer and Bedrock to improve the quality and security of your code. While CodeGuru Reviewer can provide automated code analysis and recommendations, Bedrock offers a low-friction environment that enables you to gain insights on the CodeGuru recommendations and to find creative ways to remediate your code.

Solution Overview

The diagram below depicts our approach and the AWS services involved. It works as follows:

1. The developer pushes code to an AWS CodeCommit repository.
2. The repository is associated with CodeGuru Reviewer, so an automated code review is initiated.
3. Upon completion, the CodeGuru Reviewer console displays a list of recommendations for the code base, if applicable.
4. Once aware of the recommendation and the affected code, the developer navigates to the Bedrock console, chooses a foundation model and builds a prompt (we will give examples of prompts in the next session).
5. Bedrock generates content as a response to the prompt, including code generation.
6. The developer might optionally refine the prompt, for example, to gain further insights on the CodeGuru Reviewer recommendation or to request for alternatives to remediate the code.
7. The model can respond with generated code that addresses the issue which can then be pushed back into the repository.

CodeCommit, CodeGuru and Bedrock used together

CodeCommit, CodeGuru and Bedrock used together

Note that we use CodeCommit in our walkthrough but readers can use any Git sources supported by CodeGuru Reviewer.

Using Generative AI to Improve Code Quality and Security

Next, we’re going to walk you through a scenario where a developer needs to improve the quality of her code after CodeGuru Reviewer has provided recommendations. But before getting there, let’s choose a code repository and set the Bedrock inference parameters.

A good reference of source repository for exploring CodeGuru Reviewer recommendations is the Amazon CodeGuru Reviewer Python Detector repository. The repository contains a comprehensive list of compliant and non-compliant code which fits well in the context of our discussion.

In terms of Bedrock model, we use Anthropic Claude V1 (v1.3) in our analysis which is specialized in content generation including text and code. We set the required model parameters as follows: temperature=0.5, top_p=0.9, top_k=500, max_tokens=2048. We set temperature and top_p parameters so as to give the model a bit more flexibility to generate responses for the same question. Please check the inference parameter definitions on Bedrock’s user guide for further details on these parameters. Given the randomness level specified by our inference parameters, readers experimenting with the prompts provided in this post might observe slightly different answers than the ones presented.

Requirements

  • An AWS account with access to CodeCommit, CodeGuru and Bedrock
  • Bedrock access enabled in the account. On-demand access should be fine (check pricing here).
  • Download and install the AWS CLI and Git (to push code to CodeCommit)

Walkthrough

Follow the steps below to run CodeGuru Reviewer analysis on a repository and to build and run Bedrock prompts.

  • Clone the from GitHub to your local workstation
git clone https://github.com/aws-samples/amazon-codeguru-reviewer-python-detectors.git
  • Create a CodeCommit repository and add a new Git remote
aws codecommit create-repository --repository-name amazon-codeguru-reviewer-python-detectors

cd amazon-codeguru-reviewer-python-detectors/

git remote add codecommit https://git-codecommit.us-east-1.amazonaws.com/v1/repos/amazon-codeguru-reviewer-python-detectors
  • Associate CodeGuru Reviewer with the repository to enable repository analysis
aws codeguru-reviewer associate-repository --repository 'CodeCommit={Name=amazon-codeguru-reviewer-python-detectors}'

Save the association ARN value returned after the command is executed (e.g., arn:aws:codeguru-reviewer:xx-xxxx-x:111111111111:association:e85aa20c-41d76-03b-f788-cefd0d2a3590).

  • Push code to the CodeCommit repository using the codecommit git remote
git push codecommit main:main
  • Trigger CodeGuru Reviewer to run a repository analysis on the repository’s main branch. Use the repository association ARN you noted in a previous step here.
aws codeguru-reviewer create-code-review \
 --name codereview001 \
 --type '{"RepositoryAnalysis": {"RepositoryHead": {"BranchName": "main"}}}' \
 --repository-association-arn arn:aws:codeguru-reviewer:xx-xxxx-x:111111111111:association:e85aa20c-41d76-03b-f788-cefd0d2a3590

Navigate to the CodeGuru Reviewer Console to see the various recommendations provided (you might have to wait a few minutes for the code analysis to run).

Amazon CodeGuru reviewer

Amazon CodeGuru Reviewer

  • On the CodeGuru Reviewer console (see screenshot above), we select the first recommendation on file hashlib_contructor.py, line 12, and take note of the recommendation content: The constructors for the hashlib module are faster than new(). We recommend using hashlib.sha256() instead.
  • Now let’s extract the affected code. Click on the file name link (hashlib_contructor.py in the figure above) to open the corresponding code in the CodeCommit console.
AWS CodeCommit Repository

AWS CodeCommit Repository

  • The blue arrow in the CodeCommit console above indicates the non-compliant code highlighting the specific line (line 12). We select the wrapping python function from lines 5 through 15 to build our prompt. You may want to experiment reducing the scope to a single line or a given block of lines and check if it yields better responses.
Amazon Bedrock Playground Console

Amazon Bedrock Playground Console

  • We then navigate to the Bedrock console (see screenshot above).
    • Search for keyword Bedrock in the AWS console
    • Select the Bedrock service to navigate to the service console
    • Choose Playgrounds, then choose Text
    • Choose model Anthropic Claude V1 (1.3). If you don’t see this model available, please make sure to enable model access.
  • Set the Inference configuration as shown in the screenshot below including temperature, Top P and the other parameters. Please check the inference parameter definitions on Bedrock’s user guide for further details on these parameters.
  • Build a Bedrock prompt using three elements, as illustrated in the screenshot below:
    • The source code copied from CodeCommit
    • The CodeGuru Reviewer recommendation
    • A request to refactor the code to address the code analysis finding
A Prompt in the Amazon Bedrock Playground Console

A Prompt in the Amazon Bedrock Playground Console

  • Press the Run button. Notice that Bedrock will automatically add the words Human (at the top) and Assistant (at the bottom) to the prompt.  Wait a few seconds and a response is generated (in green). The response includes the refactored code and an explanation on how the code was fixed (see screenshot below).
A Prompt Response (or completion) in the Amazon Bedrock Playground Console

A Prompt Response (or completion) in the Amazon Bedrock Playground Console

Note that the original code was refactored to use ashlib.sha256() instead of  using new in the constructor: hashlib.new(‘sha256’, …). Note that the prompt also asks for an explanation on how the refactored code fixes the issue, so the response includes such details. If we were interested in the refactored code only, we could change the prompt and ask that it returns only the refactored code.

Gaining Insights and Improving Code Quality

So far, our walkthrough showed how to navigate through the CodeCommit, CodeGuru Reviewer and Bedrock consoles to gather relevant content that can be used to build and run a Bedrock prompt. We are now going to explore further prompts to gain insights on the code and the recommendations in order to build more specific prompt to refactor the code. We omit screenshots for brevity. Readers can follow the same process outlined in the previous session.

Learning about the code

Let’s now examine another CodeGuru Reviewer recommendation. Specifically, we want to find the SQL Injection vulnerability raised for file, line 12, as shown below.

Sample non-compliant code

Sample non-compliant code

CodeGuru Reviewer provided the following recommendation:

CodeGuru Reviewer recommendation

For the sake of the walkthrough, let’s assume the developer reviewing the code above was not the original developer who built the code. We start by asking Bedrock to explain the code, line by line. Note that we have not included the CodeGuru Reviewer recommendation in the prompt this time as our initial goal is to understand the code.

Prompt:

Source Code:
def execute_query_noncompliant(request):
    import sqlite3
    name = request.GET.get("name")
    query = "SELECT * FROM Users WHERE name = " + name + ";" 
    with sqlite3.connect("example.db") as connection:
        cursor = connection.cursor()
        # Noncompliant: user input is used without sanitization.
        cursor.execute(query)
        connection.commit()
        connection.close()
Request:
Explain the code logic above line by line.

Response:

Sure, here is a line by line explanation of the code:
def execute_query_noncompliant(request):
This defines a function called execute_query_noncompliant that takes a request as an argument.
import sqlite3
This imports the sqlite3 library to connect to a SQLite database.
name = request.GET.get("name")
This gets a "name" query parameter from the request GET parameters.
query = "SELECT * FROM Users WHERE name = " + name + ";"
This constructs an SQL query string by concatenating the name variable into the query. 
So if name was "John", the query would be:
SELECT * FROM Users WHERE name = John;
with sqlite3.connect("example.db") as connection:
This opens a connection to the "example.db" SQLite database.
   cursor = connection.cursor()
This gets a cursor to execute queries on the connection.
    # Noncompliant: user input is used without sanitization.
    cursor.execute(query)
This executes the query on the database. The comment notes that this is noncompliant 
because it uses unsanitized user input in the query.
    connection.commit()
    connection.close()

Bedrock provides a line-by-line explanation of the code including the library being used (sqllite3) and a comment about the code using an unsanitized [MOU1] user input. The explanation is detailed helping the developer to understand the code logic and structure.

The developer now has a good understanding of the code. Let’s now gain insights on why and how the code is vulnerable to SQL Injection.

Prompt:

Source Code:
def execute_query_noncompliant(request):
    import sqlite3
    name = request.GET.get("name")
    query = "SELECT * FROM Users WHERE name = " + name + ";" 
    with sqlite3.connect("example.db") as connection:
        cursor = connection.cursor()
        # Noncompliant: user input is used without sanitization.
        cursor.execute(query)
        connection.commit()
        connection.close()
Request:
Why is the code above vulnerable to SQL injection?

Response:

The code is vulnerable to SQL injection because it directly concatenates 
user input (the name parameter) into an SQL query string. This allows a 
malicious user to inject SQL code into the query.
For example, if the name parameter is passed as:

'; DROP TABLE Users; -- 
Then the resulting SQL query would be:

SELECT * FROM Users WHERE name = ''; DROP TABLE Users; -- ;
This would drop the Users table, causing damage to the database.
To fix this, the user input should be sanitized before being added to the 
SQL query string. This can be done by:

Escaping single quotes in the input:
name = request.GET.get("name").replace("'", "''")
query = f"SELECT * FROM Users WHERE name = '{name}';"

Using parameterized queries:
query = "SELECT * FROM Users WHERE name = ?"
cursor.execute(query, (name,))

Again, the response is detailed. It explains that the user input is being concatenated directly into a query string. It gives an example of a SQL injection attack that could cause the Users table to be dropped. It also provides two options to fix the code including escaping single quotes and using parameterized queries.

We continue to dive deep into SQL injection. Let’s build a prompt to gather further examples of possible SQL injection attacks.

Prompt:

Source Code:
def execute_query_noncompliant(request):
    import sqlite3
    name = request.GET.get("name")
    query = "SELECT * FROM Users WHERE name = " + name + ";" 
    with sqlite3.connect("example.db") as connection:
        cursor = connection.cursor()
        # Noncompliant: user input is used without sanitization.
        cursor.execute(query)
        connection.commit()
        connection.close()
Request:
Provide examples of SQL injection for the code above

Response:

Here are some examples of SQL injection for that code:

If the name parameter is passed as:
'; DROP TABLE Users; -- 

Then the executed query would be:
SELECT * FROM Users WHERE name = ''; DROP TABLE Users; -- ;
This would drop the Users table.

If the name parameter is passed as:
OR 1=1; -- 

Then the executed query would be:
SELECT * FROM Users WHERE name = OR 1=1; -- ;
This would select all rows from the Users table.

If the name parameter is passed as:
'; UPDATE Users SET password = 'hacked' WHERE 1=1; -- 

Then the executed query would be:
SELECT * FROM Users WHERE name = ''; UPDATE Users SET password = 'hacked' WHERE 1=1; -- ;

This would update all passwords in the Users table to 'hacked'.
So in summary, without sanitizing the input, this code is vulnerable to SQL injection attacks 
that can delete data, access unauthorized data, modify data, etc. The input should be sanitized 
by escaping special characters before using in the SQL query.

Bedrock responds by including two new forms of SQL injection attacks, one that lists all records from the Users table and another one that updates all passwords in the table.

At this point, the developer has learned a lot about the code, SQL injection attacks and the CodeGuru Reviewer recommendation. It’s now time to build a prompt to fix the code. We leverage what we learned in previous prompts including the term “parameterized queries”, as suggested by CodeGuru Reviewer, to build a specific request in the prompt. Note that we have also included the CodeGuru Reviewer recommendation in the prompt this time.

Prompt:

Source Code:
def execute_query_noncompliant(request):
    import sqlite3
    name = request.GET.get("name")
    query = "SELECT * FROM Users WHERE name = " + name + ";" 
    with sqlite3.connect("example.db") as connection:
        cursor = connection.cursor()
        # Noncompliant: user input is used without sanitization.
        cursor.execute(query)
        connection.commit()
        connection.close()

Code Analysis Finding: 
It seems that you are constructing an SQL query using formatted strings. This could 
make your code vulnerable to SQL injection attacks, which can lead to loss of critical 
data. We recommend that you use parameterized queries for constructing SQL queries.  
[Learn more](https://cwe.mitre.org/data/definitions/89.html)
Request:
Fix the code above using parameterized query.

Response:

Here is the fixed code using parameterized query:

def execute_query_compliant(request):
   import sqlite3
   name = request.GET.get("name")
   query = "SELECT * FROM Users WHERE name = ?" 
   with sqlite3.connect("example.db") as connection:
      cursor = connection.cursor()
      # Compliant: Parameterized query prevents SQL injection
      cursor.execute(query, (name,)) 
      connection.commit()
      connection.close()

Instead of concatenating the user input directly into the SQL string, we use a ? 
placeholder and pass the user input as a tuple argument to cursor.execute(). This 
prevents SQL injection since the input is treated as a literal value rather than 
part of the SQL syntax.

The response includes a refactored code that uses a parameterized query to prevent SQL injection attacks (cursor.execute(query, (name,)).


We’re excited with the improvements in the code and now decide to go beyond fixing the SQL injection vulnerability. The next prompt asks Bedrock to make two specific improvements to the refactored.

  • Add python type hints
  • Add exception handling

Prompt:

Source Code:
def execute_query_compliant(request):
   import sqlite3
   name = request.GET.get("name")
   query = "SELECT * FROM Users WHERE name = ?" 
   with sqlite3.connect("example.db") as connection:
      cursor = connection.cursor()
      # Compliant: Parameterized query prevents SQL injection
      cursor.execute(query, (name,)) 
      connection.commit()
      connection.close()

Request: 
Add exception handling and type hints to the code above.

Response:

Here is the code with exception handling and type hints:

def execute_query_compliant(request: Request) -> None: 
   """Executes a compliant SQL query"""
   import sqlite3
   name = request.GET.get("name")
   query = "SELECT * FROM Users WHERE name = ?" 
   try:
      with sqlite3.connect("example.db") as connection: 
          cursor: sqlite3.Cursor = connection.cursor()
          # Compliant: Parameterized query prevents SQL injection
          cursor.execute(query, (name,)) 
          connection.commit()
   except sqlite3.Error as e: 
      print(f"SQLite error: {e}")
   finally: 
      connection.close()

I added:

Type hints:
request: Request indicating the request parameter is of type Request

cursor: sqlite3.Cursor indicating cursor is of type sqlite3.Cursor
Exception handling:
A try/except block catching sqlite3.Error exceptions. 
A finally block ensuring the connection is always closed. 
Let me know if you would like me to explain anything in the updated code!

Note that the generated code now includes a try/except/finally block to handle exceptions and type hints for the function parameter and return value.

However, some variables such as name and query are missing type hints. Here, we’d like to call out the importance of human supervision on automatically generated responses. The developer tweaks the previous prompt to ask that “all variables” have type hint added to them.

Prompt:

Source Code:
def execute_query_compliant(request):
   import sqlite3
   name = request.GET.get("name")
   query = "SELECT * FROM Users WHERE name = ?" 
   with sqlite3.connect("example.db") as connection:
      cursor = connection.cursor()
      # Compliant: Parameterized query prevents SQL injection
      cursor.execute(query, (name,)) 
      connection.commit()
      connection.close()

Request: 
Add exception handling and type hints to all variables in the code above.

Response:

Here is the code with exception handling and type hints:

def execute_query_compliant(request: Request) -> None: 
   """Executes a compliant SQL query"""
   import sqlite3
   name: str = request.GET.get("name")
   query: str = "SELECT * FROM Users WHERE name = ?" 
   try:
      with sqlite3.connect("example.db") as connection: 
         cursor: sqlite3.Cursor = connection.cursor()
         # Compliant: Parameterized query prevents SQL injection
         cursor.execute(query, (name,)) 
         connection.commit()
   except sqlite3.Error as e: 
        print(f"SQLite error: {e}")
   finally: 
        connection.close()

The new response refactors the code making sure all variables now have type hint. This shows how little changes to a prompt can produce much better results. In our example, by adding “all variables” to the prompt caused the response to indeed add type hints to all variables in the code provided.

Here is a summary of the activities performed via Bedrock prompting:

  • Gain insights on the code and the CodeGuru recommendation
    • Explain the code logic above line by line.
    • Why is the code above vulnerable to SQL injection?
    • Provide examples of SQL injection for the code above
  • Refactor and Improve the Code
    • Fix the code above using parameterized query
    • Add exception handling and type hints to the code above
    • Add exception handling and type hints to all variables in the code above.

The main takeaway is that by using a static analysis and security testing tool such as CodeGuru Reviewer in combination with a Generative AI service such as Bedrock, developers can significantly improve their code towards best practices and enhanced security. In addition, prompts which are more specific normally yield better results and that’s when CodeGuru Reviewer can be really helpful as it gives developers hints and keywords that can be used to build powerful prompts.

Cleaning Up

Don’t forget to delete the CodeCommit repository created if you no longer need it.

aws codecommit delete-repository -–repository-name amazon-codeguru-reviewer-python-detectors

Conclusion and Call to Action

In this blog, we discussed how CodeGuru Reviewer and Bedrock can be used in combination to improve code quality and security. While CodeGuru Reviewer provides a rich set of recommendations through automated code reviews, Bedrock gives developers the ability to gain deeper insights on the code and the recommendations as well as to refactor the original code to meet compliance and best practices.

We encourage readers to explore new Bedrock prompts beyond the ones introduced in this post and share their feedback with us.

Here are some ideas:

For a sample Python repository we recommend using the Amazon CodeGuru Reviewer Python Detector repository on GitHub which is publicly accessible to readers.

For Java developers, there’s a CodeGuru Reviewer Python Detector for Java repository alternative available.

Note: at the time of the writing of this post, Bedrock’s Anthropic Claude 2.0 model was not yet available so we invite readers to also experiment with the prompts provided using that model.

Special thanks to my colleagues Raghvender Arni and Mahesh Yadav for support and review of this post.
Author: Marcilio Mendonca

Marcilio Mendonca

Marcilio Mendonca is a Sr. Solutions Developer in the Prototyping And Customer Engineering (PACE) team at Amazon Web Services. He is passionate about helping customers rethink and reinvent their business through the art of prototyping, primarily in the realm of modern application development, Serverless and AI/ML. Prior to joining AWS, Marcilio was a Software Development Engineer with Amazon. He also holds a PhD in Computer Science. You can find Marcilio on LinkedIn at https://www.linkedin.com/in/marcilio/. Let’s connect!