Tag Archives: AI/ML

Experimenting with DeepSeek, Backblaze B2, and Drive Stats

2025-03-04 Pat Patterson

Post Syndicated from Pat Patterson original https://www.backblaze.com/blog/experimenting-with-deepseek-backblaze-b2-and-drive-stats/

A decorative image showing buildings of many sizes.

As we explained in our recent blog post, AI Reasoning Models: OpenAI o3-mini, o1-mini, and DeepSeek R1, Chinese startup DeepSeek caused a stir when it released its R1 reasoning model in January of this year. Interestingly, DeepSeek R1 has an OpenAI-compatible API, so applications written for OpenAI should work with DeepSeek R1 with just a configuration change. Since I had a suitable sample app all ready to go, I decided to put their claim to the test.

Why, and why not, use DeepSeek?

A major difference between DeepSeek and OpenAI is cost. At the time of writing, DeepSeek charges $0.55 per million input tokens and $2.19 per million output tokens for its R1 model. That’s about 3.6% of OpenAI’s $15.00 per million input tokens and $60.00 per million output tokens for its flagship o1 reasoning model, and about half of o3-mini’s $1.10 per million input tokens and $4.40 per million output tokens.

Set against this is the fact that, in using the DeepSeek platform’s API, you are sending your data to a startup located in China that has been accused by OpenAI of “inappropriately” basing its work on the output of OpenAI’s models. It’s up to you, and your organizations’ data governance policy, whether the trade-off is worthwhile.

Another consideration is the ability to run DeepSeek’s models locally, on your own infrastructure, or, more likely, your chosen provider’s infrastructure, rather than sending requests to the DeepSeek platform. Spinning up my own DeepSeek instance was out of scope for this blog post, but I’ll likely return to it in a future blog post.

Swapping OpenAI for DeepSeek

Last month, I explained how you can build an AI agent with Backblaze B2, LangChain, and Drive Stats, walking you through a simple chatbot that can answer questions based on our Drive Stats data set—11 years of metrics gathered from the Backblaze B2 Cloud Storage platform’s fleet of hard drives. In that example, the chatbot accepted a natural language question, used OpenAI’s GPT‑4o mini large language model (LLM) to generate a SQL query that might help provide an answer, executed the query against the Drive Stats data set via the Trino SQL engine, and then used OpenAI again to interpret the result set and either repeat the query-interpret cycle, or generate a natural language answer.

I copied the Jupyter notebook from that example and used it as the basis for investigating the feasibility of swapping out OpenAI for DeepSeek. The DeepSeek version of the notebook contains the full source code of my experiments; I’ll include relevant extracts here, edited for clarity.

Since I used the LangChain AI framework, which provides a layer above a range of AI models, the only place that OpenAI surfaced in my code was in creating an instance of LangChain’s ChatOpenAI wrapper:

# OPENAI_API_KEY must be defined in the .env file
load_dotenv()
llm = ChatOpenAI(model="gpt-4o-mini")

The ChatOpenAI class contains all the code required to communicate with OpenAI via its API.

According to the DeepSeek documentation, all you should need to do is:

Provide your DeepSeek API key in the same OPENAI_API_KEY environment variable.
Set the API base URL to https://api.deepseek.com.
Provide a DeepSeek model name in place of the OpenAI one.

If this reminds you of the steps for using Backblaze B2’s S3-compatible API, you’re not alone. The OpenAI API has become a de facto standard for integrating with LLMs in much the same way as Amazon’s S3 API allows an ecosystem of apps and tools to interoperate with object storage systems from a variety of vendors.

Looking at the DeepSeek documentation, you can use one of two models, deepseek-reasoner (aka DeepSeek R1) or deepseek-chat. Let’s see what the much-talked-about DeepSeek R1 came up with.

Using DeepSeek R1 in the AI agent

To make it easy to use both the OpenAI and DeepSeek notebooks, I created a second entry in the .env file for the DeepSeek API key, and copied it to the OpenAI environment variable in the notebook code:

# The .env file needs at least DEEPSEEK_API_KEY, and may also contain
# OPENAI_API_KEY. Move the DeepSeek API key to the OpenAI environment
# variable
load_dotenv()

os.environ["OPENAI_API_KEY"] = os.environ.pop("DEEPSEEK_API_KEY")

llm = ChatOpenAI(model="deepseek-reasoner", base_url='https://api.deepseek.com')

As I set about repeating the steps from the Jupyter notebook that supported my previous blog post, I was disappointed to see DeepSeek fall at the very first hurdle: generating a SQL query for a simple natural language question. Here is the code:

question = {"question": "How many drives are there?"}

write_query(question)

Looking back at the original notebook, OpenAI’s response was valid SQL, although it didn’t have enough information to construct the correct query:

{'query': 'SELECT COUNT(*) AS drive_count FROM drivestats'}

DeepSeek, on the other hand, responded with a Python stack trace and this error:

openai.UnprocessableEntityError: Failed to deserialize the JSON body into the target type: response_format: response_format.type `json_schema` is unavailable now at line 1 column 13827

What went wrong? Searching for the error turns up a comment from a LangChain engineer explaining that we should use BaseChatOpenAI rather than ChatOpenAI since it “[…] accommodates many APIs that are similar to OpenAI. It uses tool calling for structured output by default.”

So, we can redefine llm accordingly, and try generating a query again:

llm = BaseChatOpenAI(model="deepseek-reasoner", base_url='https://api.deepseek.com')

write_query(question)

Unfortunately, DeepSeek returns another error:

BadRequestError: Error code: 400 - {'error': {'message': 'The last message of deepseek-reasoner must be a user message, or an assistant message with prefix mode on (refer to https://api-docs.deepseek.com/guides/chat_prefix_completion).', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_request_error'}}

Looking back at the AI agent code, we can see that we used an off-the-shelf prompt from the LangChain Prompt Hub that provides the model with a single, system, message:

================================ System Message ================================

Given an input question, create a syntactically correct {dialect} query to run to help find the answer. Unless the user specifies in his question a specific number of examples they wish to obtain, always limit your query to at most {top_k} results. You can order the results by a relevant column to return the most interesting examples in the database.

Never query for all the columns from a specific table, only ask for a few relevant columns given the question.

Pay attention to use only the column names that you can see in the schema description. Be careful to not query for columns that do not exist. Also, pay attention to which column is in which table.

Only use the following tables:
{table_info}

Question: {input}

Does this mean that DeepSeek is not, in fact, API-compatible with OpenAI? I would argue that it does not. DeepSeek implements the same API request/response syntax as OpenAI, but it is a different platform. Some variation in semantics is to be expected. We see similar variations between Backblaze B2 and Amazon S3; for example, the S3 PutObjectAcl operation sets the access control list (ACL) for an object in a bucket. Amazon S3’s access management model allows you to manipulate an object’s ACL independently of its bucket—for example, you can put a private object in a public bucket, and vice versa.

This flexibility comes with a cost: It becomes difficult to reason about the visibility of data. In fact, AWS now recommends “that you keep ACLs disabled, except in unusual circumstances where you need to control access for each object individually.”

Backblaze B2’s model is much simpler: You control access at the bucket level, and all objects have the same ACL as their bucket. Backblaze B2 implements the PutObjectAcl operation, but, if you try to set an object’s ACL to any other value than its bucket’s ACL, the service responds with an error.

Returning to the AI agent code, we can replace the single-system-message prompt with one that combines a system message with a user message:

import textwrap
from langchain_core.prompts import ChatPromptTemplate

query_prompt_template = ChatPromptTemplate([
    ("system", textwrap.dedent("""Given an input question, create a
    syntactically correct {dialect} query to run to help find the answer.
    Unless the user specifies in his question a specific number of examples
    they wish to obtain, always limit your query to at most {top_k} results.
    You can order the results by a relevant column to return the most
    interesting examples in the database.

    Never query for all the columns from a specific table, only ask for a the
    few relevant columns given the question.

    Pay attention to use only the column names that you can see in the schema
    description. Be careful to not query for columns that do not exist. Also,
    pay attention to which column is in which table.

    Only use the following tables:
    {table_info}""")),
    ("human", "Question: {input}"),
])

Trying the write_query() call for a third time, this is the response:

BadRequestError: Error code: 400 - {'error': {'message': 'deepseek-reasoner does not support Function Calling', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_request_error'}}

A third error! What is this “function calling” that deepseek-reasoner does not support? A helpful article on the topic at the Hugging Face AI community explains:

Function calling is a powerful capability that enables Large Language Models (LLMs) to interact with your code and external systems in a structured way. Instead of just generating text responses, LLMs can understand when to call specific functions and provide the necessary parameters to execute real-world actions.

Unfortunately, that is exactly our use case. It’s becoming clear that DeepSeek R1 is not the correct tool for implementing an AI agent—we’ve been trying to use a chisel as a screwdriver!

DeepSeek-V3: A better fit

As its name suggests, the deepseek-chat model is more appropriate for this application. The DeepSeek documentation tells us that it is based on DeepSeek-V3, released in December 2024. DeepSeek-V3 is priced at $0.27 per million input tokens and $1.10 per million output tokens; this is actually more expensive than the GPT-4o mini model I used for the OpenAI agent example ($0.15 per million input tokens, $0.600 per million output tokens), but how does it compare? Let’s take a look.

First, we need to edit the LLM creation code again to set the model name:

llm = BaseChatOpenAI(model="deepseek-chat", base_url='https://api.deepseek.com')

Now we can run write_query() again. It’s immediately clear that it’s a better fit than its “big brother:”

{'query': 'SELECT COUNT(*) AS total_drives FROM drivestats LIMIT 10'}

As with the OpenAI agent, this query is well-formed SQL, but it’s not answering the question we set—it’s giving us the total number of rows in the dataset, rather than the number of drives. Also, it’s a little odd to have a LIMIT clause in a SELECT COUNT(*) query, but it’s legal SQL, and the agent is following its instructions very literally: always limit your query to at most {top_k} results, where we set top_k to 10.

question = {"question": "Each drive has its own serial number. How many drives are there?"}

query = write_query(question)

{'query': 'SELECT COUNT(DISTINCT serial_number) AS total_drives FROM drivestats'}

So far, so good!

I’ll skip some intermediate steps here—they are all in the Jupyter notebook if you want to review them, or run them for yourself—and look at how a simple LangChain graph, built on the DeepSeek LLM, answered the question: “Each drive has its own serial number. How many drives did each data center have on 9/1/2024?”

The OpenAI version generated an invalid query, comparing the date column with the string ’2024-09-01’ without using the required DATE type identifier, but DeepSeek generates a correct SQL query and provides a useful natural language response:

/SELECT datacenter, COUNT(DISTINCT serial_number) AS drive_count FROM drivestats WHERE date = DATE ‘2024-09-01’ GROUP BY datacenter ORDER BY drive_count DESC LIMIT 10

[(‘phx1’, 89477), (‘sac0’, 78444), (‘sac2’, 60775), (”, 24080), (‘iad1’, 22800), (‘ams5’, 16139)]

On September 1, 2024, the data centers had the following number of drives:

phx1: 89,477 drives
sac0: 78,444 drives
sac2: 60,775 drives
(empty datacenter): 24,080 drives
iad1: 22,800 drives
ams5: 16,139 drives

These are the top data centers with the highest drive counts on that date.

DeepSeek scores a point!

Moving on to the ReAct AI Agent, which allows the LLM to perform multiple SQL queries in generating an answer to a question, DeepSeek performs similarly to OpenAI. Given the question, “Each drive has its own serial number. What is the annualized failure rate of the ST4000DM000 drive model?”, the DeepSeek agent provides the overall failure rate rather than the annualized failure rate (AFR).

When we provide explicit instructions for calculating AFR in its prompt, the DeepSeek agent provides the correct result, identical, in fact, to the OpenAI agent’s response:

The annual failure rate (AFR) for the ST4000DM000 drive model is approximately 2.63%.

However, when given the question, “What was the annual failure rate of the ST8000NM000A drive model in Q3 2024?”, the DeepSeek agent gives us:

[(1.6100573445081607,)]

While OpenAI responds:

The annual failure rate (AFR) of the ST8000NM000A drive model in Q3 2024 is approximately 1.61%.

Wrapping up the investigation, the final question from the OpenAI notebook is more complex:

Considering only drive models which had at least 100 drives in service at the end of the quarter and which accumulated 10,000 or more drive days during the quarter, which drive had the most failures in Q3 2024, and what was its failure rate?

Impressively, the OpenAI agent constructed a well-formed SQL query and provided the correct response:

The drive model with the most failures in Q3 2024 is the TOSHIBA MG08ACA16TA, which had 181 failures. Its failure rate during this period was approximately 1.84%.

BadRequestError: Error code: 400 - {'error': {'message': "An assistant message with 'tool_calls' must be followed by tool messages responding to each 'tool_call_id'. (insufficient tool messages following tool_calls message)", 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_request_error'}}
During task with name 'agent' and id '0aa26ba6-a3ee-ced1-de4d-b60ed7fbca99'

The phrase “insufficient tool messages” suggested that the DeepSeek LLM might need to be reconfigured to allow more tokens. According to the documentation on models and pricing, the deepseek-chat model supports a maximum of 8K output tokens, but defaults to 4K if max_tokens is not specified.

Recreating the DeepSeek wrapper object and agent accordingly, I gave it the last question again:

llm = BaseChatOpenAI(model="deepseek-chat", base_url='https://api.deepseek.com', max_tokens=8192, **extra_kwargs)

agent_executor = create_react_agent(llm, tools, state_modifier=system_message)

response = agent_executor.invoke(
    {"messages": [{"role": "user", "content": "Considering only drive models which had at least 100 drives in service at the end of the quarter and which accumulated 10,000 or more drive days during the quarter, which drive had the most failures in Q3 2024, and what was its failure rate?"}]}
)

# Show the SQL query sent to the database
print(response['messages'][-3].tool_calls[0]['args']['query'])

# Show the final response message
display_markdown(response['messages'][-1].content, raw=True)

This time, DeepSeek was able to generate a similar SQL query to OpenAI:

WITH drive_counts AS (
    SELECT model, COUNT(DISTINCT serial_number) AS drive_count
    FROM drivestats
    WHERE date >= DATE '2024-07-01' AND date <= DATE '2024-09-30'
    GROUP BY model
    HAVING COUNT(DISTINCT serial_number) >= 100
), drive_days AS (
    SELECT model, COUNT(*) AS total_drive_days
    FROM drivestats
    WHERE date >= DATE '2024-07-01' AND date <= DATE '2024-09-30'
    GROUP BY model
    HAVING COUNT(*) >= 10000
), failures AS (
    SELECT model, COUNT(*) AS failure_count
    FROM drivestats
    WHERE date >= DATE '2024-07-01' AND date <= DATE '2024-09-30' AND failure = 1
    GROUP BY model
)
SELECT d.model,
       f.failure_count,
       100 * (CAST(f.failure_count AS DOUBLE) / (CAST(d.total_drive_days AS DOUBLE) / 365)) AS annual_failure_rate
FROM drive_days d
JOIN failures f ON d.model = f.model
JOIN drive_counts dc ON d.model = dc.model
ORDER BY f.failure_count DESC
LIMIT 1

With a correct response:

To answer the question:

The drive model with the most failures in Q3 2024 is TOSHIBA MG08ACA16TA, which had 181 failures. The annualized failure rate (AFR) for this model during that quarter was 1.84%.

Success! But, unfortunately, this isn’t the whole story.

DeepSeek Reliability

A screenshot of a DeepSeek error message.

I originally set out to write this blog post at the end of January, but the DeepSeek platform website had gone offline by January 30, so I couldn’t even start until I was able to sign up for an API key on February 5.

A screenshot of DeepSeek availability from December 2024 to Feburary 2025.

Given my shiny new API key, and DeepSeek’s claims of OpenAI API compatibility, I naïvely expected to be able to work through my earlier OpenAI notebook and write up the results in a couple of days. The reality was more like two weeks.

In this blog post I’ve detailed some of the error messages I encountered along the way, but I saw many more that pointed to the DeepSeek API simply being overwhelmed with traffic. For example, for over a day, when the status page reported no issues, most API requests to DeepSeek terminated after a minute with the error message:

json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

A time-consuming investigation revealed that this was caused by the DeepSeek API returning the 200 status code and headers as if the request was successful, then hanging for a minute before terminating the connection without returning any actual data. The calling code saw the 200 as success and tried to decode the non-existent API response body, resulting in the error.

I saw several more instances of intermittent errors that all seemed to point in the same direction: DeepSeek needs to add capacity to its API platform. Notably, the platform seemed faster and more stable on a Saturday morning, U.S. Pacific time, the early hours of Sunday morning in China.

Final thoughts

At present, I would have to classify the DeepSeek-V3 API as “promising, but somewhat flaky.” An agent invocation that succeeds one minute could fail the next with any of a range of error messages. That’s a shame, since when it does work, for instance, in creating the SQL query for the final question above, it tends to work very well.

One final caveat: This is a dynamic field; frameworks and services are literally being updated on a daily basis. For example, since yesterday, as I write this, four of the notebook’s module dependencies have been updated. I encourage you to experiment for yourself as your mileage will almost certainly vary, hopefully in a positive direction.

The post Experimenting with DeepSeek, Backblaze B2, and Drive Stats appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

How to debug code with GitHub Copilot

2025-02-21 Jeimy Ruiz

Post Syndicated from Jeimy Ruiz original https://github.blog/ai-and-ml/github-copilot/how-to-debug-code-with-github-copilot/

Debugging is an essential part of a developer’s workflow—but it’s also one of the most time consuming. What if AI could streamline the process, helping you analyze, fix, and document code faster? Enter GitHub Copilot, your AI-powered coding assistant.

GitHub Copilot isn’t just for writing code—it’s also a powerful tool for debugging. Whether you’re troubleshooting in your IDE, using Copilot Chat’s slash commands like /fix, or reviewing pull requests (PR) on github.com, GitHub Copilot offers flexible, intelligent solutions to speed up your debugging process. And with the free version of GitHub Copilot, available to all personal GitHub accounts, you can start exploring these features today.

In this guide, we’ll explore how to debug code with GitHub Copilot, where to use it in your workflow, and best practices to get the most out of its capabilities. Whether you’re new to GitHub Copilot or looking to deepen your skills, this guide has something for you.

Debugging code with GitHub Copilot: surfaces and workflows

Debugging code with GitHub Copilot can help you tackle issues faster while enhancing your understanding of the codebase. Whether you’re fixing syntax errors, refactoring inefficient code, or troubleshooting unexpected behavior, GitHub Copilot can provide valuable insights in your debugging journey.

So, how exactly does this work? “GitHub Copilot is recognizing patterns and suggesting solutions based on what it has learned,” says Christopher Harrison, Senior Developer Advocate. “Once you’ve identified the problem area, you can turn to GitHub Copilot and ask, ‘I’m giving this input but getting this output—what’s wrong?’ That’s where GitHub Copilot really shines.”

Let’s explore how GitHub Copilot can help you debug your code across different surfaces, from your IDE to github.com and even pull requests.

1. In Copilot Chat

Copilot Chat acts as an interactive AI assistant, helping you debug issues with natural language queries. And with Copilot Free, you get 50 chat messages per month. With Copilot Chat, you can:

Get real-time explanations: Ask “Why is this function throwing an error?” and Copilot Chat will analyze the code and provide insights.
Use slash commands for debugging: Try /fix to generate a potential solution or /explain for a step-by-step breakdown of a complex function. (More on this later!)
Refactor code for efficiency: If your implementation is messy or inefficient, Copilot Chat can suggest cleaner alternatives. Christopher explains, “Refactoring improves the readability of code, making it easier for both developers and GitHub Copilot to understand. And if code is easier to understand, it’s easier to debug and spot problems.”
Walk through errors interactively: Describe your issue in chat and get tailored guidance without ever having to leave your IDE.

2. In your IDE

When working in popular IDEs like VS Code or JetBrains, GitHub Copilot offers real-time suggestions as you type. It helps by:

Flagging issues: For example, if you declare a variable but forget to initialize it, GitHub Copilot can suggest a correction.
Code fixes: Encounter a syntax error? GitHub Copilot can suggest a fix in seconds, ensuring your code stays error-free.
Contextual assistance: By analyzing your workspace, GitHub Copilot provides solutions tailored to your codebase and project structure.

3. On github.com

GitHub Copilot extends beyond your IDE, offering debugging assistance directly on github.com via Copilot Chat, particularly in repositories and discussions. With this feature, you can:

Troubleshoot code in repositories: Open a file, highlight a problematic section, and use Copilot Chat to analyze it.
Generate test cases: If you’re unsure how to verify a function, GitHub Copilot can suggest test cases based on existing code.
Understand unfamiliar code: Reviewing an open-source project or a teammate’s PR? Ask GitHub Copilot to summarize a function or explain its logic.

4. For pull request assistance

GitHub Copilot can also streamline debugging within PRs, ensuring code quality before merging.

Suggest improvements in PR comments: GitHub Copilot can review PRs and propose fixes directly in the conversation.
Generate PR summaries: Struggling to describe your changes? Greg Larkin, Senior Service Delivery Engineer, says, “I use GitHub Copilot in the PR creation process to generate a summary of the changes in my feature branch compared to the branch I’m merging into. That can be really helpful when I’m struggling to figure out a good description, so that other people understand what I did.”
Explain diffs: Not sure why a change was made? Ask GitHub Copilot to summarize what’s different between commits.
Catch edge cases before merging: Use /analyze to identify potential issues and /tests to generate missing test cases.
Refactor on the fly: If a PR contains redundant or inefficient code, GitHub Copilot can suggest optimized alternatives.

By integrating Copilot into your PR workflow, you can speed up code reviews while maintaining high-quality standards. Just be sure to pair it with peer expertise for the best results.

5 slash commands in GitHub Copilot for debugging code

Slash commands turn GitHub Copilot into an on-demand debugging assistant, helping you solve issues faster, get more insights, and improve your code quality. Here are some of the most useful slash commands for debugging:

1. Use /help to get guidance on using GitHub Copilot effectively

The /help slash command provides guidance on how to interact with GitHub Copilot effectively, offering tips on structuring prompts, using slash commands, and maximizing GitHub Copilot’s capabilities.

How it works: Type /help in Copilot Chat to receive suggestions on your current task, whether it’s debugging, explaining code, or generating test cases.
Example: Need a refresher on what GitHub Copilot can do? Use /help to access a quick guide to slash commands like /fix and /explain.

2. Use /fix to suggest and apply fixes

The /fix command is a go-to tool for resolving code issues by allowing you to highlight a block of problematic code or describe an error.

How it works: Select the code causing issues, type /fix, and let Copilot Chat generate suggestions.
Example: If you have a broken API call, use /fix to get a corrected version with appropriate headers or parameters.

3. Use /explain to understand code and errors

The /explain command breaks down complex code or cryptic error messages into simpler, more digestible terms.

How it works: Highlight the code or error message you want clarified, type /explain, and Copilot Chat will provide an explanation. It will explain the function’s purpose, how it processes the data, potential edge cases, and any possible bugs or issues.
Example: Encounter a “NullPointerException”? Use /explain to understand why it occurred and how to prevent it.

4. Use /tests to generate tests

Testing is key to identifying bugs, and the /tests command helps by generating test cases based on your code.

How it works: Use /tests on a function or snippet, and Copilot Chat will generate relevant test cases.
Example: Apply /tests to a sorting function, and Copilot Chat might generate unit tests for edge cases like empty arrays or null inputs.

5. Use /doc to generate or improve documentation

There are long-term benefits to having good text documentation—for developers and GitHub Copilot, which can draw context from it—because it makes your codebase that much more searchable. By using the /doc command with Copilot Free, you can even ask GitHub Copilot to write a summary of specific code blocks within your IDE.

The /doc command helps you create or refine documentation for your code, which is critical when debugging or collaborating with others. Clear documentation provides context for troubleshooting, speeds up issue resolution, and helps fellow developers understand your code faster.

How it works: Highlight a function, class, or file, type /doc and right-click to see the context menu, and Copilot Chat will generate comprehensive comments or documentation.
Example: Apply /doc to a function, and Copilot Chat will generate inline comments detailing its purpose, parameters, and expected output.

By mastering these commands, you can streamline your debugging workflow and resolve issues faster without switching between tools or wasting time on manual tasks.

Best practices for debugging code with GitHub Copilot

Provide clear context for better results

Providing the right context helps GitHub Copilot generate even more relevant debugging suggestions. As Christopher explains, “The better that Copilot is able to understand what you’re trying to do and how you’re trying to do it, the better the responses are that it’s able to give to you.”

Since GitHub Copilot analyzes your code within the surrounding scope, ensure your files are well structured and that relevant dependencies are included. If you’re using Copilot Chat, reference specific functions, error messages, or logs to get precise answers instead of generic suggestions.

💡 Pro tip: Working across multiple files? Use the @workspace command to point GitHub Copilot in the right direction and give it more context for your prompt and intended goal.

Ask, refine, and optimize in real time

Instead of treating GitHub Copilot as a one-and-done solution, refine its suggestions by engaging in a back-and-forth process. Greg says, “I find it useful to ask GitHub Copilot for three or four different options on how to fix a problem or to analyze for performance. The more detail you provide about what you’re after—whether it’s speed, memory efficiency, or another constraint—the better the result.”

This iterative approach can help you explore alternative solutions you might not have considered, leading to more robust and efficient code.

Master the art of specific prompts

The more specific your prompt, the better GitHub Copilot’s response. Instead of asking “What’s wrong with this function?” try “Why is this function returning undefined when the input is valid?” GitHub Copilot performs best when given clear, detailed queries—this applies whether you’re requesting a fix, asking for an explanation, or looking for test cases to verify your changes.

By crafting precise prompts and testing edge cases, you can use GitHub Copilot to surface potential issues before they become production problems.

Try a structured approach with progressive debugging

Next, try a step-by-step approach to your debugging process! Instead of immediately applying fixes, use GitHub Copilot’s commands to first understand the issue, analyze potential causes, and then implement a solution. This structured workflow—known as progressive debugging—helps you gain deeper insights into your code while ensuring that fixes align with the root cause of the problem.

For example:

Start with the slash command /explain on a problematic function to understand the issue.
Use the slash command /startDebugging to help with configuring interactive debugging.
Finally, apply the slash command /fix to generate possible corrections.

📌 Use case: If a function in your React app isn’t rendering as expected, start by running /explain on the relevant JSX or state logic, then use /debug to identify mismanaged props, and finally, apply /fix for a corrected implementation.

Combine commands for a smarter workflow

Some issues require multiple levels of debugging and refinement. By combining commands, you can move from diagnosis to resolution even faster.

For example:

Use /explain + /fix to understand and resolve issues quickly.
Apply /fixTestFailure + /tests to find failing tests and generate new ones.

📌 Use case:

Fixing a broken function: Run the slash command /explain to understand why it fails, then use the slash command /fix to generate a corrected version.
Improving test coverage: Use the slash command /fixTestFailure to identify and fix failing tests, then use the slash command /tests to generate additional unit tests for the highlighted code.

Remember, slash commands are most effective when they’re used in the appropriate context, combined with clear descriptions of the problem, are part of a systematic debugging approach, and followed up with verification and testing.

Better together: AI tools with a developer in the pilot’s chair

GitHub Copilot is a powerful tool that enhances your workflow, but it doesn’t replace the need for human insight, critical thinking, and collaboration. As Greg points out, “GitHub Copilot can essentially act as another reviewer, analyzing changes and providing comments. Even so, it doesn’t replace human oversight. Having multiple perspectives on your code is crucial, as different reviewers will spot issues that others might miss.”

By combining GitHub Copilot’s suggestions with human expertise and rigorous testing, you can debug more efficiently while maintaining high-quality, reliable code.

Ready to try the free version of GitHub Copilot?
Start using GitHub Copilot today >

You can keep the learning going with these resources:
* Debug your app with GitHub Copilot in Visual Studio
* Example prompts for GitHub Copilot Chat
* GitHub Copilot and VS Code tutorials

The post How to debug code with GitHub Copilot appeared first on The GitHub Blog.

AI Reasoning Models: OpenAI o3-mini, o1-mini, and DeepSeek R1

2025-02-05 Molly Clancy

Post Syndicated from Molly Clancy original https://www.backblaze.com/blog/ai-reasoning-models-openai-o3-mini-o1-mini-and-deepseek-r1/

A decorative image showing an AI chip connecting icons of representing different files.

If you haven’t been able to keep pace with the AI news cycle, you’d be forgiven. I work at a tech company, and it’s felt like bailing water with a teacup over the past few weeks. But the term that keeps rising to the top of the flotsam in the boat is this: reasoning models. The regular ol’ models that power ChatGPT, Gemini, and Claude are cool and all, but reasoning models are what you should keep an eye on as an enterprise tech leader, specifically DeepSeek and OpenAI.

In the spirit of our AI 101 series, I’ll do my level best to recap the finer points and decode some of the more esoteric terms you’re likely to encounter (Like: WTH is a “mixture of experts”? That sounds like a party I want to be invited to, but will definitely skip at the last minute.)

The reasoning model releases: OpenAI o1-mini, DeepSeek R1, and OpenAI o3-mini

The last few weeks and months have seen a flurry of activity in the AI space, with reasoning models taking center stage. The TL/DR is that reasoning models are LLMs that can self-correct before delivering a response to a prompt, though their turn time is a little longer than your standard LLM.

Here are the releases that you should know about.

OpenAI o1-mini: September 12, 2024

It seems like a lifetime ago, but OpenAI released its o1-mini model back in September. o1-mini wasn’t the first reasoning model to go to market (models from Google, DeepMind, Anthropic, and Meta dabbled in reasoning for specific tasks). But, it was more cost-efficient at inference—80% cheaper than the o1-preview model. What you need to know:

Yes, o1-preview and o1-mini were released at the same time—it’s confusing. Without getting into the weeds, here’s the difference: pricing. o1-preview was the most expensive OpenAI model on offer at $15/1M input tokens and $60/1M output tokens versus mini’s $3/1M input and $12/1M output. (You can think of tokens as units of data, like a prompt or a response, that are processed by the ML model.)
o1-preview (the expensive one) was purported at the time to perform “similarly to PhD students on challenging benchmark tasks in physics, chemistry, and biology.”
o1-mini (the 80% cheaper one) was designed to be particularly well-suited for coding tasks.

DeepSeek R1: January 20, 2025

Unless you’ve been under a rock, you’ve heard about this one. DeepSeek rattled the AI industry and financial markets with its release of R1, challenging OpenAI’s models on performance, pricing, and open-source availability. (We love a good open-source release.) What you need to know:

DeepSeek R1 delivers comparable results to OpenAI’s o1 models, both preview and mini, on math and coding benchmarks, while being trained on fewer GPUs—orders of magnitude fewer. Best guess estimates put it at around 60,000 GPUs, while industry leaders like OpenAI and Anthropic exceed 500k each.
This makes R1 much cheaper at $0.14/1M input tokens and $2.19/1M output tokens.
These efficiency claims could have far-reaching impacts for enterprises looking to build AI at a fraction of the cost. (The DeepSeek platform page has been down since we tasked one of our favorite tech evangelists with testing it, but stay tuned for a deep dive on how it works.)

OpenAI o3-mini: January 31, 2025

OpenAI previewed o3 in December, and brought it to GA just 11 days after DeepSeek joined the party. What you need to know:

o3-mini is intended for programming and STEM use cases.
At $1.10/1M input tokens and $4.40/1M output tokens, o3-mini beats o1-mini on pricing, but doesn’t challenge DeepSeek R1.
Also, OpenAI alleges that DeepSeek stole its data for training purposes.

I’m admittedly cherry picking these releases a bit to keep things simple. Suffice it to say, there are a lot of models, even within OpenAI’s o-series, but these are the ones of note at least as it pertains to recent events.

What is reasoning anyway?

You might see reasoning described as “thinking” before it delivers an answer, but do not be fooled. AI cannot yet “think” or, to be fair, “reason” in the ways that we apply those terms to humans. To describe what they actually do, I need to use a word salad of jargon. I’m sorry—definitions follow. Reasoning models leverage chain-of-thought prompting to guide decision-making, incorporating self-improvement mechanisms and using test-time thinking to make real-time adjustments.

Chain-of-thought (CoT) prompting: Models break problems into logical steps (e.g., solving math problems via intermediate equations)
Self-improvement mechanisms: Techniques like the Self-Taught Reasoner (STaR) enable iterative refinement of reasoning through automated feedback loops.
Test-time thinking: Models can make decisions during deployment based on real-time inputs, rather than relying solely on pre-trained models or fixed strategies.

Here are a few more terms you might come across for good measure:

Inference compute: The computational power needed to run a reasoning model and generate predictions or outputs based on new data after the model has been trained.
Mixture of experts approach: Using multiple specialized models (“experts”) that handle different tasks, and applying a gating mechanism to select the most relevant expert to use to make predictions based on the input data. Of note: DeepSeek used this approach to create efficiencies.
Distillation: Using inputs and outputs from one model to train another model. Of note: OpenAI alleges this is how DeepSeek “stole” its IP.

This is all pretty cool, if linguistically painful, stuff, and it means that reasoning models are shifting perceptions of model capabilities. But they’re not without persistent challenges. Like other LLMs, they still struggle with complex reasoning failures, lack of training transparency, and cognitive biases.

Why should you care?

If the past two weeks (and, really, the past two years) are any indication, AI innovation will continue its blistering pace. Reasoning models, and LLMs in general, will become diverse and specialized for narrower tasks as the core technology is increasingly commoditized and cheapened. And, it’s worth noting that this is a totally normal—and expected—lifecycle when it comes to new technology.

What does it all mean for enterprises looking to build AI into their operations? Two key takeaways:

Don’t overcommit on any one toolset or investment: Test out OpenAI, DeepSeek, Gemini, Alibaba’s Qwen, and others. And, stay ahead of the changing landscape and new models—stay nimble, and keep experimenting.
Take care of your data: What makes these models valuable for your company isn’t so much their capabilities, but your data. You need to retain it in storage that’s reliable, easy to access, and doesn’t lock you out of AI experimentation with exorbitant egress fees.

Even as AI models get better, having those fundamentals in place can only help your business and set you up to better leverage AI when it’s right for your operations.

The post AI Reasoning Models: OpenAI o3-mini, o1-mini, and DeepSeek R1 appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

Building an AI Agent with Backblaze B2, LangChain, and Drive Stats

2025-01-22 Pat Patterson

Post Syndicated from Pat Patterson original https://www.backblaze.com/blog/building-an-ai-agent-with-backblaze-b2-langchain-and-drive-stats/

A decorative image showing multiple computer windows folding into the cloud.

Last August, I explained how you can use a Jupyter Notebook to explore AI development; specifically, building a chatbot that answers questions based on custom context downloaded from a private bucket in Backblaze B2 Cloud Storage.

In this post, I’ll look at another AI technology, agents, and show you how I built an AI agent that answers questions about hard drive reliability based on over 11 years of raw data from our Drive Stats franchise.

The Drive Stats dataset is ideal for this kind of work. It’s a real-world dataset, but, it only weighs in at around 500 million records consuming about 20GB of storage in Parquet format (“only” being a relative term), so you can use it with big data and AI tools on a laptop in a reasonable amount of time rather than spinning up an expensive virtual machine (VM) and/or spending days waiting for an operation to complete. As an example, converting the entire Drive Stats data set from CSV to Parquet using a Python app on my MacBook Pro takes a couple of hours. On the same hardware, converting a terabyte-scale data set would take about four days.

Speaking of Drive Stats

The Drive Stats 2024 report comes out February 11, and we’re hosting a LinkedIn Live event where Andy Klein, resident Drive Stats guru, will share highlights. Register today to save your spot.

You can use these same techniques with any large dataset, from healthcare to ecommerce to financial services. In this example, we’re working with a single table, but you could adapt the sample code to a data lake comprising any number of tables.

What is an AI agent?

In the spirit of the times, I posed this question to ChatGPT. Its answer:

An AI agent is a software system designed to autonomously perform tasks or make decisions based on its environment and goals. It leverages artificial intelligence techniques—such as machine learning, reasoning, and natural language processing—to process information, make decisions, and take actions to achieve specific objectives.

Key components of an AI agent include:

Perception: The ability to sense and understand its environment. This could be through sensors, input data, or other means of gathering information.
Reasoning/decision-making: The core processing mechanism that helps the agent interpret its environment, make decisions, and plan actions. It could use various algorithms, such as decision trees, reinforcement learning, or neural networks.
Action: Once the agent has analyzed the environment and made a decision, it takes action to achieve its goal, whether it’s performing an operation, giving a recommendation, or interacting with another system.
Learning: Some AI agents can adapt over time, improving their decision-making and actions based on experience (via reinforcement learning, supervised learning, etc.).

AI agents can range from simple systems, like chatbots or virtual assistants, to more complex systems like autonomous vehicles, robots, or financial trading algorithms.

In general, the term “agent” emphasizes the idea of autonomy—the agent operates independently, often with the ability to learn, adapt, and make decisions based on changing conditions without direct human intervention.

In this example, the agent’s environment is a database containing the Drive Stats data (more on that below), and I want it to perform the following tasks:

Based on a natural language question, such as “Which drive has the lowest annual failure rate?”, generate a SQL query that retrieves data that will help answer the question.
Execute that query against the Drive Stats dataset.
Based on the query results, either create a new query that better answers the question, or generate a natural language answer.

As in my previous post, I’m using the open source LangChain framework. This tutorial on building a question/answering system over SQL data was my starting point. I’ll explain key points of the integration in this blog post; the full source code is available as a Jupyter notebook in the ai-agent-demo repository.

Querying the Drive Stats dataset

Now I’ve established that my agent will be writing a SQL query, the next question is, “What will it be querying?” I’ve written about querying the Drive Stats dataset before; in that blog post I explained how I wrote a Python script to convert the Drive Stats data from the CSV format in which we publish it to Apache Parquet, a column-oriented file format particularly well-suited for storing tabular data for use in analytical queries, and upload it to a Backblaze B2 Bucket using the Apache Hive table format. There’s a broad ecosystem of tools and platforms that can manipulate Parquet data in object storage (for example, Apache Spark and Snowflake) and I chose Trino, the open source distributed SQL engine that forms the basis for Amazon Athena, to execute queries against the data.

I could have used the same technologies for this exercise, but I decided to add Apache Iceberg to the mix. While Parquet is a file format that specifies how tabular data is stored in files, Iceberg is a table format that governs how those files can be combined and interpreted as a database table. Iceberg provides a number of advantages over Hive as a table format, including better performance and much more flexible data partitioning.

What is partitioning?

Partitioning splits a dataset on one or more column values, easing data management and improving performance when a query includes a partition column.

Partitioning by year and month makes sense for the Drive Stats dataset—the resulting Parquet files are in the hundreds of megabytes, the sweet spot for Parquet data. To apply this partitioning to the Drive Stats data using the Hive table format, I had to create otherwise redundant month and year columns from the existing date column, complicating the schema.

Iceberg, by contrast, supports hidden partitioning, allowing you to apply a transformation to a column value to produce a partition value without adding any new columns. With the Drive Stats data, that meant I could simply define the partitioning as month(date) (the resulting value being the number of months since 1/1/1970, rather than an integer between 1 and 12), with no need to create any additional columns.

LangChain’s SQLDatabase class provides access to databases via the SQLAlchemy open-source Python library. The demo code obtains a SQLDatabase instance by providing a URI containing the trino scheme, a username and the location of the database node:

db = SQLDatabase.from_uri('trino://admin@localhost:8080/iceberg/drivestats')

Note: In this and other code excerpts in this blog post, I’ve omitted extraneous “boilerplate” code. As mentioned above, the full source code is available in the ai-agent-demo repository.

As you can infer from the localhost domain name, I’m running Trino on my laptop. I’m actually running it in Docker, using the Iceberg/Hive Docker Compose script from the trino-getting-started-b2 repository. I’ll dive into that example in a future blog post.

A simple query confirms that we have a successful database connection:

db.run("SELECT COUNT(*) FROM drivestats")

'[(537220724,)]'

As the result conveys, there are over 537 million rows in the Drive Stats dataset.

Each row contains the metrics collected from a single drive in the Backblaze fleet on a specific day. The schema has evolved over time, but, currently, the following columns are included:

date: The date of collection.
serial_number: The unique serial number of the drive.
model: The manufacturer’s model number of the drive.
capacity_bytes: The drive’s capacity in bytes.
failure: 1 if this was the last day that the drive was operational before failing, 0 if all is well.
pod_slot_num: The physical location of a drive within a storage server, as an integer from 0 to 59. The specific slot differs based on the storage server type and capacity: Backblaze (45 or 60 drives), Dell (26 drives), or Supermicro (60 drives).
pod_id: There are 20 storage servers in each Backblaze Vault. The pod_id is a numeric field with values from 0 to 19 assigned to each of the 20 storage servers.
vault_id: All data drives are members of a Backblaze Vault. Each Vault consists of either 900 or 1,200 hard drives divided evenly across 20 storage servers. The Vault is a numeric value starting at 1,000.
cluster_id: The name of a given collection of storage servers logically grouped together to optimize system performance, formatted as a numeric field with up to two digits. Note: At this time the cluster_id is not always correct; we are working on fixing that.
datacenter: The Backblaze data center where the drive is installed, currently one of ams5 (Amsterdam, Netherlands), iad1 (Reston, Virginia), phx1 (Phoenix, Arizona), sac0 (Sacramento, California), sac2 (Stockton, California) or, now live, yyz1, our new Toronto, Ontario, data center.
is_legacy_format: Currently 0, but may change in future as more fields are added.
A collection of SMART attributes. The number of attributes collected has risen over time; currently we store 93 SMART attributes in each record, each one in both raw and normalized form, with field names of the form smart_n_normalized and smart_n_raw, where n is between 1 and 255.

Using OpenAI to generate a SQL query

For this project, I decided to use the OpenAI API, rather than running a large language model (LLM) directly on my laptop. LangChain has a chat model integration for OpenAI, as well as many other providers, so you could use, for example, a local Llama model (via ChatOllama) or one of the Claude models (via ChatAnthropic) if you prefer.

To use the OpenAI API, you must sign up for an OpenAI account and create an OpenAI API key. This code loads the API key from a .env file and creates a chat model instance using OpenAI’s GPT-4o mini model:

# OPENAI_API_KEY must be defined in the .env file
load_dotenv()
llm = ChatOpenAI(model="gpt-4o-mini")

Now we need a system prompt template. We’ll combine this with the database schema and a natural language question to form the prompt that we send to OpenAI. As in the LangChain tutorial, I’m using a prompt from the LangChain Prompt Hub:

query_prompt_template = hub.pull("langchain-ai/sql-query-system-prompt")
query_prompt_template.messages[0].pretty_print()

This is the prompt template text, with the placeholders shown in {braces}:

================================ System Message ================================

Given an input question, create a syntactically correct {dialect} query to run to help find the answer. Unless the user specifies in his question a specific number of examples they wish to obtain, always limit your query to at most {top_k} results. You can order the results by a relevant column to return the most interesting examples in the database.

Never query for all the columns from a specific table, only ask for a few relevant columns given the question.

Pay attention to use only the column names that you can see in the schema description. Be careful to not query for columns that do not exist. Also, pay attention to which column is in which table.

Only use the following tables:
{table_info}

Question: {input}

Notice how the template requires you to specify the correct SQL dialect, constrains the number of results returned, and encourages the model to not hallucinate column names that do not exist in the schema.

A helper function populates the prompt template, sends it to the model, and returns the generated SQL query:

def write_query(state: State):
    prompt = query_prompt_template.invoke(
        {
            "dialect": db.dialect,
            "top_k": 10,
            "table_info": db.get_table_info(),
            "input": state["question"],
        }
    )
    structured_llm = llm.with_structured_output(QueryOutput)
    result = structured_llm.invoke(prompt)
    return {"query": result["query"].rstrip(';')}

We can test the helper function by calling it directly with a Python dictionary containing a simple question:

question = {"question": "How many drives are there?"}
query = write_query(question)

The resulting query dictionary does indeed contain a valid SQL query, but it won’t give us the answer we are looking for.

{'query': 'SELECT COUNT(*) AS drive_count FROM drivestats'}

That query will tell us how many rows there are in the dataset, rather than how many drives. We supplied the database schema to the model, but we haven’t given it any information on the semantics of the columns in the drivestats table. We can provide a bit more detail to obtain the correct query:

question = {"question": "Each drive has its own serial number. How many drives are there?"}
query = write_query(question)

This time, the generated SQL query is correct:

{'query': 'SELECT COUNT(DISTINCT serial_number) AS total_drives FROM drivestats'}

As you can see, it’s important to check the output of AI models—they can and do generate unexpected results.

A second helper function executes the query against the database:

def execute_query(state: State):
    execute_query_tool = QuerySQLDatabaseTool(db=db)
    return {"result": execute_query_tool.invoke(state["query"])}

We can test it using the (correct) generated query:

result = execute_query(query)

{'result': '[(430464,)]'}

We need one more helper function, to pass the result set to the model and have it generate a natural language response. This time, we define our own prompt:

def generate_answer(state: State):
    prompt = (
        "Given the following user question, corresponding SQL query, "
        "and SQL result, answer the user question.\n\n"
        f'Question: {state["question"]}\n'
        f'SQL Query: {state["query"]}\n'
        f'SQL Result: {state["result"]}'
    )
    response = llm.invoke(prompt)
    return {"answer": response.content}

Again, we can test it in isolation. Notice that we have to provide the question and query, as well as the result so that the model has the context it needs:

answer = generate_answer(question | query | result)
answer['answer']

'There are 430,464 drives.'

Success! At the present time, there are indeed 430,464 drives in the Drive Stats dataset.

LangChain’s LangGraph orchestration framework allows us to compile our three helper functions into a single graph object:

graph_builder = StateGraph(State).add_sequence(
    [write_query, execute_query, generate_answer]
)
graph_builder.add_edge(START, "write_query")
graph = graph_builder.compile()

We can visualize the flow in the notebook:

display(Image(graph.get_graph().draw_mermaid_png()))

A diagram showing a query workflow. The workflow is defined as start, write_query, execute_query, generate_answer.

We’ve combined the write_query and execute_query steps into a graph object that can run agent-generated queries. I’ll quote the security note from the LangChain tutorial on the inherent risks in doing so:

Building Q&A systems of SQL databases requires executing model-generated SQL queries. There are inherent risks in doing this. Make sure that your database connection permissions are always scoped as narrowly as possible for your chain/agent’s needs. This will mitigate though not eliminate the risks of building a model-driven system. For more on general security best practices, see here.

In this example, we are querying a public dataset, and I followed best practice by configuring Trino’s Iceberg connector with a read-only application key scoped to the bucket containing the Drive Stats Iceberg tables.

Now let’s stream a new question through the flow. This mode of operation displays the output of each step as it is executed, essential for understanding the flow’s behavior, particularly when it is behaving unexpectedly. The model returns structured text in Markdown format. With a couple of lines of code to extract the message from the step variable, we can use the display_markdown function to render each step’s output:

for step in graph.stream(
    {"question": "Each drive has its own serial number. How many drives did each data center have on 9/1/2024"}, stream_mode="updates"
):
    # unwrap the step value to get the markdown message
    state = one(step.values())
    message = one(state.values())
    display_markdown(message, raw=True)

This is the model’s output, and it gives us three different messages. I’ve separated them with a horizontal line for clarity:

SELECT datacenter, COUNT(DISTINCT serial_number) AS drive_count FROM drivestats WHERE date = ‘2024-09-01’ GROUP BY datacenter ORDER BY drive_count DESC LIMIT 10

Error: (trino.exceptions.TrinoUserError) TrinoUserError(type=USER_ERROR, name=TYPE_MISMATCH, message=”line 3:12: Cannot apply operator: date = varchar(10)”, query_id=20250113_221649_00214_bsut5) [SQL: SELECT datacenter, COUNT(DISTINCT serial_number) AS drive_count FROM drivestats WHERE date = ‘2024-09-01’ GROUP BY datacenter ORDER BY drive_count DESC LIMIT 10] (Background on this error at: https://sqlalche.me/e/20/f405)

The SQL query encountered an error because it attempted to compare a date column with a string in the format ‘YYYY-MM-DD’. Specifically, the error message indicates that the date column is of a different type (likely not a string), which is causing the type mismatch.

To answer the user question about how many drives each data center had on 9/1/2024, the SQL query needs to be corrected. Here’s the revised query:

SELECT datacenter, COUNT(DISTINCT serial_number) AS drive_count 
FROM drivestats 
WHERE date = DATE '2024-09-01' 
GROUP BY datacenter 
ORDER BY drive_count DESC 
LIMIT 10

This corrected query uses the DATE keyword to ensure that the date string is properly interpreted as a date type, which should resolve the type mismatch error. If executed successfully, this query will provide the count of distinct drives in each data center for the specified date. However, without executing the corrected query against the database, we cannot provide the exact counts.

As you can see in the output from the first step, the model generated an invalid query, comparing a date to a string, despite the database schema being included in the prompt. The output of the second step contains the resulting error message from the database, while the third step contains the model’s diagnosis of the error.

This exchange highlights a limitation of a flow that is simply a linear series of steps, such as write_query, execute_query, and generate_answer. We cannot rely on the model to generate a valid SQL query, although it is able to point the way towards resolving its error.

Creating a ReAct AI agent with LangGraph

The LangGraph framework gives you the capability to create AI agents based on arbitrarily complex logic. In this article, I’ve used its prebuilt ReAct (Reason+Act) agent, since it neatly demonstrates the agent concept, rewriting the SQL query repeatedly in response to database errors.

There are three steps to creating the agent. The first is to create an instance of LangChain’s SQLDatabaseToolkit, passing it the database and model, and obtain its list of tools:

toolkit = SQLDatabaseToolkit(db=db, llm=llm)
tools = toolkit.get_tools()

The tools list contains tools that execute queries, retrieve the names, schemas and content of database tables, and check SQL query syntax.

The next step is to retrieve a suitable prompt template from the Prompt Hub and populate the template placeholders:

prompt_template = hub.pull("langchain-ai/sql-agent-system-prompt")
system_message = prompt_template.format(dialect=db.dialect, top_k=10)

Here is the prompt template’s text:

================================ System Message ================================

You are an agent designed to interact with a SQL database.
Given an input question, create a syntactically correct {dialect} query to run, then look at the results of the query and return the answer.
Unless the user specifies a specific number of examples they wish to obtain, always limit your query to at most {top_k} results.
You can order the results by a relevant column to return the most interesting examples in the database.
Never query for all the columns from a specific table, only ask for the relevant columns given the question.
You have access to tools for interacting with the database.
Only use the below tools. Only use the information returned by the below tools to construct your final answer.
You MUST double check your query before executing it. If you get an error while executing a query, rewrite the query and try again.

DO NOT make any DML statements (INSERT, UPDATE, DELETE, DROP etc.) to the database.

To start you should ALWAYS look at the tables in the database to see what you can query.
Do NOT skip this step.
Then you should query the schema of the most relevant tables.

Now we can create an instance of the prebuilt agent:

agent_executor = create_react_agent(llm, tools, 
state_modifier=system_message)

Note how the agent must select the next step, and how the flow can cycle between the agent and tools steps:

display(Image(agent_executor.get_graph().draw_mermaid_png()))

A diagram showing the workflow between tools and agent. The workflow is as follows: start, agent, then a split option to access tools (a recursive step), or to end. The diagram shows that after agent, you can optionally select tools or end, indicating that you can end without choosing tools.

Again, we can stream the agent’s execution to show us each step of its operation.

for step in agent_executor.stream(
    {"messages": [{"role": "user", "content": "Each drive has its own serial number. How many drives did each data center have on 9/1/2024?"}]},
    stream_mode="values",
):
    step["messages"][-1].pretty_print()

The output from this flow is over 300 lines long; I posted it in its entirety as a Gist, but I’ll summarize the steps here:

Question: Each drive has its own serial number. How many drives did each data center have on 9/1/2024?
The model calls the “list tables” tool.
The list tables tool responds with a single table name, drivestats.
The model calls the “get schema” tool, passing it the table name.
The get schema tool responds with the schema and three sample rows from the drivestats table.
The model submits a query to the “query checker” tool:
SELECT datacenter, COUNT(serial_number) AS drive_count FROM drivestats WHERE date = '2024-09-01' GROUP BY datacenter ORDER BY drive_count DESC LIMIT 10;
The query checker responds with the checked query, which is the same as its input. Note that the query checker only checks the SQL query’s syntax. The query contains the same data type mismatch as the query we generated earlier, as well as another error, as we’re about to discover.
The model submits the query to the “query executor” tool.
The query executor responds with a syntax error—Trino does not allow a trailing semi-colon on the query.
The model submits a modified query to the query checker tool:
SELECT datacenter, COUNT(serial_number) AS drive_count FROM drivestats WHERE date = '2024-09-01' GROUP BY datacenter ORDER BY drive_count DESC LIMIT 10
The query checker responds with the checked query, which is the same as its input.
The model submits the query to the “query executor” tool.
The query executor responds with a type mismatch error since the query tries to compare a string value with a date column.
The model submits a query with the necessary DATE type identifier to the query checker tool:
SELECT datacenter, COUNT(serial_number) AS drive_count FROM drivestats WHERE date = DATE '2024-09-01' GROUP BY datacenter ORDER BY drive_count DESC LIMIT 10
The query checker responds with the checked query, which is the same as its input.
The model submits the query to the “query executor” tool.
The query executor responds with a result set:
[ ('phx1', 89477), ('sac0', 78444), ('sac2', 60775), ('', 24080), ('iad1', 22800), ('ams5', 16139) ]
The model returns a message containing the answer:

On September 1, 2024, the following datacenters had the specified number of drives:

1. phx1: 89,477 drives
2. sac0: 78,444 drives
3. sac2: 60,775 drives
4. (unknown datacenter): 24,080 drives
5. iad1: 22,800 drives
6. ams5: 16,139 drives

These results show the datacenters with their respective drive counts.

Now let’s see if the model can calculate the annualized failure rate of a drive model. We’ll use the Seagate ST4000DM000, just because that is the drive model with the most days of operation in the dataset.

for step in agent_executor.stream(
        {"messages": [{"role": "user", "content": "Each drive has its own serial number. What is the annualized failure rate of the ST4000DM000 drive model?"}]},
        stream_mode="values",
):
    step["messages"][-1].pretty_print()

The agent’s response mixes Markdown and LaTex notation. I used QuickLaTeX to render the LaTex to images:

The annualized failure rate (AFR) for the ST4000DM000 drive model can be calculated using the following information:

– Total failures: 5,791

– Total drives: 37,040

– Time period: from May 10, 2013, to September 30, 2024, which is approximately 11.35 years.

The formula for calculating the annualized failure rate is:

The calculation for the annualized failure rate. It's total failures divided by total drives, multiplied by one over the total years, multiplied by 100.

Plugging in the numbers:

Real number for the annualize failure rate calculations. In this instance, the text reads 5791 divided by 37040, multiplied by one over 11.35, multiplied by 100, which equals approximately 13.77 percent.

Therefore, the annualized failure rate (AFR) of the ST4000DM000 drive model is approximately 13.77%.

It’s impressive that the agent shows its working so comprehensively, but, unfortunately, it arrives at the wrong answer. Those drives were not all running for the entire span of the Drive Stats dataset. The correct calculation involves determining the number of days with data for those drives and dividing it by 365 to get the correct number of years’ operation.

It’s clear that the model is not able to answer questions on drive reliability given the data available to it so far. The solution lies in prompt engineering—providing more context on the semantics of the data in the system prompt.

We can extend the default AI agent system prompt template to include specific instructions on working with the Drive Stats dataset:

prompt_template.messages[0].prompt.template += """
Each row of the drivestats table records one day of a drive’s operation, and contains the serial number of a drive, its model name, capacity in bytes, whether it failed on that day, SMART attributes and identifiers for the slot, pod, vault, cluster and data center in which it is located.

Use this calculation for the annualized failure rate (AFR) for a drive model over a given time period:

1. **drive_days** is the number of rows for that model during the time period.
2. **failures** is the number of rows for that model during the time period where **failure** is equal to 1.
3. **annual failure rate** is 100 * (**failures** / (**drive_days** / 365)).

Use double precision arithmetic in the calculation to avoid truncation errors. To convert an integer **i** to a double, use CAST(**i** AS DOUBLE)

Note that the date column is a DATE type, not a string. Use the DATE type identifier when comparing the date column to a string.

Do not add a semi-colon suffix to SQL queries."""

Now, when we ask the same question on the annual failure rate of the ST4000DM000 drive model, the AI agent generates a correct SQL query and a more concise, and correct, final response (you can inspect the full output here).

SELECT 100 * (CAST(COUNT(CASE WHEN failure = 1 THEN 1 END) AS DOUBLE) / (COUNT(*) / 365)) AS annual_failure_rate
FROM drivestats
WHERE model = 'ST4000DM000'

The annual failure rate (AFR) for the ST4000DM000 drive model is approximately 2.63%.

Let’s ask the AI agent for a statistic that we can corroborate from the Backblaze Drive Stats for Q3 2024 blog post.

response = agent_executor.invoke(
    {"messages": [{"role": "user", "content": "What was the annual failure rate of the ST8000NM000A drive model in Q3 2024?"}]}
)
response['messages'][-3].pretty_print()
display_markdown(response['messages'][-1].content, raw=True)

The query makes sense, and the response agrees with the table in the blog post:

SELECT 100 * (CAST(SUM(failure) AS DOUBLE) / (COUNT(*) / 365)) AS annual_failure_rate
FROM drivestats
WHERE model = 'ST8000NM000A' AND date >= DATE '2024-07-01' AND date < DATE '2024-10-01'

The annual failure rate (AFR) of the ST8000NM000A drive model in Q3 2024 is approximately 1.61%.

Interestingly, this time the SQL query used SUM(failure) to count the number of failures, rather than the equivalent, but rather long-winded COUNT(CASE WHEN failure = 1 THEN 1 END) it used in the previous query. Also, looking at the full response, we can see that, as directed by the custom prompt, the agent generated the correct syntax for comparing dates, so it didn’t need to correct and retry any queries.

Finally, let’s ask a more convoluted question, including the constraints given in the blog post:

response = agent_executor.invoke(
    {"messages": [{"role": "user", "content": "Considering only drive models which had at least 100 drives in service at the end of the quarter and which accumulated 10,000 or more drive days during the quarter, which drive had the most failures in Q3 2024, and what was its failure rate?"}]}
)
response['messages'][-3].pretty_print()
display_markdown(response['messages'][-1].content, raw=True)

Again, the AI agent is able to generate a valid SQL query, this time including a subquery, and its response matches the data from the blog post exactly:

WITH drive_stats AS (
    SELECT model,
           COUNT(DISTINCT serial_number) AS drive_count,
           COUNT(*) AS drive_days,
           COUNT(CASE WHEN failure = 1 THEN 1 END) AS failures
    FROM drivestats
    WHERE date >= DATE '2024-07-01' AND date < DATE '2024-10-01'
    GROUP BY model
    HAVING COUNT(DISTINCT serial_number) >= 100 AND COUNT(*) >= 10000
)
SELECT model,
       failures,
       100 * (CAST(failures AS DOUBLE) / (CAST(drive_days AS DOUBLE) / 365)) AS failure_rate
FROM drive_stats
ORDER BY failures DESC
LIMIT 10

The drive model with the most failures in Q3 2024 is the TOSHIBA MG08ACA16TA, which had 181 failures. Its failure rate during this period was approximately 1.84%.

Closing thoughts

My experience building an AI agent was astonishment at its ability to correctly generate quite complex SQL queries based on natural language instructions, tempered with frustration at its limitations, particularly the way that it would confidently generate an incorrect response, rather than saying “I’m sorry, but I don’t know how to do that.” Your AI agent development process should include generous testing time, as well as ongoing monitoring to ensure that it is coming up with the right answers.

The post Building an AI Agent with Backblaze B2, LangChain, and Drive Stats appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

AI for Enterprise: Getting Started

2025-01-09 Stephanie Doyle

Post Syndicated from Stephanie Doyle original https://www.backblaze.com/blog/ai-for-enterprise-getting-started/

A decorative image showing various cloud storage and business related icons.

AI is here to stay, and the question on everyone’s mind is how to implement it successfully. If you’re ready to implement AI in your business, consider this article a good jumping off point. I’ll talk about different options for integrating it into your operations and how to make it truly custom, based on your own data, and useful for your business.

More from AI 101

Want to read more about AI? We’ve got you covered in our AI 101 series. And, here’s a sampling that might be useful when you’re thinking about building AI into your business.

How many companies use AI today?

How many businesses are using AI, you ask? Well, let’s ask Google. According to their AI overview (yes, we appreciate the irony), anywhere between 55% and 83% of companies are using or exploring AI in some way.

A screenshot of the Google AI overview that results from the query "how many businesses use AI"?

It’s not lost on me that the above results illustrate some of the big limitations of AI—namely that it’s only as good as the data it’s trained on, it’s far from infallible, and it can’t replace humans wholesale especially when someone needs to fact check those results. Google’s AI overviews have been criticized for providing inaccurate information, hallucinating (with sometimes hilarious results), providing a neat answer to complicated questions, providing information from unreliable sources, potential for bias, and so on. Nevertheless, the feature has had several updates since it was first released (which at least means it’s no longer telling us to put glue on pizza).

But, setting all that aside, this is actually a great example to consider before we dig into options for incorporating AI into your business. AI Overviews have improved enough—for example, by adding things like source transparency—that we can easily add enough human oversight to consider the above directionally accurate. The landscape of technology is changing, and, ready or not, businesses are being forced to figure out how AI should fit into their strategies.

What we’ll talk about today

Today we’ll talk about some foundational topics you need to understand when deciding how to incorporate AI into your business. We’ll define the following:

Software as a service (SaaS) AI add-ons
AI as a service (AIaaS)
Foundation models
Retrieval augmented generation (RAG)

Those definitions will lead us quickly to some practical examples that illustrate how businesses are using AI.

Software as a service (SaaS) applications, aka, AI as a feature

You may have noticed that many of the web-based applications you are using are suddenly AI-powered or have AI capabilities. While some of that is marketing hype, this could be a way to get started with AI in your organization—by simply turning on a feature in a SaaS product you’re already using. There are lots of ways to do this—Slack, for example, offers AI tools for summarizing and answering questions to help teams work faster.

Example AI use case: AI in customer support

Generative AI capabilities such as chatbots are often added to customer-facing applications like your customer support service. The chatbot is trained using your product support materials or actual questions your staff previously answered.

By providing a cache of human-based questions and answers, the chatbot can be trained to respond in your unique company voice.

A screenshot of the Backblaze chatbot live on www.backblaze.com. — Oh hey, there’s ours!

Before you activate and use a built-in AI feature of an existing service, you’ll want to determine how you can measure any changes in overall productivity and user satisfaction. In the customer service example above, that could be capturing metrics such as a customer satisfaction rating, time to first contact, time-to-resolution, escalation ratio, and so on. Then establish a baseline for the existing system before engaging the AI assistant and set specific points where you will compare that baseline to the AI powered system.

Using an AI powered service has many benefits, but there are a number of considerations to contemplate:

You are limited in functionality by what the vendor provides.
What is the expertise of the software vendor in developing, training, and implementing an AI model?

What happens when the model data changes? For example, you’ve employed AI to respond to customer queries. What happens when you add a new product to your lineup or a new feature to an existing product? Is the model retrained? What are the costs? Does it still make economic sense given any new cost?
During the model creation and operational phases, ancillary files such as checkpoints, prompts, responses, and so on are created. Do you have visibility into these files and what analysis can you perform?
Given these ancillary files are derived in part from your original data, can you download these files to your central repository or is the data locked in the vendor’s application?

Artificial intelligence as a service (AIaaS)

AIaaS is one of the many areas of AI where definitions and capabilities are a moving target. That said, we’ll offer that AIaaS is an outsourced service that a cloud-based company provides to other organizations that gives that organization access to different AI models, algorithms, and other resources directly through the vendor’s cloud computing platform via a user interface (UI), API, or SDK connection. The aim is to make a user-friendly interface that simplifies the process of training and deploying AI models accessible to non-AI experts.

AIaaS is worth considering if you’re interested in working with artificial intelligence but you don’t have the in-house resources or expertise to build and manage your own AI technology. There are a broad range of solutions offered in this space which vary by the services provided, let’s categorize the services as follows.

Walled gardens:
- What they offer: In my experience, AIaaS providers in this group usually host most or all of the model training data, checkpoints, inferences, and prompts.
- Pros and cons: This is the most straight-forward option, but in practice, this method can be cost prohibitive and lacks transparency. There are few if any options to reduce the cost or economically transfer the model, its work products, or its data elsewhere.
- Who are they: The obvious ones that come to mind for me are companies like AWS, Google, and IBM Watson.
Mix-and-match:
- What they offer: Solutions in this group vary by the services they provide as well as add-on options and support services. They typically provide hosting services which are used to train, deploy, and use the model. They can also provide data analysis and cleansing for the model input, model testing, engineering support, and general support services as you might require.
- Pros and cons: As with the walled garden approach, once data is ingested or ancillary data is created within the system it may be difficult to access and if available expensive to retrieve. Often, they also represent companies that provide specialized services—for instance, companies that solve a type of problem, like a computer vision specialist vs. a natural language processing model, or, alternatively, a company that focuses on AI in IT operations, call center operations, cybersecurity, etc.
- Who are they: This group includes companies like Twelve Labs, Proofpoint, or Amplify. Note that there’s a bit of a porous line between some of the providers in this category and the following—think of it like a gradient.
Open cloud:
- What they offer: Providers in this group offer a variety of tools and services that, when combined, allow an organization to construct, test, operate, and maintain an AI-based solution.
- Pros and cons: The open cloud approach allows you to select the best of breed providers for the various stages of your AI project. It also allows you to have control over the model and its byproducts such as checkpoint data, inferences, and prompts key to ensuring the model is performing as expected. In summary, while your level of effort for this approach will be higher, you will have more control over your model and more importantly the data, your data.
- Who are they: This includes platforms like Hugging Face and vendors like OpenAI of ChatGPT fame. Hugging Face is intentionally open source, whereas OpenAI is under pressure to monetize models—one of the bigger evolving conversations in the AI landscape. Today, anyone can purchase an API access subscription from OpenAI to access the GPT-4 Chat from their application. Such subscriptions offer quick access to organizations that want a mature model but aren’t able to or interested in building one themselves.

The AIaaS approach is a good choice for organizations that lack expertise in building and operating AI systems. The approach you take, walled garden, mix-and-match, or open cloud, will affect how much access and flexibility you have with the data used and produced by the system. This may not be of interest today, but as your organization becomes more AI savvy, being able to access and share the data within the system could become important.

Foundation models

The term “foundation model” originated with the Stanford Institute for Human-Centered Artificial Intelligence’s (HAI) Center for Research on Foundation Models (CRFM) which defines it as “any model that is trained on broad data that can be adapted (e.g., fine-tuned) to a wide range of downstream tasks.” Most, but not all, foundation models are generative AI in form and perform tasks such as language processing, visual comprehension, code generation, and human-centered engagement.

Although foundation models are pre-trained, they can continue to learn from prompts during inference. An organization can develop tailored outputs using techniques such as prompt engineering, fine-tuning, and pipeline engineering. For example, prompt engineering requires you to enter a series of carefully curated prompts to the model such that over time the model infers more precise answers related to the subject matter of the prompts. This makes the model less generic and more specific to your organization.

When using a foundation model, you will need to capture and store all data used to fine-tune the model, for example the prompts and responses used for the prompt engineering process. This will allow you to analyze how the inference process is shifting over time.

Utilizing a foundation model as a starting point is a good choice, but techniques such as prompt engineering are far from being an exact science. Often such training can exacerbate a subtle bias in the existing model or introduce a new bias. This is especially true if the model is public facing.

Retrieval augmented generation (RAG)

Retrieval augmented generation (RAG) is a relatively new technique that allows AI models to link to external sources. These models are, in most cases, a generative AI model, such as a large language model (LLM). By using RAG techniques, external resources, often rich in technical content, can be leveraged as part of the model during inference to be part of the response to the user. One commonly cited example is having medical journals indexed via this technique so their content is reviewed when the model is generating a response. The same could be done with financial data, legal case law, and so on.

RAG works by adding code to the original generative AI model to continuously review defined external resources and convert them into machine-readable indices (vector databases) so they are available for inference. This means the core generative model does not have to be retrained, instead it can use new or updated sources on the fly. This allows you to use your data to make the model your own and lets you update the data sources to keep the model current.

This technique is extremely powerful, but it does require you to store the original model, the testing or validation data used, the external resources you are using to augment the model, their vector databases, and any prompts and inferred responses. Given the tools and utilities you will use to monitor and analyze how your RAG infused AI model is performing, a central cloud storage repository is a good choice for storing this data.

It’s all about the data—Your data

AI, at least in its current form, is not deus ex machina. Yes, ChatGPT and its ilk can create wonderful stories of fact or fiction and amazing, never before seen imagery, but without your data, they are marvelously generic. In other words, you and more precisely your data are the key to the value your organization will achieve in using AI.

As we have seen, there are a multitude of options. On one hand, we can hand off our data to a company, pay them handsomely, and let them build and run our AI models—the walled garden approach. While this is enticing, the reality is that AI is still a moving target with few rules and regulations in place and your visibility to what is happening to your data is limited as is your ability to do something if there is a problem.

At the other end is the open cloud approach. This allows you to choose the best-of-breed cloud based applications and cloud compute services to create and run your model. These applications and services can interact freely with your cloud storage platform to leverage your organization’s data while providing you complete visibility and control. Yes, it will require more investment on your part, but given the maturity of AI in general, it makes sense for you to keep a watchful eye on how AI is used in your organization and more importantly how well it is performing.

In short, AI requires your data to be truly useful to your organization. AI in its current form is still a young science, one that requires watching to ensure it does what is expected. That’s not paranoia, that’s just good business. To do this you will need unfettered affordable access to your data, the AI model, and its work products.

The post AI for Enterprise: Getting Started appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

AI 101: Building and Deploying an AI Model

2024-12-11 Stephanie Doyle

Post Syndicated from Stephanie Doyle original https://www.backblaze.com/blog/ai-101-building-and-deploying-an-ai-model/

A decorative image showing a computer, a cloud, and a building.

Should you build your own AI model? Or use other services to help you accelerate the process?

Once you’ve defined the problem you’re trying to solve and the AI model type that best fits your needs, these are the questions you’re faced with next—where to deploy an AI model and how to go about doing it. In most cases, there is very little reason for you to build, train, and deploy your AI model from scratch, particularly as more and more vendors are stepping in to help companies with all or some of the process. It’s fundamentally complex, takes tons of resources and requires specialized knowledge to do correctly.

Still, you should have a basic understanding of the AI model training and deployment processes, as these learnings will be useful as later on as you explore various predefined tools, applications, and services you can use to expedite or enhance your ability to use AI within your organization. That’s what I’m digging into today.

How AI model training works

There are several steps in training an AI model which include identification and gathering the data required, data cleansing and assembly, training the model, checkpointing, and, finally, model serving where the model is deployed into the production environment. Here’s an overview of the process.

A diagram that explains the AI model training process.

Let’s take a minute to explore each of the steps in a little more detail.

Step 1: Review

The organizational data needed to help educate your model will either be structured or unstructured. Structured data is found in databases, tables, and so on. Unstructured data is basically everything else. Some unstructured data is easy to process, such as text files, while other data is harder to extract, such as PDFs and images.

In general, the more data you can provide, the better your trained model can be. But, remember to include data that is not what you want as well—this helps models to hone in on the specific piece of information when things are similar. Take this example scenario, for instance:

You are monitoring hundreds of thousands of wooded acres to determine if there is a fire on the land. As part of training the model, you need to provide images of the legitimate flora and fauna along with images of fire. But you should also provide images of what is not fire, for example reflections of the sun or moon on a lake, a group of lightning bugs at night, car headlights, and so on.

Step 2: Clean

As the data is collected, it will need to be pre-processed, which involves several techniques such as cleaning the data to handle missing values, removing outliers, scaling features, encoding categorical variables, and splitting the data into training and testing sets. The data needs to be arranged in a manner acceptable to the model itself. This sounds relatively simple, but some studies show that this can take up to 80% of the total model development process time.

Step 3: Stage

This is a collection point for all of the clean, ready to be processed, data. This data will arrive as it is processed (cleaned) which can occur over several days or even weeks. Having this data on hand will be useful if the model is not generated correctly or in the future as a starting point to retrain the model.

Typically large amounts of your data will be cleaned and staged as it is readied to train the AI model. But, there are no special storage requirements for this data. It just needs to be readily available to be uploaded to the AI training environment when the time comes.

Step 4: Train

Model training is a resource intensive process where data is copied from staging to high-performance storage located in close proximity to whatever high-powered processor you’re rocking, usually a graphical processing unit (GPU). The GPUs then run the algorithms developed specifically for training the model, and the data is iteratively read and processed an indeterminate number of times until training is complete. Minimizing the time spent utilizing these expensive, high-powered storage and processing resources is critical in managing the overall cost of building the model. In other words: get in, process, and get out.

Step 5: Checkpoint

During the building of the model, the programming will often create snapshots of the status of the training process. This will include various variables, state changes, and so on. These snapshots are referred to as checkpoints. They initially will be written to local storage within the model training system, and are used to restart the training process from a known good state if something goes wrong.

Once the model training process is complete, checkpoints should be written to the same centralized data storage location as your staged data. The checkpoint data will become part of the documentation of the model and may be used for forensic purposes should the model not behave appropriately once it is deployed.

Step 6: Serve

Once the training process is complete, the model can be exported to your central storage location. This will once again help document the system, and from there the model can then be uploaded to the local or cloud compute environment where it will be used.

At this point you have a clean version of the source data, the checkpoints of the model created, and a copy of the model itself, all stored in your centralized location under your control and readily available should they be needed in the future.

AI model inference

The term inference is derived from the AI model’s perspective. At a high level, when given a prompt, the model infers its response from the trained model and its data. In simple terms, you’ve trained your model to recognize cats, and then you bring it new data (a picture of a family reunion) and ask your model if it sees any cats in the photo (I’m hoping the answer is yes).

In AI, the prompt is viewed as new data which is compared to the model’s existing data to determine a response typically in the form of a decision, prediction, or new content as is the case with generative AI models.

An overview of the inference process is below:

In some AI systems, the inference process flow includes some additional code to help improve your model. These types of filters can have a range of uses and can happen on either the input or the output stage. For example, if you want to filter inappropriate queries or information, you could include something like keyword filtering when data (the prompt) is input. Or, you could introduce a toxicity detection filter on the output side, which reviews responses and prevents harmful or offensive content to be presented to the user.

A perhaps better understood problem that filters like this can address is how to get accurate and up-to-date information out of your queried response. On the input flow side of things, retrieval-augmented generation (RAG) directs a trained model to incorporate and weight more heavily information from trusted sources that the user designates. On the output side, you might add a hallucination prevention filter, which would stop the model from presenting false or misleading information.

More broadly, you’ll notice that both the prompt and response are saved. It is important to review this information on a periodic basis. This is especially true if the model is public facing, if you are using a model which can change over time such as a foundation model, or if you are using a model which utilizes RAG techniques to include new or external content.

In all of those examples, your model can drift as new information is introduced, and, as we noted above, getting the right information and cleaning it properly is likely the most time-intensive and important stage of this process. Not for nothing is the phrase “knowledge is power” a truism—in the age of AI, knowledge is power and good data is king.

The post AI 101: Building and Deploying an AI Model appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

Leverage powerful generative-AI capabilities for Java development in the Eclipse IDE public preview

2024-11-27 Vinicius Senger

Post Syndicated from Vinicius Senger original https://aws.amazon.com/blogs/devops/amazon-q-developer-eclipse-preview/

Today marks an exciting milestone for Eclipse developers everywhere: we’re thrilled to announce the public preview of Amazon Q Developer in the Eclipse IDE. This integration brings the power of AI-driven development directly into one of the most popular development environments. In this blog post, we’ll explore some of its game-changing features, and show you how this fusion of traditional IDE and cutting-edge AI can supercharge your development tasks across the software development lifecycle.

Background

As I sit down to write this announcement, I can’t help but feel a wave of nostalgia mixed with excitement. This is one of the most requested IDEs for Amazon Q Developer and I can see why. Like many developers of my generation, Eclipse was where I cut my teeth in Java programming. I remember downloading that bulky IDE, waiting for what felt like an eternity as it installed, and then staring at the workspace, both intimidated and thrilled by the possibilities it presented.

Eclipse has been a stalwart in the world of software development for over two decades now. It’s been there through the evolution of Java, from the early days of J2SE to the modern Java Platform. For countless developers, it’s been more than just an IDE – it’s been a trusty companion on our coding journeys.

But times have changed. The landscape of software development is evolving at a rapid pace, and at the heart of this revolution is Generative AI. We’re witnessing a paradigm shift in how we approach coding, testing, and deploying applications. And today, I’m thrilled to announce a game-changing integration that brings together the familiar comfort of Eclipse with the cutting-edge capabilities of Amazon Q Developer.

Introducing Amazon Q Developer plugin for Eclipse IDE

Amazon Q Developer is the most capable AI-powered assistant for software development that reimagines the experience across the entire software development lifecycle, making it easier and faster to build, secure, manage, and optimize applications on AWS. By bringing this powerhouse directly into Eclipse, we’re not just adding a feature – we’re opening up a new world of possibilities for Java developers. Whether you’re a seasoned Java veteran or just starting your development journey, Amazon Q Developer in Eclipse is set to become your indispensable generative AI-assistant that accelerates tasks across the software development lifecycle, including coding.

During the public preview, Eclipse developers will be able to chat with Amazon Q Developer about their project and code faster with inline code suggestions. By leveraging Amazon Q Developer customizations they’ll be able to receive tailored responses that conform to their team’s internal tools and services, helping developers build faster while enhancing productivity across the entire software development lifecycle. Let’s take a look at some of the features that will be available to you during public preview.

Inline suggestions

Inline code suggestions is an excellent starting point for experiencing Amazon Q Developer AI-powered capabilities. As you type, Amazon Q Developer analyzes your code, comments, and naming conventions to provide context-aware suggestions. Note, that the more comprehensive and well-organized your code documentation is, the more accurate and helpful Amazon Q Developer’s suggestions will be.

Amazon Q is using a class with pre-existing methods as context for the inline code suggestions.

Chat

The Amazon Q Developer chat interface serves as a versatile tool for various development needs. You can request code snippet suggestions, ask questions about your project, or seek guidance on implementing specific functionalities. For example, you could ask for sample code to calculate a Fast Fourier Transform in Java or seek assistance in enhancing a database class with additional fields using UUID.

Seeking guidance from Amazon Q Developer on how to apply Fast Fourier transform with Java to detect frequencies.

You can also seamlessly integrate code snippets into your chat interactions with Amazon Q Developer. By selecting a code fragment and sending it to the chat window (by right-clicking in the editor and selecting Amazon Q > Send To Prompt), you can ask specific questions about the code or request modifications, enabling a more interactive and context-aware coding experience.

Integrate code snippets by using the right-click menu to ask Amazon Q Developer to integrate it into your chat experience.

You can also use the right-click menu to ask Amazon Q Developer to explain, refactor, fix, or optimize a selected fragment of code.

Customization

With customizations, Amazon Q Developer can assist with software development in ways that conform to your team’s internal libraries, proprietary algorithmic techniques, and enterprise code style. Customizations must first be configured by an administrator; they can then be selected in the IDE using the menu in the Amazon Q Developer panel. For more information, please refer to the user guide.

Conclusion

The Amazon Q Developer plugin for Eclipse IDE preview represents a significant step forward in enhancing the development experience within this trusted platform. By integrating AI-powered tools such as inline suggestions and chat, Amazon Q Developer empowers developers to work more efficiently across different programming tasks. Whether you’re maintaining legacy code, building new features, or troubleshooting complex issues, Amazon Q Developer streamlines your workflow, allowing you to focus on what matters most — writing great code.

To get started, install the Amazon Q Developer plugin into your Eclipse IDE.

Analyzing your AWS Cost Explorer data with Amazon Q Developer: Now Generally Available

2024-11-26 Riya Dani

Post Syndicated from Riya Dani original https://aws.amazon.com/blogs/devops/analyzing-your-aws-cost-explorer-data-with-amazon-q-developer-now-generally-available/

We are excited to announce the general availability of the cost analysis capability in Amazon Q Developer. This powerful feature integrates Q Developer’s natural language processing capabilities with AWS Cost Explorer, revolutionizing how you analyze and understand your AWS costs. Initially launched in preview on April 30, 2024, the Amazon Q cost analysis capability now offers enhanced functionality, allowing users to gain deeper insights into their cloud spending through simple, conversational interactions.

In this blog, we will highlight the key features and capabilities of analyzing your Cost Explorer data with Amazon Q Developer, including its ability to handle complex cost queries, provide context-aware responses, and offer actionable insights into your AWS spending.

Simplifying Cost Analysis with Natural Language Queries

At the heart of Amazon Q for AWS cost management is its ability to understand and respond to natural language queries. This feature reduces the learning curve to get valuable cost insights from Cost Explorer.

Users can now simply type their questions in plain English, such as:

“What were my top 5 most expensive services last month?”
“How much did my S3 costs increase between Q1 and Q2?”
“Did we receive any credits last quarter, and if so, how much?”

Q Developer interprets these questions, processes the relevant data, and provides detailed, actionable insights.

For our first example, imagine you’re a Cloud Architect who wants to understand the cost implications of recent architectural changes. You could open Amazon Q in the AWS Management Console and enter a prompt such as: “Show me the breakdown of EC2 costs by instance type for the last 30 days”

User entering prompt in Amazon Q Developer chat in the AWS Management Console about breakdown of EC2 instance types for a specific time period, and Amazon Q listing the results.

Figure 1: Q Developer listing breakdown of EC2 instance types for a specific time period

As shown in Figure 1, Q Developer provides a detailed breakdown of the EC2 instance types for the last 30 days.

Let’s consider another scenario. A FinOps professional responsible for reporting on cloud costs across multiple departments could ask “What were last month’s costs broken down by Cost Category “Cost Center”?

Amazon Q Developer in the AWS Management Console providing a detailed cost breakdown by cost category in response to a user's natural language query.

Figure 2: Q Developer delivers a comprehensive cost analysis breakdown by cost category for the previous month.

Figure 2 showcases Q Developer’s capability to instantly generate detailed cost insights based on custom categories. This feature empowers users to make data-driven decisions for more effective cost allocation and budget planning.

Finally, let’s say you are an IT professional who wants to understand what your future costs will look like, based on current workloads and recent trends. You could ask “What are my forecasted costs for Q1 of next year?”

Amazon Q Developer in the AWS Management Console providing a cost forecast for Q1 of 2025.

Figure 3: Q Developer provides forecasted cost data from AWS Cost Explorer.

Figure 3 shows Q Developer ’s ability to provide both historical and forecasted costs, helping customers plan and predict their spending.

Here are some other examples of questions you can now explore when analyzing your Cost Explorer data with Amazon Q:

What percentage of our total costs last month were attributed to tag key = “Environment”, value = “Dev”?
Which services had the highest month-over-month cost increase in September?“
Which linked accounts spent the most last month?
What is our forecasted spend for the next three months?
What were my costs broken down by tag key “Project”?”

Verifying data and diving deeper

Q Developer provides transparency on the specific AWS Cost Explorer parameters that were used to retrieve the data to answer your questions. This transparency allows you to verify that the data presented is exactly what you were looking for. Additionally, each response includes a link to a matching view in AWS Cost Explorer, so you can dive deeper and visualize your data.

Amazon Q Developer in the AWS Management Console providing a link to visualize the data in AWS Cost Explorer.

Figure 4: Q Developer provides a link to a matching view in AWS Cost Explorer.

Figure 4 demonstrates how Q Developer bridges natural language queries with AWS Cost Explorer’s powerful visualization capabilities. This integration allows users to effortlessly transition from conversational insights to comprehensive graphical representations of their cost data, facilitating more thorough analysis and informed decision-making.

Conclusion

The general availability of Q Developer for AWS Cost Management marks a significant milestone in simplifying cloud financial management. By leveraging natural language processing and context-aware responses, Amazon Q makes it easier than ever for users across various roles – from FinOps professionals to application developers– to gain valuable insights into their AWS spending.

These new features streamline the process of cost analysis and forecasting, improving efficiency and enabling data-driven decision-making for AWS users. We encourage you to explore Amazon Q for cost analysis and experience firsthand how it can transform your approach to cloud cost management.

To get started with cost analysis in Q Developer, simply log in to the AWS Management Console and click on Amazon Q icon on the right side of the console. For more information on pricing and availability, please visit our Cost analysis in Amazon Q Developer documentation.

Expanded resource awareness in Amazon Q Developer

2024-11-20 Brendan Jenkins

Post Syndicated from Brendan Jenkins original https://aws.amazon.com/blogs/devops/expanded-resource-awareness-in-amazon-q-developer/

Recently, Amazon Q Developer announced expanded support for account resource awareness with Amazon Q in the AWS Management Console along with the general availability of Amazon Q Developer in AWS Chatbot, enabling you to ask questions from Microsoft Teams or Slack. Additionally, Amazon Q will now provide context-aware assistance for your questions about resources in your account depending on where you are in the console. Amazon Q in the console gives you the ability to use natural language with the Amazon Q Developer chat capability to list resources in your AWS account, get specific resource details, and ask about related resources, launched in preview on April 30, 2024.

In this blog, I will highlight the new expanded functionality of this feature in Amazon Q Developer including understanding relationships between account resources, context-awareness, and the general availability of the AWS Chatbot integration with Microsoft Teams and Slack.

Expanded account resource awareness with Amazon Q Developer

Prior to the launch of the expanded support, you could ask Amazon Q Developer to list resources in your AWS Account with prompts such as “List all my EC2 instances in us-east-1” and the service would list all your Amazon Elastic Compute Cloud (Amazon EC2) instances. Now, with the expanded support, you can ask more complex questions about your AWS account resources. I will show a few examples in this section of this post.

For our first example, imagine that you’re a developer who is responsible for maintaining code as a part of the software development lifecycle (SDLC) and you frequently use AWS Lambda for development and Amazon Relational Database Service (RDS) in the backend as a part of your development process. With this new update, a developer could open a new Q chat in the AWS Management Console, and enter a prompt such as: “Which RDS clusters are due for an update?”

User entering prompt Amazon Q Developer chat in the AWS management console about listing all RDS clusters that need updates in their account and Amazon Q listing those Databases.

Figure 1: Amazon Q Developer listing RDS clusters needing an update

As a result, the Amazon Q Developer console chat will return a list of all your Amazon RDS clusters that have available updates as shown in Figure 1 above.

Now, for another example, you want to update any Lambda functions in your AWS account that had a Simple Notification Service (SNS) topic as a trigger due to moving to a new SNS topic you recently created. To identify which SNS topics are still being used, you could enter a prompt such as “List all the SNS topics that trigger a lambda function.”

User entering prompt Amazon Q Developer chat in the AWS management console about listing all SNS topics that trigger a lambda function and Amazon Q listing the SNS topics as an output.

Figure 2: Amazon Q listing SNS topics that are lambda triggers

As shown in the prior example, Amazon Q Developer was able to identify any SNS topics in the form of Amazon resource name (ARN) that was set to trigger a lambda function in the AWS account as intended.

Additionally, you can ask a follow up question in the same chat to investigate more. You can send a prompt such as “Which lambda function uses the arn:aws:sns:us-east-1:76859XXXX:FailoverHealthcheck SNS topic?”

User entering prompting Amazon Q Developer chat with a follow up question in the AWS management console about which Lambda is associated with an SNS topic.

Figure 3: Asking Q Developer a follow up question about a resource

From Figure 3 above, you can see that there is a Lambda function/endpoint associated with the SNS topic resource that Amazon Q Developer was able to identify.

Outside of the examples above, here are some other prompts/examples that can be explored for the expanded support:

– “Do I have any ECS clusters with pending tasks?”

– “Are there any ECS clusters in my account with services in DRAINING status?”

Amazon Q Developer understands where you are in the console

Amazon Q Developer in the AWS Management Console now provides context-aware assistance for your questions about resources in your account. This feature allows you to ask questions directly related to the console page you’re viewing, eliminating the need to specify the service or resource in your query. Q Developer uses the current page as additional context to provide more accurate and relevant responses, streamlining your interaction with AWS services and resources.

Prior to the update, a user would have to prompt, “What is the public IPv4 address of my instance i-08ccXXXXXX?” Now, if you are viewing an EC2 instance in the console and prompt Amazon Q, “What is the public IPv4 address of my instance?” you will not need to specify the instance you are referring to.

User entering prompt Amazon Q Developer chat in the AWS management console about what the IP address is of the instance on the page.

Figure 4: Asking Amazon Q about an EC2 instance being viewed

In figure 4 above, Amazon Q’s console chat was able to use its context-awareness to pick up on what the IPv4 address was on the console page where I was currently working, despite me not specifying which instance I was referring to.

AWS ChatBot can now answer questions about AWS resources in Microsoft Teams and Slack

Recently, we announced the general availability of Amazon Q Developer in AWS Chatbot, which provides answers to customers’ AWS resource related queries in Microsoft Teams and Slack. This gives teams the ability to quickly find relevant resources to troubleshoot issues using natural language queries in the chat channels of Microsoft Teams or Slack.

For example, you could integrate the AWS Chatbot Service with Amazon Q Developer to allow you to enter a prompt in Slack such as “@aws show EC2 instances in running state in us-east-1”.

Figure 5: Amazon Q listing all EC2 resources in Slack

As shown in figure 5 above, Amazon Q was able to list all the EC2 resources and place them into a slack channel showing an example of the functionality in action.

Conclusion

Amazon Q Developer has enhanced its cloud resource management capabilities, enabling more intuitive and intelligent interactions with AWS resources. The new features allow developers to ask complex, context-aware questions about their cloud infrastructure directly through the AWS Management Console, Microsoft Teams, and Slack. Users can now easily discover new details about specific resources with natural language queries that provide precise, contextual information. These improvements represent a significant step forward in simplifying cloud resource management, making it faster and more user-friendly for development teams to understand, track, and maintain their AWS environments. To learn more about chatting with your AWS resources, check out Console documentation and AWS Chatbot documentation.

About the authors

Manage access controls in generative AI-powered search applications using Amazon OpenSearch Service and Amazon Cognito

2024-11-19 Karim Akhnoukh

Post Syndicated from Karim Akhnoukh original https://aws.amazon.com/blogs/big-data/manage-access-controls-in-generative-ai-powered-search-applications-using-amazon-opensearch-service-and-aws-cognito/

Organizations of all sizes and types are using generative AI to create products and solutions. A common adoption pattern is to introduce document search tools to internal teams, especially advanced document searches based on semantic search. In semantic search, documents are stored as vectors, a numeric representation of the document content, in a vector database such as Amazon OpenSearch Service, and are retrieved by performing similarity search with a vector representation of the search query.

In a real-world scenario, organizations want to make sure their users access only documents they are entitled to access. They are looking for a reliable and scalable solution to implement robust access controls to make sure these documents are only accessible to individuals who have a legitimate business need and the appropriate level of authorization. The permission mechanism has to be secure, built on top of built-in security features, and scalable for manageability when the user base scales out. Maintaining proper access controls for these sensitive assets is paramount, because unauthorized access could lead to severe consequences, such as data breaches, compliance violations, and reputational damage.

In this post, we show you how to manage user access to enterprise documents in generative AI-powered tools according to the access you assign to each persona.

Common use cases

The following are industry-specific use cases for document access management across different departments:

In R&D and engineering, access to product design documents evolves from restricted to broader as development progresses
HR maintains open access to general policies while limiting access to sensitive employee information
Finance and accounting documents require varying levels of access for auditing and executive decision-making
Sales and marketing teams carefully manage customer data and strategies, implementing tiered access for different roles and departments

These examples demonstrate the need for dynamic, role-based access control to balance information sharing with confidentiality in various business contexts.

Solution overview

By combining the powerful vector search capabilities of OpenSearch Service with the access control features provided by Amazon Cognito, this solution enables organizations to manage access controls based on custom user attributes and document metadata.

This approach simplifies the management of access rights, making sure only authorized users can access and interact with specific documents based on their roles, departments, and other relevant attributes. Following this approach, you can manage the access to your organization’s documents at scale. The following diagram depicts the solution architecture.

The solution workflow consists of the following steps:

The user accesses a smart search portal and lands on a web interface deployed on AWS Amplify.
The user authenticates through an Amazon Cognito user pool and an access token is returned to the client. This access token will be used to retrieve the key pair custom attributes assigned to the user. In our case, we created two custom attributes (custom:department and custom:access_level).
For each user query, an API is invoked on Amazon API Gateway to process the request. Each invocation includes the user access token in the header.
The API is integrated with AWS Lambda, which processes the user query and generates the answers based on available documents and user access using retrieval augmented generation (RAG). The process starts by creating a vector based on the question (embedding) by invoking the embedding model.
A query is sent to OpenSearch Service that includes the following:
1. The embedding vector generated.
2. User custom attributes retrieved by Lambda based on their access token, by calling the Amazon Cognito GetUser API.
3. The query relies on the support of an efficient k-NN filter in OpenSearch Service to perform the search.
Pre-filtered documents that relate to the user query are included in the prompt of the large language model (LLM) that summarizes the answer. Then, Lambda replies back to the web interface with the LLM completion (reply).
If the user’s access needs to be modified (assigned attributes), an API call is made through API Gateway to a Lambda function that processes the request to add or update the custom attributes’ value for a specific user.
New attributes are reflected in the user’s profile in Amazon Cognito.

Our solution is implemented and wrapped within AWS Cloud Development Kit (AWS CDK) stacks, which are available in the GitHub repo.

Our sample documents assume a fictional manufacturing company called Unicorn Robotics Factory, which develops robotic unicorns. The dataset contains over 900 documents that are a mix of engineering, roadmap, and business reporting documents. The following is an example of a document’s content:

**CONFIDENTIAL - UNICORNS ROBOTICS INTERNAL DOCUMENT**

**Project: "Galactic Unicorn"**

Unicorns Robotics is proud to announce the development of our latest project, the "Galactic Unicorn". 
This top-secret project aims to create a robotic unicorn that can travel through space and time, bringing magic and joy to children and adults alike.....

The associated metadata file for this document consists of the following:

{ "department": "research", "access_level": "confidential" }

Our solution in the GitHub repo takes care of loading the documents with associated metadata tags. For illustration purposes, we used the following mapping for the users and document access.

This solution is meant to delegate access management to the application tier, to simplify the implementation of use cases like generative AI-powered document search tools. However, if your use case requires a stricter approach to control document access, like multi-tenant environments or field-level security, you might want to use the fine-grained access control feature in OpenSearch Service. In our solution, we manage the access on the document level according to the assigned metadata.

Prerequisites

To deploy the solution, you need the following prerequisites:

An AWS account. If you don’t already have an AWS account, you can create one.
Your access to the AWS account must have AWS Identity and Access Management (IAM) permissions to launch AWS CloudFormation templates that create IAM roles.
The AWS Command Line Interface (AWS CLI) installed.
node.js and npm installed for the frontend.
Docker installed.
The AWS CDK configured. For more information, see Getting started with the AWS CDK.
In case of LLM inference based on Amazon SageMaker, a sufficient service limit to deploy an ml.g5.12xlarge instance for the SageMaker endpoint. If needed, you can initiate a quota increase request. Refer to Service Quotas for more details.

Deploy the solution

To deploy the solution to your AWS account, refer to the Readme file in our GitHub repo.

Query documents with different personas

Now let’s test the application using different personas. In this example, we use the same users with their corresponding custom attributes as illustrated in the solution overview.

To start, let’s log in using the researcher account and run the search around a confidential document.

We ask, “What is the projected profit margin of the Galactic Unicorn project?” and get the result as shown in the following screenshot.

The question invokes a query to OpenSearch Service using the custom attributes assigned to the researcher. The following code illustrates how the query is structured:

for attr, values in user_attributes.items():
        must_conditions.append(
            {
                "bool": {
                    "should": [{"term": {attr: value}} for value in values],
                    "minimum_should_match": 1,
                }
            }
        )

query = {
        "size": 5,
        "query": {
            "knn": {
                "doc_embedding": {
                    "vector": query_vector,
                    "k": 10,
                    "filter": {"bool": {"must": must_conditions}},
                }
            }
        },
    }

Let’s sign out and log in again with an engineer profile to test the same query. Based on the assigned attributes and document metadata, the result should look like that in the following screenshot.

If you tried to query some support documents, you will get the desired answer, as shown in the following screenshot.

Modify user access

As depicted in the solution diagram, we’ve added a feature in the web interface to allow you to modify user access, which you could use to perform further tests. To do so, log in as a tool admin and choose Manage Attributes. Then modify the custom attribute value for a given user, as shown in the following screenshot.

Clean up

When deleting a stack, most resources will be deleted upon stack deletion, but that’s not the case for all resources. The Amazon Simple Storage Service (Amazon S3) bucket, Amazon Cognito user pool, and OpenSearch Service domain will be retained by default. However, our AWS CDK code altered this default behavior by setting the RemovalPolicy to DESTROY for the mentioned resources. If you want to retain them, you can adjust the RemovalPolicy in the AWS CDK code for the different resources.

You can use the following command to clean up the resources deployed to your AWS account:

make destroy

Conclusion

This post illustrated how to build a document search RAG solution that makes sure only authorized users can access and interact with specific documents based on their roles, departments, and other relevant attributes. It combines OpenSearch Service and Amazon Cognito custom attributes to make a tag-based access control mechanism that makes it straightforward to manage at scale.

For demonstration purposes, the following points weren’t included in the AWS CDK code. However, they’re still applicable and you might want to work on them before deploying for production purposes:

OpenSearch Service best practices, such as instance sizing and using primary nodes
Advanced document chunking strategies for RAG implementations, such as recursive or semantic chunking

About the Authors

Karim Akhnoukh is a Solutions Architect at AWS working with manufacturing customers in Germany. He is passionate about applying machine learning and generative AI to solve customers’ business challenges. Besides work, he enjoys playing sports, aimless walks, and good quality coffee.

Ahmed Ewis is a Senior Solutions Architect at AWS GenAI Labs. He helps customers build generative AI-based solutions to solve business problems. When not collaborating with customers, he enjoys playing with his kids and cooking.

Fortune Hui is a Solutions Architect at AWS Hong Kong, working with conglomerate customers. He helps customers and partners build big data platform and generative AI applications. In his free time, he plays badminton and enjoys whisky.

Introducing the next-level of AI-powered workflows with Amazon Q Developer inline chat

2024-10-29 Jose Yapur

Post Syndicated from Jose Yapur original https://aws.amazon.com/blogs/devops/amazon-q-developer-inline-chat/

Earlier today, Amazon Q Developer announced support for inline chat. Inline chat combines the benefits of in-IDE chat with the ability to directly update code, allowing developers to describe issues or ideas directly in the code editor, and receive AI-generated responses that are seamlessly integrated into their codebase. In this post, I will introduce the new inline chat and discuss when to use this new capability to get the most value from Amazon Q Developer.

Background

I started using Q Developer (previously called Amazon CodeWhisperer) when it first launched in June 2022. This initial release included support for inline suggestions, which automatically generated code completions based on existing code and comments. Inline suggestions resulted in significant productivity gains.

Later that year, OpenAI released ChatGPT, and generative AI-powered chat became a hot topic. Personally, I found the chat experience more helpful when I was unsure how to accomplish a task. The chat interface not only generated code, but also provided explanatory context. I preferred to use inline suggestions when I knew what I was doing, and chat when I was learning something new. Therefore, I was thrilled when Amazon Q Developer added chat to the IDE in 2023, as I could use it to explain coding concepts, generate code and tests, and improve existing code. Having chat in the IDE helps me stay on task and in a state of focus and flow.

I have been using both inline suggestions and chat for the past year equally. While I love both options, I still felt there was room for improvement. For example, when fixing a bug, inline suggestions excel at generating new code, but do not easily allow me to update the existing code. Chat allows me to update existing code, but the response is provided in the chat window rather than being directly integrated into my code. This is where inline chat aims to improve the workflow.

Introducing inline chat

Today, we are excited to announce inline chat for Visual Studio Code (VS Code) and JetBrains. Inline chat allows me to provide additional context, such as a description of the bug I’m trying to fix, directly in the code editor. The AI-generated response is then seamlessly merged into my existing code, rather than requiring me to copy and paste from a separate chat window. I can easily review the suggested changes and accept, or decline, them with minimal effort. This new capability is ideal for editing an existing file to fix issues, optimize code, refactor code, add comments. And, it’s included in Amazon Q Developer’s expansive Free tier.

Inline chat is really powerful and helps me do more complex things quickly and accurately. There’s a lot that goes into building an assistant, but one important component is the underlying model, and inline chat is the first Amazon Q Developer capability powered by the latest version of Anthropic’s Claude 3.5 Sonnet, which launched on October 22nd. This new model “shows wide-ranging improvements on industry benchmarks, with particularly strong gains in agentic coding.” As I write this, upgraded Claude 3.5 Sonnet is the top performing model on the SWE-bench, solving 49% of the verified dataset which consists of 500 real-world GitHub issues. This demonstrates the impressive capabilities of the latest Anthropic model.

Amazon Q Developer is built on Amazon Bedrock, a fully managed service for building generative AI applications that offers a choice of high-performing foundation models (FMs) from Amazon and leading AI companies. Amazon Q uses multiple FMs, including FMs from Amazon, and routes tasks to the FM that is the best fit for the job. Amazon Q is constantly getting better, and we regularly change or refine the underlying models to improve performance and take advantage of the latest technologies, as we have latest version of Anthropic’s Claude 3.5 Sonnet launching just a week ago.

By powering the new inline chat capability with this cutting-edge Anthropic model, Amazon Q Developer is delivering an AI assistant that can help you save time, while tackling your most complex coding challenges with unparalleled capabilities. And with the seamless model updates handled behind the scenes, you can be confident that your experience will only continue to improve over time. Let’s take a moment to see how inline chat works.

Refactoring code

Let’s see the inline chat in action. Imagine that I have a class that displays messages on a web page. It started simple, but over time I have added a few variants to change the color, display warning messages, and display error messages. I don’t want to continue adding more and more variants, so I will ask Amazon Q Developer to refactor them. I select all four methods, and press ⌘ + I on Mac or Ctrl + I on Windows. Then, I prompt Q Developer to “refactor these into a single method with optional parameters for the color and message type.”

Animated gif showing four similar methods in VSCode. Inline chat refactors the methods into one with optional parameters. This is displayed as a diff and then merged.

As you can see in the previous video, Amazon Q Developer refactored my code into a single method. Note that Q is showing me which lines it will add, in green, and which lines it will remove, in red. I’m happy with this recommendation, so I will hit return to accept it. Q Developer then merges the changes into my code.

While I could have done this in the chat pane, I would have to copy the response, and merge it to my code manually. Inline chat returns a diff so I can see exactly which portions will be added and removed. Alternatively, I could have used inline suggestions to generate a new method. However, I would have been left to clean up the old methods manually. The new inline chat feature excels at updating code in place.

Adding documentation

I’ll demonstrate another practical use of inline chat. Recently, I was working on a complex data processing algorithm that I had written some time ago. While the code functioned correctly, it lacked proper documentation. Recognizing that this could hinder future maintenance and comprehension by the team, I decided to add comprehensive documentation.

Animated gif showing a python function in VSCode. Inline chat is used to ask Q to add comments. This is displayed as a diff and then merged.

I selected the entire function and activated the inline chat using ⌘ + I on Mac (or Ctrl + I on Windows). In the chat interface, I entered the prompt “Add documentation including descriptive comments throughout the code.” Q Developer swiftly analyzed the code and generated appropriate documentation. The suggestions appeared with new text highlighted in green, indicating additions.

Amazon Q Developer created a detailed comment block at the beginning of the script, including parameter descriptions and return value information. It also added inline comments throughout, explaining complex logic and calculations. After a thorough review of the suggested documentation, I accepted the changes by hitting return or clicking on “Accept”. Q Developer then integrated the new documentation seamlessly into the existing code.

This feature proves particularly useful when dealing with legacy code or preparing for new team members to join a project. It helps maintain consistency in documentation style across the codebase and significantly reduces the time required compared to manual documentation. The resulting well-documented code is self-explanatory, which can streamline the development process. Inline chat has made it more efficient to keep codebases well-documented and maintainable.

Conclusion

With the introduction of inline chat, Amazon Q Developer has taken the next leap in AI-powered development, combining the best of both worlds – combining the benefits of in-IDE chat with the ability to directly update code. This new capability, powered by Anthropic’s latest Claude 3.5 Sonnet, empowers developers to tackle complex coding challenges efficiently. Whether it’s generating new features, refactoring existing code, or adding comprehensive documentation, inline chat streamlines the workflow, eliminating the need to switch between separate chat and editor windows. By continuously integrating the latest advancements in AI language models, Amazon Q Developer ensures that developers always have access to the most advanced and capable generative AI-powered assistant, handling the undifferentiated heavy lifting and allowing them to focus on what they do best – writing high-quality, innovative code.

You can try it out today by updating or installing your Amazon Q Developer extension on VS Code or JetBrains. This update will help you unleash your productivity right in your IDE.

AI 101: Classification vs. Predictive vs. Generative AI

2024-10-22 Stephanie Doyle

Post Syndicated from Stephanie Doyle original https://www.backblaze.com/blog/ai-101-classification-vs-predictive-vs-generative-ai/

A decorative image showing several buildings with digital lines flowing upward into a cloud.

It may seem like generative AI is the only game in town, or at least the only AI model worth paying attention to. But folks have been using AI models to do all kinds of things for years before ChatGPT, Claude, and Gemini came on the scene.

Today, I’m talking about the three different broadly defined categories of AI—classification, predictive, and generative—and what they’re good for.

Classification vs. Predictive vs. Generative AI Models: What’s the Diff?

Classification and predictive models have been foundational to AI for decades, powering applications like spam filters, cyber security tools, big data analysis, and demand forecasting. However, with recent advances, generative models like GPT and DALL-E have taken the spotlight, bringing up interesting existential (and legal) questions about the nature of creativity and creative work going forward. Understanding the distinctions and history of these models is key to grasping how AI continues to shape industries and innovation today.

Let’s see which category best applies to your particular problem.

AI classification models

A classification model is built to recognize, understand, and group data into preset categories. The model is fully trained using the training data and then evaluated using test data before being used to respond to unseen data. In general, such models infer answers for the current moment in time, for example, deciding whether an email is spam or phishing. In that case, the decision is based on comparing the incoming email to a model trained on previously classified email messages, both ones that the user has set or ones that the platform has. (The two are related, of course, as the platform’s filters often update to include aggregate user data.)

In business, classification models drive applications like spam detection, customer segmentation, and fraud detection. Healthcare uses classification models to diagnose diseases based on medical images or patient data. In finance, they help identify high-risk transactions. Social media platforms rely on these models to filter content, detect hate speech, and recommend posts. Overall, classification models are key to organizing large datasets efficiently and making decisions based on patterns, helping automate and optimize numerous industry processes.

AI prediction models

Predictive AI models utilize historical data, patterns, and trends to train the model, so they can be used to make informed decisions about future events or outcomes. Using Drive Stats as an example, we could theoretically build a model that, when given data about a particular drive model and failure rates, predicts the chance that a given hard drive will fail in the next 90 days. Predictive AI models typically require large amounts of data to be trained and are computationally expensive to generate.

Predicting Hard Drive Failure Rates with AI

Okay, we were being coy when we said “example.” Check out Andy Klein’s Tech Day 2024 presentation, “Predicting Hard Drive Failure Rates with AI” to see how this kind of predictive model works.

AI prediction models help predict customer behavior, sales trends, and demand, aiding in decision making and resource planning. In finance, these models are crucial for stock price forecasting, risk assessment, and credit scoring. Healthcare utilizes prediction models for patient outcome predictions, disease progression, and treatment effectiveness. They are also applied in weather forecasting, supply chain optimization, and energy usage management. By analyzing past data, prediction models provide insights that help organizations anticipate trends, make proactive decisions, and optimize performance across various industries.

Generative AI models

You know this one. Generative AI is about creating (sort of) new content. It uses neural networking, deep learning, and other techniques to infer and generate content that is based on patterns it observes in existing content all while mimicking the style and structure as requested. Image generators such as DALL-E and Stable Diffusion, and large language models like ChatGPT, Claude, and Gemini are easily accessible AI applications which have brought AI into the public eye.

Generative AI is at turns the thing that will revolutionize everything, a scary specter with near-sentience that will steal your job, or a big hallucinating fluke that tells you to put glue on pizza. There are some pretty cool use cases—for one, researchers are using generative AI for new drug discovery. But you’re most likely to run into generative AI in the following use cases: customer service chatbots, coding assistants, marketing support, and general business assistants that generate transcripts and summaries.

Unlocking the power of AI

Even with all the current hype around generative AI we are still in the early stages of development when it comes to AI systems given they are most useful in responding to queries based on the subject matter with which they were trained.

For example, an AI model trained to play chess might find playing checkers to be difficult. While the board, and number of players are the same, can a chess-playing AI model infer the allowed checker moves based on its understanding of chess? Even generative AI models like ChatGPT which are trained on a wide variety of subjects are still lacking a key ingredient to be truly useful to your organization: your data.

An AI chatbot, for example, isn’t going to perform the way you want it to without being powered by your organization’s data. And, how do you build an AI powered tool while keeping your private data private? We started to explore that very question in a recent webinar, “Leveraging your Cloud Storage Data in AI/ML Apps and Services.”

Tune in to learn more about the various ways AI/ML applications use and store data and get insights from our customers who leverage Backblaze B2 Cloud Object Storage for their AI/ML needs.

The post AI 101: Classification vs. Predictive vs. Generative AI appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

Is AI Right for Your Business? 4 Questions to Ask

2024-10-10 Stephanie Doyle

Post Syndicated from Stephanie Doyle original https://www.backblaze.com/blog/is-ai-right-for-your-business-4-questions-to-ask/

A decorative image showing several layers of computer screen folding into the cloud.

AI is everywhere—powering chatbots, generating images, even deciding what you binge watch next. It’s no wonder businesses of all sizes are feeling compelled to jump on the AI bandwagon. But before you get swept up in the AI hype, here’s the question you need to ask: Is AI right for your business and the problem you’re trying to solve?

Where AI truly becomes a change agent is when it is powered by your organization’s data to deliver relevant, insightful, and actionable observations to you in a timely manner. The reality is, while AI is really cool, without your unique data it provides your organization few competitive advantages. Of course, releasing proprietary, or even sensitive, information to a robot connected to the internet can be risky—and you want to make sure your (and your clients’) information doesn’t end up in surprising places.

But just because everyone’s talking about AI doesn’t mean it’s the magic bullet for every problem. Like any strategic investment, it takes careful consideration. So, before you hand over your data to a machine, let’s explore whether AI is really what your business needs—or if it’s just another shiny object in the tech landscape.

Where do I start?

Today, many organizations are somewhere along the AI/ML path. Most are experimenting with AI, some are actively building applications, and a handful have successfully deployed a solution. Like any other project, before you start trying to use AI in your organization, the first thing you should do is define the problem you are trying to solve. Only then can you determine if you really need AI as a part of the solution.

Ask yourself the following questions about the project. If you answer yes to all four items, the project is AI-worthy:

1. Do you want AI to replace tedious, repetitive tasks?

Start by identifying the business problem in specific, measurable terms. Determine the scope of the problem, its frequency, and the impact it has on your business. Is it recurring and time consuming? If the problem is complex, repetitive, or data-intensive, it might be suitable for AI.

2. Do you want to use AI because you can’t consistently apply a set of logical rules to answer the questions at hand?

If the problem involves large amounts of data that is difficult to process manually where the answer is derived by combining and weighing multiple factors, it may be a candidate for an AI-based solution. On the other hand, just because it can be automated doesn’t mean you need an AI solution—AI is expensive in terms of power and processing resources. If you’re running a simple routine task over and over, you might be just as well off using traditional programming methods. But, when you’re solving a complex task, you need a structure that is not a strict binary, and that’s when you might want to use AI.

3. Will you use AI for problems that humans can solve, but AI can solve much faster?

AI should help your organization solve problems it finds extremely difficult or nearly impossible to solve otherwise. AI excels at tackling complex problems that overwhelm traditional methods, such as processing vast amounts of data, recognizing intricate patterns, or making real-time predictions. If your business is facing challenges that manual processes or standard software can’t handle effectively, AI can step in to provide powerful, scalable solutions that would otherwise be out of reach.

But remember, AI should work with you, not against you. Understand how AI will integrate into your workflow and whether it aligns with your overall business strategy to avoid creating unnecessary complications or disrupting ongoing operations.

4. Do you intend for AI to increase productivity of a function or group?

Most AI projects are productivity based, even those that seem otherwise. Even AI projects aimed at improving customer experiences, like personalized recommendations, ultimately enhance productivity by streamlining interactions and reducing manual effort. At their core, most AI implementations are designed to automate tasks, optimize processes, or extract actionable insights, all of which drive greater efficiency and cost savings. And, that means you need to analyze the potential return on investment (ROI).

AI integration requires an investment in technology, data management, and often specialized personnel. Weigh the cost of implementing AI against the potential benefits it could bring. Will it save time or reduce costs? By how much? If the financial or productivity benefits outweigh the costs, AI may be a worthwhile investment.

Where to next?

Clearly defining the problem and deciding if it’s suitable for an AI-based solution is really just the first step. Once the problem is defined, you open up another set of questions around whether and how to implement it. Do you have the right data, resources, and expertise to support an AI solution? How will it integrate with your systems? How will you measure success? The answers to all of these questions should absolutely inform your decision-making, but understanding if you’re applying AI to the right problem is your starting point. Without that, you’re using a sledgehammer to crack a nut, so to speak.

The post Is AI Right for Your Business? 4 Questions to Ask appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

How to Build Your Own LLM with Backblaze B2 + Jupyter Notebook

2024-08-13 Pat Patterson

Post Syndicated from Pat Patterson original https://www.backblaze.com/blog/how-to-build-your-own-llm-with-backblaze-b2-jupyter-notebook/

Last month, Backblaze Principal Storage Cloud Storyteller, Andy Klein, and I presented a webinar, Leveraging your Cloud Storage Data in AI/ML Apps and Services, in which we explored the various ways AI/ML applications use and store data. In addition to sharing insights from our customers who leverage Backblaze B2 Cloud Object Storage for their AI/ML needs, we also demonstrated a simple AI use case: a retrieval-augmented generation (RAG) chatbot answering questions based on custom context downloaded from a private Backblaze B2 Bucket. After the webinar, I posted the demo source code to a GitHub repository: https://github.com/backblaze-b2-samples/ai-rag-examples.

In this blog post, I’ll recap a little of the webinar, and explain how you can use the demo source code as a basis for your own experiments with RAG and large language models (LLMs).

But why, Pat?

You might be asking yourself, why build a chatbot when multiple online services, such as ChatGPT, are just a click away? It’s a good question. The main motivation is privacy. Suppose you want your chatbot to use confidential data in generating its responses. By running an LLM on your own machine, whether on-premises or on a virtual machine (VM) under your control in a public cloud provider’s infrastructure, you eliminate the risks of that data surfacing in response to a question from a user outside your organization.

In the webinar, I showed two Python apps running on a GPU-equipped VM at Vultr, one of Backblaze’s compute partners. The first app used the GPT4All Python SDK to create a very simple conversational chatbot running a local instance of a large language model (LLM), which it used in answering general questions. Here’s an example from the webinar:

Ask me a question: What were the causes of the First World War?
The causes of the First World War are complex and varied, but some key factors include nationalism, imperialism, militarism, alliances between nations, and the assassination of Archduke Franz Ferdinand. Nationalism led to tensions between ethnic groups and their respective governments, while imperialism saw countries competing for territories and resources around the world. Militarism also played a role as many European powers built up large armies and navies in preparation for potential conflicts. The complex web of alliances between nations meant that any conflict could quickly escalate into a latIer war, and the assassination of Archduke Franz Ferdinand by a Serbian nationalist was the spark that ignited the powder keg of tensions already present in Europe at the time.

Now, I’m not a history scholar, but that looks like a pretty good answer to me! (History scholars, you are welcome to correct me.)

The second app used the Langchain framework to implement a more elaborate chatbot, again running on my own machine at Vultr, that used PDF data downloaded from a private bucket in Backblaze B2 as context for answering questions. As much as I love our webinar attendees, I didn’t want to share genuinely confidential data with them, so I used our Backblaze B2 Cloud Storage documentation as context. The chatbot was configured to use that context, and only that context, in answering questions. From the webinar:

Ask me a question about Backblaze 82: What's the difference between the master application key and a standard application key?

The master application key provides complete access to your account with all capabilities, access to all buckets, and has no file prefix restrictions or expiration. On the other hand, a standard application key is limited to the level of access that a user needs and can be specific to a bucket.

Ask me a question about Backblaze B2: What were the causes of the First World War?

The exact cause of the First World War is not mentioned in these documents.

The chatbot provides a comprehensive, accurate answer to the question on Backblaze application keys, but doesn’t answer the question on the causes of the First World War, since it was configured to use only the supplied context in generating its response.

During the webinar’s question-and-answer session, an attendee posed an excellent question: “Can you ask [the chatbot] follow-up questions where it can use previous discussions to build a proper answer based on content?” I responded, “Yes, absolutely; I’ll extend the demo to do exactly that before I post it to GitHub.” What follows are instructions for building a simple RAG chatbot, and then extending it to include message history.

Building a simple RAG chatbot

After the webinar, I rewrote both demo apps as Jupyter notebooks, which allowed me to add commentary to the code. I’ll provide you with edited highlights here, but you can find all of the details in the RAG demo notebook.

The first section of the notebook focuses on downloading PDF data from the private Backblaze B2 Bucket into a vector database, a storage mechanism particularly well suited for use with RAG. This process involves retrieving each PDF, splitting it into uniformly sized segments, and loading the segments into the database. The database stores each segment as a vector with many dimensions—we’re talking hundreds, or even thousands. The vector database can then vectorize a new piece of text—say a question from a user—and very quickly retrieve a list of matching segments.

Since this process can take significant time—about four minutes on my MacBook Pro M1 for the 225 PDF files I used, totaling 58MB of data—the notebook also shows you how to archive the resulting vector data to Backblaze B2 for safekeeping and retrieve it when running the chatbot later.

The vector database provides a “retriever” interface that takes a string as input, performs a similarity search on the vectors in the database, and outputs a list of matching documents. Given the vector database, it’s easy to obtain its retriever:

retriever = vectorstore.as_retriever()

The prompt template I used in the webinar provides the basic instructions for the LLM: use this context to answer the user’s question, and don’t go making things up!

prompt_template = """Use the following pieces of context to answer the question at the end. 
    If you don't know the answer, just say that you don't know, don't try to make up an answer.
    
    {context}
    
    Question: {question}
    Helpful Answer:"""

prompt = PromptTemplate(
    template=prompt_template, input_variables=["context", "question"]
)

The RAG demo app creates a local instance of an LLM, using GPT4All with Nous Hermes 2 Mistral DPO, a fast chat-based model. Here’s an abbreviated version of the code:

model = GPT4All(
    model='Nous-Hermes-2-Mistral-7B-DPO.Q4_0.gguf',
    max_tokens=4096,
    device='gpu'
)

LangChain, as its name suggests, allows you to combine these components into a chain that can accept the user’s question and generate a response.

chain = (
        {"context": retriever, "question": RunnablePassthrough()}
        | prompt
        | model
        | StrOutputParser()
)

As mentioned above, the retriever takes the user’s question as input and returns a list of matching documents. The user’s question is also passed through the first step, and, in the second step, the prompt template combines the context with the user’s question to form the input to the LLM. If we were to peek inside the chain as it was processing the question about application keys, the prompt’s output would look something like this:

Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

<Text of first matching document>

<Text of second matching document>

Question: What's the difference between the master application key and a standard application key?

Helpful Answer:

This is the basis of RAG: building an LLM prompt that contains the information required to generate an answer, then using the LLM to distill that prompt into an answer. The final step of the chain transforms the data structure emitted by the LLM into a simple string for display.

Now that we have a chain, we can ask it a question. Again, abbreviated from the sample code:

question = 'What is the difference between the master application key and a standard application key?'
answer = chain.invoke(question)

Adding message history to the simple RAG chatbot

The first step of extending the chatbot is to give the LLM new instructions, similar to its previous prompt template, but including the message history:

prompt_template = """Use the following pieces of context and the message history to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.
    
Context: {context}
    
History: {history}
    
Question: {question}

Helpful Answer:"""

prompt = PromptTemplate(
    template=prompt_template, input_variables=["context", "question", "history"]
)

The chain must be modified slightly to accommodate the message history:

chain = (
    {
        "context": (
                itemgetter("question")
                | retriever
        ),
        "question": itemgetter("question"),
        "history": itemgetter("history")
    }
    | prompt
    | model
    | StrOutputParser()
)

Now, we define a very simple in-memory message store that uses a session_id parameter to manage multiple simultaneous conversations:

store = {}

def get_session_history(session_id: str) -> BaseChatMessageHistory:
    if session_id not in store:
        store[session_id] = InMemoryChatMessageHistory()
    return store[session_id]

LangChain provides a wrapper, RunnableWithMessageHistory, that combines the message store with the above chain to create a new chain with message history capability:

with_message_history = RunnableWithMessageHistory(
    chain,
    get_session_history,
    input_messages_key="question",
    history_messages_key="history",
)

Now we can feed a series of related questions into the new chain:

questions = [
    'What is the difference between the master application key and a standard application key?',
    'Which one would I use to work with a single bucket?',
    'Can you tell me anything more about this topic?'
]

for question in questions:
    print(f'\n{question}\n')
    answer = with_message_history.invoke(
        {"question": question},
        config={"configurable": {"session_id": "abc123"}},
    )
    print(f'{answer}\n')

I have to admit, I was pleasantly surprised by the results:

What is the difference between the master application key and a standard application key?

A master application key grants broad access privileges, while a standard application key is limited to the level of access that a user needs.

Which one would I use to work with a single bucket?

You would use a standard application key to work with a single bucket as it has limited access and only grants permissions needed for specific tasks, unlike the master application key which provides broad access privileges.

Can you tell me anything more about this topic?

Sure! The master application key is typically used by developers during development or testing phases to grant full access to all resources in a Backblaze B2 account, while the standard application key provides limited permissions and should be used for production environments where security is paramount.

Processing this series of questions on my MacBook Pro M1 with no GPU-acceleration took three minutes and 25 seconds, and just 52 seconds with its 16-core GPU. For comparison, I spun up a VM at Ori, another Backblaze partner offering GPU VM instances, with an Nvidia L4 Tensor Core GPU and 24GB of VRAM. The only code change required was to set the LLM device to ‘cuda’ to select the Nvidia GPU. The Ori VM answered those same questions in just 18 seconds.

An image of an Nvidia L4 Tensor Core GPU — The Nvidia L4 Tensor Core GPU: not much to look at, but crazy-fast AI inference!

Go forth and experiment

One of the reasons I refactored the demo apps was that notebooks allow an interactive, experimental approach. You can run the code in a cell, make a change, then re-run it to see the outcome. The RAG demo repository includes instructions for running the notebooks, and both the GPT4All and LangChain SDKs can run LLMs on machines with or without a GPU. Use the code as a starting point for your own exploration of AI, and let us know how you get on in the comments!

The post How to Build Your Own LLM with Backblaze B2 + Jupyter Notebook appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

Directing ML-powered Operational Insights from Amazon DevOps Guru to your Datadog event stream

2023-07-13 Bineesh Ravindran

Post Syndicated from Bineesh Ravindran original https://aws.amazon.com/blogs/devops/directing_ml-powered_operational_insights_from_amazon_devops_guru_to_your_datadog_event_stream/

Amazon DevOps Guru is a fully managed AIOps service that uses machine learning (ML) to quickly identify when applications are behaving outside of their normal operating patterns and generates insights from its findings. These insights generated by DevOps Guru can be used to alert on-call teams to react to anomalies for business mission critical workloads. If you are already utilizing Datadog to automate infrastructure monitoring, application performance monitoring, and log management for real-time observability of your entire technology stack, then this blog is for you.

You might already be using Datadog for a consolidated view of your Datadog Events interface to search, analyze and filter events from many different sources in one place. Datadog Events are records of notable changes relevant for managing and troubleshooting IT Operations, such as code, deployments, service health, configuration changes and monitoring alerts.

Wherever DevOps Guru detects operational events in your AWS environment that could lead to outages, it generates insights and recommendations. These insights/recommendations are then pushed to a user specific Datadog endpoint using Datadog events API. Customers can then create dashboards, incidents, alarms or take corrective automated actions based on these insights and recommendations in Datadog.

Datadog collects and unifies all of the data streaming from these complex environments, with a 1-click integration for pulling in metrics and tags from over 90 AWS services. Companies can deploy the Datadog Agent directly on their hosts and compute instances to collect metrics with greater granularity—down to one-second resolution. And with Datadog’s out-of-the-box integration dashboards, companies get not only a high-level view into the health of their infrastructure and applications but also deeper visibility into individual services such as AWS Lambda and Amazon EKS.

This blogpost will show you how to utilize Amazon DevOps guru with Datadog to get real time insights and recommendations on their AWS Infrastructure. We will demonstrate how an insight generated by Amazon DevOps Guru for an anomaly can automatically be pushed to Datadog’s event streams which can then be used to create dashboards, create alarms and alerts to take corrective actions.

Solution Overview

When an Amazon DevOps Guru insight is created, an Amazon EventBridge rule is used to capture the insight as an event and routed to an AWS Lambda Function target. The lambda function interacts with Datadog using a REST API to push corresponding DevOps Guru events captured by Amazon EventBridge

The EventBridge rule can be customized to capture all DevOps Guru insights or narrowed down to specific insights. In this blog, we will be capturing all DevOps Guru insights and will be performing actions on Datadog for the below DevOps Guru events:

DevOps Guru New Insight Open
DevOps Guru New Anomaly Association
DevOps Guru Insight Severity Upgraded
DevOps Guru New Recommendation Created
DevOps Guru Insight Closed

Figure 1: Amazon DevOps Guru Integration with Datadog with Amazon EventBridge and AWS.

Solution Implementation Steps

Pre-requisites

Before you deploy the solution, complete the following steps.

- Datadog Account Setup: We will be connecting your AWS Account with Datadog. If you do not have a Datadog account, you can request a free trial developer instance through Datadog.
- Datadog Credentials: Gather the credentials of Datadog keys that will be used to connect with AWS. Follow the steps below to create an API Key and Application Key
  Add an API key or client token
  1. 1. To add a Datadog API key or client token:
    2. Navigate to Organization settings, then click the API keys or Client Tokens
    3. Click the New Key or New Client Token button, depending on which you’re creating.
    4. Enter a name for your key or token.
    5. Click Create API key or Create Client Token.
    6. Note down the newly generated API Key value. We will need this in later steps
    7. Figure 2: Create new API Key.
  Add application keys
  - To add a Datadog application key, navigate to Organization Settings > Application Keys.If you have the permission to create application keys, click New Key.Note down the newly generated Application Key. We will need this in later steps

Add Application Key and API Key to AWS Secrets Manager : Secrets Manager enables you to replace hardcoded credentials in your code, including passwords, with an API call to Secrets Manager to retrieve the secret programmatically. This helps ensure the secret can’t be compromised by someone examining your code,because the secret no longer exists in the code.
Follow below steps to create a new secret in AWS Secrets Manager.

Open the Secrets Manager console at https://console.aws.amazon.com/secretsmanager/
Choose Store a new secret.
On the Choose secret type page, do the following:
1. For Secret type, choose other type of secret.
2. In Key/value pairs, either enter your secret in Key/value
  pairs

Figure 3: Create new secret in Secret Manager.

Click next and enter “DatadogSecretManager” as the secret name followed by Review and Finish

Figure 4: Configure secret in Secret Manager.

- - Enable DevOps Guru for your applications by following these steps or you can follow this blog to deploy a sample serverless application that can be used to generate DevOps Guru insights for anomalies detected in the application.
  - AWS Cloud9 is recommended to create an environment as AWS Serverless Application Model (SAM) CLI and AWS Command Line Interface (CLI) are pre-installed and can be accessed from a bash terminal.
  - Install and set up SAM CLI – Install the SAM CLI
  - Download and set up Java. The version should be matching to the runtime that you defined in the SAM template. yaml Serverless function configuration – Install the Java SE Development Kit 11
  - Maven – Install Maven

Option 1: Deploy Datadog Connector App from AWS Serverless Repository

The DevOps Guru Datadog Connector application is available on the AWS Serverless Application Repository which is a managed repository for serverless applications. The application is packaged with an AWS Serverless Application Model (SAM) template, definition of the AWS resources used and the link to the source code. Follow the steps below to quickly deploy this serverless application in your AWS account

- - Login to the AWS management console of the account to which you plan to deploy this solution.
  - Go to the DevOps Guru Datadog Connector application in the AWS Serverless Repository and click on “Deploy”.
  - The Lambda application deployment screen will be displayed where you can enter the Datadog Application name
    
    Figure 5: DevOps Guru Datadog connector.
    
    Figure 6: Serverless Application DevOps Guru Datadog connector.
  - After successful deployment the AWS Lambda Application page will display the “Create complete” status for the serverlessrepo-DevOps-Guru-Datadog-Connector application. The CloudFormation template creates four resources,
    1. Lambda function which has the logic to integrate to the Datadog
    2. Event Bridge rule for the DevOps Guru Insights
    3. Lambda permission
    4. IAM role
  - Now skip Option 2 and follow the steps in the “Test the Solution” section to trigger some DevOps Guru insights/recommendations and validate that the events are created and updated in Datadog.

Option 2: Build and Deploy sample Datadog Connector App using AWS SAM Command Line Interface

As you have seen above, you can directly deploy the sample serverless application form the Serverless Repository with one click deployment. Alternatively, you can choose to clone the GitHub source repository and deploy using the SAM CLI from your terminal.

The Serverless Application Model Command Line Interface (SAM CLI) is an extension of the AWS CLI that adds functionality for building and testing serverless applications. The CLI provides commands that enable you to verify that AWS SAM template files are written according to the specification, invoke Lambda functions locally, step-through debug Lambda functions, package and deploy serverless applications to the AWS Cloud, and so on. For details about how to use the AWS SAM CLI, including the full AWS SAM CLI Command Reference, see AWS SAM reference – AWS Serverless Application Model.

Before you proceed, make sure you have completed the pre-requisites section in the beginning which should set up the AWS SAM CLI, Maven and Java on your local terminal. You also need to install and set up Docker to run your functions in an Amazon Linux environment that matches Lambda.

Clone the source code from the github repo

git clone https://github.com/aws-samples/amazon-devops-guru-connector-datadog.git

Build the sample application using SAM CLI

$cd DatadogFunctions

$sam build
Building codeuri: $\amazon-devops-guru-connector-datadog\DatadogFunctions\Functions runtime: java11 metadata: {} architecture: x86_64 functions: Functions
Running JavaMavenWorkflow:CopySource
Running JavaMavenWorkflow:MavenBuild
Running JavaMavenWorkflow:MavenCopyDependency
Running JavaMavenWorkflow:MavenCopyArtifacts

Build Succeeded

Built Artifacts  : .aws-sam\build
Built Template   : .aws-sam\build\template.yaml

Commands you can use next
=========================
[*] Validate SAM template: sam validate
[*] Invoke Function: sam local invoke
[*] Test Function in the Cloud: sam sync --stack-name {{stack-name}} --watch
[*] Deploy: sam deploy --guided

This command will build the source of your application by installing dependencies defined in Functions/pom.xml, create a deployment package and saves it in the. aws-sam/build folder.

Deploy the sample application using SAM CLI

$sam deploy --guided

This command will package and deploy your application to AWS, with a series of prompts that you should respond to as shown below:

- - Stack Name: The name of the stack to deploy to CloudFormation. This should be unique to your account and region, and a good starting point would be something matching your project name.
  - AWS Region: The AWS region you want to deploy your application to.
  - Confirm changes before deploy: If set to yes, any change sets will be shown to you before execution for manual review. If set to no, the AWS SAM CLI will automatically deploy application changes.
  - Allow SAM CLI IAM role creation:Many AWS SAM templates, including this example, create AWS IAM roles required for the AWS Lambda function(s) included to access AWS services. By default, these are scoped down to minimum required permissions. To deploy an AWS CloudFormation stack which creates or modifies IAM roles, the CAPABILITY_IAM value for capabilities must be provided. If permission isn’t provided through this prompt, to deploy this example you must explicitly pass --capabilities CAPABILITY_IAM to the sam deploy command.
  - Disable rollback [y/N]: If set to Y, preserves the state of previously provisioned resources when an operation fails.
  - Save arguments to configuration file (samconfig.toml): If set to yes, your choices will be saved to a configuration file inside the project, so that in the future you can just re-run sam deploy without parameters to deploy changes to your application.

After you enter your parameters, you should see something like this if you have provided Y to view and confirm ChangeSets. Proceed here by providing ‘Y’ for deploying the resources.

Initiating deployment
=====================

        Uploading to sam-app-datadog/0c2b93e71210af97a8c57710d0463c8b.template  1797 / 1797  (100.00%)


Waiting for changeset to be created..

CloudFormation stack changeset
---------------------------------------------------------------------------------------------------------------------
Operation                     LogicalResourceId             ResourceType                  Replacement
---------------------------------------------------------------------------------------------------------------------
+ Add                         FunctionsDevOpsGuruPermissi   AWS::Lambda::Permission       N/A
                              on
+ Add                         FunctionsDevOpsGuru           AWS::Events::Rule             N/A
+ Add                         FunctionsRole                 AWS::IAM::Role                N/A
+ Add                         Functions                     AWS::Lambda::Function         N/A
---------------------------------------------------------------------------------------------------------------------


Changeset created successfully. arn:aws:cloudformation:us-east-1:867001007349:changeSet/samcli-deploy1680640852/bdc3039b-cdb7-4d7a-a3a0-ed9372f3cf9a


Previewing CloudFormation changeset before deployment
======================================================
Deploy this changeset? [y/N]: y

2023-04-04 15:41:06 - Waiting for stack create/update to complete

CloudFormation events from stack operations (refresh every 5.0 seconds)
---------------------------------------------------------------------------------------------------------------------
ResourceStatus                ResourceType                  LogicalResourceId             ResourceStatusReason
---------------------------------------------------------------------------------------------------------------------
CREATE_IN_PROGRESS            AWS::IAM::Role                FunctionsRole                 -
CREATE_IN_PROGRESS            AWS::IAM::Role                FunctionsRole                 Resource creation Initiated
CREATE_COMPLETE               AWS::IAM::Role                FunctionsRole                 -
CREATE_IN_PROGRESS            AWS::Lambda::Function         Functions                     -
CREATE_IN_PROGRESS            AWS::Lambda::Function         Functions                     Resource creation Initiated
CREATE_COMPLETE               AWS::Lambda::Function         Functions                     -
CREATE_IN_PROGRESS            AWS::Events::Rule             FunctionsDevOpsGuru           -
CREATE_IN_PROGRESS            AWS::Events::Rule             FunctionsDevOpsGuru           Resource creation Initiated
CREATE_COMPLETE               AWS::Events::Rule             FunctionsDevOpsGuru           -
CREATE_IN_PROGRESS            AWS::Lambda::Permission       FunctionsDevOpsGuruPermissi   -
                                                            on
CREATE_IN_PROGRESS            AWS::Lambda::Permission       FunctionsDevOpsGuruPermissi   Resource creation Initiated
                                                            on
CREATE_COMPLETE               AWS::Lambda::Permission       FunctionsDevOpsGuruPermissi   -
                                                            on
CREATE_COMPLETE               AWS::CloudFormation::Stack    sam-app-datadog               -
---------------------------------------------------------------------------------------------------------------------


Successfully created/updated stack - sam-app-datadog in us-east-1

Once the deployment succeeds, you should be able to see the successful creation of your resources. Also, you can find your Lambda, IAM Role and EventBridge Rule in the CloudFormation stack output values.

You can also choose to test and debug your function locally with sample events using the SAM CLI local functionality.Test a single function by invoking it directly with a test event. An event is a JSON document that represents the input that the function receives from the event source. Refer the Invoking Lambda functions locally – AWS Serverless Application Model link here for more details.

$ sam local invoke Functions -e ‘event/event.json’

Once you are done with the above steps, move on to “Test the Solution” section below to trigger some DevOps Guru insights and validate that the events are created and pushed to Datadog.

Test the Solution

To test the solution, we will simulate a DevOps Guru Insight. You can also simulate an insight by following the steps in this blog. After an anomaly is detected in the application, DevOps Guru creates an insight as shown below

Figure 7: DevOps Guru insight for DynamoDB

For the DevOps Guru insight shown above, a corresponding event is automatically created and pushed to Datadog as shown below. In addition to the events creation, any new anomalies and recommendations from DevOps Guru is also associated with the events

Figure 8 : DevOps Guru Insight pushed to Datadog event stream.

Cleaning Up

To delete the sample application that you created, In your Cloud 9 environment open a new terminal. Now type in the AWS CLI command below and pass the stack name you provided in the deploy step

aws cloudformation delete-stack --stack-name <Stack Name>

Alternatively ,you could also use the AWS CloudFormation Console to delete the stack

Conclusion

This article highlights how Amazon DevOps Guru monitors resources within a specific region of your AWS account, automatically detecting operational issues, predicting potential resource exhaustion, identifying probable causes, and recommending remediation actions. It describes a bespoke solution enabling integration of DevOps Guru insights with Datadog, enhancing management and oversight of AWS services. This solution aids customers using Datadog to bolster operational efficiencies, delivering customized insights, real-time alerts, and management capabilities directly from DevOps Guru, offering a unified interface to swiftly restore services and systems.

To start gaining operational insights on your AWS Infrastructure with Datadog head over to Amazon DevOps Guru documentation page.

About the authors:

DevSecOps with Amazon CodeGuru Reviewer CLI and Bitbucket Pipelines

2023-04-28 Bineesh Ravindran

Post Syndicated from Bineesh Ravindran original https://aws.amazon.com/blogs/devops/devsecops-with-amazon-codeguru-reviewer-cli-and-bitbucket-pipelines/

DevSecOps refers to a set of best practices that integrate security controls into the continuous integration and delivery (CI/CD) workflow. One of the first controls is Static Application Security Testing (SAST). SAST tools run on every code change and search for potential security vulnerabilities before the code is executed for the first time. Catching security issues early in the development process significantly reduces the cost of fixing them and the risk of exposure.

This blog post, shows how we can set up a CI/CD using Bitbucket Pipelines and Amazon CodeGuru Reviewer . Bitbucket Pipelines is a cloud-based continuous delivery system that allows developers to automate builds, tests, and security checks with just a few lines of code. CodeGuru Reviewer is a cloud-based static analysis tool that uses machine learning and automated reasoning to generate code quality and security recommendations for Java and Python code.

We demonstrate step-by-step how to set up a pipeline with Bitbucket Pipelines, and how to call CodeGuru Reviewer from there. We then show how to view the recommendations produced by CodeGuru Reviewer in Bitbucket Code Insights, and how to triage and manage recommendations during the development process.

Bitbucket Overview

Bitbucket is a Git-based code hosting and collaboration tool built for teams. Bitbucket’s best-in-class Jira and Trello integrations are designed to bring the entire software team together to execute a project. Bitbucket provides one place for a team to collaborate on code from concept to cloud, build quality code through automated testing, and deploy code with confidence. Bitbucket makes it easy for teams to collaborate and reduce issues found during integration by providing a way to combine easily and test code frequently. Bitbucket gives teams easy access to tools needed in other parts of the feedback loop, from creating an issue to deploying on your hardware of choice. It also provides more advanced features for those customers that need them, like SAML authentication and secrets storage.

Solution Overview

Bitbucket Pipelines uses a Docker container to perform the build steps. You can specify any Docker image accessible by Bitbucket, including private images, if you specify credentials to access them. The container starts and then runs the build steps in the order specified in your configuration file. The build steps specified in the configuration file are nothing more than shell commands executed on the Docker image. Therefore, you can run scripts, in any language supported by the Docker image you choose, as part of the build steps. These scripts can be stored either directly in your repository or an Internet-accessible location. This solution demonstrates an easy way to integrate Bitbucket pipelines with AWS CodeReviewer using bitbucket-pipelines.yml file.

You can interact with your Amazon Web Services (AWS) account from your Bitbucket Pipeline using the OpenID Connect (OIDC) feature. OpenID Connect is an identity layer above the OAuth 2.0 protocol.

Now that you understand how Bitbucket and your AWS Account securely communicate with each other, let’s look into the overall summary of steps to configure this solution.

Fork the repository
Configure Bitbucket Pipelines as an IdP on AWS.
Create an IAM role.
Add repository variables needed for pipeline
Adding the CodeGuru Reviewer CLI to your pipeline
Review CodeGuru recommendations

Now let’s look into each step in detail. To configure the solution, follow steps mentioned below.

Step 1: Fork this repo

https://bitbucket.org/aws-samples/amazon-codeguru-samples

Figure 1 : Fork amazon-codeguru-samples bitbucket repository.

Step 2: Configure Bitbucket Pipelines as an Identity Provider on AWS

Configuring Bitbucket Pipelines as an IdP in IAM enables Bitbucket Pipelines to issue authentication tokens to users to connect to AWS.
In your Bitbucket repo, go to Repository Settings > OpenID Connect. Note the provider URL and the Audience variable on that screen.

The Identity Provider URL will look like this:

https://api.bitbucket.org/2.0/workspaces/YOUR_WORKSPACE/pipelines-config/identity/oidc – This is the issuer URL for authentication requests. This URL issues a token to a requester automatically as part of the workflow. See more detail about issuer URL in RFC . Here “YOUR_WORKSPACE” need to be replaced with name of your bitbucket workspace.

And the Audience will look like:

ari:cloud:bitbucket::workspace/ari:cloud:bitbucket::workspace/84c08677-e352-4a1c-a107-6df387cfeef7 – This is the recipient the token is intended for. See more detail about audience in Request For Comments (RFC) which is memorandum published by the Internet Engineering Task Force(IETF) describing methods and behavior for securely transmitting information between two parties usinf JSON Web Token ( JWT).

Figure 2 : Configure Bitbucket Pipelines as an Identity Provider on AWS

Next, navigate to the IAM dashboard > Identity Providers > Add provider, and paste in the above info. This tells AWS that Bitbucket Pipelines is a token issuer.

Step 3: Create a custom policy

You can always use the CLI with Admin credentials but if you want to have a specific role to use the CLI, your credentials must have at least the following permissions:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "codeguru-reviewer:ListRepositoryAssociations",
                "codeguru-reviewer:AssociateRepository",
                "codeguru-reviewer:DescribeRepositoryAssociation",
                "codeguru-reviewer:CreateCodeReview",
                "codeguru-reviewer:DescribeCodeReview",
                "codeguru-reviewer:ListRecommendations",
                "iam:CreateServiceLinkedRole"
            ],
            "Resource": "*",
            "Effect": "Allow"
        },
        {
            "Action": [
                "s3:CreateBucket",
                "s3:GetBucket*",
                "s3:List*",
                "s3:GetObject",
                "s3:PutObject",
                "s3:DeleteObject"
            ],
            "Resource": [
                "arn:aws:s3:::codeguru-reviewer-cli-<AWS ACCOUNT ID>*",
                "arn:aws:s3:::codeguru-reviewer-cli-<AWS ACCOUNT ID>*/*"
            ],
            "Effect": "Allow"
        }
    ]
}

To create an IAM policy, navigate to the IAM dashboard > Policies > Create Policy

Now then paste the above mentioned json document into the json tab as shown in screenshot below and replace <AWS ACCOUNT ID> with your own AWS Account ID

Figure 3 : Create a Policy.

Name your policy; in our example, we name it CodeGuruReviewerOIDC.

Figure 4 : Review and Create a IAM policy.

Step 4: Create an IAM Role

Once you’ve enabled Bitbucket Pipelines as a token issuer, you need to configure permissions for those tokens so they can execute actions on AWS.
To create an IAM web identity role, navigate to the IAM dashboard > Roles > Create Role, and choose the IdP and audience you just created.

Figure 5 : Create an IAM role

Next, select the “CodeGuruReviewerOIDC “ policy to attach to the role.

Figure 6 : Assign policy to role

Figure 7 : Review and Create role

Name your role; in our example, we name it CodeGuruReviewerOIDCRole.

After adding a role, copy the Amazon Resource Name (ARN) of the role created:

The Amazon Resource Name (ARN) will look like this:

arn:aws:iam::000000000000:role/CodeGuruReviewerOIDCRole

we will need this in a later step when we create AWS_OIDC_ROLE_ARN as a repository variable.

Step 5: Add repository variables needed for pipeline

Variables are configured as environment variables in the build container. You can access the variables from the bitbucket-pipelines.yml file or any script that you invoke by referring to them. Pipelines provides a set of default variables that are available for builds, and can be used in scripts .Along with default variables we need to configure few additional variables called Repository Variables which are used to pass special parameter to the pipeline.

Figure 8 : Create repository variables

Figure 8 Create repository variables

Below mentioned are the few repository variables that need to be configured for this solution.

1.AWS_DEFAULT_REGION Create a repository variableAWS_DEFAULT_REGION with value “us-east-1”

2.BB_API_TOKEN Create a new repository variable BB_API_TOKEN and paste the below created App password as the value

App passwords are user-based access tokens for scripting tasks and integrating tools (such as CI/CD tools) with Bitbucket Cloud.These access tokens have reduced user access (specified at the time of creation) and can be useful for scripting, CI/CD tools, and testing Bitbucket connected applications while they are in development.
To create an App password:

- Select your avatar (Your profile and settings) from the navigation bar at the top of the screen.
- Under Settings, select Personal settings.
- On the sidebar, select App passwords.
- Select Create app password.
- Give the App password a name, usually related to the application that will use the password.
- Select the permissions the App password needs. For detailed descriptions of each permission, see: App password permissions.
- Select the Create button. The page will display the New app password dialog.
- Copy the generated password and either record or paste it into the application you want to give access. The password is only displayed once and can’t be retrieved later.

3.BB_USERNAME Create a repository variable BB_USERNAME and add your bitbucket username as the value of this variable

4.AWS_OIDC_ROLE_ARN

After adding a role in Step 4, copy the Amazon Resource Name (ARN) of the role created:

The Amazon Resource Name (ARN) will look something like this:

arn:aws:iam::000000000000:role/CodeGuruReviewerOIDCRole

and create AWS_OIDC_ROLE_ARN as a repository variable in the target Bitbucket repository.

Step 6: Adding the CodeGuru Reviewer CLI to your pipeline

In order to add CodeGuruRevewer CLi to your pipeline update the bitbucket-pipelines.yml file as shown below

#  Template maven-build

 #  This template allows you to test and build your Java project with Maven.
 #  The workflow allows running tests, code checkstyle and security scans on the default branch.

 # Prerequisites: pom.xml and appropriate project structure should exist in the repository.

 image: docker-public.packages.atlassian.com/atlassian/bitbucket-pipelines-mvn-python3-awscli

 pipelines:
  default:
    - step:
        name: Build Source Code
        caches:
          - maven
        script:
          - cd $BITBUCKET_CLONE_DIR
          - chmod 777 ./gradlew
          - ./gradlew build
        artifacts:
          - build/**
    - step: 
        name: Download and Install CodeReviewer CLI   
        script:
          - curl -OL https://github.com/aws/aws-codeguru-cli/releases/download/0.2.3/aws-codeguru-cli.zip
          - unzip aws-codeguru-cli.zip
        artifacts:
          - aws-codeguru-cli/**
    - step:
        name: Run CodeGuruReviewer 
        oidc: true
        script:
          - export AWS_DEFAULT_REGION=$AWS_DEFAULT_REGION
          - export AWS_ROLE_ARN=$AWS_OIDC_ROLE_ARN
          - export S3_BUCKET=$S3_BUCKET

          # Setup aws cli
          - export AWS_WEB_IDENTITY_TOKEN_FILE=$(pwd)/web-identity-token
          - echo $BITBUCKET_STEP_OIDC_TOKEN > $(pwd)/web-identity-token
          - aws configure set web_identity_token_file "${AWS_WEB_IDENTITY_TOKEN_FILE}"
          - aws configure set role_arn "${AWS_ROLE_ARN}"
          - aws sts get-caller-identity

          # setup codegurureviewercli
          - export PATH=$PATH:./aws-codeguru-cli/bin
          - chmod 777 ./aws-codeguru-cli/bin/aws-codeguru-cli

          - export SRC=$BITBUCKET_CLONE_DIR/src
          - export OUTPUT=$BITBUCKET_CLONE_DIR/test-reports
          - export CODE_INSIGHTS=$BITBUCKET_CLONE_DIR/bb-report

          # Calling Code Reviewer CLI
          - ./aws-codeguru-cli/bin/aws-codeguru-cli --region $AWS_DEFAULT_REGION  --root-dir $BITBUCKET_CLONE_DIR --build $BITBUCKET_CLONE_DIR/build/classes/java --src $SRC --output $OUTPUT --no-prompt --bitbucket-code-insights $CODE_INSIGHTS        
        artifacts:
          - test-reports/*.* 
          - target/**
          - bb-report/**
    - step: 
        name: Upload Code Insights Artifacts to Bitbucket Reports 
        script:
          - chmod 777 upload.sh
          - ./upload.sh bb-report/report.json bb-report/annotations.json
    - step:
        name: Upload Artifacts to Bitbucket Downloads       # Optional Step
        script:
          - pipe: atlassian/bitbucket-upload-file:0.3.3
            variables:
              BITBUCKET_USERNAME: $BB_USERNAME
              BITBUCKET_APP_PASSWORD: $BB_API_TOKEN
              FILENAME: '**/*.json'
    - step:
          name: Validate Findings     #Optional Step
          script:
            # Looking into CodeReviewer results and failing if there are Critical recommendations
            - grep -o "Critical" test-reports/recommendations.json | wc -l
            - count="$(grep -o "Critical" test-reports/recommendations.json | wc -l)"
            - echo $count
            - if (( $count > 0 )); then
            - echo "Critical findings discovered. Failing."
            - exit 1
            - fi
          artifacts:
            - '**/*.json'

Let’s look into the pipeline file to understand various steps defined in this pipeline

Figure 9 : Bitbucket pipeline execution steps

Step 1) Build Source Code

In this step source code is downloaded into a working directory and build using Gradle.All the build artifacts are then passed on to next step

Step 2) Download and Install Amazon CodeGuru Reviewer CLI
In this step Amazon CodeGuru Reviewer is CLI is downloaded from a public github repo and extracted into working directory. All artifacts downloaded and extracted are then passed on to next step

Step 3) Run CodeGuruReviewer

This step uses flag oidc: true which declares you are using the OIDC authentication method, while AWS_OIDC_ROLE_ARN declares the role created in the previous step that contains all of the necessary permissions to deal with AWS resources.
Further repository variables are exported, which is then used to set AWS CLI .Amazon CodeGuruReviewer CLI which was downloaded and extracted in previous step is then used to invoke CodeGuruReviewer along with some parameters .

Following are the parameters that are passed on to the CodeGuruReviewer CLI
--region $AWS_DEFAULT_REGION The AWS region in which CodeGuru Reviewer will run (in this blog we used us-east-1).

--root-dir $BITBUCKET_CLONE_DIR The root directory of the repository that CodeGuru Reviewer should analyze.

--build $BITBUCKET_CLONE_DIR/build/classes/java Points to the build artifacts. Passing the Java build artifacts allows CodeGuru Reviewer to perform more in-depth bytecode analysis, but passing the build artifacts is not required.

--src $SRC Points the source code that should be analyzed. This can be used to focus the analysis on certain source files, e.g., to exclude test files. This parameter is optional, but focusing on relevant code can shorten analysis time and cost.

--output $OUTPUT The directory where CodeGuru Reviewer will store its recommendations.

--no-prompt This ensures that CodeGuru Reviewer does run in interactive mode where it pauses for user input.

–-bitbucket-code-insights $CODE_INSIGHTS The location where recommendations in Bitbucket CodeInsights format should be written to.

Once Amazon CodeGuruReviewer scans the code based on the above parameters, it generates two json files (reports.json and annotations.json) Code Insight Reports which is then passed on as artifacts to the next step.

Step 4) Upload Code Insights Artifacts to Bitbucket Reports
In this step code Insight Report generated by Amazon CodeGuru Reviewer is then uploaded to Bitbucket Reports. This makes the report available in the reports section in the pipeline as displayed in the screenshot

Figure 10 : CodeGuru Reviewer Report

Step 5) [Optional] Upload the copy of these reports to Bitbucket Downloads
This is an Optional step where you can upload the artifacts to Bitbucket Downloads. This is especially useful because the artifacts inside a build pipeline gets deleted after 14 days of the pipeline run. Using Bitbucket Downloads, you can store these artifacts for a much longer duration.

Figure 11 : Bitbucket downloads

Step 6) [Optional] Validate Findings by looking into results and failing is there are any Critical Recommendations
This is an optional step showcasing how the results for CodeGururReviewer can be used to trigger the success and failure of a Bitbucket pipeline. In this step the pipeline fails, if a critical recommendation exists in report.

Step 7: Review CodeGuru recommendations

CodeGuru Reviewer supports different recommendation formats, including CodeGuru recommendation summaries, SARIF, and Bitbucket CodeInsights.

Keeping your Pipeline Green

Now that CodeGuru Reviewer is running in our pipeline, we need to learn how to unblock ourselves if there are recommendations. The easiest way to unblock a pipeline after is to address the CodeGuru recommendation. If we want to validate on our local machine that a change addresses a recommendation using the same CLI that we use as part of our pipeline.
Sometimes, it is not convenient to address a recommendation. E.g., because there are mitigations outside of the code that make the recommendation less relevant, or simply because the team agrees that they don’t want to block deployments on recommendations unless they are critical. For these cases, developers can add a .codeguru-ignore.yml file to their repository where they can use a variety of criteria under which a recommendation should not be reported. Below we explain all available criteria to filter recommendations. Developers can use any subset of those criteria in their .codeguru-ignore.yml file. We will give a specific example in the following sections.

version: 1.0 # The version number is mandatory. All other entries are optional.

# The CodeGuru Reviewer CLI produces a recommendations.json file which contains deterministic IDs for each
# recommendation. This ID can be excluded so that this recommendation will not be reported in future runs of the
# CLI.
 ExcludeById:
 - '4d2c43618a2dac129818bef77093730e84a4e139eef3f0166334657503ecd88d'
# We can tell the CLI to exclude all recommendations below a certain severity. This can be useful in CI/CD integration.
 ExcludeBelowSeverity: 'HIGH'
# We can exclude all recommendations that have a certain tag. Available Tags can be found here:
# https://docs.aws.amazon.com/codeguru/detector-library/java/tags/
# https://docs.aws.amazon.com/codeguru/detector-library/python/tags/
 ExcludeTags:
  - 'maintainability'
# We can also exclude recommendations by Detector ID. Detector IDs can be found here:
# https://docs.aws.amazon.com/codeguru/detector-library
 ExcludeRecommendations:
# Ignore all recommendations for a given Detector ID 
  - detectorId: 'java/[email protected]'
# Ignore all recommendations for a given Detector ID in a provided set of locations.
# Locations can be written as Unix GLOB expressions using wildcard symbols.
  - detectorId: 'java/[email protected]'
    Locations:
      - 'src/main/java/com/folder01/*.java'
# Excludes all recommendations in the provided files. Files can be provided as Unix GLOB expressions.
 ExcludeFiles:
  - tst/**

The recommendations will still be reported in the CodeGuru Reviewer console, but not by the CodeGuru Reviewer CLI and thus they will not block the pipeline anymore.

Conclusion

In this post, we outlined how you can set up a CI/CD pipeline using Bitbucket Pipelines, and Amazon CodeGuru Reviewer and we outlined how you can integrate Amazon CodeGuru Reviewer CLI with the Bitbucket cloud-based continuous delivery system that allows developers to automate builds, tests, and security checks with just a few lines of code. We showed you how to create a Bitbucket pipeline job and integrate the CodeGuru Reviewer CLI to detect issues in your Java and Python code, and access the recommendations for remediating these issues.

We presented an example where you can stop the build upon finding critical violations. Furthermore, we discussed how you could upload these artifacts to BitBucket downloads and store these artifacts for a much longer duration. The CodeGuru Reviewer CLI offers you a one-line command to scan any code on your machine and retrieve recommendations .You can use the CLI to integrate CodeGuru Reviewer into your favorite CI tool, as a pre-commit hook, in your workflow. In turn, you can combine CodeGuru Reviewer with Dynamic Application Security Testing (DAST) and Software Composition Analysis (SCA) tools to achieve a hybrid application security testing method that helps you combine the inside-out and outside-in testing approaches, cross-reference results, and detect vulnerabilities that both exist and are exploitable.

If you need hands-on keyboard support, then AWS Professional Services can help implement this solution in your enterprise, and introduce you to our AWS DevOps services and offerings.

About the authors:

10 ways to build applications faster with Amazon CodeWhisperer

2023-04-26 Kris Schultz

Post Syndicated from Kris Schultz original https://aws.amazon.com/blogs/devops/10-ways-to-build-applications-faster-with-amazon-codewhisperer/

Amazon CodeWhisperer is a powerful generative AI tool that gives me coding superpowers. Ever since I have incorporated CodeWhisperer into my workflow, I have become faster, smarter, and even more delighted when building applications. However, learning to use any generative AI tool effectively requires a beginner’s mindset and a willingness to embrace new ways of working.

Best practices for tapping into CodeWhisperer’s power are still emerging. But, as an early explorer, I’ve discovered several techniques that have allowed me to get the most out of this amazing tool. In this article, I’m excited to share these techniques with you, using practical examples to illustrate just how CodeWhisperer can enhance your programming workflow. I’ll explore:

Before we begin

If you would like to try these techniques for yourself, you will need to use a code editor with the AWS Toolkit extension installed. VS Code, AWS Cloud9, and most editors from JetBrains will work. Refer to the CodeWhisperer “Getting Started” resources for setup instructions.

CodeWhisperer will present suggestions automatically as you type. If you aren’t presented with a suggestion, you can always manually trigger a suggestion using the Option + C (Mac) or Alt + C (Windows) shortcut. CodeWhisperer will also sometimes present you with multiple suggestions to choose from. You can press the → and ← keys to cycle through all available suggestions.

The suggestions CodeWhisperer offers are non-deterministic, which means you may receive slightly different suggestions than the ones shown in this article. If you receive a suggestion that doesn’t suit your needs, try cycling through all available suggestions using the arrow keys or try rephrasing any code comment prompts slightly. Even punctuation changes can produce fresh suggestions.

The examples in this article are written in JavaScript and TypeScript. However, the techniques I’ll demonstrate are equally suited to many of the programming languages supported by CodeWhisperer. Now, onto the examples!

Typing less

At the most basic level, CodeWhisperer can be used as a supercharged auto-complete, significantly reducing your typing when writing individual lines or blocks of code. To use CodeWhisperer in this way, just write code as you normally would. CodeWhisperer will make automatic suggestions as you type. Accept suggestions when they’re helpful, and ignore suggestions when they’re not. In my experience, using CodeWhisperer in this way alone reduces my typing by 50% or more.

Generating functions

Automatically generating function bodies from signatures

Skilled programmers use clear, descriptive names for their functions and parameters. If you follow this practice in your own programming, CodeWhisperer will have all the context it needs to implement many functions for you.

After I enter the function signature below, CodeWhisperer generates the function body for me.

User input (JavaScript):

function standardDeviation(numbers)

Result:

function standardDeviation(numbers) {
  let mean = numbers.reduce((a, b) => a + b) / numbers.length;
  let variance =
    numbers.reduce((a, b) => a + Math.pow(b - mean, 2)) / numbers.length;
  return Math.sqrt(variance);
}

Generating functions from comments

Sometimes a good function signature alone isn’t enough to convey the desired behavior of a function. In those cases, you can use a detailed code comment as a prompt to CodeWhisperer. I recommend using your language’s doc comment delimiter when using this technique.

User Input (JavaScript):

/**
 * Given a list that contains some numbers and strings, format them into a string
 * in which the numbers are prepended with "#" and the strings are wrapped in
 * double quotes.
 */

Result:

/**
 * Given a list that contains some numbers and strings, format them into a string
 * in which the numbers are prepended with "#" and the strings are wrapped in
 * double quotes.
 */
function formatList(list) {
  return list
    .map((item) => {
      if (typeof item === "number") {
        return `#${item}`;
      } else {
        return `"${item}"`;
      }
    })
    .join(", ");
}

Generating classes

You can generate full or partial class implementations from a single code comment. The comment should describe the key properties and behaviors of the class. Simple descriptions are often all that is required.

User input (TypeScript):

/**
 * A folder has a name, children, and an optional parent. A child can't be added
 * twice.
 */

Result:

After I write the above block comment, CodeWhisperer provides multiple suggestions, including the following implementation, complete with documentation comments for each method!

/**
 * A folder has a name, children, and an optional parent. A child can't be added
 * twice.
 */
export class Folder {
  name: string;
  children: Folder[];
  parent?: Folder;

  constructor(name: string, parent?: Folder) {
    this.name = name;
    this.children = [];
    this.parent = parent;
  }

  /**
   * Add a child to the folder.
   * @param child The child to add.
   * @returns True if the child was added, false if it was already present.
   */
  addChild(child: Folder): boolean {
    if (this.children.includes(child)) {
      return false;
    }
    this.children.push(child);
    child.parent = this;
    return true;
  }

  /**
   * Remove a child from the folder.
   * @param child The child to remove.
   * @returns True if the child was removed, false if it wasn't present.
   */
  removeChild(child: Folder): boolean {
    const index = this.children.indexOf(child);
    if (index === -1) {
      return false;
    }
    this.children.splice(index, 1);
    child.parent = undefined;
    return true;
  }
}

Implementing algorithms

CodeWhisperer enables you to implement well-known algorithms with ease. Simply tell CodeWhisperer which algorithm you want to apply to a particular coding problem, and it will oblige.

User input (JavaScript):

const actors = [ ... ];

// Sort the actors by age using the merge sort algorithm.

Result:

const actors = [ ... ];

// Sort the actors by age using the merge sort algorithm.
function mergeSort(arr) {
  if (arr.length < 2) {
    return arr;
  }

  const middle = Math.floor(arr.length / 2);
  const left = arr.slice(0, middle);
  const right = arr.slice(middle);

  return merge(mergeSort(left), mergeSort(right));
}

function merge(left, right) {
  const result = [];

  while (left.length && right.length) {
    if (left[0].age < right[0].age) {
      result.push(left.shift());
    } else {
      result.push(right.shift());
    }
  }

  while (left.length) {
    result.push(left.shift());
  }

  while (right.length) {
    result.push(right.shift());
  }

  return result;
}

console.log(mergeSort(actors));

Writing unit tests

CodeWhisperer accelerates the implementation of unit tests. Not only is it able to implement the logic for most of the explicit unit tests I ask for, but it often suggests unit tests for edge cases that hadn’t even occurred to me.

When I want CodeWhisperer to implement a unit test, I start with a declaration of the condition I’m testing and the expected result as in the following example.

User input (JavaScript):

describe("Folder.addChild()", () => {
  it("when adding an existing folder should not add duplicate"

Result:

describe("Folder.addChild()", () => {
  it("when adding an existing folder should not add duplicate", () => {
    const folder = new Folder("root");
    const child = new Folder("child");
    folder.addChild(child);
    folder.addChild(child);
    expect(folder.children.length).toBe(1);
  });
});

After you’ve written a test or two, CodeWhisperer is likely to suggest more tests, even without explicit prompting.

Tip: You’ll notice that my test description includes 1) the function I’m testing, 2) the condition I’m testing, and 3) what I expect the result to be. Following this three-part pattern is a great way to ensure CodeWhisperer has the context it needs to implement high quality test logic.

Creating sample data

One of the delightful discoveries I’ve made is that CodeWhisperer can generate dummy data—or really any code that has a repeating pattern. As an illustration, for the following list I entered only the first “user” object as an example, and CodeWhisperer helpfully continued the pattern.

User input (JavaScript):

const fakeUsers = [
  { name: "User 1", id: "user1" },

Result:

const fakeUsers = [
  { name: "User 1", id: "user1" },
  { name: "User 2", id: "user2" },
  { name: "User 3", id: "user3" },  
  { name: "User 4", id: "user4" },
  { name: "User 5", id: "user5" },
  { name: "User 6", id: "user6" }
]

It even works for generating random but plausible values like the names seen below.

User input (JavaScript):

const fakeUsers = [
  { name: "Sally", id: "user1" },

Result:

const fakeUsers = [
  { name: "Sally", id: "user1" },
  { name: "Bob", id: "user2" },
  { name: "Joe", id: "user3" },
  { name: "Jane", id: "user4" },
  { name: "Jill", id: "user5" },
  { name: "Jack", id: "user6" }
]

Simplifying regular expressions

While CodeWhisperer can’t translate natural language into custom regular expressions (yet), it does know a lot of commonly used regular expressions. This saves you time by putting these regular expressions at your fingertips whenever you need them.

Let’s look at one notoriously complex but common regular expression.

User input (JavaScript):

/**
 * Function that validates whether a string is a valid email address.
 */

Result:

/**
 * Function that validates whether a string is a valid email address.
 */
function validateEmail(email: string): boolean {
  const re =
    /^(([^<>()[\]\\.,;:\s@"]+(\.[^<>()[\]\\.,;:\s@"]+)*)|(".+"))@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$/;
  return re.test(String(email).toLowerCase());
}

Learning third-party code libraries faster

CodeWhisperer helps accelerate my learning when I’m using an unfamiliar library, module, or framework. In the example below, I needed to write a utility function that would take a gzip compressed value coming from an API and translate it into a value that the rest of my code could understand. I knew the “pako” library would help me, but I had no idea how to use it. I just prompted CodeWhisperer with an import statement and a comment describing my goal.

import pako from "pako";

/**
 * Accepts a base64-encoded, gzip-compressed JSON string and returns
 * a parsed JSON object.
 */

Result:

import pako from "pako";

/**
 * Accepts a base64-encoded, gzip-compressed JSON string and returns
 * a parsed JSON object.
 */
export function decodeGzipJson(base64String) {
  const compressed = atob(base64String);
  const uncompressed = pako.inflate(compressed, { to: "string" });
  return JSON.parse(uncompressed);
}

Documenting code

CodeWhisperer is capable of generating docstrings and comments for the code it generates, as well as for your existing code. For example, let’s say I want CodeWhisperer to document the matches() method of this FavoritesFilter TypeScript class I’ve implemented (I’ve omitted some implementation details for brevity).

class FavoritesFilter implements IAssetFilter {
  ...
  matches(asset: Asset): boolean {
    ...
  }
}

I can just type a doc comment delimiter (/** */) immediately above the method name and CodeWhisperer will generate the body of the doc comment for me.

Note: When using CodeWhisperer in this way you may have to manually trigger a suggestion using Option + C (Mac) or Alt + C (Windows).

class FavoritesFilter implements IAssetFilter {
  ...
  /**
   * Determines whether the asset matches the filter.
   */
  matches(asset: Asset): boolean {
    ...
  }
}

Conclusion

I hope the techniques above inspire ideas for how CodeWhisperer can make you a more productive coder. Install CodeWhisperer today to start using these time-saving techniques in your own projects. These examples only scratch the surface. As additional creative minds start applying CodeWhisperer to their daily workflows, I’m sure new techniques and best practices will continue to emerge. If you discover a novel approach that you find useful, post a comment to share what you’ve discovered. Perhaps your technique will make it into a future article and help others in the CodeWhisperer community enhance their superpowers.

Automating detection of security vulnerabilities and bugs in CI/CD pipelines using Amazon CodeGuru Reviewer CLI

2022-06-01 Akash Verma

Post Syndicated from Akash Verma original https://aws.amazon.com/blogs/devops/automating-detection-of-security-vulnerabilities-and-bugs-in-ci-cd-pipelines-using-amazon-codeguru-reviewer-cli/

Watts S. Humphrey, the father of Software Quality, had famously quipped, “Every business is a software business”. Software is indeed integral to any industry. The engineers who create software are also responsible for making sure that the underlying code adheres to industry and organizational standards, are performant, and are absolved of any security vulnerabilities that could make them susceptible to attack.

Traditionally, security testing has been the forte of a specialized security testing team, who would conduct their tests toward the end of the Software Development lifecycle (SDLC). The adoption of DevSecOps practices meant that security became a shared responsibility between the development and security teams. Now, development teams can, on their own or as advised by their security team, setup and configure various code scanning tools to detect security vulnerabilities much earlier in the software delivery process (aka “Shift Left”). Meanwhile, the practice of Static code analysis and security application testing (SAST) has become a standard part of the SDLC. Furthermore, it’s imperative that the development teams expect SAST tools that are easy to set-up, seamlessly fit into their DevOps infrastructure, and can be configured without requiring assistance from security or DevOps experts.

In this post, we’ll demonstrate how you can leverage Amazon CodeGuru Reviewer Command Line Interface (CLI) to integrate CodeGuru Reviewer into your Jenkins Continuous Integration & Continuous Delivery (CI/CD) pipeline. Note that the solution isn’t limited to Jenkins, and it would be equally useful with any other build automation tool. Moreover, it can be integrated at any stage of your SDLC as part of the White-box testing. For example, you can integrate the CodeGuru Reviewer CLI as part of your software development process, as well as run it on your dev machine before committing the code.

Launched in 2020, CodeGuru Reviewer utilizes machine learning (ML) and automated reasoning to identify security vulnerabilities, inefficient uses of AWS APIs and SDKs, as well as other common coding errors. CodeGuru Reviewer employs a growing set of detectors for Java and Python to provide recommendations via the AWS Console. Customers that leverage the CodeGuru Reviewer CLI within a CI/CD pipeline also receive recommendations in a machine-readable JSON format, as well as HTML.

CodeGuru Reviewer offers native integration with Source Code Management (SCM) systems, such as GitHub, BitBucket, and AWS CodeCommit. However, it can be used with any SCM via its CLI. The CodeGuru Reviewer CLI is a shim layer on top of the AWS Command Line Interface (AWS CLI) that simplifies the interaction with the tool by handling the uploading of artifacts, triggering of the analysis, and fetching of the results, all in a single command.

Many customers, including Mastercard, are benefiting from this new CodeGuru Reviewer CLI.

“During one of our technical retrospectives, we noticed the need to integrate Amazon CodeGuru recommendations in our build pipelines hosted on Jenkins. Not all our developers can run or check CodeGuru recommendations through the AWS console. Incorporating CodeGuru CLI in our build pipelines acts as an important quality gate and ensures that our developers can immediately fix critical issues.”
Claudio Frattari, Lead DevOps at Mastercard

Solution overview

The application deployment workflow starts by placing the application code on a GitHub SCM. To automate the scenario, we have added GitHub to the Jenkins project under the “Source Code” section. We chose the GitHub option, which would clone the chosen GitHub repository in the Jenkins local workspace directory.

In the build stage of the pipeline (see Figure 1), we configure the appropriate build tool to perform the code build and security analysis. In this example, we will be using Maven as the build tool.

Figure 1: Jenkins pipeline with Amazon CodeGuru Reviewer

In the post-build stage, we configure the CodeGuru Reviewer CLI to generate the recommendations based on the review.

Lastly, in the concluding stage of the pipeline, we’ll be analyzing the JSON results using jq – a lightweight and flexible command-line JSON processor, and then failing the Jenkins job if we encounter observations that are of a “Critical” severity.

Jenkins will trigger the “CodeGuru Reviewer” (see Figure 1) based review process in the post-build stage, i.e., after the build finishes. Furthermore, you can configure other stages, such as automated testing or deployment, after this stage. Additionally, passing the location of the build artifacts to the CLI lets CodeGuru Reviewer perform a more in-depth security analysis. Build artifacts are either directories containing jar files (e.g., build/lib for Gradle or /target for Maven) or directories containing class hierarchies (e.g., build/classes/java/main for Gradle).

Walkthrough

Now that we have an overview of the workflow, let’s dive deep and walk you through the following steps in detail:

Installing the CodeGuru Reviewer CLI
Creating a Jenkins pipeline job
Reviewing the CodeGuru Reviewer recommendations
Configuring CodeGuru Reviewer CLI’s additional options

1. Installing the CodeGuru CLI Wrapper

a. Prerequisites

To run the CLI, we must have Git, Java, Maven, and the AWS CLI installed. Verify that they’re installed on our machine by running the following commands:

java -version 
mvn --version 
aws --version 
git –-version

If they aren’t installed, then download and install Java here (Amazon Corretto is a no-cost, multiplatform, production-ready distribution of the Open Java Development Kit), Maven from here, and Git from here. Instructions for installing AWS CLI are available here.

We would need to create an Amazon Simple Storage Service (Amazon S3) bucket with the prefix codeguru-reviewer-. Note that the bucket name must begin with the mentioned prefix, since we have used the name pattern in the following AWS Identity and Access Management (IAM) permissions, and CodeGuru Reviewer expects buckets to begin with this prefix. Refer to the following section 4(a) “Specifying S3 bucket name” for more details.

Furthermore, we’ll need working credentials on our machine to interact with our AWS account. Learn more about setting up credentials for AWS here. You can find the minimal permissions to run the CodeGuru Reviewer CLI as follows.

b. Required Permissions

To use the CodeGuru Reviewer CLI, we need at least the following AWS IAM permissions, attached to an AWS IAM User or an AWS IAM role:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "codeguru-reviewer:ListRepositoryAssociations",
                "codeguru-reviewer:AssociateRepository",
                "codeguru-reviewer:DescribeRepositoryAssociation",
                "codeguru-reviewer:CreateCodeReview",
                "codeguru-reviewer:DescribeCodeReview",
                "codeguru-reviewer:ListRecommendations",
                "iam:CreateServiceLinkedRole"
            ],
            "Resource": "*",
            "Effect": "Allow"
        },
        {
            "Action": [
                "s3:CreateBucket",
                "s3:GetBucket*",
                "s3:List*",
                "s3:GetObject",
                "s3:PutObject",
                "s3:DeleteObject"
            ],
            "Resource": [
                "arn:aws:s3:::codeguru-reviewer-*",
                "arn:aws:s3:::codeguru-reviewer-*/*"
            ],
            "Effect": "Allow"
        }
    ]
}

c. CLI installation

Please download the latest version of the CodeGuru Reviewer CLI available at GitHub. Then, run the following commands in sequence:

curl -OL https://github.com/aws/aws-codeguru-cli/releases/download/0.0.1/aws-codeguru-cli.zip
unzip aws-codeguru-cli.zip
export PATH=$PATH:./aws-codeguru-cli/bin

d. Using the CLI

The CodeGuru Reviewer CLI only has one required parameter –root-dir (or just -r) to specify to the local directory that should be analyzed. Furthermore, the –src option can be used to specify one or more files in this directory that contain the source code that should be analyzed. In turn, for Java applications, the –build option can be used to specify one or more build directories.

For a demonstration, we’ll analyze the demo application. This will make sure that we’re all set for when we leverage the CLI in Jenkins. To proceed, first we download and install the sample application, as follows:

git clone https://github.com/aws-samples/amazon-codeguru-reviewer-sample-app
cd amazon-codeguru-reviewer-sample-app
mvn clean compile

Now that we have built our demo application, we can use the aws-codeguru-cli CLI command that we added to the path to trigger the code scan:

aws-codeguru-cli --root-dir ./ --build target/classes --src src --output ./output

For additional assistance on the CLI command, reference the readme here.

2. Creating a Jenkins Pipeline job

CodeGuru Reviewer can be integrated in a Jenkins Pipeline as well as a Freestyle project. In this example, we’re leveraging a Pipeline.

a. Pipeline Job Configuration

Log in to Jenkins, choose “New Item”, then select “Pipeline” option.
Enter a name for the project (for example, “CodeGuruPipeline”), and choose OK.

Figure 2: Creating a new Jenkins pipeline

On the “Project configuration” page, scroll down to the bottom and find your pipeline. In the pipeline script, paste the following script (or use your own Jenkinsfile). The following example is a valid Jenkinsfile to integrate CodeGuru Reviewer with a project built using Maven.

pipeline {
    agent any
    stages {
        stage('Build') {
            steps {
                // Get code from a GitHub repository
                git clone https://github.com/aws-samples/amazon-codeguru-reviewer-java-detectors.git

                // Run Maven on a Unix agent
                sh "mvn clean compile"

                // To run Maven on a Windows agent, use following
                // bat "mvn -Dmaven.test.failure.ignore=true clean package"
            }
        }
        stage('CodeGuru Reviewer') {
            steps{
                sh 'ls -lsa *'
                sh 'pwd'
                // Here we’re setting an absolute path, but we can 
                // also use JENKINS environment variables
                sh '''
                    export BASE=/var/jenkins_home/workspace/CodeGuruPipeline/amazon-codeguru-reviewer-java-detectors
                    export SRC=${BASE}/src
                    export OUTPUT = ./output
                    /home/codeguru/aws-codeguru-cli/bin/aws-codeguru-cli --root-dir $BASE --build $BASE/target/classes --src $SRC --output $OUTPUT -c $GIT_PREVIOUS_COMMIT:$GIT_COMMIT --no-prompt
                    '''
            }
        }    
        stage('Checking findings'){
            steps{
                // In this example we are stopping our pipline on  
                // detecting Critical findings. We are using jq 
                // to count occurrences of Critical severity 
                sh '''
                CNT = $(cat ./output/recommendations.json |jq '.[] | select(.severity=="Critical")|.severity' | wc -l)'
                if (( $CNT > 0 )); then
                  echo "Critical findings discovered. Failing."
                  exit 1
                fi
                '''
            }
        }
    }
}

Save the configuration and select “Build now” on the side bar to trigger the build process (see Figure 3).

Figure 3: Jenkins pipeline in triggered state

3. Reviewing the CodeGuru Reviewer recommendations

Once the build process is finished, you can view the review results from CodeGuru Reviewer by selecting the Jenkins build history for the most recent build job. Then, browse to Workspace output. The output is available in JSON and HTML formats (Figure 4).

Figure 4: CodeGuru CLI Output

Snippets from the HTML and JSON reports are displayed in Figure 5 and 6 respectively.

In this example, our pipeline analyzes the JSON results with jq based on severity equal to critical and failing the job if there are any critical findings. Note that this output path is set with the –output option. For instance, the pipeline will fail on noticing the “critical” finding at Line 67 of the EventHandler.java class (Figure 5), flagged due to use of an insecure code. Till the time the code is remediated, the pipeline would prevent the code deployment. The vulnerability could have gone to production undetected, in absence of the tool.

Figure 5: CodeGuru HTML Report

Figure 6: CodeGuru JSON recommendations

4. Configuring CodeGuru Reviewer CLI’s additional options

a. Specifying Amazon S3 bucket name and policy

CodeGuru Reviewer needs one Amazon S3 bucket for the CLI to store the artifacts while the analysis is running. The artifacts are deleted after the analysis is completed. The same bucket will be reused for all the repositories that are analyzed in the same account and region (unless specified otherwise by the user). Note that CodeGuru Reviewer expects the S3 bucket name to begin with codeguru-reviewer-. At this time, you can’t use a different naming pattern. However, if you want to use a different bucket name, then you can use the –bucket-name option.

Select the Permissions tab of your S3 bucket. Update the Block public access and add the following S3 bucket policy.

Figure 7: S3 bucket settings

S3 bucket policy:

{
   "Version":"2012-10-17",
   "Statement":[
      {
         "Sid":"PublicRead",
         "Effect":"Allow",
         "Principal":"*",
         "Action":"s3:GetObject",
         "Resource":"[Change to ARN for your S3 bucket]/*"
      }
   ]
}

Note that if you must change the bucket’s name, then you can remove the associated S3 bucket in the AWS console under CodeGuru → CI workflows and select Disassociate Workflow.

b. Analyzing a single commit

The CLI also lets us specify a specific commit range to analyze. This can lead to faster and more cost-effective scans for the incremental code changes, instead of a full repository scan. For example, if we just want to analyze the last commit, we can run:

aws-codeguru-cli -r ./ -s src/main/java -b build/libs -c HEAD^:HEAD --no-prompt

Here, we use the -c option to specify that we only want to analyze the commits between HEAD^ (the previous commit) and HEAD (the current commit). Moreover, we add the –no-prompt option to automatically answer questions by the CLI with yes. This option is useful if we plan to use the CLI in an automated way, such as in our CI/CD workflow.

c. Encrypting artifacts

CodeGuru Reviewer lets us use a customer managed key to encrypt the content of the S3 bucket that is used to store the source and build artifacts. To achieve this, create a customer owned key in AWS Key Management Service (AWS KMS) (see Figure 8).

Figure 8: KMS settings

We must grant CodeGuru Reviewer the permission to decrypt artifacts with this key by adding the following Statement to your Key policy:

{
   "Sid":"Allow CodeGuru to use the key to decrypt artifact",
   "Effect":"Allow",
   "Principal":{
      "AWS":"*"
   },
   "Action":[
      "kms:Decrypt",
      "kms:DescribeKey"
   ],
   "Resource":"*",
   "Condition":{
      "StringEquals":{
         "kms:ViaService":"codeguru-reviewer.amazonaws.com",
         "kms:CallerAccount":[
            "YOUR AWS ACCOUNT ID"
         ]
      }
   }
}

Then, enable server-side encryption for the S3 bucket that we’re using with CodeGuru Reviewer (Figure 9).

S3 bucket settings:

Figure 9: S3 bucket encryption settings

After we enable encryption on the bucket, we must delete all the CodeGuru repository associations that use this bucket, and then recreate them by analyzing the repositories while providing the key (as in the following example, Figure 10):

Figure 10: CodeGuru CI Workflow

Note that the first time you check out your repository, it will always trigger a full repository scan. Consider setting the -c option, as this will allow a commit range.

Cleaning Up

At this stage, you may choose to delete the resources created while following this blog, to avoid incurring any unwanted costs.

Delete Amazon S3 bucket.
Delete AWS KMS key.
Delete the Jenkins installation, if not required further.

Conclusion

In this post, we outlined how you can integrate Amazon CodeGuru Reviewer CLI with the Jenkins open-source build automation tool to perform code analysis as part of your code build pipeline and act as a quality gate. We showed you how to create a Jenkins pipeline job and integrate the CodeGuru Reviewer CLI to detect issues in your Java and Python code, as well as access the recommendations for remediating these issues. We presented an example where you can stop the build upon finding critical violations. Furthermore, we discussed how you can specify a commit range to avoid a full repo scan, and how the S3 bucket used by CodeGuru Reviewer to store artifacts can be encrypted using customer managed keys.

The CodeGuru Reviewer CLI offers you a one-line command to scan any code on your machine and retrieve recommendations. You can run the CLI anywhere where you can run AWS commands. In other words, you can use the CLI to integrate CodeGuru Reviewer into your favourite CI tool, as a pre-commit hook, or anywhere else in your workflow. In turn, you can combine CodeGuru Reviewer with Dynamic Application Security Testing (DAST) and Software Composition Analysis (SCA) tools to achieve a hybrid application security testing method that helps you combine the inside-out and outside-in testing approaches, cross-reference results, and detect vulnerabilities that both exist and are exploitable.

Hopefully, you have found this post informative, and the proposed solution useful. If you need helping hands, then AWS Professional Services can help implement this solution in your enterprise, as well as introduce you to our AWS DevOps services and offerings.