All posts by Jeimy Ruiz

5 tips to supercharge your developer career in 2024

2024-05-01 Jeimy Ruiz

Post Syndicated from Jeimy Ruiz original https://github.blog/2024-05-01-5-tips-to-supercharge-your-developer-career-in-2024/

The world of software development is constantly evolving. That means whether you’re a seasoned developer or just starting out on your coding journey, there’s always something new to learn.

Below, we’ll explore five actionable tips to take your career to the next level. From mastering prompt engineering to harnessing the power of AI for code security, these tips will help you learn the skills and uncover the knowledge you need to excel in today’s competitive job market.

Tip #1: Become a pro at prompt engineering

In the age of AI, you can use AI tools like GitHub Copilot to code up to 55% faster. But like any other tool or skill, our AI pair programmer has a learning curve, and there are certain techniques you can use that will make your work with AI even more effective. Enter prompt engineering. With prompt engineering, you provide GitHub Copilot with more context about your project—which yields better, more accurate results. Below are three best practices for crafting prompts for GitHub Copilot:

While you can begin using GitHub Copilot with a blank file, one easy way to introduce more context is to open related files in VS Code. Known as neighboring tabs, this technique enables Copilot to gain a deeper understanding of your code by processing all open files in your IDE.

This broader scope allows Copilot to identify matching code segments across your project, enhancing its suggestions and code completion capabilities.

Provide a top-level comment in your code file

Imagine being assigned a task with little to no context—that would make accomplishing it much more difficult, right? The same can be said for GitHub Copilot. When you add a brief, top-level comment in your code file, it helps Copilot understand the overarching objective before getting into the how.

Once you’ve broken down the ask and your goal, you can articulate the logic and steps required to achieve it. Then, allow Copilot to generate code incrementally, rather than all at once. This approach enhances Copilot’s understanding and improves the quality of the generated code.

Input sample code

Offer GitHub Copilot a snippet of code that closely resembles what you need. Even a brief example can further help Copilot craft suggestions tailored to your language and objectives!

Tip #2: Learn shortcuts and hacks

GitHub is full of shortcuts and hacks that make your work life easier and help you stay in the flow. Gain momentum in your projects and increase your productivity with these popular shortcuts:

Search for any file in your repositories

When you’re searching through repositories, type the letter “t” on your keyboard to activate the file finder and do away with hours of wasted time! See how in the video below:

Link your pull requests to your issues

Did you know that GitHub also has project management tools? One of them is a handy interlinking feature that allows you to link pull requests and Git commits to relevant issues in a project. This facilitates better organization, collaboration, and project management, not just for you, but for anyone looking for more context in your issue. Gone are the days of hunting down old issues every time you create a new pull request!

Create custom actions

Creating custom actions on GitHub enables you to enhance code reuse, bypass repetition, and simplify maintenance across multiple workflows. All you have to do is outline the necessary steps for a particular task and package them into an action using any supported programming or scripting language, and you’re all set!

Incorporate feedback in pull requests

Ever wish there was an easier way to review code? Well, it’s possible! Add comments directly to the pull request, propose changes, and even accept and add those suggestions seamlessly to make code reviews easier than ever. You can also save your replies by heading over to the comment box in an open pull request and selecting “create new saved reply,” and then “add saved reply,” to make it official.

Tip #3: Brush up on your soft skills

AI has introduced a host of hard skills that developers need to master in order to keep up with the latest tooling. Soft skills complement your new technical expertise and can contribute to your overall success by enhancing communication, collaboration, and problem-solving. Here are a few important ones to practice:

Communication

As you know, developer work rarely happens in a vacuum. Strong communication skills can facilitate clear understanding and efficient collaboration for both humans and AI tools, whether you’re collaborating with stakeholders, communicating complex technical concepts to non-technical audiences, or working on your prompt engineering.

Problem-solving

Critical thinking enables developers to approach complex challenges creatively, break them down into manageable tasks, and find innovative solutions with the help of AI coding tools.

Adaptability

AI coding tools are evolving rapidly, with new technologies, methodologies, and tools emerging regularly. Being adaptable allows developers to stay current, learn new skills quickly, and stay nimble as things change. To cultivate resilience and embrace discomfort (in and outside of the workplace), engage in activities that challenge you to anticipate and respond to the unexpected.

Ethics

Being aware of the ethical implications associated with these tools is essential. Developers should understand both the capabilities and limitations of AI coding tools and exercise critical thinking when interpreting responses from them. By remaining conscious of ethical considerations and actively working toward ethical practices, developers can ensure that these tools are used responsibly.

Empathy

Empathy is crucial for understanding the needs, preferences, and challenges of end-users. Empathy also fosters better collaboration within teams by promoting understanding and respect for colleagues’ perspectives and experiences.

Tip #4: Use AI to secure your code

Developers can leverage AI to enhance code security in several ways. First, AI can help prevent vulnerabilities by providing context and secure code suggestions right from the start. Traditionally, “shift left” meant getting security feedback after coding (but before deployment). By utilizing AI as a pair programmer, developers can “shift left” by addressing security concerns right where they bring their ideas to code.

A common pain point for developers is sifting through lengthy pages of alerts, many of which turn out to be false positives—wasting valuable time and resources. With features like code scanning autofix, AI and automation can step in to provide AI-generated code fixes alongside vulnerability alerts, streamlining remediation directly into the developer workflow. Similarly, secret scanning alerts developers to potential secrets detected in the code.

AI also presents an opportunity to improve the modeling of a vast array of open-source frameworks and libraries. Traditionally, security teams manually model numerous packages and APIs. This is a challenging task given the volume and diversity of these components, along with frequent updates and replacements. By infusing AI in modeling efforts, developers can increase the detection of vulnerabilities.

Tip #5: Attend GitHub Universe 2024

Attending conferences is a valuable investment in a developer’s career, providing opportunities for learning, networking, skill development, and professional growth all at the same time. GitHub Universe is our flagship, global event that brings together developers, leaders, and companies for two days of exploring the latest technologies and industry trends with fun, food, and networking in between. Here are some of the highlights:

100+ sessions on AI, DevEx, and security

Learn about frameworks and best practices directly from 150+ experts in the field through keynotes, breakout sessions, product demos, and more.

Gain and practice new skills

Git official by signing up for an interactive workshop or getting GitHub certified in GitHub Actions, GitHub Advanced Security, GitHub Foundations, or GitHub Administration. It’ll certainly look great on your resume and LinkedIn. 😉

Visibility

Sharing insights, presenting research findings, or showcasing projects can help developers establish themselves as thought leaders and experts in their field. The Universe call for sessions is open from now until May 10. Submit a session proposal today!

Professional development

Show your commitment to your career and continuous learning by visiting the dedicated Career Corner for professional development.

Community engagement

Build your network and find opportunities for collaboration and mentorship by engaging with peers and participating in the Discussions Lounge.

Learn more about our content tracks and what we have in store for the 10th anniversary of our global developer event.

Navigate your career with confidence

By implementing the strategies outlined above, you’ll be well-equipped to unlock your dream career in 2024 and beyond. And remember: you can take your skills to the next level, network with industry leaders, and learn how to use the latest AI tools at GitHub Universe 2024.

Eager to get involved? Act fast to save 30% on in-person tickets with our Super Early Bird discount from now until July 8, or get notified about our free virtual event!

The post 5 tips to supercharge your developer career in 2024 appeared first on The GitHub Blog.

How AI code generation works

2024-02-22 Jeimy Ruiz

Post Syndicated from Jeimy Ruiz original https://github.blog/2024-02-22-how-ai-code-generation-works/

Generative AI coding tools are changing software production for enterprises. Not just for their code generation abilities—from vulnerability detection and facilitating comprehension of unfamiliar codebases, to streamlining documentation and pull request descriptions, they’re fundamentally reshaping how developers approach application infrastructure, deployment, and their own work experience.

We’re now witnessing a significant turning point. As AI models get better, refusing adoption would be like “asking an office worker to use a typewriter instead of a computer,” says Albert Ziegler, principal researcher and member of the GitHub Next research and development team.

In this post, we’ll dive into the inner workings of AI code generation, exploring how it functions, its capabilities and benefits, and how developers can use it to enhance their development experience while propelling your enterprise forward in today’s competitive landscape.

How to use AI to generate code

AI code generation refers to full or partial lines of code that are generated by machines instead of human developers. This emerging technology leverages advanced machine learning models, particularly large language models (LLMs), to understand and replicate the syntax, patterns, and paradigms found in human-generated code.

The AI models powering these tools, like ChatGPT and GitHub Copilot, are trained on natural language text and source code from publicly available sources that include a diverse range of code examples. This training enables them to understand the nuances of various programming languages, coding styles, and common practices. As a result, the AI can generate code suggestions that are syntactically correct and contextually relevant based on input from developers.

Favored by 55% of developers, our AI-powered pair programmer, GitHub Copilot, provides contextualized coding assistance based on your organization’s codebase across dozens of programming languages, and targets developers of all experience levels. With GitHub Copilot, developers can use AI to generate code in three ways:

1. Type code and AI can autocomplete the code

Autocompletions are the earliest version of AI code generation. John Berryman, a senior researcher of ML on the GitHub Copilot team, explains the user experience: “I’ll be writing code and taking a pause to think. While I’m doing that, the agent itself is also thinking, looking at surrounding code and content in neighboring tabs. Then it pops up on the screen as gray ‘ghost text’ that I can reject, partially accept, or fully accept and then, if necessary, modify.”

While every developer can reap the benefits of using AI coding tools, experienced programmers can often feel these gains even more so. “In many cases, especially for experienced programmers in a familiar environment, this suggestion speeds us up. I would have written the same thing. It’s just faster to hit ‘tab’ (thus accepting the suggestion) than it is to write out those 20 characters by myself,” says Johan Rosenkilde, principal researcher for GitHub Next.

Whether developers are new or highly skilled, they’ll often have to work in less familiar languages, and code completion suggestions using GitHub Copilot can lend a helping hand. “Using GitHub Copilot for code completion has really helped speed up my learning experience,” says Berryman. “I will often accept the suggestion because it’s something I wouldn’t have written on my own since I don’t know the syntax.”

Using an AI coding tool has become an invaluable skill in itself. Why? Because the more developers practice coding with these tools, the faster they’ll get at using them.

2. Explicit code comments codes using natural language to receive even better AI-generated code suggestions

For experienced developers in unfamiliar environments, tools like GitHub Copilot can even help jog their memories.

Let’s say a developer imports a new type of library they haven’t used before, or that they don’t remember. Maybe they’re looking to figure out the standard library function or the order of the argument. In these cases, it can be helpful to make GitHub Copilot more explicitly aware of where the developer wants to go by writing a comment.

“It’s quite likely that the developer might not remember the formula, but they can recognize the formula, and GitHub Copilot can remember it by being prompted,” says Rosenkilde. This is where natural language commentary comes into play: it can be a shortcut for explaining intent when the developer is struggling with the first few characters of code that they need.

If developers give specific names to their functions and variables, and write documentation, they can get better suggestions, too. That’s because GitHub Copilot can read the variable names and use them as an indicator for what that function should do.

Suddenly that changes how developers write code for the better, because code with good variable and function names are more maintainable. And oftentimes the main job of a programmer is to maintain code, not write it from scratch.

“When you push that code, someone is going to review it, and they will likely have a better time reviewing that code if it’s well named, if there’s even a hint of documentation in it, and so on,” says Rosenkilde. In this sense, the symbiotic relationship between the developer and the AI coding tool is not just beneficial for the developer, but for the entire team.

3. Chat directly with AI

With AI chatbots, code generation can be more interactive. GitHub Copilot Chat, for example, allows developers to interact with code by asking it to explain code, improve syntax, provide ideas, generate tests, and modify existing code—making it a versatile ally in managing coding tasks.

Rosenkilde uses the different functionalities of GitHub Copilot:

“When I want to do something and I can’t remember how to do it, I type the first few letters of it, and then I wait to see if Copilot can guess what I’m doing,” he says. “If that doesn’t work, maybe I delete those characters and I write a one liner in commentary and see whether Copilot can guess the next line. If that doesn’t work, then I go to Copilot Chat and explain in more detail what I want done.”

Typically, Copilot Chat returns with something much more verbose and complete than what you get from GitHub Copilot code completion. “Namely, it describes back to you what it is you want done and how it can be accomplished. It gives you code examples, and you can respond and say, oh, I see where you’re going. But actually I meant it like this instead,” says Rosenkilde.

But using AI chatbots doesn’t mean developers should be hands off. Mistakes in reasoning could lead the AI down a path of further mistakes if left unchecked. Berryman recommends that users should interact with the chat assistant in much the same way that you would when pair programming with a human. “Go back and forth with it. Tell the assistant about the task you are working on, ask it for ideas, have it help you write code, and critique and redirect the assistant’s work in order to keep it on the right track.”

The importance of code reviews

GitHub Copilot is designed to empower developers to execute their ideas. As long as there is some context for it to draw on, it will likely generate the type of code the developer wants. But this doesn’t replace code reviews between developers.

Code reviews play an important role in maintaining code quality and reliability in software projects, regardless of whether AI coding tools are involved. In fact, the earlier developers can spot bugs in the code development process, the cheaper it is by orders of magnitude.

Ordinary verification would be: does the code parse? Do the tests work? With AI code generation, Ziegler explains that developers should, “Scrutinize it in enough detail so that you can be sure the generated code is correct and bug-free. Because if you use tools like that in the wrong way and just accept everything, then the bugs that you introduce are going to cost you more time than you save.”

Rosenkilde adds, “A review with another human being is not the same as that, right? It’s a conversation between two developers about whether this change fits into the kind of software they’re building in this organization. GitHub Copilot doesn’t replace that.”

The advantages of using AI to generate code

When developer teams use AI coding tools across the software development cycle, they experience a host of benefits, including:

Faster development, more productivity

AI code generation can significantly speed up the development process by automating repetitive and time-consuming tasks. This means that developers can focus on high-level architecture and problem-solving. In fact, 88% of developers reported feeling more productive when using GitHub Copilot.

Rosenkilde reflects on his own experience with GitHub’s AI pair programmer: “95% of the time, Copilot brings me joy and makes my day a little bit easier. And this doesn’t change the code I would have written. It doesn’t change the way I would have written it. It doesn’t change the design of my code. All it does is it makes me faster at writing that same code.” And Rosenkilde isn’t alone: 60% of developers feel more fulfilled with their jobs when using GitHub Copilot.

Mental load alleviated

The benefits of faster development aren’t just about speed: they’re also about alleviating the mental effort that comes with completing tedious tasks. For example, when it comes to debugging, developers have to reverse engineer what went wrong. Detecting a bug can involve digging through an endless list of potential hiding places where it might be lurking, making it repetitive and tedious work.

Rosenkilde explains, “Sometimes when you’re debugging, you just have to resort to creating print statements that you can’t get around. Thankfully, Copilot is brilliant at print statements.”

A whopping 87% of developers reported spending less mental effort on repetitive tasks with the help of GitHub Copilot.

Less context switching

In software development, context switching is when developers move between different tasks, projects, or environments, which can disrupt their workflow and decrease productivity. They also often deal with the stress of juggling multiple tasks, remembering syntax details, and managing complex code structures.

With GitHub Copilot developers can bypass several levels of context switching, staying in their IDE instead of searching on Google or jumping into external documentation.

“When I’m writing natural language commentary,” says Rosenkilde, “GitHub Copilot code completion can help me. Or if I use Copilot Chat, it’s a conversation in the context that I’m in, and I don’t have to explain quite as much.”

Generating code with AI helps developers offload the responsibility of recalling every detail, allowing them to focus on higher-level thinking, problem-solving, and strategic planning.

Berryman adds, “With GitHub Copilot Chat, I don’t have to restate the problem because the code never leaves my trusted environment. And I get an answer immediately. If there is a misunderstanding or follow-up questions, they are easy to communicate with.”

What to look for in enterprise-ready AI code generation tools

Before you implement any AI into your workflow, you should always review and test tools thoroughly to make sure they’re a good fit for your organization. Here are a few considerations to keep in mind.

Compliance

Regulatory compliance. Does the tool comply with relevant regulations in your industry?
Compliance certifications. Are there attestations that demonstrate the tool’s compliance with regulations?

Security

Encryption. Is the data transmission and storage encrypted to protect sensitive information?
Access controls. Are you able to implement strong authentication measures and access controls to prevent unauthorized access?
Compliance with security standards. Is the tool compliant with industry standards?
Security audits. Does the tool undergo regular security audits and updates to address vulnerabilities?

Privacy

Data handling. Are there clear policies for handling user data and does it adhere to privacy regulations like GDPR, CCPA, etc.?
Data anonymization. Does the tool support anonymization techniques to protect user privacy?

Permissioning

Role-based access control. Are you able to manage permissions based on user roles and responsibilities?
Granular permissions. Can you control access to different features and functionalities within the tool?
Opt-in/Opt-out mechanisms. Can users control the use of their data and opt out if needed?

Pricing

Understand the pricing model. is it based on usage, number of users, features, or other metrics?
Look for transparency. Is the pricing structure clear with no hidden costs?
Scalability. Does the pricing scale with your usage and business growth?

Additionally, consider factors such as customer support, ease of integration with existing systems, performance, and user experience when evaluating AI coding tools. Lastly, it’s important to thoroughly assess how well the tool aligns with your organization’s specific requirements and priorities in each of these areas.

Visit the GitHub Copilot Trust Center to learn more around security, privacy, and other topics.

Can AI code generation be detected?

The short answer here is: maybe.

Let’s first give some context to the question. It’s never really the case that a whole code base is generated with AI, because large chunks of AI-generated code are very likely to be wrong. The standard code review process is a good way to avoid this, since large swaths of completely auto-generated code would stand out to a human developer as simply not working.

For smaller amounts of AI-generated code, there is no way at the moment to detect traces of AI in code with true confidence. There are offerings that purport to classify whether content has AI-generated text, but there are limited equivalents for code, since you’d need a dedicated model to do it. Ziegler explains, “Computer generated code is good enough that it doesn’t leave any particular traces and normally has no clear tells.”

At GitHub, the Copilot team makes use of a duplicate detection filter that detects exact duplicates in code. So, if you’re writing code and it’s an exact copy of something that exists elsewhere, then it’ll flag it.

Is AI code generation secure?

AI code generation is not any more insecure than human generated code. A combination of testing, manual code reviews, scanning, monitoring, and feedback loops can produce the same quality of code as your human-generated code.

When it comes to code generated by GitHub Copilot, developers can use tools like code scanning, which actively reviews your code for potential security issues in real-time and seamlessly integrates the findings into the developer workflow.

Ultimately, AI code generation will have vulnerabilities—but so does code written by human developers. As Ziegler explains, “It’s unclear whether computer generated code does particularly worse. So, the answer is not if you have GitHub Copilot, use a vulnerability checker. The answer is always use a vulnerability checker.”

Watch this video for more tips and words of advice around secure coding best practices with AI.

Empower your enterprise with AI code generation

While the benefits to using AI code generation tools can be significant, it’s important to note that human oversight remains crucial to ensure that the generated code aligns with project goals, coding standards, and business needs.

Tech leaders should embrace the use of AI code generation—not only to streamline development, but also to empower developer teams to collaborate, drive meaningful business outcomes, and deliver exceptional value to customers.

Ready to get started with the world’s most widely adopted AI developer tool? Learn more or get started now.

The post How AI code generation works appeared first on The GitHub Blog.

Demystifying LLMs: How they can do things they weren’t trained to do

2023-10-27 Jeimy Ruiz

Post Syndicated from Jeimy Ruiz original https://github.blog/2023-10-27-demystifying-llms-how-they-can-do-things-they-werent-trained-to-do/

Large language models (LLMs) are revolutionizing the way we interact with software by combining deep learning techniques with powerful computational resources.

While this technology is exciting, many are also concerned about how LLMs can generate false, outdated, or problematic information, and how they sometimes even hallucinate (generating information that doesn’t exist) so convincingly. Thankfully, we can immediately put one rumor to rest. According to Alireza Goudarzi, senior researcher of machine learning (ML) for GitHub Copilot: “LLMs are not trained to reason. They’re not trying to understand science, literature, code, or anything else. They’re simply trained to predict the next token in the text.”

Let’s dive into how LLMs come to do the unexpected, and why. This blog post will provide comprehensive insights into LLMs, including their training methods and ethical considerations. Our goal is to help you gain a better understanding of LLM capabilities and how they’ve learned to master language, seemingly, without reasoning.

What are large language models?

LLMs are AI systems that are trained on massive amounts of text data, allowing them to generate human-like responses and understand natural language in a way that traditional ML models can’t.

“These models use advanced techniques from the field of deep learning, which involves training deep neural networks with many layers to learn complex patterns and relationships,” explains John Berryman, a senior researcher of ML on the GitHub Copilot team.

What sets LLMs apart is their proficiency at generalizing and understanding context. They’re not limited to pre-defined rules or patterns, but instead learn from large amounts of data to develop their own understanding of language. This allows them to generate coherent and contextually appropriate responses to a wide range of prompts and queries.

And while LLMs can be incredibly powerful and flexible tools because of this, the ML methods used to train them, and the quality—or limitations—of their training data, can also lead to occasional lapses in generating accurate, useful, and trustworthy information.

Deep learning

The advent of modern ML practices, such as deep learning, has been a game-changer when it comes to unlocking the potential of LLMs. Unlike the earliest language models that relied on predefined rules and patterns, deep learning allows these models to create natural language outputs in a more human-like way.

“The entire discipline of deep learning and neural networks—which underlies all of this—is ‘how simple can we make the rule and get as close to the behavior of a human brain as possible?’” says Goudarzi.

By using neural networks with many layers, deep learning enables LLMs to analyze and learn complex patterns and relationships in language data. This means that these models can generate coherent and contextually appropriate responses, even in the face of complex sentence structures, idiomatic expressions, and subtle nuances in language.

While the initial pre-training equips LLMs with a broad language understanding, fine-tuning is where they become versatile and adaptable. “When developers want these models to perform specific tasks, they provide task descriptions and examples (few-shot learning) or task descriptions alone (zero-shot learning). The model then fine-tunes its pre-trained weights based on this information,” says Goudarzi. This process helps it adapt to the specific task while retaining the knowledge it gained from its extensive pre-training.

But even with deep learning’s multiple layers and attention mechanisms enabling LLMs to generate human-like text, it can also lead to overgeneralization, where the model produces responses that may not be contextually accurate or up to date.

Why LLMs aren’t always right

There are several factors that shed light on why tools built on LLMs may be inaccurate at times, even while sounding quite convincing.

Limited knowledge and outdated information

LLMs often lack an understanding of the external world or real-time context. They rely solely on the text they’ve been trained on, and they don’t possess an inherent awareness of the world’s current state. “Typically this whole training process takes a long time, and it’s not uncommon for the training data to be two years out of date for any given LLM,” says Albert Ziegler, principal researcher and member of the GitHub Next research and development team.

This limitation means they may generate inaccurate information based on outdated assumptions, since they can’t verify facts or events in real-time. If there have been developments or changes in a particular field or topic after they have been trained, LLMs may not be aware of them and may provide outdated information. This is why it’s still important to fact check any responses you receive from an LLM, regardless of how fact-based it may seem.

Lack of context

One of the primary reasons LLMs sometimes provide incorrect information is the lack of context. These models rely heavily on the information given in the input text, and if the input is ambiguous or lacks detail, the model may make assumptions that can lead to inaccurate responses.

Training data biases and limitations

LLMs are exposed to massive unlabelled data sets of text during pre-training that are diverse and representative of the language the model should understand. Common sources of data include books, articles, websites—even social media posts!

Because of this, they may inadvertently produce responses that reflect these biases or incorrect information present in their training data. This is especially concerning when it comes to sensitive or controversial topics.

“Their biases tend to be worse. And that holds true for machine learning in general, not just for LLMs. What machine learning does is identify patterns, and things like stereotypes can turn into extremely convenient shorthands. They might be patterns that really exist, or in the case of LLMs, patterns that are based on human prejudices that are talked about or implicitly used,” says Ziegler.

If a model is trained on a dataset that contains biased or discriminatory language, it may generate responses that are also biased or discriminatory. This can have real-world implications, such as reinforcing harmful stereotypes or discriminatory practices.

Overconfidence

LLMs don’t have the ability to assess the correctness of the information they generate. Given their deep learning, they often provide responses with a high degree of confidence, prioritizing generating text that appears sensible and flows smoothly—even when the information is incorrect!

Hallucinations

LLMs can sometimes “hallucinate” information due to the way they generate text (via patterns and associations). Sometimes, when they’re faced with incomplete or ambiguous queries, they try to complete them by drawing on these patterns, sometimes generating information that isn’t accurate or factual. Ultimately, hallucinations are not supported by evidence or real-world data.

For example, imagine that you ask ChatGPT about a historical issue in the 20th century. Instead, it describes a meeting between two famous historical figures who never actually met!

In the context of GitHub Copilot, Ziegler explains that “the typical hallucinations we encounter are when GitHub Copilot starts talking about code that’s not even there. Our mitigation is to make it give enough context to every piece of code it talks about that we can check and verify that it actually exists.”

But the GitHub Copilot team is already thinking about how to use hallucinations to their advantage in a “top-down” approach to coding. Imagine that you’re tackling a backlog issue, and you’re looking for GitHub Copilot to give you suggestions. As Johan Rosenkilde, principal researcher for GitHub Next, explains, “ideally, you’d want it to come up with a sub-division of your complex problem delegated to nicely delineated helper functions, and come up with good names for those helpers. And after suggesting code that calls the (still non-existent) helpers, you’d want it to suggest the implementation of them too!”

This approach to hallucination would be like getting the blueprint and the building blocks to solve your coding challenges.

Ethical use and responsible advocacy of LLMs

It’s important to be aware of the ethical considerations that come along with using LLMs. That being said, while LLMs have the potential to generate false information, they’re not intentionally fabricating or deceiving. Instead, these arise from the model’s attempts to generate coherent and contextually relevant text based on the patterns and information it has learned from its training data.

The GitHub Copilot team has developed a few tools to help detect harmful content. Goudarzi says “First, we have a duplicate detection filter, which helps us detect matches between generated code and all open source code that we have access to, filtering such suggestions out. Another tool we use is called Responsible AI (RAI), and it’s a classifier that can filter out abusive words. Finally, we also separately filter out known unsafe patterns.”

Understanding the deep learning processes behind LLMs can help users grasp their limitations—as well as their positive impact. To navigate these effectively, it’s crucial to verify information from reliable sources, provide clear and specific input, and exercise critical thinking when interpreting LLM-generated responses.

As Berryman reminds us, “the engines themselves are amoral. Users can do whatever they want with them and that can run the gamut of moral to immoral, for sure. But by being conscious of these issues and actively working towards ethical practices, we can ensure that LLMs are used in a responsible and beneficial manner.”

Developers, researchers, and scientists continuously work to improve the accuracy and reliability of these models, making them increasingly valuable tools for the future. All of us can advocate for the responsible and ethical use of LLMs. That includes promoting transparency and accountability in the development and deployment of these models, as well as taking steps to mitigate biases and stereotypes in our own corners of the internet.

The post Demystifying LLMs: How they can do things they weren’t trained to do appeared first on The GitHub Blog.