Velociraptor collections or hunts are usually post-processed or filtered in Notebooks. This allows users to refine and post-process the data in complex ways. For example, to view only the Velociraptor service from a hunt collecting all services (Windows.System.Services), one would click on the Notebook tab and modify the query by adding a WHERE statement.
In our experience, this ability to quickly filter or sort a table is very common, and sometimes we don’t really need the full power of VQL. In 0.6.5, we introduced table transformations — simple filtering/sorting operations on every table in the GUI.
Velociraptor’s community of DFIR professionals is global! We have users from all over the world, and although most users are fluent in English, we wanted to acknowledge our truly international user base by adding internationalization into the GUI. You can now select from a number of popular languages. (Don’t see your language here? We would love additional contributions!)
Here is a screenshot showing our German translations:
New interface themes
The 0.6.5 release expanded our previous offering of 3 themes into 7, with a selection of light and dark themes. We even have a retro feel ncurses theme that looks like a familiar terminal…
Error-handling in VQL
Velociraptor is simply a VQL engine – users write VQL artifacts and run these queries on the endpoint.
Previously, it was difficult to tell when VQL encountered an error. Sometimes a missing file is expected, and other times it means something went wrong. From Velociraptor’s point of view, as long as the VQL query ran successfully on the endpoint, the collection was a success. The VQL query can generate logs to provide more information, but the user had to actually look at the logs to determine if there was a problem.
For example, in a hunt parsing a file on the endpoints, it was difficult to tell which of the thousands of machines failed to parse a file. Previously, Velociraptor marked the collection as successful if the VQL query ran – even if it returned no rows because the file failed to parse.
In 0.6.5, there is a mechanism for VQL authors to convey more nuanced information to the user by way of error levels. The VQL log() function was expanded to take a level parameter. When the level is ERROR the collection will be marked as failed in the GUI.
Custom time zone support
Timestamps are a central part of most DFIR work. Although it is best practice to always work in UTC times, it can be a real pain to have to convert from UTC to local time in your head! Since Velociraptor always uses RFC3389 to represent times unambiguously but for human consumption, it is convenient to represent these times in different local times.
You can now select a more convenient time zone in the GUI by clicking your user preferences and setting the relevant timezone.
The preferred time will be shown in most instances in the UI:
A new MUSL build target
On Linux Go binaries are mostly static but always link to Glibc, which is shipped with the Linux distribution. This means that traditionally Velociraptor had problems running on very old Linux machines (previous to Ubuntu 18.04). We used to build a more compatible version on an old Centos VM, but this was manual and did not support the latest Go compiler.
In 0.6.5, we added a new build target using MUSL – a lightweight Glibc replacement. The produced binary is completely static and should run on a much wider range of Linux versions. This is still considered experimental but should improve the experience on older Linux machines.
In this episode of Security Nation, Jen and Tod chat with Steve Micallef about SpiderFoot, the open-source intelligence tool of which he is the creator and founder. He tells us how the platform went from a passion project to a fully fledged open-source offering, with a SaaS option to boot, and how it can help security engineers automate tasks and focus on finding the major threats in their data.
Stick around for our Rapid Rundown, where Tod chats with producer Jesse about a new paper that reveals all is not as it seems with CVSS scores.
Steve Micallef is the author of SpiderFoot (www.spiderfoot.net), an open-source OSINT automation platform. You can follow him @binarypool on Twitter.
In this episode of Security Nation, Jen and Tod chat with Phillip Maddux about his project HoneyDB, a site that pulls data together from honeypots around the world in a handy, open-source format for security pros and researchers. He details how his motivations for creating HoneyDB derived from his time in application security and why he thinks open source is such a great format for this kind of project.
No Rapid Rundown this week, since RSAC 2022 has Tod tied up (and several time zones farther from Jen than usual). If you’re in San Francisco for the conference, stop by the Rapid7 booth and say hi!
Phillip Maddux is a staff engineer on the Detection and Response Engineering team at Compass. He has over 15 years of experience in information security, with the majority of that time focused on application security in the financial services sector. Throughout his career, Phillip has been a honeypot enthusiast and is the creator of HoneyDB.io.
I’ve just come back from a long (extended) holiday weekend here in the US and I’m still catching up on all the AWS launches that happened this past week. I’m particularly excited about some of the data, machine learning, and quantum computing news. Let’s have a look!
Last Week’s Launches The launches that caught my attention last week are the following:
Amazon EMR Serverless is now generally available – Amazon EMR Serverless allows you to run big data applications using open-source frameworks such as Apache Spark and Apache Hive without configuring, managing, and scaling clusters. The new serverless deployment option for Amazon EMR automatically scales resources up and down to provide just the right amount of capacity for your application, and you only pay for what you use. To learn more, check out Channy’s blog post and listen to The Official AWS Podcast episode on EMR Serverless.
AWS PrivateLink is now supported by additional AWS services – AWS PrivateLink provides private connectivity between your virtual private cloud (VPC), AWS services, and your on-premises networks without exposing your traffic to the public internet. The following AWS services just added support for PrivateLink:
Amazon S3 on Outposts has added support for PrivateLink to perform management operations on your S3 storage by using private IP addresses in your VPC. This eliminates the need to use public IPs or proxy servers. Read the June 1 What’s New post for more information.
AWS Panorama now supports PrivateLink, allowing you to access AWS Panorama from your VPC without using public endpoints. AWS Panorama is a machine learning appliance and software development kit (SDK) that allows you to add computer vision (CV) to your on-premises cameras. Read the June 2 What’s New post for more information.
AWS Backup has added PrivateLink support for VMware workloads, providing direct access to AWS Backup from your VMware environment via a private endpoint within your VPC. Read the June 3 What’s New post for more information.
Amazon SageMaker JumpStart now supports incremental model training and automatic tuning – Besides ready-to-deploy solution templates for common machine learning (ML) use cases, SageMaker JumpStart also provides access to more than 300 pre-trained, open-source ML models. You can now incrementally train all the JumpStart models with new data without training from scratch. Through this fine-tuning process, you can shorten the training time to reach a better model. SageMaker JumpStart now also supports model tuning with SageMaker Automatic Model Tuning from its pre-trained model, solution templates, and example notebooks. Automatic tuning allows you to automatically search for the best hyperparameter configuration for your model.
Amazon Transcribe now supports automatic language identification for multi-lingual audio – Amazon Transcribe converts audio input into text using automatic speech recognition (ASR) technology. If your audio recording contains more than one language, you can now enable multi-language identification, which identifies all languages spoken in the audio file and creates a transcript using each identified language. Automatic language identification for multilingual audio is supported for all 37 languages that are currently supported for batch transcriptions. Read the What’s New post from Amazon Transcribe to learn more.
Amazon Braket adds support for Borealis, the first publicly accessible quantum computer that is claimed to offer quantum advantage – If you are interested in quantum computing, you’ve likely heard the term “quantum advantage.” It refers to the technical milestone when a quantum computer outperforms the world’s fastest supercomputers on a well-defined task. Until now, none of the devices claimed to demonstrate quantum advantage have been accessible to the public. The Borealis device, a new photonic quantum processing unit (QPU) from Xanadu, is the first publicly available quantum computer that is claimed to have achieved quantum advantage. Amazon Braket, the quantum computing service from AWS, has just added support for Borealis. To learn more about how you can test a quantum advantage claim for yourself now on Amazon Braket, check out the What’s New post covering the addition of Borealis support.
Other AWS News Some other updates and news that you may have missed:
New AWS Heroes – A warm welcome to our newest AWS Heroes! The AWS Heroes program is a worldwide initiative that acknowledges individuals who have truly gone above and beyond to share knowledge in technical communities. Get to know them in the June 2022 introduction blog post!
AWS open-source news and updates – My colleague Ricardo Sueiras writes this weekly open-source newsletter in which he highlights new open-source projects, tools, and demos from the AWS Community. Read edition #115 here.
Upcoming AWS Events Join me in Las Vegas for Amazon re:MARS 2022. The conference takes place June 21–24 and is all about the latest innovations in machine learning, automation, robotics, and space. I will deliver a talk on how machine learning can help to improve disaster response. Say “Hi!” if you happen to be around and see me.
We also have more AWS Summits coming up over the next couple of months, both in-person and virtual.
You can now register for IMAGINE 2022 (August 3, Seattle). The IMAGINE 2022 conference is a no-cost event that brings together education, state, and local leaders to learn about the latest innovations and best practices in the cloud.
In this episode of Security Nation, Jen and Tod sit down with Jim O’Gorman and Ben “g0tmi1k” Wilson of Offensive Security to chat about Kali Linux. They walk our hosts through the vision behind Kali and how they understand the uses, advantages, and challenges of open-source security tools.
Stick around for our Rapid Rundown, where producer Jesse joins Tod to talk about an upcoming change in security protocols across the internet that might make passwords obsolete (eventually).
Jim O’Gorman (Elwood) began his tech career as a network administrator with a particular talent for network intrusion simulation, digital investigations, and malware analysis. Jim started teaching for OffSec in 2009 as an instructor for the Penetration Testing with Kali (PWK) course — a role he still enjoys. He went on to co-author Metasploit: The Penetration Tester’s Guide and Kali Linux: Revealed, and has developed and curated a number of OffSec courses. As the Chief Content and Strategy officer, he currently oversees the open source Kali Linux development project and participates with OffSec’s Penetration Testing Team.
Ben “g0tmi1k” Wilson
Ben “g0tmi1k” Wilson has been in the information security world for nearly two decades. Since joining Offensive Security nine years ago, he has applied his experience in a number of roles including live instructor, content developer, and security administrator. He is currently managing the day-to-day activity as well as developing Kali Linux, pushing it forward. He has worked on various vulnerabilities, which are published on Exploit-DB that he also works on. Furthermore he created and still runs VulnHub, allowing for hands-on experience.
This is the second and final post in a series describing friendly forks and alternative strategies for managing them. Make sure to check out Being friendly: friendly forks 101 for general information on friendly forks and background on the three forks on which we center this post’s discussion.
In the first post in this series, we discussed what friendly forks are and learned about three GitHub-managed friendly forks of git/git, git-for-windows/git, microsoft/git, and github/git. In this post, we deep dive into the management strategies we employ for each of these forks and provide scenarios to help you select the appropriate management strategy for your own friendly fork.
The importance of friendly fork management
While the basics of friendly forks could make for mildly interesting cocktail party conversation , it takes a deeper understanding to successfully manage a fork once you have it. Management (or lack thereof) can make or break friendly forks for the following reasons:
Contributions taken by the upstream project are also generally valuable to the fork.
The number of changes to the upstream project since the last merge is correlated with the difficulty of merging those changes into the fork.
When security patches are pushed upstream, it is critical to be able to easily apply them (without other conflicts getting in the way).
Although there is no one-size-fits-all approach to friendly fork management, our goal for the remainder of this post is to provide a solid starting point for your management journey that will help your friendly fork remain securely in the friend zone.
Our management strategies
We employ a different management strategy for each of the forks discussed in the previous post based on that fork’s unique needs. These strategies are illustrated in the graphic below.
If the above image makes you feel slightly dizzy, don’t worry! We know it’s a lot, so we’re going to break it down into detailed descriptions of how each fork works. Take a deep breath, and prepare to dive in.
Git for Windows uses a custom merging rebase strategy to take changes from upstream. A merging rebase is just what it sounds like-a combination of a merge and a rebase. Merging rebases are executed at a predictable cadence that follows the git/git release cycle.
When it is time for a new release, git/git creates a series of release candidate tags (typically rc0, rc1, rc2, and final) on its default branch. As soon as each new candidate is released, we execute a merging rebase of the main branch on top of it. The merge portion comes first, with this command:
This creates what we call a “fake merge” since it uses the “ours” strategy to discard all the changes from main. While this may seem odd, it actually provides the benefits of a clean slate for rebasing and the ability to fast forward from previous states of the branch.
After the merge is complete, the rebase commences. This portion of the process helps us resolve merge conflicts that occur when upstream changes conflict with changes that have been made in git-for-windows/git. One type of conflict in particular is worth discussing in more depth: commits that have been added to git-for-windows/git and subsequently upstreamed.
When these commits are submitted upstream, the community supporting git/git usually requests changes before they are accepted. Additionally, since git/git only accepts patches sent via mailing list (instead of pull requests), commit IDs inevitably change when applied upstream. This means there will be conflicts when git-for-windows/git is rebased on top of a new release. Running the following command helps us identify commits that have been upstreamed when we encounter conflicts:
This command compares the differences between the below ranges of commits, dropping any that are not in the first specified range:
From the commit before the commit you are checking to the commit you are checking (this will only contain the commit you are checking).
The upstream commits that are not in the commit history of <commit>.
If there is a matching commit in upstream, it will be shown in the command’s output. In this case, we use git rebase --skip to bypass the commit (and implicitly accept the upstream version).
Because git-for-windows/git begins its merging rebase immediately after the creation of each release candidate, we classify it as proactive-it ensures all new git/git features are integrated and released as soon as possible. The merging rebase is generally executed for each release candidate by one developer, but is reviewed by multiple other developers with the help of range-diff comparisons. You can find an example of such a review here. When complete, the changes are pushed directly to the main branch, and a new release is created.
Like git-for-windows/git, microsoft/git is proactive, executing rebases immediately following the creation of each new git/git release candidate. It also uses the same strategies for identifying commits that have made it upstream. However, there are a few key differences between the git-for-windows/git approach and the microsoft/git approach. These are:
Since microsoft/git is a fork of git-for-windows/git, it does not take commits directly from git/git. Instead, we wait for the git-for-windows/git merging rebase to complete for each candidate then rebase on top of the resulting tag.
Instead of repeatedly rebasing a designated main branch, we cut a brand new branch for each version with the naming scheme vfs-2.X.Y (see the current default as an example). This branch is based off the initial git-for-windows/git release candidate tag and is updated with the rebase for each new release candidate. We based this strategy on release branches in Azure DevOps to make hotfixing easier and to clarify which commits are released with each new version.
Once the rebases are complete, we designate vfs-2.X.Y, as the new default branch, and create a new release.
github/git integrates new git/git releases using a traditional merge strategy. It is cautious in its cadence for taking releases; for this fork, we prefer to allow new features to “simmer” for some time before integration. This means that github/git is typically one or two versions behind the latest release of git/git.
To ensure merges are high-quality and accurate, the merge of a new git/git version is carried out by at least two developers in parallel. For commits that began in github/git and subsequently made it upstream, we generally accept the upstream version when resolving resulting conflicts. Occasionally, however, there are reasons to take the github/git version (or, some parts of both versions). It is up to the developers executing the merge to decide the correct strategy for a particular commit. When each merge is complete, the trees at the tip of each merge are compared, and the different approaches to conflict resolution are reviewed. The outcome of this review is merged and deployed as a new release.
Note that there are tradeoffs to the decision to use a traditional merge strategy to manage this fork. Merging is more straightforward than rebasing or rebase merging. However, merges in github/git can become somewhat tricky when it has drifted far from upstream or when a sweeping change in upstream affects many parts of its custom code. Additionally, this strategy requires all commits to be preserved (as opposed to git-for-windows/git and microsoft/git, which use the autosquash feature of the rebase command to remove squash and fixup commits), which means more commits are involved in github/git merges.
Below is a side-by-side summary of the key similarities and differences in the management strategies discussed above.
# of developers executing
Proactive or cautious
Long-running main branch
Integrates release candidates
As shown in the table, git-for-windows/git and microsoft/git have a lot in common. They are both proactive, executed by one developer, and use a form of rebase to integrate new releases (and release candidates). github/git is a bit different in its choice of a merge management strategy, the number of developers that simultaneously execute this strategy, and its cautious approach to integrating new releases (and in that release candidates are not considered).
As lovely as the above table is, you may still be scratching your head, wondering which strategy is right for you or your organization. Never fear! Our final section provides a series of scenarios with the goal of making this decision easier for you.
Finding your perfect match
Well done! You’ve successfully made it through our deep dive into three alternatives for friendly fork management.
However, now comes the really important part! It’s time to take a good look at each of the above strategies to understand which one is the best fit for you. We’ve organized this section as a series of scenarios to help you frame the above in the context of your own needs. Keep reading if you’re ready to choose your own friendly fork adventure!
Scenario 1: You have many contributors working simultaneously.
Constantly creating new default branches can leave developers with open pull requests in a bad state; obviously, this is particularly problematic when there’s a healthy amount of active development in your repository. Consider a merge or merging rebase strategy to avoid changing default branches and requiring your developers to constantly rebase.
Scenario 2: You need to support multiple versions with security releases or other bug fixes.
Consider the rebase model to have an easy story for cherry-picking fixes to supported versions.
Note: it is also possible to cherry-pick features from upstream with a merge-based workflow. However, if you later merge the commits containing the cherry-picked features, you may have to resolve some trivial conflicts depending on your merge strategy.
Scenario 3: You don’t (or do!) want to take new features immediately.
You can apply a cautious or proactive approach to any of the above strategies. Work with your team/management to find a cadence everyone is comfortable with and stick to it.
Scenario 4: You’re new to the fork game and want to keep it simple.
If this is your (or, your team’s) first time managing a fork and/or you’re just learning your way around Git, the merge strategy may be the most straightforward place for you to start.
Still not sure which strategy to use? Consider getting in touch with the maintainers of one of the friendly forks listed above (for example, via the appropriate mailing list(s) or a GitHub discussion) to get input from the experts!
A friendly fork can help accelerate development and increase developer productivity and satisfaction. Additionally, managing a friendly fork carefully to stay in the friend zone can lead to successful collaboration between different communities and improved project quality for all parties involved. We hope this series has helped you understand whether a friendly fork is right for you or your organization and, if so, empowered you to create and begin managing your friendly fork successfully.
This is the first post in a two-part series describing friendly forks and alternative strategies for managing them. Stay tuned for part two coming in May!
This post covers what a friendly fork is, why they are beneficial, and how they differ from a divergent fork. We’ll also look at some examples from the wild and provide details on three of our favorite friendly forks of the git/git repository.
To fork or not to fork
Most developers are familiar with the concept of working with source code in repositories. However, what should you do when you want to make one or more major changes to a repository that you do not own? Two options are to submit feature requests or to contribute the features you need yourself. This is a very common approach in open source software, and, when it goes well, it can lead to productive collaboration and useful results for all parties.
But what if the proposed features are not accepted into the repository? What if they were never intended to be contributed back to the original project? If you (or your company) have a strong need for these features, creating a friendly fork of the repository could be the right choice.
What is a friendly fork?
A friendly fork is a long-lived fork that complements its upstream repository (i.e., the repository from which it was forked) with customizations targeted to a subset of users. Typically, features from the friendly fork are contributed back to the upstream repository through a process known as upstreaming. If that is the case, developers working in the friendly fork sustain relationships with the maintainers of the upstream repository (this is the friend zone we’re so fond of!) to facilitate this flow of features and to improve the software for both user bases. It is also possible, however, for a friendly fork to simply take regular updates from its upstream repository with no maintainer interaction. Friendly forks may or may not eventually re-merge with the upstream repository.
Below are some examples of existing friendly forks of the git/git repository (which are maintained by folks at GitHub) and their purposes.
git-for-windows/git: hosts Windows-specific features of Git. It also sometimes receives early features which are subsequently upstreamed into git/git.
microsoft/git: focuses on features to support monorepos, some of which are subsequently upstreamed into git/git.
github/git: powers GitHub’s backend. It includes GitHub-specific changes, like extra instrumentation specific to GitHub’s infrastructure, but also serves as a staging ground for new features before they are submitted upstream.
git/git is definitely not the only repository with friendly forks. Examples of friendly forks created from other repos include:
MSYS2: a fork of Cygwin that provides an environment for building, installing, and running native Windows software.
CentOS: a fork of Red Hat Enterprise Linux (RHEL) created in 2004 to offer a RHEL-like experience for the community. (Interestingly, Red Hat now owns the CentOS trademarks).
It is important to note that not all forks can be considered friendly. There are also divergent forks, which are typically created due to insurmountable disagreements in a community caused by disparate goals or personality conflicts. Divergent forks often become so different from their upstream repositories that sharing code becomes difficult, if not impossible. While it is good to know that divergent forks are a thing you may encounter in the wild, we emphatically believe in the power of the friend zone and will center our focus for the rest of this series on getting and keeping forks there.
A tale of three forks
Three of the friendly fork examples provided above are based off of git/git, git-for-windows/git, microsoft/git, and github/git. Each of these forks has a unique history that contributes to the strategy used to maintain it. We will dedicate the remainder of this post to describing the history and purpose of each fork.
git-for-windows/git is the oldest of our forks for discussion. It was created in 2007 to provide the required adjustments to Git for it to run on Windows. While it may seem odd that Windows-specific features weren’t just added to the git/git repository, forking was deemed necessary because the git/git project was (and, still is) run by experts in the Unix systems domain. And Windows, of course, falls outside the scope of that expertise.
Although this fork was originally intended to be short-lived, it soon became clear that Windows support would be an ongoing community need that would require a permanent fork. Thus, development in git-for-windows/git continues in earnest today.
Because it was created to give Windows users the option of using Git for version control, the main purposes of git-for-windows/git are:
Provide a seamless, pain-free experience for Windows users.
Separation of concerns (in other words, Windows-specific and Unix-specific features are contributed in different repositories, while shared features can easily flow between the repositories).
As with each fork we discuss in this post, there are some use cases for which it makes sense for new features to be contributed to the fork before they are added upstream. FS Monitor is an example of this in which the Request for Comments went to the git/git mailing list, but early implementations of the feature were merged into git-for-windows/git for rapid testing.
microsoft/git began as a private fork of git-for-windows/git, with the initial purpose of supporting Microsoft-internal products. However, it was open-sourced in 2017, and, as a result, its goals today are much more general and community-oriented:
Facilitate easy dogfooding/quick releases of new features for GitHub’s monorepo customers.
As with git-for-windows/git, determine which of these features make sense to contribute back upstream and submit/monitor them on the mailing list.
An example feature that was introduced to microsoft/git prior to upstreaming is the sparse index. This was done to speedily get this new feature into the hands of monorepo customers who needed it most. After that was done, the feature was gradually introduced and refined upstream.
github/git, our final fork for discussion, actually did not begin as a fork. In the early days of GitHub, we carried a handful of changes on top of new Git releases. Because there were only a few, we stored them as *.patch files that got applied on top of each new release before deployment. Over time, however, our custom changes became both more numerous and more applicable to being contributed to git/git. It became clear that a full-fledged fork would be beneficial to improve management, workflows, and our ability to contribute back to upstream. Thus, in 2012, the official github/git fork was born.
Note that github/git differs from the above forks in that it is a private friendly fork, while the other two are public. Private friendly forks can be very beneficial to organizations. For example, they can be an excellent testing ground for new features, as they allow you to be confident code has been battle-tested internally and works before submitting publicly upstream. They can also be less beholden to the upstream release cadence, which helps ensure stability for the product.
To this day, this fork serves the same purpose of powering GitHub’s infrastructure. It fulfills our need to support features specialized to this infrastructure and is also used as a “staging ground” for new features before submitting them to the open source git/git repository. Specific examples of the latter use include bitmaps and multi-pack bitmaps.
That’s all for now!
In this post, we’ve discussed what a friendly fork is and how friendly forks differ from divergent forks. We’ve learned about three different friendly forks of the git/git repository and their purposes for existing. Thanks for sticking with us this far, and be sure to keep your peeled for our second post in the “friend zone” series, in which we’ll talk about how these forks are managed and how you can adapt their strategies to your very own friendly fork!
The main focus of this release is in improving path handling in VQL to allow for more efficient path manipulation. This leads to the ability to analyze dead disk images, which depends on accurate path handling.
A path is a simple concept – it’s a string similar to /bin/ls that can be used to pass to an OS API and have it operate on the file in the filesystem (e.g. read/write it).
However, it turns out that paths are much more complex than they first seem. For one thing, paths have an OS-dependent separator (usually / or \). Some filesystems support path separators inside a filename too! To read about the details, check out Paths and Filesystem Accessors, but one of the most interesting things with the new handling is that stacking filesystem accessors is now possible. For example, it’s possible to open a docx file inside a zip file inside an ntfs drive inside a partition.
Dead disk analysis
Velociraptor offers top-notch forensic analysis capability, but it’s been primarily used as a live response agent. Many users have asked if Velociraptor can be used on dead disk images. Although dead disk images are rarely used in practice, sometimes we do encounter these in the field (e.g. in cloud investigations).
Previously, Velociraptor couldn’t be used easily on dead disk images without having to carefully tailor and modify each artifact. In the 0.6.4 release, we now have the ability to emulate a live client from dead disk images. We can use this feature to run the exact same VQL artifacts that we normally do on live systems, but against a dead disk image. If you’d like to read more about this new feature, check out Dead Disk Forensics.
When collecting artifacts from endpoints, we need to be mindful of the overall load that collection will cost on endpoints. For performance-sensitive servers, our collection can cause operational disruption. For example, running a yara scan over the entire disk would utilize a lot of IO operations and may use a lot of CPU resources. Velociraptor will then compete for these resources with the legitimate server functionality and may cause degraded performance.
Previously, Velociraptor had a setting called Ops Per Second, which could be used to run the collection “low and slow” by limiting the rate at which notional “ops” were utilized. In reality, this setting was only ever used for Yara scans because it was hard to calculate an appropriate setting: Notional ops didn’t correspond to anything measurable like CPU utilization.
In 0.6.4, we’ve implemented a feedback-based throttler that can control VQL queries to a target average CPU utilization. Since CPU utilization is easy to measure, it’s a more meaningful control. The throttler actively measures the Velociraptor process’s CPU utilization, and when the simple moving average (SMA) rises above the limit, the query is paused until the SMA drops below the limit.
The above screenshot shows the latest resource controls dialog. You can now set a target CPU utilization between 0 and 100%. The image below shows how that looks in the Windows task manager.
By reducing the allowed CPU utilization, Velociraptor will be slowed down, so collections will take longer. You may need to increase the collection timeout to correspond with the extra time it takes.
Note that the CPU limit refers to a percentage of the total CPU resources available on the endpoint. So for example, if the endpoint is a 2 core cloud instance a 50% utilization refers to 1 full core. But on a 32 core server, a 50% utilization is allowed to use 16 cores!
On some cloud resources, IO operations per second (IOPS) are more important than CPU loading since cloud platforms tend to rate limit IOPS. So if Velociraptor uses many IOPS (e.g. in Yara scanning), it may affect the legitimate workload.
Velociraptor now offers limits on IOPS which may be useful for some scenarios. See for example here and here for a discussion of these limits.
The offline collector resource controls
Many people use the Velociraptor offline collector to collect artifacts from endpoints that they’re unable to install a proper client/server architecture on. In previous versions, there was no resource control or time limit imposed on the offline collector, because it was assumed that it would be used interactively by a user.
However, experience shows that many users use automated tools to push the offline collector to the endpoint (e.g. an EDR or another endpoint agent), and therefore it would be useful to provide resource controls and timeouts to control Velociraptor acquisitions. The below screenshot shows the new resource control page in the offline collector wizard.
Version 0.6.4 brings a lot of useful GUI improvements.
Notebooks are an excellent tool for post processing and analyzing the collected results from various artifacts. Most of the time, similar post processing queries are used for the same artifacts, so it makes sense to allow notebook templates to be defined in the artifact definition. In this release, you can define an optional suggestion in the artifact yaml to allow a user to include certain cells when needed.
The following screenshot shows the default suggestion for all hunt notebooks: Hunt Progress. This cell queries all clients in a hunt and shows the ones with errors, running and completed.
Multiple OAuth2 authenticators
Velociraptor has always had SSO support to allow strong two-factor authentication for access to the GUI. Previously, however, Velociraptor only supported one OAuth2 provider at a time. Users had to choose between Google, Github, Azure, or OIDC (e.g. Okta) for the authentication provider.
This limitation is problematic for some organizations that need to share access to the Velociraptor console with third parties (e.g. consultants need to provide read-only access to customers).
In 0.6.4, Velociraptor can be configured to support multiple SSO providers at the same time. So an organization can provide access through Okta for their own team members at the same time as Azure or Google for their customers.
The Velociraptor knowledge base
Velociraptor is a very powerful tool. Its flexibility means that it can do things that you might have never realized it can! For a while now, we’ve been thinking about ways to make this knowledge more discoverable and easily available.
Many people ask questions on the Discord channel and learn new capabilities in Velociraptor. We want to try a similar format to help people discover what Velociraptor can do.
The Velociraptor Knowledge Base is a new area on the documentation site that allows anyone to submit small (1-2 paragraphs) tips about how to do a particular task. Knowledge base tips are phrased as questions to help people search for them. Provided tips and solutions are short, but they may refer users to more detailed information.
If you learned something about Velociraptor that you didn’t know before and would like to share your experience to make the next user’s journey a little bit easier, please feel free to contribute a small note to the knowledge base.
Importing previous artifacts
Updating the VQL path handling in 0.6.4 introduces a new column called OSPath (replacing the old FullPath column), which wasn’t present in previous versions. While we attempt to ensure that older artifacts should continue to work on 0.6.4 clients, it’s possible that the new VQL artifacts built into 0.6.4 won’t work correctly on older versions.
To make migration easier, 0.6.4 comes built in with the Server.Import.PreviousReleases artifact. This server artifact will load all the artifacts from a previous release into the server, allowing you to use those older versions with older clients.
To celebrate this most recent release, here’s GitHub’s look at some of the most interesting features and changes introduced since last time.
Review merge conflict resolution with –remerge-diff
Returning readers may remember our coverage of merge ort, the from-scratch rewrite of Git’s recursive merge engine.
This release brings another new feature powered by ort, which is the --remerge-diff option. To explain what --remerge-diff is and why you might be excited about it, let’s take a step back and talk about git show.
When given a commit git show will print out that commit’s log message as well as its diff. But it has slightly different behavior when given a merge commit, especially one that had merge conflicts. If you’ve ever passed a conflicted merge to git show, you might be familiar with this output:
If you look closely, you might notice that there are actually two columns of diff markers (the + and - characters to indicate lines added and removed). These come from the output of git diff-tree -cc, which is showing us the diff between each parent and the post-image of the given commit simultaneously.
In this particular example, the conflict occurs because one side has an extra argument in the dwim_ref() call, and the other includes an updated comment to use reflect renaming a variable from sha1 to oid. The left-most markers show the latter resolution, and the right-most markers show the former.
But this output can be understandably difficult to interpret. In Git 2.36, --remerge-diff takes a different approach. Instead of showing you the diffs between the merge resolution and each parent simultaneously, --remerge-diff shows you the diff between the file with merge conflicts, and the resolution.
The above shows the output of git show with --remerge-diff on the same conflicted merge commit as before. Here, we can see the diff3-style conflicts (shown in red, since the merge commit removes the conflict markers during resolution) along with the resolution. By more clearly indicating which parts of the conflict were left as-is, we can more easily see how the given commit resolved its conflicts, instead of trying to weave-together the simultaneous diff output from git diff-tree -cc.
Reconstructing these merges is made possible using ort. The ort engine is significantly faster than its predecessor, recursive, and can reconstruct all conflicted merge in linux.git in about 3 seconds (as compared to diff-tree -cc, which takes more than 30 seconds to perform the same operation
Give it a whirl in your Git repositories on 2.36 by running git show --remerge-diff on some merge conflicts in your history.
If you have ever looked around in your repository’s .git directory, you’ll notice a variety of files: objects, references, reflogs, packfiles, configuration, and the like. Git writes these objects to keep track of the state of your repository, creating new object files when you make new commits, update references, repack your repository, and so on.
Most likely, you haven’t had to think too hard about how these files are written and updated. If you’re curious about these details, then read on! When any application writes changes to your filesystem, those changes aren’t immediately persisted, since writing to the external storage medium is significantly slower than updating your filesystem’s in-memory caches.
Instead, changes are staged in memory and periodically flushed to disk at which point the changes are (usually, though disks and controllers can have their own write caches, too) written to the physical storage medium.
Aside from following standard best-practices (like writing new files to a temporary location and then atomically moving them into place), Git has had a somewhat limited set of configuration available to tune how and when it calls fsync, mostly limited to core.fsyncObjectFiles, which, when set, causes Git to call fsync() when creating new loose object files. (Git has had non-configurable fsync() calls scattered throughout its codebase for things like writing packfiles, the commit-graph, multi-pack index, and so on).
Git 2.36 introduces a significantly more flexible set of configuration options to tune how and when Git will explicitly fsync lots of different kinds of files, not just if it fsyncs loose objects.
At the heart of this new change are two new configuration variables: core.fsync and core.fsyncMethod. The former lets you pick a comma-separated list of which parts of Git’s internal data structures you want to be explicitly flushed after writing. The full list can be found in the documentation, but you can pick from things like pack (to fsync files in $GIT_DIR/objects/pack) or loose-object (to fsync loose objects), to reference (to fsync references in the $GIT_DIR/refs directory). There are also aggregate options like objects (which implies both loose-object and pack), along with others like derived-metadata, committed, and all.
You can also tune how Git ensures the durability of components included in your core.fsync configuration by setting the core.fsyncMethod to either fsync (which calls fsync(), or issues a special fcntl() on macOS), or writeout-only, which schedules the written data for flushing, though does not guarantee that metadata like directory entries are updated as part of the flush operation.
Most users won’t need to change these defaults. But for server operators who have many Git repositories living on hardware that may suddenly lose power, having these new knobs to tune will provide new opportunities to enhance the durability of written data.
If you haven’t seen our blog post from last week announcing the security patches for versions 2.35 and earlier, let me give you a brief recap.
Beginning in Git 2.35.2, Git changed its default behavior to prevent you from executing git commands in a repository owned by a different user than the current one. This is designed to prevent git invocations from unintentionally executing commands which the repository owner configured.
You can bypass this check by setting the new safe.directory configuration to include trusted repositories owned by other users. If you can’t upgrade immediately, our blog post outlines some steps you can take to mitigate your risk, though the safest thing you can do is upgrade to the latest version of Git.
Since publishing that blog post, the safe.directory option now interprets the value * to consider all Git repositories as safe, regardless of their owner. You can set this in your --global config to opt-out of the new behavior in situations where it makes sense.
Now that we have looked at some of the bigger features in detail, let’s turn to a handful of smaller topics from this release.
If you’ve ever spent time poking around in the internals of one of your Git repositories, you may have come across the git cat-file command. Reminiscent of cat, this command is useful for printing out the raw contents of Git objects in your repository. cat-file has a handful of other modes that go beyond just printing the contents of an object. Instead of printing out one object at a time, it can accept a stream of objects (via stdin) when passed the --batch or --batch-check command-line arguments. These two similarly-named options have slightly different outputs: --batch instructs cat-file to just print out each object’s contents, while --batch-check is used to print out information about the object itself, like its type and size1.
But what if you want to dynamically switch between the two? Before, the only way was to run two separate copies of the cat-file command in the same repository, one in --batch mode and the other in --batch-check mode. In Git 2.36, you no longer need to do this. You can instead run a single git cat-file command with the new --batch-command mode. This mode lets you ask for the type of output you want for each object. Its input looks either like contents <object>, or info <object>, which correspond to the output you’d get from --batch, or --batch-check, respectively.
For server operators who may have long-running cat-file commands intended to service multiple requests, --batch-command accepts a new flush command, which flushes the output buffer upon receipt.
Previously, the customizability of ls-tree‘s output was somewhat limited. You could restrict the output to just the filenames with --name-only, print absolute paths with --full-name, or abbreviate the object IDs with --abbrev, but that was about it.
In Git 2.36, you have a lot more control about how ls-tree‘s should look. There’s a new --object-only option to complement --name-only. But if you really want to customize its output, the new --format option is your best bet. You can select from any combination and order of the each entry’s mode, type, name, and size.
Here’s a fun example of where something like this might come in handy. Let’s say you’re interested in the distribution of file-sizes of blobs in your repository. Before, to get a list of object sizes, you would have had to do either:
If you’ve ever tried to track down a bug using Git, then you’re familiar with the git bisect command. If you haven’t, here’s a quick primer. git bisect takes two revisions of your repository, one corresponding to a known “good” state, and another corresponding to some broken state. The idea is then to run a binary search between those two points in history to find the first commit which transitioned the good state to the broken state.
If you aren’t a frequent bisect user, you may not have heard of the git bisect run command. Instead of requiring you to classify whether each point along the search is good or bad, you can supply a script which Git will execute for you, using its exit status to classify each revision for you.
This can be useful when trying to figure out which commit broke the build, which you can do by running:
$ git bisect start <bad> <good>
$ git bisect run make
which will run make along the binary search between <bad> and <good>, outputting the first commit which broke compilation.
But what about automating more complicated tests? It can often be useful to write a one-off shell script which runs some test for you, and then hand that off to git bisect. Here, you might do something like:
$ vi test.sh
# type type type
$ git bisect run test.sh
See the problem? We forgot to mark test.sh as executable! In previous versions of Git, git bisect would incorrectly carry on the search, classifying each revision as broken. In Git 2.36, git bisect will detect that you forgot to mark the script as executable, and halt the search early.
When you run git fetch, your Git client communicates with the remote to carry out a process called negotiation to determine which objects the server needs to send to complete your request. Roughly speaking, your client and the server mutually advertise what they have at the tips of each reference, then your client lists which objects it wants, and the server sends back all objects between the requested objects and the ones you already have.
This works well because Git always expects to maintain closure over reachable objects2, meaning that if you have some reachable object in your repository, you also have all of its ancestors.
In other words, it’s fine for the Git server to omit objects you already have, since the combination of the objects it sends along with the ones you already have should be sufficient to assemble the branches and tags your client asked for.
But if your repository is corrupt, then you may need the server to send you objects which are reachable from ones you already have, in which case it isn’t good enough for the server to just send you the objects between what you have and want. In the past, getting into a situation like this may have led you to re-clone your entire repository.
Git 2.36 ships with a new option to git fetch which makes it easier to recover from certain kinds of repository corruption. By passing the new --refetch option, you can instruct git fetch to fetch all objects from the remote, regardless of which objects you already have, which is useful when the contents of your objects directory are suspect.
Over the last handful of releases, more and more commands have become compatible with the sparse index. This release is no exception, with four more Git commands joining the pack. Git 2.36 brings sparse index support to git clean, git checkout-index, git update-index, and git read-tree.
If you haven’t used these commands, there’s no need to worry: adding support to these plumbing commands is designed to lay the ground work for building a sparse index-aware git stash. In the meantime, sparse index support already exists in the commands that you are most likely already familiar with, like git status, git commit, git checkout, and more.
As an added bonus, git sparse-checkout (which is used to enable the sparse checkout feature and dictate which parts of your repository you want checked out) gained support for the command-line completion Git ships in its contrib directory.
Returning readers may remember our previous coverage on partial clones, a relatively new feature in Git which allows you to initialize your clones by downloading just some of the objects in your repository.
If you used this feature in the past with git clone‘s --recurse-submodules flag, the partial clone filter was only applied to the top-level repository, cloning all of the objects in the submodules.
This has been fixed in the latest release, where the --filter specification you use in your top-level clone is applied recursively to any submodules your repository might contain, too.
While we’re talking about partial clones, now is a good time to mention partial bundles, which are new in Git 2.36. You may not have heard of Git bundles, which is a different way of transferring around parts of your repository.
Roughly speaking, a bundle combines the data in a packfile, along with a list of references that are contained in the bundle. This allows you to capture information about the state of your repository into a single file that you can share. For example, the Git project uses bundles to share embargoed security releases with various Linux distribution maintainers. This allows us to send all of the objects which comprise a new release, along with the tags that point at them in a single file over email.
In previous releases of Git, it was impossible to prepare a filtered bundle which you could apply to a partial clone. In Git 2.36, you can now prepare filtered bundles, whose contents are unpacked as if they arrived during a partial clone3. You can’t yet initialize a new clone from a partial bundle, but you can use it to fetch objects into a bare repository:
Lastly, let’s discuss a bug fix concerning Git’s multi-pack reachability bitmaps. If you have started to use this new feature, you may have noticed a handful of new files in your .git/objects/pack directory:
$ ls .git/objects/pack/multi-pack-index*
In order, these are: the multi-pack index (MIDX) itself, the reachability bitmap data, and the reverse-index which tells Git which bits correspond to what objects in your repository.
These are all associated back to the MIDX via the MIDX’s checksum, which is how Git knows that the three belong together. This release fixes a bug where the .rev file could fall out-of-sync with the MIDX and its bitmap, leading Git to report incorrect results when using a multi-pack bitmap. This happens when changing the object order of the MIDX without changing the set of objects tracked by the MIDX.
If your .rev file has a modification time that is significantly older than the MIDX and .bitmap, you may have been bitten by this bug4. Luckily this bug can be resolved by dropping and regenerating your bitmaps5. To prevent a MIDX bitmap and its .rev file from falling out of sync again, the contents of the .rev are now included in the MIDX itself, forcing the MIDX’s checksum to change whenever the object order changes.
You can ask for other attributes, too, like %(objectsize:disk) which shows how many bytes it takes Git to store the object on disk (which can be smaller than %(objectsize) if, for example, the object is stored as a delta against some other, similar object). ↩
This isn’t quite true, because of things like shallow and partial clones, along with grafts, but the assumption is good enough for our purposes here. What matters it that outside of scenarios where we expect to be missing objects, the only time we don’t have a reachability closure is when the repository itself is corrupt. ↩
This isn’t entirely fool-proof, since it’s possible way of detecting that this bug occurred, since it’s possible your bitmaps were rewritten after first falling out-of-sync. When this happens, it’s possible that the corrupt bitmaps are propagated forward when generating new bitmaps. You can use git rev-list --test-bitmap HEAD to check if your bitmaps are OK. ↩
By first running rm -f .git/objects/pack/multi-pack-index*, and then git repack -d --write-midx --write-bitmap-index. ↩
Performance is an essential aspect of any production application, and GitHub is no exception. In the last year, we’ve been making significant investments to improve the performance of some of our most critical workflows. There’s a lot to share, but today we’re going to focus on a little-known feature of some Rack-powered web servers called rack.after_reply that saved us 30ms p50 and 50ms+ p99 on every request across the site.
First, let me talk about how we found ourselves looking at this Rack web server functionality. When diving into performance, we often look at flamegraphs that reveal where time is being spent for a given request. Eventually, we started to notice a pattern. It turned out we were spending a significant amount of time sending statsd metrics throughout a request. In fact, it could be up to 65ms per request! While telemetry enables us to improve GitHub, it doesn’t help users directly, especially when it impacts the performance of our page loads.
There’s no need to send telemetry immediately, especially if it’s expensive, so we started discussing ways to remove telemetry network calls out of the critical path. This resulted in the question, “Can we batch all of our telemetry calls and send them at once?” There’s far too much data to use a background job, so we had to look at other potential solutions. We came across Rack::Events and spiked out a proof of concept. Unfortunately, this approach caused any work performed in Rack::Events callbacks to block the response from closing. This resulted in two issues. First, users would see the browser loading indicator until the deferred code was complete. Second, this impacted our metrics since response times still included the time it took to run our deferred code.
A potential path forward
Recognizing the potential, we kept digging. Eventually, we came across a feature that seemed to do exactly what we wanted, rack.after_reply. The Puma source code describesrack.after_reply as: “A rack extension. If the app writes #call’ables to this array, we will invoke them when the request is done.” In other words, this would allow us to collect telemetry metrics during a request and flush them in a single batch after the browser received the full response, avoiding the issue of network calls slowing down response times. There was a problem though. We don’t use Puma at GitHub, we use Unicorn.
We had a clear path forward. All we needed to do was implement rack.after_reply in Unicorn. This functionality has a lot of use cases outside of sending telemetry metrics, so we contributed rack.after_reply back to Unicorn. This allowed us to write a small wrapper class around rack.after_reply, making it easier to use in the application:
env[“rack.after_reply”] ||= 
env[“rack.after_reply”] << -> do
@to_perform = 
# Calls each callable defined via #perform
@to_perform.each do |block|
rescue Object => e
# Adds given block to the array of callables that will
# be called after the user has received the response.
def perform(name, &block)
@to_perform << block
With the hard part completed, we wrote a small wrapper around our telemetry class to buffer all of our stats data. Combining that with a middleware that performs roughly the following, users now receive responses 30ms faster P50 across all pages. P99 response times improved too, with a 50ms+ reduction across the site.
There are a few drawbacks to this approach that are worth mentioning. The primary disadvantage is that a Unicorn worker executes the rack.after_reply code, making it unable to serve another HTTP request until your callables finish running. If not kept in check, this can potentially increase HTTP queueing, the time spent waiting for a web server worker to serve a request. We mitigated this by adding timeout behavior to prevent any code from running too long.
It’s a little early to talk about it, but there is progress toward making this functionality an official part of the Rack spec. A recently released gem also implements additional functionality around this feature called maybe_later that’s worth checking out.
Overall, we’re pleased with the results. Unicorn-powered applications now have a new tool at their disposal, and our customers reap the benefits of this performance enhancement!
A developer has been caught adding malicious code to a popular open-source package that wiped files on computers located in Russia and Belarus as part of a protest that has enraged many users and raised concerns about the safety of free and open source software.
The application, node-ipc, adds remote interprocess communication and neural networking capabilities to other open source code libraries. As a dependency, node-ipc is automatically downloaded and incorporated into other libraries, including ones like Vue.js CLI, which has more than 1 million weekly downloads.
The node-ipc update is just one example of what some researchers are calling protestware. Experts have begun tracking other open source projects that are also releasing updates calling out the brutality of Russia’s war. This spreadsheet lists 21 separate packages that are affected.
One such package is es5-ext, which provides code for the ECMAScript 6 scripting language specification. A new dependency named postinstall.js, which the developer added on March 7, checks to see if the user’s computer has a Russian IP address, in which case the code broadcasts a “call for peace.”
It constantly surprises non-computer people how much critical software is dependent on the whims of random programmers who inconsistently maintain software libraries. Between log4j and this new protestware, it’s becoming a serious vulnerability. The White House tried to start addressing this problem last year, requiring a “software bill of materials” for government software:
…the term “Software Bill of Materials” or “SBOM” means a formal record containing the details and supply chain relationships of various components used in building software. Software developers and vendors often create products by assembling existing open source and commercial software components. The SBOM enumerates these components in a product. It is analogous to a list of ingredients on food packaging. An SBOM is useful to those who develop or manufacture software, those who select or purchase software, and those who operate software. Developers often use available open source and third-party software components to create a product; an SBOM allows the builder to make sure those components are up to date and to respond quickly to new vulnerabilities. Buyers can use an SBOM to perform vulnerability or license analysis, both of which can be used to evaluate risk in a product. Those who operate software can use SBOMs to quickly and easily determine whether they are at potential risk of a newly discovered vulnerability. A widely used, machine-readable SBOM format allows for greater benefits through automation and tool integration. The SBOMs gain greater value when collectively stored in a repository that can be easily queried by other applications and systems. Understanding the supply chain of software, obtaining an SBOM, and using it to analyze known vulnerabilities are crucial in managing risk.
When one mentions supply chains these days, we tend to think of microchips from China causing delays in automobile manufacturing or toilet paper disappearing from store shelves. Sure, there are some chips in the communications infrastructure, but the cyber supply chain is mostly about virtual things – the ones you can’t actually touch.
In 2018, the Cybersecurity and Infrastructure Security Agency (CISA) established the Information and Communications Technology (ICT) Supply Chain Risk Management (SCRM) Task Force as a public-private joint effort to build partnerships and enhance ICT supply chain resilience. To date, the Task Force has worked on 7 Executive Orders from the White House that underscore the importance of supply chain resilience in critical infrastructure.
The ICT-SCRM Task Force is made up of members from the following sectors:
Information Technology (IT) – Over 40 IT companies, including service providers, hardware, software, and cloud have provided input.
Communications – Nearly 25 communications associations and companies are included, with representation from the wireline, wireless, broadband, and broadcast areas.
Government – More than 30 government organizations and agencies are represented on the Task Force.
These three sector groups touch nearly every facet of critical infrastructure that businesses and government require. The Task Force is dedicated to identifying threats and developing solutions to enhance resilience by reducing the attack surface of critical infrastructure. This diverse group is poised perfectly to evaluate existing practices and elevate them to new heights by enhancing existing standards and frameworks with up-to-date practical advice.
The core of the task force is the working groups. These groups are created and disbanded as needed to address core areas of the cyber supply chain. Some of the working groups have been concentrating on areas like:
The legal risks of information sharing
Evaluating supply chain threats
Identifying criteria for building Qualified Bidder Lists and Qualified Manufacturer Lists
The impacts of the COVID-19 pandemic on supply chains
Creating a vendor supply chain risk management template
After two years of producing some great resources and rather large reports, the ICT-SCRM Task Force recognized the need to ensure organizations of all sizes can take advantage of the group’s resources, even if they don’t have a dedicated risk management professional at their disposal. This led to the creation of both a Small and Medium Business (SMB) working group, as well as one dedicated to Product Marketing.
The SMB working group chose to review and adapt the Vendor SCRM template for use by small and medium businesses, which shows the template can be a great resource for companies and organizations of all sizes.
Out of this template, the group described three cyber supply chain scenarios that an SMB (or any size organization, really) could encounter. From that, the group further simplified the process by creating an Excel spreadsheet that provides a document that is easy for SMBs to share with their prospective vendors and partners as a tool to evaluate their cybersecurity posture. Most importantly, the document does not promote a checkbox approach to cybersecurity — it allows for partial compliance, with room provided for explanations. It also allows many of the questions to be removed if the prospective partner possesses a SOC1/2 certification, thereby eliminating duplication in questions.
What the future holds
At the time of this writing, the Product Marketing and SMB working groups are hard at work making sure everyone, including the smallest businesses, are using the ICT-SCRM Task Force Resources to their fullest potential. Additional workstreams are being developed and will be announced soon, and these will likely include expansion with international partners and additional critical-infrastructure sectors.
In this episode of Security Nation, Jen and Tod chat with Matthew Kienow, Senior Software Engineer at Rapid7, about open-source security – a subject he knows a thing or two about from his work on Metasploit, AttackerKB, and most recently the Recog recognition framework. They discuss the selling points and drawbacks of open source, why seeing all the code doesn’t mean you can see all the bugs, and how open-source projects like Recog make the digital world a better place.
Stick around for our Rapid Rundown, where Matt sticks around to chat with Tod and Jen about a worrying trend in DDoS attacks that allows for amplification levels of 65x.
Matthew Kienow is a software engineer and security researcher. Matthew is currently responsible for the Recog recognition framework project at Rapid7 and previously worked on the AttackerKB project, as well as Metasploit’s MSF 5 APIs. He has also designed, built, and successfully deployed many secure software solutions; however, often he enjoys breaking them instead. He has presented his research at various security conferences including DerbyCon, Hack In Paris, and CarolinaCon. His research has been cited by CSO, Threatpost, and SC Magazine.
Today, I am happy to share two new features of CodeGuru Reviewer:
A new Detector Library describes in detail the detectors that CodeGuru Reviewer uses when looking for possible defects and includes code samples for both Java and Python.
New security detectors have been introduced for detecting log-injection flaws in Java and Python code, similar to what happened with the recent Apache Log4j vulnerability we described in this blog post.
Let’s see these new features in more detail.
Using the Detector Library To help you understand more clearly which detectors CodeGuru Reviewer uses to review your code, we are now sharing a Detector Library where you can find detailed information and code samples.
These detectors help you build secure and efficient applications on AWS. In the Detector Library, you can find detailed information about CodeGuru Reviewer’s security and code quality detectors, including descriptions, their severity and potential impact on your application, and additional information that helps you mitigate risks.
Note that each detector looks for a wide range of code defects. We include one noncompliant and compliant code example for each detector. However, CodeGuru uses machine learning and automated reasoning to identify possible issues. For this reason, each detector can find a range of defects in addition to the explicit code example shown on the detector’s description page.
Let’s have a look at a few detectors. One detector is looking for insecure cross-origin resource sharing (CORS) policies that are too permissive and may lead to loading content from untrusted or malicious sources.
Another detector checks for improper input validation that can enable attacks and lead to unwanted behavior.
Specific detectors help you use the AWS SDK for Java and the AWS SDK for Python (Boto3) in your applications. For example, there are detectors that can detect hardcoded credentials, such as passwords and access keys, or inefficient polling of AWS resources.
New Detectors for Log-Injection Flaws Following the recent Apache Log4j vulnerability, we introduced in CodeGuru Reviewer new detectors that check if you’re logging anything that is not sanitized and possibly executable. These detectors cover the issue described in CWE-117: Improper Output Neutralization for Logs.
These detectors work with Java and Python code and, for Java, are not limited to the Log4j library. They don’t work by looking at the version of the libraries you use, but check what you are actually logging. In this way, they can protect you if similar bugs happen in the future.
Following these detectors, user-provided inputs must be sanitized before they are logged. This avoids having an attacker be able to use this input to break the integrity of your logs, forge log entries, or bypass log monitors.
The Detector Library is free to browse as part of the documentation. For the new detectors looking for log-injection flaws, standard pricing applies. See the CodeGuru pricing page for more information.
A picture tells a thousand words, but up until now the only way to include pictures and diagrams in your Markdown files on GitHub has been to embed an image. We added support for embedding SVGs recently, but sometimes you want to keep your diagrams up to date with your docs and create something as easily as doing ASCII art, but a lot prettier.
Working with Knut and also the wider community at CommonMark, we’ve rolled out a change that will allow you to create graphs inline using Mermaid syntax, for example:
The raw code block above will appear as this diagram in the rendered Markdown:
How it works
When we encounter code blocks marked as mermaid, we generate an iframe that takes the raw Mermaid syntax and passes it to Mermaid.js, turning that code into a diagram in your local browser.
We achieve this through a two-stage process—GitHub’s HTML pipeline and Viewscreen, our internal file rendering service.
Rendering the charts asynchronously helps eliminate the overhead of potentially rendering several charts before sending the compiled ERB view to the client.
User-supplied content is locked away in an iframe, where it has less potential to cause mischief on the GitHub page that the chart is loaded into.
The net result is fast, easily editable, and vector-based diagrams right in your documentation where you need them.
Mermaid has been getting increasingly popular with developers and has a rich community of contributors led by the maintainer Knut Sveidqvist. We are very grateful for Knut’s support in bringing this feature to everyone on GitHub. If you’d like to learn more about the Mermaid syntax, head over to the Mermaid website or check out Knut’s first official Mermaid book.
In the spring of 2018, we launched the Open Data initiative to provide security teams and researchers with access to research data generated from Project Sonar and Project Heisenberg. Our goal for those projects is to understand how the attack surface is evolving, what exposures are most common or impactful, and how attackers are taking advantage of these opportunities. Ultimately, we want to be able to advocate for necessary remediation actions that will reduce opportunities for attackers and advance security. This is also why we publish extensive research reports highlighting key security learnings and mitigation recommendations.
Our goal for Open Data has been to enable others to participate in these efforts, increasing the positive impact across the community. Open Data was an evolution of our participation in the scans.io project, hosted by the University of Michigan. Our hope was that security professionals would apply the research data to their own environments to reduce their exposure and researchers would use the data to uncover insights to help educate the community on security best practices.
Since we first launched Open Data, we’ve been mindful that sharing large amounts of raw data may not maximize value for recipients and lead to the best security outcomes. It is efficient for us, as it can be automated, but we have constantly sought more impactful and productive ways to share the data. Where possible, we’ve developed partnerships with key nonprofit organizations and government entities that share our goals and values around advancing security and reducing exposure. We’ve looked for ways to make the information more accessible for internal security teams.
Fast forward to 2021, and wow, what a few years we’ve had. We’ve faced a global pandemic, which has really brought home our increased reliance on connected technologies, and amplified the need for privacy protections and understanding of digital threats. During the past few years, we have also seen an evolving regulatory environment for data protection. Back in 2018, GDPR was just coming into effect, and everyone was trying to figure out its implications. In 2020, we saw California join the party with the introduction of CCPA. It seems likely we will see more privacy regulations follow.
The surprising thing is not this focus on privacy, which we wholeheartedly support, but rather the inclusion and control of IP addresses as personal data or information. We believe security research supports better security outcomes, which in turn enables better privacy. It’s fundamentally challenging to maintain privacy without understanding and addressing security challenges.
Yet IP addresses make up a significant portion of the data being shared in our security research data. While we believe there is absolutely a legitimate interest in processing this kind of data to advance cybersecurity, we also recognize the need to take appropriate balancing controls to protect privacy and ensure that the processing is “necessary and proportionate” — per the language of Recital 49.
Evolving data sharing
So what does this mean? To date, Open Data included two elements:
A free sign-up service that was subject to light vetting and terms of service, and provides access to current and historical research data
Free access (no account required) to a one-month window of recent data from Project Sonar shared on the Rapid7 website
Beginning today, the latter will no longer be available. For the former, we still want to be able to provide data to help security teams and researchers identify and mitigate exposures. Our goals and values on this have not changed in any way since the inception of Open Data. What has evolved — apart from the regulatory landscape — is our thinking on the best ways to do this.
For Rapid7 customers, we launched Project Doppler, a free tool that provides insight into an organization’s external exposures and attack surface. Digging their own specific information out of our mountain of internet-wide scan data is the use case most Rapid7 customers want, so Doppler makes that much, much easier.
We are working on how we might practically extend Project Doppler more broadly to be available for other internal infosec teams, while still protecting privacy in line with regulatory requirements.
For governments, ISACs, and other nonprofits working on security advocacy to reduce opportunities for attackers, please contact us; we believe we share a mission to advance security and want to continue to support you in this. We will continue to provide free access to the data with appropriate balancing controls (such as geo-filtering) and legal agreements (such as for data retention) in place.
For legitimate public research projects, we have a new submission process so you can request access to the Project Sonar data sets for a limited time and subject to conditions for sharing your findings to advance the public good. Please email [email protected] for more information or to make a submission.
While it was not the primary goal or intention behind the Open Data initiative, we recognize that there are also entities using the data for commercial projects. We are not intentionally trying to hinder this, but per privacy regulations, we need to ensure we have more vetting and controls in place. If you are interested in discussing options for incorporating Project Sonar data into a commercial offering, please contact [email protected].
If you have a use case for Project Sonar data that does not fit into one of the categories above, please contact us. We welcome any opportunity to better understand how our data may be useful, and we want to continue to advance security and support the security community as best we can.
More advocacy, better outcomes
While these changes are being triggered by the evolving regulatory landscape, we believe that ultimately they will lead to more productive data sharing and better security outcomes. We’re still not sold on the view that IP addresses should be viewed as personal dataI, but we recognize the value of a more thoughtful and tailored approach to data sharing that both supports data protection values and also promotes more security advocacy and remediation action.
NEVER MISS A BLOG
Get the latest stories, expertise, and news about security today.
The Open Source Security Foundation announced $10 million in funding from a pool of tech and financial companies, including $5 million from Microsoft and Google, to find vulnerabilities in open source projects:
The “Alpha” side will emphasize vulnerability testing by hand in the most popular open-source projects, developing close working relationships with a handful of the top 200 projects for testing each year. “Omega” will look more at the broader landscape of open source, running automated testing on the top 10,000.
This is an excellent idea. This code ends up in all sorts of critical applications.
Log4j would be a prototypical vulnerability that the Alpha team might look for – an unknown problem in a high-impact project that automated tools would not be able to pick up before a human discovered it. The goal is not to use the personnel engaged with Alpha to replicate dependency analysis, for example.
The open source Git project just released Git 2.35, with features and bug fixes from over 93 contributors, 35 of them new. We last caught up with you on the latest in Git back when 2.34 was released. To celebrate this most recent release, here’s GitHub’s look at some of the most interesting features and changes introduced since last time.
When working on a complicated change, it can be useful to temporarily discard parts of your work in order to deal with them separately. To do this, we use the git stash tool, which stores away any changes to files tracked in your Git repository.
Using git stash this way makes it really easy to store all accumulated changes for later use. But what if you only want to store part of your changes in the stash? You could use git stash -p and interactively select hunks to stash or keep. But what if you already did that via an earlier git add -p? Perhaps when you started, you thought you were ready to commit something, but by the time you finished staging everything, you realized that you actually needed to stash it all away and work on something else.
git stash‘s new --staged mode makes it easy to stash away what you already have in the staging area, and nothing else. You can think of it like git commit (which only writes staged changes), but instead of creating a new commit, it writes a new entry to the stash. Then, when you’re ready, you can recover your changes (with git stash pop) and keep working.
git log has a rich set of --format options that you can use to customize the output of git log. These can be handy when sprucing up your terminal, but they are especially useful for making it easier to script around the output of git log.
In our blog post covering Git 2.33, we talked about a new --format specifier called %(describe). This made it possible to include the output of git describe alongside the output of git log. When it was first released, you could pass additional options down through the %(describe) specifier, like matching or excluding certain tags by writing --format=%(describe:match=<foo>,exclude=<bar>).
In 2.35, Git includes a couple of new ways to tweak the output of git describe. You can now control whether to use lightweight tags, and how many hexadecimal characters to use when abbreviating an object identifier.
You can try these out with %(describe:tags=<bool>) and %(describe:abbrev=<n>), respectively. Here’s a goofy example that gives me the git describe output for the last 8 commits in my copy of git.git, using only non-release-candidate tags, and uses 13 characters to abbreviate their hashes:
In our last post, we talked about SSH signing: a new feature in Git that allows you to use the SSH key you likely already have in order to sign certain kinds of objects in Git.
This release includes a couple of new additions to SSH signing. Suppose you use SSH keys to sign objects in a project you work on. To track which SSH keys you trust, you use the allowed signers file to store the identities and public keys of signers you trust.
Now suppose that one of your collaborators rotates their key. What do you do? You could update their entry in the allowed signers file to point at their new key, but that would make it impossible to validate objects signed with the older key. You could store both keys, but that would mean that you would accept new objects signed with the old key.
Git 2.35 lets you take advantage of OpenSSH’s valid-before and valid-after directives by making sure that the object you’re verifying was signed using a signature that was valid when it was created. This allows individuals to rotate their SSH keys by keeping track of when each key was valid without invalidating any objects previously signed using an older key.
Git 2.35 also supports new key types in the user.signingKey configuration when you include the key verbatim (instead of storing the path of a file that contains the signing key). Previously, the rule for interpreting user.signingKey was to treat its value as a literal SSH key if it began with “ssh-“, and to treat it as filepath otherwise. You can now specify literal SSH keys with keytypes that don’t begin with “ssh-” (like ECDSA keys).
If you’ve ever dealt with a merge conflict, you know that accurately resolving conflicts takes some careful thinking. You may not have heard of Git’s merge.conflictStyle setting, which makes resolving conflicts just a little bit easier.
The default value for this configuration is “merge”, which produces the merge conflict markers that you are likely familiar with. But there is a different mode, “diff3”, which shows the merge base in addition to the changes on either side.
Git 2.35 introduces a new mode, “zdiff3”, which zealously moves any lines in common at the beginning or end of a conflict outside of the conflicted area, which makes the conflict you have to resolve a little bit smaller.
For example, say I have a list with a placeholder comment, and I merge two branches that each add different content to fill in the placeholder. The usual merge conflict might look something like this:
Trying again with diff3-style conflict markers shows me the merge base (revealing a comment that I didn’t know was previously there) along with the full contents of either side, like so:
# add more here
The above gives us more detail, but notice that both sides add “foo” and, “bar” at the beginning and “baz” at the end. Trying one last time with zdiff3-style conflict markers moves the “foo” and “bar” outside of the conflicted region altogether. The result is both more accurate (since it includes the merge base) and more concise (since it handles redundant parts of the conflict for us).
# add more here
You may (or may not!) know that Git supports a handful of different algorithms for generating a diff. The usual algorithm (and the one you are likely already familiar with) is the Myers diff algorithm. Another is the --patience diff algorithm and its cousin --histogram. These can often lead to more human-readable diffs (for example, by avoiding a common issue where adding a new function starts the diff by adding a closing brace to the function immediately preceding the new one).
In Git 2.35, --histogram got a nice performance boost, which should make it faster in many cases. The details are too complicated to include in full here, but you can check out the reference below and see all of the improvements and juicy performance numbers.
If you’re a fan of performance improvements (and diff options!), here’s another one you might like. You may have heard of git diff‘s --color-moved option (if you haven’t, we talked about it back in our Highlights from Git 2.17). You may not have heard of the related --color-moved-ws, which controls how whitespace is or isn’t ignored when colorizing diffs. You can think of it like the other space-ignoring options (like --ignore-space-at-eol, --ignore-space-change, or --ignore-all-space), but specifically for when you’re running diff in the --color-moved mode.
Like the above, Git 2.35 also includes a variety of performance improvement for --color-moved-ws. If you haven’t tried --color-moved yet, give it a try! If you already use it in your workflow, it should get faster just by upgrading to Git 2.35.
In case you aren’t familiar with git jump, here’s a quick refresher. git jump populates Vim’s quickfix list with the locations of merge conflicts, grep matches, or diff hunks (by running git jump merge, git jump grep, or git jump diff, respectively).
In Git 2.35, git jump merge learned how to narrow the set of merge conflicts using a pathspec. So if you’re working on resolving a big merge conflict, but you only want to work on a specific section, you can run:
$ git jump merge -- foo
to only focus on conflicts in the foo directory. Alternatively, if you want to skip conflicts in a certain directory, you can use the special negative pathspec like so:
# Skip any conflicts in the Documentation directory for now.
$ git jump merge -- ':^Documentation'
You might have heard of Git’s “clean” and “smudge” filters, which allow users to specify how to “clean” files when staging, or “smudge” them when populating the working copy. Git LFS makes extensive use of these filters to represent large files with stand-in “pointers.” Large files are converted to pointers when staging with the clean filter, and then back to large files when populating the working copy with the smudge filter.
Git has historically used the size_t and unsigned long types relatively interchangeably. This is understandable, since Git was originally written on Linux where these two types have the same width (and therefore, the same representable range of values).
But on Windows, which uses the LLP64 data model, the unsigned long type is only 4 bytes wide, whereas size_t is 8 bytes wide. Because the clean and smudge filters had previously used unsigned long, this meant that they were unable to process files larger than 4GB in size on platforms conforming to LLP64.
The effort to standardize on the correct size_t type to represent object length continues in Git 2.35, which makes it possible for filters to handle files larger than 4GB, even on LLP64 platforms like Windows1.
If you haven’t used Git in a patch-based workflow where patches are emailed back and forth, you may be unaware of the git am command, which extracts patches from a mailbox and applies them to your repository.
Previously, if you tried to git am an email which did not contain a patch, you would get dropped into a state like this:
$ git am /path/to/mailbox
Patch is empty.
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".
This can often happen when you save the entire contents of a patch series, including its cover letter (the customary first email in a series, which contains a description of the patches to come but does not itself contain a patch) and try to apply it.
In Git 2.35, you can specify how git am will behave should it encounter an empty commit with --empty=<stop|drop|keep>. These options instruct am to either halt applying patches entirely, drop any empty patches, or apply them as-is (creating an empty commit, but retaining the log message). If you forgot to specify an --empty behavior but tried to apply an empty patch, you can run git am --allow-empty to apply the current patch as-is and continue.
Returning readers may remember our discussion of the sparse index, a Git features that improves performance in repositories that use sparse-checkout. The aforementioned link describes the feature in detail, but the high-level gist is that it stores a compacted form of the index that grows along with the size of your checkout rather than the size of your repository.
In 2.34, the sparse index was integrated into a handful of commands, including git status, git add, and git commit. In 2.35, command support for the sparse index grew to include integrations with git reset, git diff, git blame, git fetch, git pull, and a new mode of git ls-files.
Speaking of sparse-checkout, the git sparse-checkout builtin has deprecated the git sparse-checkout init subcommand in favor of using git sparse-checkout set. All of the options that were previously available in the init subcommand are still available in the set subcommand. For example, you can enable cone-mode sparse-checkout and include the directory foo with this command:
Git stores references (such as branches and tags) in your repository in one of two ways: either “loose” as a file inside of .git/refs (like .git/refs/heads/main) or “packed” as an entry inside of the file at .git/packed_refs.
But for repositories with truly gigantic numbers of references, it can be inefficient to store them all together in a single file. The reftable proposal outlines the alternative way that JGit stores references in a block-oriented fashion. JGit has been using reftable for many years, but Git has not had its own implementation.
Reftable promises to improve reading and writing performance for repositories with a large number of references. Work has been underway for quite some time to bring an implementation of reftable to Git, and Git 2.35 comes with an initial import of the reftable backend. This new backend isn’t yet integrated with the refs, so you can’t start using reftable just yet, but we’ll keep you posted about any new developments in the future.
In our first episode of Security Nation Season 5, Jen and Tod chat with Mike Hanley, Chief Security Officer at GitHub, all about the major vulnerability in Apache’s Log4j logging library (aka Log4Shell). Mike talks about the ins and outs of GitHub’s response to this blockbuster vulnerability and what could have helped the industry deal with an issue of this massive scope more effectively (hint: he drops the SBOM). They also touch on GitHub’s updated policy on the sharing of exploits.
Stick around for our Rapid Rundown, where Tod and Jen talk about Microsoft’s release of emergency fixes for Windows Server and VPN over Martin Luther King Day weekend.
Mike Hanley is the Chief Security Officer at GitHub. Prior to GitHub, Mike was the Vice President of Security at Duo Security, where he built and led the security research, development, and operations functions. After Duo’s acquisition by Cisco for $2.35 billion in 2018, Mike led the transformation of Cisco’s cloud security framework and later served as CISO for the company. Mike also spent several years at CERT/CC as a Senior Member of the Technical Staff and security researcher focused on applied R&D programs for the US Department of Defense and the Intelligence Community.
When he’s not talking about security at GitHub, Mike can be found enjoying Ann Arbor, MI with his wife and seven kids.
The past few weeks have shown us the importance and wide reach of open-source security. In December 2021, public disclosure of the Log4Shell vulnerability in Log4j, an open-source logging library, caused a cascade of dependency analysis by developers in organizations around the world. The incident was so wide-reaching that representatives from federal agencies and large private-sector companies gathered on January 13, 2022, at a White House meeting to discuss initiatives for securing open-source software.
A large percentage of the software we rely on today is proprietary or closed-source, meaning that the software is fully controlled by the company and closed for independent review. But in most cases, all of the code written to build the proprietary software is not entirely produced by the companies that provide the products and services; instead, they use a third-party library or a component piece of software to help them assemble their solution.
Many of those third-party components are classified as open-source software, meaning the source code is freely available for anyone to use, view, change to correct issues, or enhance to add new functionality. Open-source software projects are frequently maintained by volunteers, and a community of developers and users forms around a shared passion for the software. It’s their passion and collaboration that help projects grow and remain supported.
Finding the resources for open-source security
Yet for the majority of open-source projects that do not have a large corporate backer, the role these individuals play is frequently overlooked by software consumers, and as a result, many open-source projects face maintenance challenges.
Limited resources impose a variety of constraints on projects, but the implications are particularly wide-reaching when we look at the challenge of securing open-source software. Vulnerabilities discovered in proprietary software are the responsibility of the software vendor, frequently better funded than open-source software, with teams available to triage and resolve defects. Better — or any — funding, or broader community participation, may also increase the chance of avoiding vulnerabilities during development or discovering them during quality assurance checks. It can also help developers more quickly identify and resolve vulnerabilities discovered at a future date.
Increasing open-source project funding is a wonderful idea, and it’s in the best interest of companies using such software to build their products and services. However, funding alone won’t increase security in open-source projects, just as the greater source code visibility in open-source hasn’t necessarily resulted in fewer defects or shortened times between defect introduction and resolution.
For example, the vulnerability in Microsoft’s Server Message Block (SMB) protocol implementation (CVE-2017-0144) was around for many years before the defect was resolved in 2017. Similarly, the Log4Shell (CVE-2021-44228) vulnerability in the Log4j project was introduced in 2013, and it remained undiscovered and unresolved until December 2021. There is clearly a massive difference in both funding and available resources to those involved in these projects, and yet both were able to have vulnerable defects exist for years before resolution.
Solving the problem at the source (code)
Accidental software vulnerabilities share similar root causes whether they’re found in proprietary or open-source software. When developers create new features or modify existing ones, we need code reviews that look beyond feature functionality confirmation. We need to inspect the code changes for security issues but also perform a deeper analysis, with attention to the security implications of these changes within the greater scope of the complete project.
The challenge is that not all developers are security practitioners, and that is not a realistic expectation. The limited resources of open-source projects compound the problem, increasing the likelihood that contribution reviews focus primarily on functionality. We should encourage developer training in secure coding practices but understand that mistakes are still possible. That means we need processes and tooling to assist with secure coding.
Security in open-source software carries some other unique challenges due to the open environment. Projects tend to accept a wide variety of contributions from anyone. A new feature might not have enough of a demand to get time from the primary developers, but anyone who takes the time to develop the feature while staying within the bounds of the project’s goals and best practices may very well have their contribution accepted and merged into the project. Projects may find themselves the target of malicious contributions through covert defect introduction. The project may even be sabotaged by a project maintainer, or the original maintainer may want to retire from the project and end up handing it over to another party that — intentionally or not — introduces a defect.
It’s important for us to identify open-source projects that are critical to the software supply chain and ensure these projects are sustainably maintained for the future. These goals would benefit from increased adoption of secure coding practices and infrastructure that ensures secure distribution and verification of software build artifacts.
NEVER MISS A BLOG
Get the latest stories, expertise, and news about security today.
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.