Tag Archives: cybersecurity

REvil is Off-Line

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2021/07/revil-is-off-line.html

This is an interesting development:

Just days after President Biden demanded that President Vladimir V. Putin of Russia shut down ransomware groups attacking American targets, the most aggressive of the groups suddenly went off-line early Tuesday.

[…]

Gone was the publicly available “happy blog” the group maintained, listing some of its victims and the group’s earnings from its digital extortion schemes. Internet security groups said the custom-made sites ­- think of them as virtual conference rooms — where victims negotiated with REvil over how much ransom they would pay to get their data unlocked also disappeared. So did the infrastructure for making payments.

Okay. So either the US took them down, Russia took them down, or they took themselves down.

Why the Robot Hackers Aren’t Here (Yet)

Post Syndicated from Erick Galinkin original https://blog.rapid7.com/2021/07/14/why-the-robot-hackers-arent-here-yet/

Why the Robot Hackers Aren’t Here (Yet)

“Estragon: I’m like that. Either I forget right away or I never forget.” – Samuel Beckett, Waiting for Godot

Hacking and Automation

As hackers, we spend a lot of time making things easier for ourselves.

For example, you might be aware of a tool called Metasploit, which can be used to make getting into a target easier. We’ve also built internet-scale scanning tools, allowing us to easily view data about open ports across the internet. Some of our less ethical comrades-in-arms build worms and botnets to automate the process of doing whatever they want to do.

If the heart of hacking is making things do what they shouldn’t, then perhaps the lungs are automation.

Over the years, we’ve seen security in general and vulnerability discovery in particular move from a risky, shady business to massive corporate-sponsored activities with open marketplaces for bug bounties. We’ve also seen a concomitant improvement in the techniques of hacking.

If hackers had known in 1996 that we’d go from Stack-based buffer overflows to chaining ROP gadgets, perhaps we’d have asserted “no free bugs” earlier on. This maturity has allowed us to find a number of bugs that would have been unbelievable in the early 2000s, and exploits for those bugs are quickly packaged into tools like Metasploit.

Now that we’ve automated the process of running our exploits once they’ve been written, why is it so hard to get machines to find the bugs for us?

This is, of course, not for lack of trying. Fuzzing is a powerful technique that turns up a ton of bugs in an automated way. In fact, fuzzing is powerful enough that loads of folks turn up 0-days while they’re learning how to do fuzzing!

However, the trouble with fuzzing is that you never know what you’re going to turn up, and once you get a crash, there is a lot of work left to be done to craft an exploit and understand how and why the crash occurred — and that’s on top of all the work needed to craft a reliable exploit.

Automated bug finding, like we saw in the DARPA Cyber Grand Challenge, takes this to another level by combining fuzzing and symbolic execution with other program analysis techniques, like reachability and input dependence. But fuzzers and SMT solvers — a program that solves particular types of logic problems — haven’t found all the bugs, so what are we missing?

As with many problems in the last few years, organizations are hoping the answer lies in artificial intelligence and machine learning. The trouble with this hope is that AI is good at some tasks, and bug finding may simply not be one of them — at least not yet.

Learning to Find Bugs

Academic literature is rich with papers aiming to find bugs with machine learning. A quick Google Scholar search turns up over 140,000 articles on the topic as of this writing, and many of these articles seem to promise that, any day now, machine learning algorithms will turn up bugs in your source code.

There are a number of developer tools that suggest this could be true. Tools like Codota, Tabnine, and Kite will help auto-complete your code and are quite good. In fact, Microsoft has used GPT-3 to write code from natural language.

But creating code and finding bugs is sadly an entirely different problem. A 2017 paper written by Chappell et al — a collaboration between Australia’s Queensland University of Technology and the technology giant Oracle — found that a variety of machine learning approaches vastly underperformed Oracle’s Parfait system, which uses more traditional symbolic analysis techniques on the intermediate representations used by the compiler.

Another paper, out of the University of Oslo in Norway, Simulated SQL injection using Q-Learning, a form of Reinforcement Learning. This paper caused a stir in the MLSec community and especially within the DEF CON AI village (full disclosure: I am an officer for the AI village and helped cause the stir). The possibility of using a roomba-like method to find bugs was deeply enticing, and Erdodi et al. did great work.

However, their method requires a very particular environment, and although the agent learned to exploit the specific simulation, the method does not seem to generalize well. So, what’s a hacker to do?

Blaming Our Boots

“Vladimir: There’s man all over for you, blaming on his boots the faults of his feet.” – Samuel Beckett, Waiting for Godot

One of the fundamental problems with throwing machine learning at security problems is that many ML techniques have been optimized for particular types of data. This is particularly important for deep learning techniques.

Images are tensors — a sort of matrix with not just height and width but also color channels — of rectangular bit maps with a range of possible values for each pixel. Natural language is tokenized, and those tokens are mapped into a word embedding, like GloVe or Word2Vec.

This is not to downplay the tremendous accomplishments of these machine learning techniques but to demonstrate that, in order for us to repurpose them, we must understand why they were built this way.

Unfortunately, the properties we find important for computer vision — shift invariance and orientation invariance — are not properties that are important for tasks like vulnerability detection or malware analysis. There is, likewise, a heavy dependence in log analysis and similar tasks on tokens that are unlikely to be in our vocabulary — oddly encoded strings, weird file names, and unusual commands. This makes these techniques unsuitable for many of our defensive tasks and, for similar reasons, mostly useless for generating net-new exploits.

Why doesn’t this work? A few issues are at play here. First, the machine does not understand what it is learning. Machine learning algorithms are ultimately function approximators — systems that see some inputs and some outputs, and figure out what function generated them.

For example, if our dataset is:

X = {1, 3, 7, 11, 2}

Y = {3, 7, 15, 23, 5}

Our algorithm might see the first input and output: 3 = f(1) and guess that f(x) = 3x.

By the second input, it would probably be able to figure out that y = f(x) = 2x + 1.

By the fifth, there would be a lot of good evidence that f(x) = 2x + 1. But this is a simple linear model, with one weight term and one bias term. Once we have to account for a large number of dimensions and a function that turns a label like “cat” into a 32 x 32 image with 3 color channels, approximating that function becomes much harder.

It stands to reason then that the function which maps a few dozen lines of code that spread across several files into a particular class of vulnerability will be harder still to approximate.

Ultimately, the problem is neither the technology on its own nor the data representation on its own. It is that we are trying to use the data we have to solve a hard problem without addressing the underlying difficulties of that problem.

In our case, the challenge is not identifying vulnerabilities that look like others we’ve found before. The challenge is in capturing the semantic meaning of the code and the code flow at a point, and using that information to generate an output that tells us whether or not a certain condition is met.

This is what SAT solvers are trying to do. It is worth noting then that, from a purely theoretical perspective, this is the problem SAT, the canonical NP-Complete problem. It explains why the problem is so hard — we’re trying to solve one of the most challenging problems in computer science!

Waiting for Godot

The Samuel Beckett play, Waiting for Godot, centers around the characters of Vladimir and Estragon. The two characters are, as the title suggests, waiting for a character named Godot. To spoil a roughly 70-year-old play, I’ll give away the punchline: Godot never comes.

Today, security researchers who are interested in using artificial intelligence and machine learning to move the ball forward are in a similar position. We sit or stand by the leafless tree, waiting for our AI Godot. Like Vladimir and Estragon, our Godot will never come if we wait.

If we want to see more automation and applications of machine learning to vulnerability discovery, it will not suffice to repurpose convolutional neural networks, gradient-boosted decision trees, and transformers. Instead, we need to think about the way we represent data and how to capture the relevant details of that data. Then, we need to develop algorithms that can capture, learn, and retain that information.

We cannot wait for Godot — we have to find him ourselves.

China Taking Control of Zero-Day Exploits

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2021/07/china-taking-control-of-zero-day-exploits.html

China is making sure that all newly discovered zero-day exploits are disclosed to the government.

Under the new rules, anyone in China who finds a vulnerability must tell the government, which will decide what repairs to make. No information can be given to “overseas organizations or individuals” other than the product’s manufacturer.

No one may “collect, sell or publish information on network product security vulnerabilities,” say the rules issued by the Cyberspace Administration of China and the police and industry ministries.

This just blocks the cyber-arms trade. It doesn’t prevent researchers from telling the products’ companies, even if they are outside of China.

Iranian State-Sponsored Hacking Attempts

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2021/07/iranian-state-sponsored-hacking-attempts.html

Interesting attack:

Masquerading as UK scholars with the University of London’s School of Oriental and African Studies (SOAS), the threat actor TA453 has been covertly approaching individuals since at least January 2021 to solicit sensitive information. The threat actor, an APT who we assess with high confidence supports Islamic Revolutionary Guard Corps (IRGC) intelligence collection efforts, established backstopping for their credential phishing infrastructure by compromising a legitimate site of a highly regarded academic institution to deliver personalized credential harvesting pages disguised as registration links. Identified targets included experts in Middle Eastern affairs from think tanks, senior professors from well-known academic institutions, and journalists specializing in Middle Eastern coverage.

These connection attempts were detailed and extensive, often including lengthy conversations prior to presenting the next stage in the attack chain. Once the conversation was established, TA453 delivered a “registration link” to a legitimate but compromised website belonging to the University of London’s SOAS radio. The compromised site was configured to capture a variety of credentials. Of note, TA453 also targeted the personal email accounts of at least one of their targets. In subsequent phishing emails, TA453 shifted their tactics and began delivering the registration link earlier in their engagement with the target without requiring extensive conversation. This operation, dubbed SpoofedScholars, represents one of the more sophisticated TA453 campaigns identified by Proofpoint.

The report details the tactics.

News article.

Securing the Supply Chain: Lessons Learned from the Codecov Compromise

Post Syndicated from Justin Pagano original https://blog.rapid7.com/2021/07/09/securing-the-supply-chain-lessons-learned-from-the-codecov-compromise/

Securing the Supply Chain: Lessons Learned from the Codecov Compromise

Supply chain attacks are all the rage these days. While they’re not a new part of the threat landscape, they are growing in popularity among more sophisticated threat actors, and they can create significant system-wide disruption, expense, and loss of confidence across multiple organizations, sectors, or regions. The compromise of Codecov’s Bash Uploader script is one of the latest such attacks. While much is still unknown about the full impact of this incident on organizations around the world, it’s been another wake up call for the world that cybersecurity problems are getting more complex by the day.

This blog post is meant to provide the security community with defensive knowledge and techniques to protect against supply chain attacks involving continuous integration (CI) systems, such as Jenkins, Bamboo, etc., and version control systems, such as GitHub, GitLab, etc. It covers prevention techniques — for software suppliers and consumers — as well as detection and response techniques in the form of a playbook.

It has been co-developed by our Information Security, Security Research, and Managed Detection & Response teams. We believe one of the best ways for organizations to close their security achievement gap and outpace attackers is by openly sharing knowledge about ever-evolving security best practices.

Defending CI systems and source code repositories from similar supply chain attacks

Below are some of the security best practices defenders can use to prevent, detect, and respond to incidents like the Codecov compromise.

Securing the Supply Chain: Lessons Learned from the Codecov Compromise

Figure 1: High-level overview of known Codecov supply chain compromise stages

Prevention techniques

Provide and perform integrity checks for executable code

If you’re a software consumer

Use collision-resistant checksum hashes, such as with SHA-256 and SHA-512, provided by your vendor to validate checksums for all executable files or code they provide. Likewise, verify digital signatures for all executable files or code they provide.

If either of these integrity checks fail, notify your vendor ASAP as this could be a sign of compromised code.

If you’re a software supplier

Provide collision-resistant hashes, such as with SHA-256 and SHA-512, and store checksums out-of-band from their corresponding files (i.e. make it so that an attacker has to successfully carry out two attacks to compromise your code: one against the system hosting your checksum data and another against your content delivery systems). Provide users with easy-to-use instructions, including sample code, for performing checksum validation.

Additionally, digitally sign all executable code using tamper-resistant code signing frameworks such as in-toto and secure software update frameworks such as The Update Framework (TUF) (see DataDog’s blog post about using these tools for reference). Simply signing code with a private key is insufficient since attackers have demonstrated ways to compromise static signing keys stored on servers to forge authentic digital signatures.

Relevant for the following Codecov compromise attack stages:

  • Customers’ CI jobs dynamically load Bash Uploader

Version control third-party software components

Store and load local copies of third-party components in a version control system to track changes over time. Only update them after comparing code differences between versions, performing checksum validation, and authenticating digital signatures.

Relevant for the following Codecov compromise attack stages:

  • Bash Uploader script modified and replaced in GCS

Implement egress filtering

Identify trusted internet-accessible systems and apply host-based or network-based firewall rules to only allow egress network traffic to those trusted systems. Use specific IP addresses and fully qualified domain names whenever possible, and fallback to using IP ranges, subdomains, or domains only when necessary.

Relevant for the following Codecov compromise attack stages:

  • Environment variables, including creds, exfiltrated

Implement IP address safelisting

While zero-trust-networking (ZTN) doubts have been cast on the effectiveness of network perimeter security controls, such as IP address safelisting, they are still one of the easiest and most effective ways to mitigate attacks targeting internet routable systems. IP address safelisting is especially useful in the context of protecting service account access to systems when ZTN controls like hardware-backed device authentication certificates aren’t feasible to implement.

Popular source code repository services, such as GitHub, provide this functionality, although it may require you to host your own server or, if using their cloud hosted option, have multiple organizations in place to host your private repositories separately from your public repositories.

Relevant for the following Codecov compromise attack stages:

  • Creds used to access source code repos
  • Bash Uploader script modified and replaced in GCS

Apply least privilege permissions for CI jobs using job-specific credentials

For any credentials a CI job uses, provide a credential for that specific job (i.e. do not reuse a single credential across multiple CI jobs). Only provision permissions to each credential that are needed for the CI job to execute successfully: no more, no less. This will shrink the blast radius of a credential compromise

Relevant for the following Codecov compromise attack stages:

  • Creds used to access source code repos

Use encrypted secrets management for safe credential storage

If you absolutely cannot avoid storing credentials in source code, use cryptographic tooling such as AWS KMS and the AWS Encryption SDK to encrypt credentials before storing them in source code. Otherwise, store them in a secrets management solution, such as Vault, AWS Secrets Manager, or GitHub Actions Encrypted Secrets (if you’re using GitHub Actions as your CI service, that is).

Relevant for the following Codecov compromise attack stages:

  • Creds used to access source code repos

Block plaintext secrets from code commits

Implement pre-commit hooks with tools like git-secrets to detect and block plaintext credentials before they’re committed to your repositories.

Relevant for the following Codecov compromise attack stages:

  • Creds used to access source code repos

Use automated frequent service account credential rotation

Rotate credentials that are used programmatically (e.g. service account passwords, keys, tokens, etc.) to ensure that they’re made unusable at some point in the future if they’re exposed or obtained by an attacker.

If you’re able to automate credential rotation, rotate them as frequently as hourly. Also, create two “credential rotator” credentials that can both rotate all service account credentials and rotate each other. This ensures that the credential that is used to rotate other credentials is also short lived.

Relevant for the following Codecov compromise attack stages:

  • Creds used to access source code repos

Detection techniques

While we strongly advocate for adopting multiple layers of prevention controls to make it harder for attackers to compromise software supply chains, we also recognize that prevention controls are imperfect by themselves. Having multiple layers of detection controls is essential for catching suspicious or malicious activity that you can’t (or in some cases shouldn’t) have prevention controls for.

Identify Dependencies

You’ll need these in place to create detection rules and investigate suspicious activity:

  1. Process execution logs, including full command line data, or CI job output logs
  2. Network logs (firewall, network flow, etc.), including source and destination IP address
  3. Authentication logs (on-premise and cloud-based applications), including source IP and identity/account name
  4. Activity audit logs (on-premise and cloud-based applications), including source IP and identity/account name
  5. Indicators of compromise (IOCs), including IPs, commands, file hashes, etc.

Ingress from atypical IP addresses or regions

Whether or not you’re able to implement IP address safelisting for accessing certain systems/environments, use an IP address safelist to detect when atypical IP addresses are accessing critical systems that should only be accessed by trusted IPs.

Relevant for the following Codecov compromise attack stages:

  • Bash Uploader script modified and replaced in GCS
  • Creds used to access source code repos

Egress to atypical IP addresses or regions

Whether or not you’re able to implement egress filtering for certain systems/environments, use an IP address safelist to detect when atypical IP addresses are being connected to.

Relevant for the following Codecov compromise attack stages:

  • Environment variables, including creds, exfiltrated

Environment variables being passed to network connectivity processes

It’s unusual for a system’s local environment variables to be exported and passed into processes used to communicate over a network (curl, wget, nc, etc.), regardless of the IP address or domain being connected to.

Relevant for the following Codecov compromise attack stages:

  • Environment variables, including creds, exfiltrated

Response techniques

The response techniques outlined below are, in some cases, described in the context of the IOCs that were published by Codecov. Dependencies identified in “Detection techniques” above are also dependencies for response steps outlined below.

Data exfiltration response steps: CI servers

Identify and contain affected systems and data

  1. Search CI systems’ process logs, job output, and job configuration files to identify usage of compromised third-party components (in regex form). This will identify potentially affected CI systems that have been using the third-party component that is in scope. This is useful for getting a full inventory of potentially affected systems and examining any local logs that might not be in your SIEM

curl (-s )?https://codecov.io/bash

2. Search for known IOC IP addresses in regex form (based on RegExr community pattern)

(.*79\.135\.72\.34|178\.62\.86\.114|104\.248\.94\.23|185\.211\.156\.78|91\.194\.227\.*|5\.189\.73\.*|218\.92\.0\.247|122\.228\.19\.79|106\.107\.253\.89|185\.71\.67\.56|45\.146\.164\.164|118\.24\.150\.193|37\.203\.243\.207|185\.27\.192\.99\.*)

3. Search for known IOC command line pattern(s)

curl -sm 0.5 -d "$(git remote -v)

4. Create forensic image of affected system(s) identified in steps 1 – 3

5. Network quarantine and/or power off affected system(s)

6. Replace affected system(s) with last known good backup, image snapshot, or clean rebuild

7. Analyze forensic image and historical CI system process, job output, and/or network traffic data to identify potentially exposed sensitive data, such as credentials

Search for malicious usage of potentially exposed credentials

  1. Search authentication and activity audit logs for IP address IOCs
  2. Search authentication and activity audit logs for potentially compromised account events originating from IP addresses outside of organization’s known IP addresses
  3. This could potentially uncover new IP address IOCs

Unauthorized access response steps: source code repositories

Clone full historical source code repository content

Note: This content is based on git-based version control systems

Version control systems such as git coincidentally provide forensics-grade information by virtue of them tracking all changes over time. In order to be able to fully search all data from a given repository, certain git commands must be run in sequence.

  1. Set git config to get full commit history for all references (branches, tags), including pull requests, and clone repositories that need to be analyzed (*nix shell script)

git config --global remote.origin.fetch '+refs/pull/*:refs/remotes/origin/pull/*'
# Space delimited list of repos to clone
declare -a repos=("repo1" "repo2" "repo3")
git_url="https://myGitServer.biz/myGitOrg"
# Loop through each repo and clone it locally
for r in ${repos[@]}; do
echo "Cloning $git_url/$r"
git clone "$git_url/$r"
done

2. In the same directory where repositories were cloned from the step above, export full git commit history in text format for each repository. List git committers at top of each file in case they need to be contacted to gather context (*nix shell script)

git fetch --all
for dir in *(/) ; do
(rm $dir.commits.txt
cd $dir
git fetch --all
echo "******COMMITTERS FOR THIS REPO********" >> ../$dir.commits.txt
git shortlog -s -n >> ../$dir.commits.txt
echo "**************************************" >> ../$dir.commits.txt
git log --all --decorate --oneline --graph -p >> ../$dir.commits.txt
cd ..)
done

a. Note: the below steps can be done with tools such as Atom, Visual Studio Code, and Sublime Text and extensions/plugins you can install in them.

If performing manual reviews of these commit history text files, create copies of those files and use the regex below to find and replace git’s log graph formatting that prepends each line of text

^(\|\s*)*(\+|-|\\|/\||\*\|*)*\s*

b. Then, sort the text in ascending or descending order and de-duplicate/unique-ify it. This will make it easier to manually parse.

Search for binary files and content in repositories

Exporting commit history to text files does not export data from any binary files (e.g. ZIP files, XLSX files, etc.). In order to thoroughly analyze source code repository content, binary files need to be identified and reviewed.

  1. Find binary files in folder containing all cloned git repositories based on file extension (*nix shell script)

find -E . -regex '.*\.(jpg|png|pdf|doc|docx|xls|xlsx|zip|7z|swf|atom|mp4|mkv|exe|ppt|pptx|vsd|rar|tiff|tar|rmd|md)'

2. Find binary files in folder containing all cloned git repositories based on MIME type (*nix shell script)

find . -type f -print0 | xargs -0 file --mime-type | grep -e image/jpg -e image/png -e application/pdf -e application/msword -e application/vnd.openxmlformats-officedocument.wordprocessingml.document -e application/vnd.ms-excel -e application/vnd.openxmlformats-officedocument.spreadsheetml.sheet -e application/zip -e application/x-7z-compressed -e application/x-shockwave-flash -e video/mp4 -e application/vnd.ms-powerpoint -e application/vnd.openxmlformats-officedocument.presentationml.presentation -e application/vnd.visio -e application/vnd.rar -e image/tiff -e application/x-targ

3. Find encoded binary content in commit history text files and other text-based files

grep -ir "url(data:" | cut -d\) -f1
grep -ir "base64" | cut -d\" -f1

Search for plaintext credentials: passwords, API keys, tokens, certificate private keys, etc.

  1. Search commit history text files for known credential patterns using tools such as TruffleHog and GitLeaks
  2. Search binary file contents identified in Search for binary files and content in repositories for credentials

Search logs for malicious access to discovered credentials

  1. Follow steps from Data exfiltration response (Searching for malicious usage of potentially exposed credentials) using logs from systems associated with credentials discovered in Search for plaintext credentials

New findings about attacker behavior from Project Sonar

We are fortunate to have a tremendous amount of data at our fingertips thanks to Project Sonar which conducts internet-wide surveys across more than 70 different services and protocols to gain insights into global exposure to common vulnerabilities. We analyzed data from Project Sonar to see if we could gain any additional context about the IP address IOCs associated with Codecov’s Bash Uploader script compromise. What we found was interesting, to say the least:

  • The threat actor set up the first exfiltration server (178.62.86[.]114) on or about February 1, 2021
  • Historical DNS records from remotly[.]ru and seasonver[.]ru have, and continue to, point to this server
  • The threat actor configured a simple HTTP redirect on the exfiltration server to about.codecov.io to avoid detection
    { “http_code”: 301, “http_body”: “”, “server”: “nginx”, “alt-svc”: “clear”, “location”: “http://about.codecov.io/“, “via”: “1.1 google” }
  • The redirect was removed from the exfiltration server on or before February 22, 2021, presumably by the server owner having detected these changes
  • The threat actor set up new infrastructure (104.248.94[.]23) that more closely mirrored Codecov’s GCP setup as their new exfiltration server on or about March 7, 2021
    { “http_code”: 301, “http_body”: “”, “server”: “envoy”, “alt-svc”: “clear”, “location”: “http://about.codecov.io/“, “via”: “1.1 google” }
  • The new exfiltration server was last seen on April 1, 2021

We hope the content in this blog will help defenders prevent, detect, and respond to these types of supply chain attacks going forward.

Introducing the Manual Regex Editor in IDR’s Parsing Tool: Part 2

Post Syndicated from Teresa Copple original https://blog.rapid7.com/2021/07/08/introducing-the-manual-regex-editor-in-idrs-parsing-tool-part-2/

Introducing the Manual Regex Editor in IDR’s Parsing Tool: Part 2

I have logs on my mind right now, because every spring, as trees that didn’t survive the winter are chopped down, my neighbor has truckloads of them delivered to his house. All the logs are eventually burned up in his sugar house and used to make maple syrup, and it reminds me that I have some logs I’d like to burn to the ground, as well!

If you’re reading this blog, chances are you probably have some of these ugly logs that are messy, unstructured, and consequently, difficult to parse. Rather than destroy your ugly logs, however tempting, you can instead use the Custom Parsing tool in InsightIDR to break them up into usable fields.

Specifically, I will discuss here how to use Regex Editor mode, which assumes a general understanding of regular expression. If you aren’t familiar with regular expression or would like a short refresher, you may want to start with Introducing the Manual Regex Editor in IDR’s Parsing Tool: Part 1.

Many logs follow a standard format and can be easily parsed with the Custom Parsing tool using the default guided mode, which is quite easy to use and requires no regular expression.  You can read more about using guided mode here.

If the logs parse well with guided mode, you will likely use it to parse your logs. However, for those logs that lack good structure, common fields, or just do not parse correctly using guided mode, you may need to switch to Regex Editor mode to parse them.

Example 1: Key-Value Pair Logs Separated With Keys

Let’s start by looking at some logs in a classic format, which may not be obvious at first glance.

The first part of these logs has the fields separated by pipes (“|”). Logs that have a common field, like a pipe, colon, space, etc., separating the values are usually easy to parse and can be parsed out in guided mode.

However, the problem with these logs is that the last part contains a string that’s separated by literal strings of text rather than a set character type. For example, if you look at the “msg” field, it is an unstructured string that might contain many different characters — the end of the value is determined by the start of the next key name, “rt”.

May 26 08:25:37 SECRETSERVER CEF:0|Thycotic Software|Secret Server|10.9.000033|500|System Log|3|msg=Login Failure - ts2000.com\hfinn - AuthenticationFailed rt=May 26 2021 08:25:37 src=10.15.4.56 Raw Log/Thycotic Secret Server
May 26 08:25:37 SECRETSERVER CEF:0|Thycotic Software|Secret Server|10.9.000033|500|System Log|3|msg=Login attempt on node SECRETSERVER by user hfinn failed: Login failed. rt=May 26 2021 08:25:37 src=10.15.4.56

Let’s see how parsing these logs with the Custom Parsing Tool works. I have followed the instructions at https://docs.rapid7.com/insightidr/custom-parsing-tool/ and started parsing out my fields. Right away, you can see I’m having a problem parsing out the “msg” field in guided mode. It’s not working like I want it to!

Introducing the Manual Regex Editor in IDR’s Parsing Tool: Part 2

This is where it is a good idea to switch to Regex Editor mode rather than continuing in guided mode. As such, I have selected the Regex Editor, and you can see that displayed here:

Introducing the Manual Regex Editor in IDR’s Parsing Tool: Part 2

Whoa! This regex might look daunting at first, but just remember from our previous lesson that you can use an online regex cheat sheet or expression builder to make sense of any parts you find confusing or unclear.

Here is what the Regex Editor has so far, based on the two fields I added from guided mode (date and message):

^(?P<date>(?:[^:]*:){2}[^ ]+)[^=]*=(?P<message>[^d]*d)

The regular expression for the field I decided to name “message” is the part that’s not working like I want. So, I need to edit this part: (?P<message>[^d]*d).

Remember, this is a capture group, so the regular expression following the field name, “<message>”, will be used to read the logs and extract the values for each one. In other words, in Log Search, the key-value pair will be extracted out from the logs as follows:

(?P<key>regex that will extract the value)

[^d]*d” is not working, so let’s figure out how to replace that. There are a lot of ways to go about crafting a regular expression that will extract the field. For example, one option might be to extract every character until you get to the next literal string:

^(?P<date>(?:[^:]*:){2}[^ ]+)[^=]*=(?P<message>.*)rt=

This works, but it is somewhat inefficient for parsing. It’s outside our scope here to discuss why, but in general, you should not use the dot star, “.*”, for parsing rules. It is much more efficient to define what to match rather than what to not match, so whenever possible, use a character class to define what should be matched.

Create the character class and put all possible characters that appear in the field into it:

^(?P<date>(?:[^:]*:){2}[^ ]+)[^=]*=(?P<message>[\w\s\.-\\]+)

These Thycotic logs have a lot of different characters appearing in the “msg” field, so defining a character class like I’m doing here, “[\w\s\.-\\]”, is a bit like playing pin the tail on the donkey, in that I hope I get them all!

Let’s look at another way to extract a text string when it does not have a standard format. Remember that “\d” will match any digit character. Its opposite is “\D”, which matches any non-digit character. Therefore, “[\d\D]+” matches any digit or non-digit character:

^(?P<date>(?:[^:]*:){2}[^ ]+)[^=]*=(?P<message>[\d\D]+)rt=

One thing to point out here is that, although defining the specific values in the character class was a bit futzy to extract the “msg” field, this method works very well when parsing out host and user names.

In most cases, host and user names will contain only letters, numbers, and the “-” symbol, so a character class of “[\w\d-]” works well to extract them. If the names also contain a FQDN, such as “hfinn.widgets.com”, then you also need to extract the period: “[\w\d\.-]”.

Example 2: Unstructured Logs

Let’s look at some logs that are challenging to parse due to the field structure.

Logs that do not have the same fields in every log and that do not have standard breaks to delineate the fields, while nicely human readable, are more difficult for a machine to parse.

<14>May 27 16:25:31 tkxr7san01.widgets.local MSWinEventLog	1	 Microsoft-Windows-PrintService/Operational	5793	Mon May 27 16:25:31 2021 	805	Microsoft-Windows-PrintService	hfinn	User	Information	 tkxr7san01.widgets.local	Print job diagnostics		Rendering job 71.	 2447840
<14>May 27 16:25:31 tkxr7san01.widgets.local MSWinEventLog	1	 Microsoft-Windows-PrintService/Operational	5794	Mon May 27 16:25:31 2021 	307	Microsoft-Windows-PrintService	hfinn	User	Information	 tkxr7san01.widgets.local	Printing a document		Document 71, Print Document owned by hfinn on lpt-hfinn01 was printed on XEROX1001 through port XEROX1001. Size in bytes: 891342. Pages printed: 2. No user action is required.	2447841

Let’s take a look at how we can parse out the fields in this log using guided mode. When you are using the Custom Parsing Tool, one of the first things you need to decide is if you want to create a filter for the rule:

Introducing the Manual Regex Editor in IDR’s Parsing Tool: Part 2

With logs, the device vendor decides what fields to create and how the logs will be structured. Some vendors create their logs so all have the same fields in every log line, no matter what the event. If your logs look like this, then you will not need to use a filter.

Other vendors define different fields depending on the type of event. In this case, you will probably need to use a filter and create a separate rule for each type of event you want to parse. Another reason to use a filter is that you just want to parse out one type of event.

Looking at my Microsoft print server logs closely, you can see that the second log has quite a few more fields than the first one: document name printed, the document owner, what printer it was printed on, size, pages printed, and if any user action is required. As such, I’m going to need to use filters and create more than one rule here.

The filter should be a literal string that is part of the logs I want to parse. In other words, how can the parsing tool know which logs to apply this rule to? Let’s start with the first type of log:

<14>May 27 16:25:31 tkxr7san01.widgets.local MSWinEventLog	1	 Microsoft-Windows-PrintService/Operational	5793	Mon May 27 16:25:31 2021 	805	Microsoft-Windows-PrintService	hfinn	User	Information	 tkxr7san01.widgets.local	Print job diagnostics		Rendering job 71.	 2447840

Its type is “Print job diagnostics”, so that seems like a good string to match on. I will use that for my filter.

Introducing the Manual Regex Editor in IDR’s Parsing Tool: Part 2

Still in the default guided mode, I’ll start extracting out the fields I want to parse. I don’t need to parse them all, just the ones I care about. As I am working my way through the log, I find that I am not able to extract the username like I want:

Introducing the Manual Regex Editor in IDR’s Parsing Tool: Part 2

I am going to continue for the moment, however, using guided mode to define the last field I need to add.

Introducing the Manual Regex Editor in IDR’s Parsing Tool: Part 2

Let’s pause for a moment. I’ve created four fields. Three of them are fine, but the “source_user” field is not parsing correctly. I am going to switch now to Regex Editor mode to fix it.

Introducing the Manual Regex Editor in IDR’s Parsing Tool: Part 2

The regex created by the Custom Parsing Tool in guided mode is:

^(?:[^ ]* ){3}(?P<source_host>[^ ]+)(?:[^ ]* ){4}(?P<datetime>[^	]+)[^m]+(?P<source_user>[^ ]+)[^P]+(?P<action>[^	]+)

The only part I need to look at is the capture group for the field I called “source_user”:

(?P<source_user>[^ ]+)

With that said, the issue with the rule could be somewhere else in the parsing rules. However, let’s just start with the one capture group. Let’s interpret the character class first: “[^ ]”.  

When the hat symbol (“^”) appears in a character class, it means “not”. The hat is followed by a space, so the character class is “not a space”. Therefore, “[^ ]+” means “read in values that are not spaces” or “read until you get to a space”.  

Looking at the entire parsing rule, you can see it is counting the number of spaces to define what goes into each field. This would work out fine if spaces were the field delimiters, but that’s not how these logs work. The logs are a bit unstructured in the sense that some of the fields are defined by literal strings and others are just literal strings themselves.

Also, guided mode had a few too many beers while trying to cope with these silly logs and decided that the “source_user” field should always start with the letter “m”:

[^m]+(?P<source_user>[^ ]+)  

Oops! We don’t want that! Let’s get rid of it, plus the “[^P]+”, which means to read to a literal capital P: this is how it is rolling past everything to the “Print job diagnostics”, but we can do better than that.

As humans, we know that we want the “action” field to be the literal string “Print job diagnostics”, which the Custom Parsing Tool doesn’t know. Let’s just fix these few things first. I made these changes, clicked on Apply to test them, and got an error:

Introducing the Manual Regex Editor in IDR’s Parsing Tool: Part 2

This error means I’ve goofed, and the rule does not match my logs. I know I’m on the right path, though, so I’m going to continue. The problem here is with how the regex is going from “datetime” to “source_user” and then to “action”.

Let’s stop for a moment to look at this regex:

(?:[^ ]*){3}

The “(?P<keyname>)” structure that we’ve been using is a capture group. The “(?:)” structure is a non-capture group. It means that regex should read this but not actually capture it or do anything with it. It’s also how we are going to skip past the fields we don’t want to parse. The “{3}” means “three times”. Of course, we have already seen that “[^ ]*” means “not a space” or “read until you get to a space”. So, the whole non-capture group “(?:[^ ]*){3}“ means “read everything to the next space, three times” or “skip past everything that is in the log until you have read past three spaces”.

Now, let’s look at an actual log:

Introducing the Manual Regex Editor in IDR’s Parsing Tool: Part 2

The last field we read in was “datetime”, and then, we need to skip over to the “source_user” field. Let’s try to do that by skipping past the three spaces until we get there.

Next, from the “source_user” to the last field, “action”, there are four spaces. Here is my regex:

^(?:[^ ]* ){3}(?P<source_host>[^ ]+)(?:[^ ]* ){4}(?P<datetime>[^ ]+)(?:[^ ]* ){3}(?P<source_user>[^ ]+)(?:[^	]* ){4}(?P<action>Print job diagnostics)

I have added “(?:[^ ]* ){3}“ to skip past 3 spaces and done the same thing later to skip past 4 spaces, using a “{4}” to denote 4 spaces. Let’s see if it works:

Introducing the Manual Regex Editor in IDR’s Parsing Tool: Part 2

The tool seems to be happy with this, as all the fields appear correctly, so I will go ahead with applying and saving this rule.

If you are at this spot with your logs except the tool is not happy and you are getting an error, I have a couple of tips for you to try:

  • Sometimes, the tool works better if you do parse every field, even if you do not particularly care about them. Try parsing every field to see if that works better for you.
  • Occasionally, it is easier to just parse out one (or maybe two or three) fields per rule. This is especially true if the logs are very messy and the fields have little structure. If you are really stuck, try to parse out just one field at a time. It is okay to have several parsing rules per log type, if necessary.
  • Try to proceed one field at a time, if possible. Get the one field extracted correctly, and then proceed to the next one.

When you create a parsing rule, it will apply to new logs that are collected. I have waited a bit for more logs to come in, and I can see they are now parsing as expected.

Introducing the Manual Regex Editor in IDR’s Parsing Tool: Part 2

Now, I need to create a second rule for the next type of log. Here is what those logs look like

<14>May 27 16:25:31 tkxr7san01.widgets.local MSWinEventLog	1	 Microsoft-Windows-PrintService/Operational	5794	Mon May 27 16:25:31 2021 	307	Microsoft-Windows-PrintService	hfinn	User	Information	 tkxr7san01.widgets.local	Printing a document  Document 71, Print Document owned by hfinn on lpt-hfinn01 was printed on XEROX1001 through port XEROX1001. Size in bytes: 891342. Pages printed: 2. No user action is required.	2447841

When you create a second (or third, fourth, etc.) parsing rule for the same logs, the Custom Parsing Tool does not know about the previous rules. You will need to start any additional rules just as you did the first one.

Also, just like before, as I am creating the parsing rule, I will need to apply a filter to match these logs. The type of log is “Printing a document”, so I will use that as the filter.

Again, I will start in guided mode and define the fields I want to start parsing out — it isn’t required to start in guided mode but, sometimes, that is easier. I defined a few fields, and as you can see, the parsing is not working like I need.

Introducing the Manual Regex Editor in IDR’s Parsing Tool: Part 2

Now that I have the fields defined, I’ll switch to Regex Editor mode.

Introducing the Manual Regex Editor in IDR’s Parsing Tool: Part 2

The regex that was generated in guided mode is:

^(?:[^ ]* ){4}(?P<datetime>[^ ]+)[^h]{36}(?P<source_user>[^ ]+)\D{18}(?P<source_host>[^	]+) (?P<action>[^ ]+)	(?P<printed_document>(?:[^ ]* ){3}[^ ]+)(?:[^ ]* ){6}(?P<owner>[^ ]+)

Just like before, I am going to clean this up a bit, starting with the first field and working from left to right, modifying the regex to parse out the keys like I want.

The first part of the regex, “^(?:[^ ]* ){4}(?P<datetime>[^ ]+)”, says to skip past the first four spaces and read the next part into the datetime key until a fifth space is read. This first part is fine and parsing out the field as needed, so let’s move on to the next part, “[^h]{36}(?P<source_user>[^ ]+)”.  

For simplicity and functionality, I am going to swap out the way the regex is skipping over to the username, “[^h]{36}”, with “(?:[^ ]* ){3}”.  This is using the same logic as the first rule: skip past three spaces to find the next value to read in.  

Introducing the Manual Regex Editor in IDR’s Parsing Tool: Part 2

These first two fields are working, so let’s move to the next one, “source_host”. The regex for skipping over to this field and parsing it is: “\D{18}(?P<source_host>[^ ]+)”.

While this might look odd, the “\D{18}” part is how regex skips over the literal string “User Information”.  We looked at “\D” previously; the “\D” means to match “any non-digit character”, and the “{18}” means “18 times”. In other words, skip forward 18 non-digit characters. This is working well, and there is no need for any tinkering.

The next field is “(?P<action>[^ ]+)”, and just like the previous one, this field is properly parsed out. If a field is properly parsed, then there is no need to make any changes.

Now, I am getting to the messiest field in the logs, which is the name of the document that was printed. You can see that the field should contain all the text between “Printing a document” and “owned by”, and it is not being correctly parsed with the auto-generated regex.

Introducing the Manual Regex Editor in IDR’s Parsing Tool: Part 2

Hey, we humans are still better at some things than machines! While this field is easy for us to make out, machine-generated regex has difficulty with these types of fields.

Let’s look at the current method that regex is using. The field is parsed with “(?P<printed_document>(?:[^ ]* ){3}[^ ]+)”, which means the parser is counting spaces to try to determine the field, and that just doesn’t work here. We need to craft a regular expression that will parse everything until we get to “owned by”, no matter what it is.

The best way to do this is to create a character class that includes all the possible characters that might be in the field to be extracted. Because a document can contain any possible character, I am going to use the class “[\d\D]”, which I used previously. To say “followed by the literal string owned by”, I need to be careful and take into account the spaces.

Spaces can be a little tricky in that, sometimes, what your eyes perceive as one space are several. To be on the safe side, instead of specifying one space with “\s” you might want to use “\s+”. You could also put an actual space there. That is what the auto-generated regex has; however, I prefer to use the “\s+” notation as it makes it clear that I want to match one or more spaces.

(?P<printed_document>[\d\D]+)\s+owned\s+by\s+

Before continuing, I am checking to make sure the “printed_document” field is parsed properly.  

Introducing the Manual Regex Editor in IDR’s Parsing Tool: Part 2

It all looks good! If it didn’t, I would fiddle with the parsing until it did. The last field owner is parsing correctly, as well

Drat! I need to extract one more field, and I forgot to define it in guided mode. That’s okay, however, because I can go ahead and add it right now. All I need to do is to create a named capture group for it.

We looked at the syntax for a named capture group earlier:

(?P<key>regex that will extract the value)

I want to collect how many pages were printed, so let’s look at our logs again. I’m working on this part of the log:

owned by mstewart on \\L-mstewart03 was printed on Adobe PDF through port Documents\*.pdf. Size in bytes: 0. Pages printed: 13.

I need to skip forward from the “owned by” value all the way over to “Pages printed”. The fields in between might have different numbers of spaces, characters, etc., which means I can’t count spaces like I did before. I can’t count the number of characters, either.

Let’s use our old friend “[\d\D]”:    

(?:[\d\D]+)Pages printed:\s+(?P<pages_printed>\d+)

The tool seems happy with this, as there are no errors when I click on Apply, and the fields in the Log Preview appear to be correct.

When I apply and save this rule, I wait for a few more logs to come in to see if the parsing rule is working like I want. In Log Search, here is a parsed log line:

Introducing the Manual Regex Editor in IDR’s Parsing Tool: Part 2

You might be wondering what the “\t” is all about. The “\t” is a tab or space character, and I’m not quite satisfied with my parsing rule. I didn’t spot when I saved it that there is a space being captured at the beginning of the field. If you find, after you save a rule, that it isn’t working like you want, you can either delete it and start over or just modify the existing rule.

I am going to modify my rule and fix it. Here is my final regex with the extra spaces accounted for:

^(?:[^ ]* ){4}(?P<datetime>[^ ]+)(?:[^	]* ){3}(?P<source_user>[^	]+)\D{18}(?P<source_host>[^	]+) (?P<action>[^ ]+)\s+(?P<printed_document>[\d\D]+)\s+owned\s+by\s+(?P<owner>[^ ]+)(?:[\d\D]+)Pages printed:\s+(?P<pages_printed>\d+)

And that’s it! With this guide, you have now learned how to use the Custom Parsing tool in InsightIDR to break up ugly logs into usable fields.


Introducing the Manual Regex Editor in IDR’s Parsing Tool: Part 2

Introducing the Manual Regex Editor in IDR’s Parsing Tool: Part 1

Post Syndicated from Teresa Copple original https://blog.rapid7.com/2021/07/06/introducing-the-manual-regex-editor-in-idrs-parsing-tool-part-1/

Introducing the Manual Regex Editor in IDR’s Parsing Tool: Part 1

New to writing regular expressions? No problem. In this two-part blog series, we’ll cover the basics of regular expressions and how to write regular expression statements (regex) to extract fields from your logs while using the custom parsing tool. Like learning any new language, getting started can be the hardest part, so we want to make it as easy as possible for you to get the most out of this new capability quickly and seamlessly.

The ability to analyze and visualize log data — regardless of whether it’s critical for security analytics or not — has been available in InsightIDR for some time. If you prefer to create custom fields from your logs in a non-technical way, you can simply head over to the custom parsing tool, navigate through the parsing tool wizard to find the “extract fields” step, and drag your cursor over the log data you’d like to extract to begin defining field names.

The following guide will give you the basic skills and knowledge you need to write parsing rules with regular expressions.

What Are Regular Expressions?

In technical applications, you occasionally need a way to search through text strings for certain patterns. For example, let’s say you have these log lines, which are text strings:

May 10 12:43:12 SECRETSERVERHOST CEF:0|Thycotic Software|Secret Server|10.9.000002|500|System Log|7|msg=The server could not be contacted. rt=May 10 2021 12:43:12
May 10 12:43:41 SECRETSERVERHOST CEF:0|Thycotic Software|Secret Server|10.9.000002|500|System Log|7|msg=The RPC Server is unavailable. rt=May 10 2021 12:43:41

You need to find the message part of the log lines, which is everything between “msg=” and “rt=”.  With these two log lines, I might hit the easy button and just copy the text manually, but clearly, this approach won’t work if I have hundreds or thousands of lines from which I need to pull the field out.

This is where regular expression, often shortened to regex, comes in. Regex gives you a way to search through text to match patterns, like “msg=”, so you can easily pull out the text you need.

How Does It Work?

I have a secret to share with you about regular expression: It’s really not that hard. If you want to learn it in great depth and understand every feature, that’s a story for another day. However, if you want to learn enough to parse out some fields and get on with your life, these simple tips will help you do just that.

Before we get started, you need to understand that regex has some rules that must be followed.  The best mindset to get the hang of regex, at least for a little while, is to follow the rules without worrying about why.

Here are some of the basic regular expression rules:

  • The entire regular expression is wrapped with forward slashes (“/”).
  • Pattern matches start with backslashes (“\”).
  • It is case-sensitive.
  • It requires you to match every character of the text you are searching.
  • It requires you to learn its special language for matching characters.

The special language of regular expression is how the text you are searching is matched. You need to start the pattern match with a backslash (“\”). After that, you should use a special character to denote what you want to match. For example, a letter or “word character” is matched with “\w” and a number or “digit character” is matched with “\d”.

If we want to match all the characters in a string like:

cat

We can use “\w”, as “\w” matches any “word character” or letter, so:

\w\w\w

This matches the three characters “c”, “a”, and “t”. In other words, the first “\w” matches the “c” character; “\w\w” matches “ca”; and “\w\w\w” matches “cat”.

As you can see, “\w” matches any single letter from “a” to “z” and also matches letters from “A” to “Z”. Remember: Regex is case sensitive.

“\w” also matches any number. However, “\w” does NOT match spaces or other special characters, like “-”, “:”, etc. To match other characters, you need to use their special regex symbols or other methods, which we’ll explore here.

Getting Started With Regex

Before we keep going, now is a good time to take a few minutes to find a regex “cheat sheet” you like.

Rapid7 has one you can use: https://docs.rapid7.com/insightops/regular-expression-search/, or you may have a completely different one you prefer. Whatever the case is for you, these guides are helpful in keeping track of all your matching options.

While we’re at it, let’s also find a regex testing tool we can use to practice our regex. https://regex101.com/ is very popular, as it is a tool and cheat sheet in one, although you may find another tool you want to use instead.

InsightIDR supports the version of regex called RE2, so if your parsing tool supports the Golang/RE2 flavor, you might want to select it to practice the specific flavor InsightIDR uses.

To follow along with me, open your preferred tool for testing regex. Enter in some text to match and some regex, and see what happens!

Introducing the Manual Regex Editor in IDR’s Parsing Tool: Part 1

Let’s look at another way to match the string “cat”. You can use literals, which means you just enter the character you want to match:

Introducing the Manual Regex Editor in IDR’s Parsing Tool: Part 1

This means you literally want to match the string “cat”. It matches “cat” and nothing else.  

Let’s look at another example. Say I need to match the string:

san-dc01

As we saw earlier, you can use “\w” to match the word characters. To match a number, you can use “\w” or “\d”.  “\d” will match any number or “digit”. However, how can you match the “-”?

The dash (“-”) is not a word character, so “\w” does not match it. In this case, we can tell regex we want to match the “-” literally:

\w\w\w-\w\w\d\d

There are other options, as well. The dot or period character (“.”) in regex means to match any single character. This works to parse out the string “san-dc01”, as well:

\w\w\w.\w\w\d\d

While this works, it is tedious typing all these “\w”s. This is where wildcards, sometimes called regex quantifiers, come in handy.

The two most common are:

* match 0 or more characters

+ match 1 or more characters

“\w*” means “match 0 or more word characters”, and “\w+” means “match 1 or more word characters”.

Let’s use these new wildcards to match some text. Say we have these two strings:

cat

san-dc01

I want one regex pattern that will match both strings. Let’s match “cat” first. The regex we used previously:

\w\w\w

matches the string, so you can see that using this wildcard would work, too:

\w+

Now, let’s look at matching “san-dc01”. I can use this:

\w+-\w+

This means “match as many word characters as there are, followed by a dash, and then followed by as many word characters as there are”. However, while this matches “san-dc01”, it does not match “cat”. The string “cat” has no “-” followed by characters.

The regex we added, “-\w+”, only matches a string if the “-” character is part of the string. In addition, “\w+” means “match one or more numbers”. In other words, “\w+” means “match at least one word character up to as many as there are”. As such, we need to use “\w*” here instead to specify that the “dc01” part of the string might not always exist. We also need to use “-*” to specify that the “-” might not always exist in the string we need to match, either.

Therefore, this should work to parse both strings:

\w+-*\w*

By now, you may have noticed something else important about regex: There are usually many different patterns that will match the same text.

Sometimes, I find that people get snooty about their regex, and these people might mock you if they think you could have crafted a shorter pattern or a more efficient one. A pox on their house!  Don’t worry about that right now. It’s more important that your regex pattern works than it is that it be short or impressively intricate.

Let’s look at another way to match our strings: You can use a character class for matching.

The character class is defined by using the square brackets “[“ and “]”. It simply means you want regex to match anything included in your defined class.

This is easier than it sounds! Since our strings “cat” and “san-dc01” contain characters that match either “\w” or the literal “-”, our character class is “[\w-]”.

Now, we can use the “+” to specify that our string has one or more characters from the character class:

[\w-]+

Additional Regular Expressions for Log Parsing

Besides “\w” and “\d”, I have a few more regular expressions I want you to pay close attention to. The first one is “\s”, which is how you match whitespaces.

“\s” will match any whitespace character, and “\s+” will match one or more whitespace characters.

Next, remember that the dot (“.”) will match any character. The dot becomes especially powerful when you combine it with the star (“*”).  Remember: The star means to match 0 or more characters. Therefore, the “dot star” (“.*”) will match any characters as many times as they appear, including matching nothing. In other words, “.*” matches anything.

Finally, let’s look at special uses of the circumflex character, which is often just called the hat (“^”).

The hat has two completely different uses, which should not be confused. First, when used by itself, the hat in regex designates where the beginning of the line starts. For example, “^\w+” means that the line must start with a word.

The second use of the hat is when it appears in a character class. If you use the hat character when defining a character class, it means “everything except” as in “match everything except these characters”. In other words, “[\d]+” would match any digit character, while “[^\d]+” means to do the opposite or match everything except for any digit character!

Log Parsing Examples

Let’s go back to where we started, trying to parse out the msg field from our logs:

May 10 12:43:12 SECRETSERVERHOST CEF:0|Thycotic Software|Secret Server|10.9.000002|500|System Log|7|msg=The server could not be contacted. rt=May 10 2021 12:43:12
May 10 12:43:41 SECRETSERVERHOST CEF:0|Thycotic Software|Secret Server|10.9.000002|500|System Log|7|msg=The RPC Server is unavailable. rt=May 10 2021 12:43:41

Have you copied and pasted these log lines into your regex tester? If not, go ahead and do so now.

We need to parse a literal string “msg=”. These literals in log lines are often the “key” part of a key-value pair and are sometimes called anchors instead of literals, since they are the same in every log line. To parse them, you would usually specify the literal string to match on.

Next, we need to read the value that follows. You have a few different approaches you can use here. A common way to parse the value is to read everything that follows until the next literal or anchor. Remember: There are many ways to do this, but your regex might look like this:

msg=.*rt=

By the way, if you are familiar with regex, you know that the greedy “*” creates inefficient parsing rules, but let’s not worry too much about that right now. The skinny on this, however, is that you should never use the dot star (“.*”) for parsing rules. It is useful for searches and trying to figure out log structure, though.

Another way to read the value is to use a character class:

msg=[\w\s\.]+rt=

Let’s break the character class down to determine exactly what is specified. “\w” means match any word character. “\s” means to match any space. We also need to match a literal period, since that appears in the msg value, but the period or dot has a special meaning in regex. When characters have special meaning in regex, like the slashes, brackets, dot, etc, they need to be “escaped”, which you do by putting a backslash (“\”) in front of them. Therefore, to match the period, we need to use “\.” in the character class.

Remember: Defining a character class means you want to match any character defined in the class, and the “+” at the end of the class means to “match one or more of these characters”. In that case, “[\w\s\.]+” means “match any word character, any space, or a period as many times as it occurs”. The matching will stop when the next character in the sequence is not a word character, a space, or a period OR when the next part of the regex is matched. The next part of the regex is the literal string “rt=”, so the regex will extract the “[\w\s\.]+” characters until it gets to “rt=”.

Finally, there is just one more regex syntax that is helpful to understand when using regex with InsightIDR, and that is the use of capture groups. A capture group is how you define key names in regex. Capture groups are actually much more than this, but let’s narrow our focus to just what we need to know for using them with InsightIDR. To specify a named capture group for our purposes, use this syntax:

(?P<keyname>putyourregexhere)

The regex you put into the capture group is what is used to read the value that is going to match the “keyname”. Let’s see how this works with our logs.

Say we have some logs in key-value pair (KVP) format, and we want to both define and parse out the “msg” key. We know this regex works to match our logs: “msg=[\w\s\.]+rt=”. Now, we need to take this one step further and define the “msg” key and its values. We can do that with a named capture group:

msg=(?P<msg>[\w\s\.]+)rt

Introducing the Manual Regex Editor in IDR’s Parsing Tool: Part 1

Let’s break this down. We want to read in the literal string “msg=” and then place everything after it into a capture group, stopping at the literal string “rt=”. The capture group defines the key, which you can also think of as a “field name”, as “msg”: “(?P<msg>”.

If we wanted to parse the field name as something else, we could specify that in between the “<>”. For example, if we wanted the field name to be “message” instead of “msg”, we would use: “(?P<message>”.

The regex that follows is what we want to read for the value part of our key-value pair. In other words, it is what we want to extract for the “msg”. We already know from our previous work that the character class “[\w\s\.]” matches our logs, so that’s what is used for this regex.

In this blog, we explored the regex syntax we need for InsightIDR and used a generic tool to test our regular expressions. In the next blog, we’ll use what we covered here to write our own parsing rules in InsightIDR using the Custom Parsing Tool in Regex Editor mode.


Introducing the Manual Regex Editor in IDR’s Parsing Tool: Part 1

More Russian Hacking

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2021/07/more-russian-hacking.html

Two reports this week. The first is from Microsoft, which wrote:

As part of our investigation into this ongoing activity, we also detected information-stealing malware on a machine belonging to one of our customer support agents with access to basic account information for a small number of our customers. The actor used this information in some cases to launch highly-targeted attacks as part of their broader campaign.

The second is from the NSA, CISA, FBI, and the UK’s NCSC, which wrote that the GRU is continuing to conduct brute-force password guessing attacks around the world, and is in some cases successful. From the NSA press release:

Once valid credentials were discovered, the GTsSS combined them with various publicly known vulnerabilities to gain further access into victim networks. This, along with various techniques also detailed in the advisory, allowed the actors to evade defenses and collect and exfiltrate various information in the networks, including mailboxes.

News article.

Insurance and Ransomware

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2021/07/insurance-and-ransomware.html

As ransomware becomes more common, I’m seeing more discussions about the ethics of paying the ransom. Here’s one more contribution to that issue: a research paper that the insurance industry is hurting more than it’s helping.

However, the most pressing challenge currently facing the industry is ransomware. Although it is a societal problem, cyber insurers have received considerable criticism for facilitating ransom payments to cybercriminals. These add fuel to the fire by incentivising cybercriminals’ engagement in ransomware operations and enabling existing operators to invest in and expand their capabilities. Growing losses from ransomware attacks have also emphasised that the current reality is not sustainable for insurers either.

To overcome these challenges and champion the positive effects of cyber insurance, this paper calls for a series of interventions from government and industry. Some in the industry favour allowing the market to mature on its own, but it will not be possible to rely on changing market forces alone. To date, the UK government has taken a light-touch approach to the cyber insurance industry. With the market undergoing changes amid growing losses, more coordinated action by government and regulators is necessary to help the industry reach its full potential.

The interventions recommended here are still relatively light, and reflect the fact that cyber insurance is only a potential incentive for managing societal cyber risk.They include: developing guidance for minimum security standards for underwriting; expanding data collection and data sharing; mandating cyber insurance for government suppliers; and creating a new collaborative approach between insurers and intelligence and law enforcement agencies around ransomware.

Finally, although a well-functioning cyber insurance industry could improve cyber security practices on a societal scale, it is not a silver bullet for the cyber security challenge. It is important to remember that the primary purpose of cyber insurance is not to improve cyber security, but to transfer residual risk. As such, it should be one of many tools that governments and businesses can draw on to manage cyber risk more effectively.

Basically, the insurance industry incents companies to do the cheapest mitigation possible. Often, that’s paying the ransom.

News article.

AWS Verified episode 6: A conversation with Reeny Sondhi of Autodesk

Post Syndicated from Stephen Schmidt original https://aws.amazon.com/blogs/security/aws-verified-episode-6-a-conversation-with-reeny-sondhi-of-autodesk/

I’m happy to share the latest episode of AWS Verified, where we bring you global conversations with leaders about issues impacting cybersecurity, privacy, and the cloud. We take this opportunity to meet with leaders from various backgrounds in security, technology, and leadership.

For our latest episode of Verified, I had the opportunity to meet virtually with Reeny Sondhi, Vice President and Chief Security Officer of Autodesk. In her role, Reeny drives security-related strategy and decisions across the company. She leads the teams responsible for the security of Autodesk’s infrastructure, cloud, products, and services, as well as the teams dedicated to security governance, risk & compliance, and security incident response.

Reeny and I touched on a variety of subjects, from her career journey, to her current stewardship of Autodesk’s security strategy based on principles of trust. Reeny started her career in product management, having conceptualized, created, and brought multiple software and hardware products to market. “My passion as a product manager was to understand customer problems and come up with either innovative products or features to help solve them. I tell my team I entered the world of security by accident from product management, but staying in this profession has been my choice. I’ve been able to take the same passion I had when I was a product manager for solving real world customer problems forward as a security leader. Even today, sitting down with my customers, understanding what their problems are, and then building a security program that directly solves these problems, is core to how I operate.”

Autodesk has customers across a wide variety of industries, so Reeny and her team work to align the security program with customer experience and expectations. Reeny has also worked to drive security awareness across Autodesk, empowering employees throughout the organization to act as security owners. “One lesson is consistency in approach. And another key lesson that I’ve learned over the last few years is to demystify security as much as possible for all constituents in the organization. We have worked pretty hard to standardize security practices across the entire organization, which has helped us in scaling security throughout Autodesk.”

Reeny and Autodesk are setting a great example on how to innovate on behalf of their customers, securely. I encourage you to learn more about her perspective on this, and other aspects of how to manage and scale a modern security program, by watching the interview.

Watch my interview with Reeny, and visit the Verified webpage for previous episodes, including conversations with security leaders at Netflix, Comcast, and Vodafone. If you have any suggestions for topics you’d like to see featured in future episodes, please leave a comment below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Steve Schmidt

Steve is Vice President and Chief Information Security Officer for AWS. His duties include leading product design, management, and engineering development efforts focused on bringing the competitive, economic, and security benefits of cloud computing to business and government customers. Prior to AWS, he had an extensive career at the Federal Bureau of Investigation, where he served as a senior executive and section chief. He currently holds 11 patents in the field of cloud security architecture. Follow Steve on Twitter.

Risks of Evidentiary Software

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2021/06/risks-of-evidentiary-software.html

Over at Lawfare, Susan Landau has an excellent essay on the risks posed by software used to collect evidence (a Breathalyzer is probably the most obvious example).

Bugs and vulnerabilities can lead to inaccurate evidence, but the proprietary nature of software makes it hard for defendants to examine it.

The software engineers proposed a three-part test. First, the court should have access to the “Known Error Log,” which should be part of any professionally developed software project. Next the court should consider whether the evidence being presented could be materially affected by a software error. Ladkin and his co-authors noted that a chain of emails back and forth are unlikely to have such an error, but the time that a software tool logs when an application was used could easily be incorrect. Finally, the reliability experts recommended seeing whether the code adheres to an industry standard used in an non-computerized version of the task (e.g., bookkeepers always record every transaction, and thus so should bookkeeping software).

[…]

Inanimate objects have long served as evidence in courts of law: the door handle with a fingerprint, the glove found at a murder scene, the Breathalyzer result that shows a blood alcohol level three times the legal limit. But the last of those examples is substantively different from the other two. Data from a Breathalyzer is not the physical entity itself, but rather a software calculation of the level of alcohol in the breath of a potentially drunk driver. As long as the breath sample has been preserved, one can always go back and retest it on a different device.

What happens if the software makes an error and there is no sample to check or if the software itself produces the evidence? At the time of our writing the article on the use of software as evidence, there was no overriding requirement that law enforcement provide a defendant with the code so that they might examine it themselves.

[…]

Given the high rate of bugs in complex software systems, my colleagues and I concluded that when computer programs produce the evidence, courts cannot assume that the evidentiary software is reliable. Instead the prosecution must make the code available for an “adversarial audit” by the defendant’s experts. And to avoid problems in which the government doesn’t have the code, government procurement contracts must include delivery of source code­ — code that is more-or-less readable by people — ­for every version of the code or device.

The Future of Machine Learning and Cybersecurity

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2021/06/the-future-of-machine-learning-and-cybersecurity.html

The Center for Security and Emerging Technology has a new report: “Machine Learning and Cybersecurity: Hype and Reality.” Here’s the bottom line:

The report offers four conclusions:

  • Machine learning can help defenders more accurately detect and triage potential attacks. However, in many cases these technologies are elaborations on long-standing methods — not fundamentally new approaches — that bring new attack surfaces of their own.
  • A wide range of specific tasks could be fully or partially automated with the use of machine learning, including some forms of vulnerability discovery, deception, and attack disruption. But many of the most transformative of these possibilities still require significant machine learning breakthroughs.
  • Overall, we anticipate that machine learning will provide incremental advances to cyber defenders, but it is unlikely to fundamentally transform the industry barring additional breakthroughs. Some of the most transformative impacts may come from making previously un- or under-utilized defensive strategies available to more organizations.
  • Although machine learning will be neither predominantly offense-biased nor defense-biased, it may subtly alter the threat landscape by making certain types of strategies more appealing to attackers or defenders.

AWS Verified episode 5: A conversation with Eric Rosenbach of Harvard University’s Belfer Center

Post Syndicated from Stephen Schmidt original https://aws.amazon.com/blogs/security/aws-verified-episode-5-a-conversation-with-eric-rosenbach-of-harvard-universitys-belfer-center/

I am pleased to share the latest episode of AWS Verified, where we bring you conversations with global cybersecurity leaders about important issues, such as how to create a culture of security, cyber resiliency, Zero Trust, and other emerging security trends.

Recently, I got the opportunity to experience distance learning when I took the AWS Verified series back to school. I got a taste of life as a Harvard grad student, meeting (virtually) with Eric Rosenbach, Co-Director of the Belfer Center of Science and International Affairs at Harvard University’s John F. Kennedy School of Government. I call it, “Verified meets Veritas.” Harvard’s motto may never be the same again.

In this video, Eric shared with me the Belfer Center’s focus as the hub of the Harvard Kennedy School’s research, teaching, and training at the intersection of cutting edge and interdisciplinary topics, such as international security, environmental and resource issues, and science and technology policy. In recognition of the Belfer Center’s consistently stellar work and its six consecutive years ranked as the world’s #1 university-affiliated think tank, in 2021 it was named a center of excellence by the University of Pennsylvania’s Think Tanks and Civil Societies Program.

Eric’s deep connection to the students reflects the Belfer Center’s mission to prepare future generations of leaders to address critical areas in practical ways. Eric says, “I’m a graduate of the school, and now that I’ve been out in the real world as a policy practitioner, I love going into the classroom, teaching students about the way things work, both with cyber policy and with cybersecurity/cyber risk mitigation.”

In the interview, I talked with Eric about his varied professional background. Prior to the Belfer Center, he was the Chief of Staff to US Secretary of Defense, Ash Carter. Eric was also the Assistant Secretary of Defense for Homeland Defense and Global Security, where he was known around the US government as the Pentagon’s cyber czar. He has served as an officer in the US Army, written two books, been the Chief Security Officer for the European ISP Tiscali, and was a professional committee staff member in the US Senate.

I asked Eric to share his opinion on what the private sector and government can learn from each other. I’m excited to share Eric’s answer to this with you as well as his thoughts on other topics, because the work that Eric and his Belfer Center colleagues are doing is important for technology leaders.

Watch my interview with Eric Rosenbach, and visit the AWS Verified webpage for previous episodes, including interviews with security leaders from Netflix, Vodafone, Comcast, and Lockheed Martin. If you have an idea or a topic you’d like covered in this series, please leave a comment below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Steve Schmidt

Steve is Vice President and Chief Information Security Officer for AWS. His duties include leading product design, management, and engineering development efforts focused on bringing the competitive, economic, and security benefits of cloud computing to business and government customers. Prior to AWS, he had an extensive career at the Federal Bureau of Investigation, where he served as a senior executive and section chief. He currently holds 11 patents in the field of cloud security architecture. Follow Steve on Twitter.

National Security Risks of Late-Stage Capitalism

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2021/03/national-security-risks-of-late-stage-capitalism.html

Early in 2020, cyberspace attackers apparently working for the Russian government compromised a piece of widely used network management software made by a company called SolarWinds. The hack gave the attackers access to the computer networks of some 18,000 of SolarWinds’s customers, including US government agencies such as the Homeland Security Department and State Department, American nuclear research labs, government contractors, IT companies and nongovernmental agencies around the world.

It was a huge attack, with major implications for US national security. The Senate Intelligence Committee is scheduled to hold a hearing on the breach on Tuesday. Who is at fault?

The US government deserves considerable blame, of course, for its inadequate cyberdefense. But to see the problem only as a technical shortcoming is to miss the bigger picture. The modern market economy, which aggressively rewards corporations for short-term profits and aggressive cost-cutting, is also part of the problem: Its incentive structure all but ensures that successful tech companies will end up selling insecure products and services.

Like all for-profit corporations, SolarWinds aims to increase shareholder value by minimizing costs and maximizing profit. The company is owned in large part by Silver Lake and Thoma Bravo, private-equity firms known for extreme cost-cutting.

SolarWinds certainly seems to have underspent on security. The company outsourced much of its software engineering to cheaper programmers overseas, even though that typically increases the risk of security vulnerabilities. For a while, in 2019, the update server’s password for SolarWinds’s network management software was reported to be “solarwinds123.” Russian hackers were able to breach SolarWinds’s own email system and lurk there for months. Chinese hackers appear to have exploited a separate vulnerability in the company’s products to break into US government computers. A cybersecurity adviser for the company said that he quit after his recommendations to strengthen security were ignored.

There is no good reason to underspend on security other than to save money — especially when your clients include government agencies around the world and when the technology experts that you pay to advise you are telling you to do more.

As the economics writer Matt Stoller has suggested, cybersecurity is a natural area for a technology company to cut costs because its customers won’t notice unless they are hacked ­– and if they are, they will have already paid for the product. In other words, the risk of a cyberattack can be transferred to the customers. Doesn’t this strategy jeopardize the possibility of long-term, repeat customers? Sure, there’s a danger there –­ but investors are so focused on short-term gains that they’re too often willing to take that risk.

The market loves to reward corporations for risk-taking when those risks are largely borne by other parties, like taxpayers. This is known as “privatizing profits and socializing losses.” Standard examples include companies that are deemed “too big to fail,” which means that society as a whole pays for their bad luck or poor business decisions. When national security is compromised by high-flying technology companies that fob off cybersecurity risks onto their customers, something similar is at work.

Similar misaligned incentives affect your everyday cybersecurity, too. Your smartphone is vulnerable to something called SIM-swap fraud because phone companies want to make it easy for you to frequently get a new phone — and they know that the cost of fraud is largely borne by customers. Data brokers and credit bureaus that collect, use, and sell your personal data don’t spend a lot of money securing it because it’s your problem if someone hacks them and steals it. Social media companies too easily let hate speech and misinformation flourish on their platforms because it’s expensive and complicated to remove it, and they don’t suffer the immediate costs ­– indeed, they tend to profit from user engagement regardless of its nature.

There are two problems to solve. The first is information asymmetry: buyers can’t adequately judge the security of software products or company practices. The second is a perverse incentive structure: the market encourages companies to make decisions in their private interest, even if that imperils the broader interests of society. Together these two problems result in companies that save money by taking on greater risk and then pass off that risk to the rest of us, as individuals and as a nation.

The only way to force companies to provide safety and security features for customers and users is with government intervention. Companies need to pay the true costs of their insecurities, through a combination of laws, regulations, and legal liability. Governments routinely legislate safety — pollution standards, automobile seat belts, lead-free gasoline, food service regulations. We need to do the same with cybersecurity: the federal government should set minimum security standards for software and software development.

In today’s underregulated markets, it’s just too easy for software companies like SolarWinds to save money by skimping on security and to hope for the best. That’s a rational decision in today’s free-market world, and the only way to change that is to change the economic incentives.

This essay previously appeared in the New York Times.

GPS Vulnerabilities

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2021/02/gps-vulnerabilities.html

Really good op-ed in the New York Times about how vulnerable the GPS system is to interference, spoofing, and jamming — and potential alternatives.

The 2018 National Defense Authorization Act included funding for the Departments of Defense, Homeland Security and Transportation to jointly conduct demonstrations of various alternatives to GPS, which were concluded last March. Eleven potential systems were tested, including eLoran, a low-frequency, high-power timing and navigation system transmitted from terrestrial towers at Coast Guard facilities throughout the United States.

“China, Russia, Iran, South Korea and Saudi Arabia all have eLoran systems because they don’t want to be as vulnerable as we are to disruptions of signals from space,” said Dana Goward, the president of the Resilient Navigation and Timing Foundation, a nonprofit that advocates for the implementation of an eLoran backup for GPS.

Also under consideration by federal authorities are timing systems delivered via fiber optic network and satellite systems in a lower orbit than GPS, which therefore have a stronger signal, making them harder to hack. A report on the technologies was submitted to Congress last week.

GPS is a piece of our critical infrastructure that is essential to a lot of the rest of our critical infrastructure. It needs to be more secure.

Chinese Supply-Chain Attack on Computer Systems

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2021/02/chinese-supply-chain-attack-on-computer-systems.html

Bloomberg News has a major story about the Chinese hacking computer motherboards made by Supermicro, Levono, and others. It’s been going on since at least 2008. The US government has known about it for almost as long, and has tried to keep the attack secret:

China’s exploitation of products made by Supermicro, as the U.S. company is known, has been under federal scrutiny for much of the past decade, according to 14 former law enforcement and intelligence officials familiar with the matter. That included an FBI counterintelligence investigation that began around 2012, when agents started monitoring the communications of a small group of Supermicro workers, using warrants obtained under the Foreign Intelligence Surveillance Act, or FISA, according to five of the officials.

There’s lots of detail in the article, and I recommend that you read it through.

This is a follow on, with a lot more detail, to a story Bloomberg reported on in fall 2018. I didn’t believe the story back then, writing:

I don’t think it’s real. Yes, it’s plausible. But first of all, if someone actually surreptitiously put malicious chips onto motherboards en masse, we would have seen a photo of the alleged chip already. And second, there are easier, more effective, and less obvious ways of adding backdoors to networking equipment.

I seem to have been wrong. From the current Bloomberg story:

Mike Quinn, a cybersecurity executive who served in senior roles at Cisco Systems Inc. and Microsoft Corp., said he was briefed about added chips on Supermicro motherboards by officials from the U.S. Air Force. Quinn was working for a company that was a potential bidder for Air Force contracts, and the officials wanted to ensure that any work would not include Supermicro equipment, he said. Bloomberg agreed not to specify when Quinn received the briefing or identify the company he was working for at the time.

“This wasn’t a case of a guy stealing a board and soldering a chip on in his hotel room; it was architected onto the final device,” Quinn said, recalling details provided by Air Force officials. The chip “was blended into the trace on a multilayered board,” he said.

“The attackers knew how that board was designed so it would pass” quality assurance tests, Quinn said.

Supply-chain attacks are the flavor of the moment, it seems. But they’re serious, and very hard to defend against in our deeply international IT industry. (I have repeatedly called this an “insurmountable problem.”) Here’s me in 2018:

Supply-chain security is an incredibly complex problem. US-only design and manufacturing isn’t an option; the tech world is far too internationally interdependent for that. We can’t trust anyone, yet we have no choice but to trust everyone. Our phones, computers, software and cloud systems are touched by citizens of dozens of different countries, any one of whom could subvert them at the demand of their government.

We need some fundamental security research here. I wrote this in 2019:

The other solution is to build a secure system, even though any of its parts can be subverted. This is what the former Deputy Director of National Intelligence Sue Gordon meant in April when she said about 5G, “You have to presume a dirty network.” Or more precisely, can we solve this by building trustworthy systems out of untrustworthy parts?

It sounds ridiculous on its face, but the Internet itself was a solution to a similar problem: a reliable network built out of unreliable parts. This was the result of decades of research. That research continues today, and it’s how we can have highly resilient distributed systems like Google’s network even though none of the individual components are particularly good. It’s also the philosophy behind much of the cybersecurity industry today: systems watching one another, looking for vulnerabilities and signs of attack.

It seems that supply-chain attacks are constantly in the news right now. That’s good. They’ve been a serious problem for a long time, and we need to take the threat seriously. For further reading, I strongly recommend this Atlantic Council report from last summer: “Breaking trust: Shades of crisis across an insecure software supply chain.

Attack against Florida Water Treatment Facility

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2021/02/attack-against-florida-water-treatment-facility.html

A water treatment plant in Oldsmar, Florida, was attacked last Friday. The attacker took control of one of the systems, and increased the amount of sodium hydroxide — that’s lye — by a factor of 100. This could have been fatal to people living downstream, if an alert operator hadn’t noticed the change and reversed it.

We don’t know who is behind this attack. Despite its similarities to a Russian attack of a Ukrainian power plant in 2015, my bet is that it’s a disgruntled insider: either a current or former employee. It just doesn’t make sense for Russia to be behind this.

ArsTechnica is reporting on the poor cybersecurity at the plant:

The Florida water treatment facility whose computer system experienced a potentially hazardous computer breach last week used an unsupported version of Windows with no firewall and shared the same TeamViewer password among its employees, government officials have reported.

Brian Krebs points out that the fact that we know about this attack is what’s rare:

Spend a few minutes searching Twitter, Reddit or any number of other social media sites and you’ll find countless examples of researchers posting proof of being able to access so-called “human-machine interfaces” — basically web pages designed to interact remotely with various complex systems, such as those that monitor and/or control things like power, water, sewage and manufacturing plants.

And yet, there have been precious few known incidents of malicious hackers abusing this access to disrupt these complex systems. That is, until this past Monday, when Florida county sheriff Bob Gualtieri held a remarkably clear-headed and fact-filled news conference about an attempt to poison the water supply of Oldsmar, a town of around 15,000 not far from Tampa.

SonicWall Zero-Day

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2021/02/sonicwall-zero-day.html

Hackers are exploiting a zero-day in SonicWall:

In an email, an NCC Group spokeswoman wrote: “Our team has observed signs of an attempted exploitation of a vulnerabilitythat affects the SonicWall SMA 100 series devices. We are working closely with SonicWall to investigate this in more depth.”

In Monday’s update, SonicWall representatives said the company’s engineering team confirmed that the submission by NCC Group included a “critical zero-day” in the SMA 100 series 10.x code. SonicWall is tracking it as SNWLID-2021-0001. The SMA 100 series is a line of secure remote access appliances.

The disclosure makes SonicWall at least the fifth large company to report in recent weeks that it was targeted by sophisticated hackers. Other companies include network management tool provider SolarWinds, Microsoft, FireEye, and Malwarebytes. CrowdStrike also reported being targeted but said the attack wasn’t successful.

Neither SonicWall nor NCC Group said that the hack involving the SonicWall zero-day was linked to the larger hack campaign involving SolarWinds. Based on the timing of the disclosure and some of the details in it, however, there is widespread speculation that the two are connected.

The speculation is just that — speculation. I have no opinion in the matter. This could easily be part of the SolarWinds campaign, which targeted other security companies. But there are a lot of “highly sophisticated threat actors” — that’s how NCC Group described them — out there, and this could easily be a coincidence.

Were I working for a national intelligence organization, I would try to disguise my operations as being part of the SolarWinds attack.

EDITED TO ADD (2/9): SonicWall has patched the vulnerability.

Presidential Cybersecurity and Pelotons

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2021/02/presidential-cybersecurity-and-pelotons.html

President Biden wants his Peloton in the White House. For those who have missed the hype, it’s an Internet-connected stationary bicycle. It has a screen, a camera, and a microphone. You can take live classes online, work out with your friends, or join the exercise social network. And all of that is a security risk, especially if you are the president of the United States.

Any computer brings with it the risk of hacking. This is true of our computers and phones, and it’s also true about all of the Internet-of-Things devices that are increasingly part of our lives. These large and small appliances, cars, medical devices, toys and — yes — exercise machines are all computers at their core, and they’re all just as vulnerable. Presidents face special risks when it comes to the IoT, but Biden has the NSA to help him handle them.

Not everyone is so lucky, and the rest of us need something more structural.

US presidents have long tussled with their security advisers over tech. The NSA often customizes devices, but that means eliminating features. In 2010, President Barack Obama complained that his presidential BlackBerry device was “no fun” because only ten people were allowed to contact him on it. In 2013, security prevented him from getting an iPhone. When he finally got an upgrade to his BlackBerry in 2016, he complained that his new “secure” phone couldn’t take pictures, send texts, or play music. His “hardened” iPad to read daily intelligence briefings was presumably similarly handicapped. We don’t know what the NSA did to these devices, but they certainly modified the software and physically removed the cameras and microphones — and possibly the wireless Internet connection.

President Donald Trump resisted efforts to secure his phones. We don’t know the details, only that they were regularly replaced, with the government effectively treating them as burner phones.

The risks are serious. We know that the Russians and the Chinese were eavesdropping on Trump’s phones. Hackers can remotely turn on microphones and cameras, listening in on conversations. They can grab copies of any documents on the device. They can also use those devices to further infiltrate government networks, maybe even jumping onto classified networks that the devices connect to. If the devices have physical capabilities, those can be hacked as well. In 2007, the wireless features of Vice President Richard B. Cheney’s pacemaker were disabled out of fears that it could be hacked to assassinate him. In 1999, the NSA banned Furbies from its offices, mistakenly believing that they could listen and learn.

Physically removing features and components works, but the results are increasingly unacceptable. The NSA could take Biden’s Peloton and rip out the camera, microphone, and Internet connection, and that would make it secure — but then it would just be a normal (albeit expensive) stationary bike. Maybe Biden wouldn’t accept that, and he’d demand that the NSA do even more work to customize and secure the Peloton part of the bicycle. Maybe Biden’s security agents could isolate his Peloton in a specially shielded room where it couldn’t infect other computers, and warn him not to discuss national security in its presence.

This might work, but it certainly doesn’t scale. As president, Biden can direct substantial resources to solving his cybersecurity problems. The real issue is what everyone else should do. The president of the United States is a singular espionage target, but so are members of his staff and other administration officials.

Members of Congress are targets, as are governors and mayors, police officers and judges, CEOs and directors of human rights organizations, nuclear power plant operators, and election officials. All of these people have smartphones, tablets, and laptops. Many have Internet-connected cars and appliances, vacuums, bikes, and doorbells. Every one of those devices is a potential security risk, and all of those people are potential national security targets. But none of those people will get their Internet-connected devices customized by the NSA.

That is the real cybersecurity issue. Internet connectivity brings with it features we like. In our cars, it means real-time navigation, entertainment options, automatic diagnostics, and more. In a Peloton, it means everything that makes it more than a stationary bike. In a pacemaker, it means continuous monitoring by your doctor — and possibly your life saved as a result. In an iPhone or iPad, it means…well, everything. We can search for older, non-networked versions of some of these devices, or the NSA can disable connectivity for the privileged few of us. But the result is the same: in Obama’s words, “no fun.”

And unconnected options are increasingly hard to find. In 2016, I tried to find a new car that didn’t come with Internet connectivity, but I had to give up: there were no options to omit that in the class of car I wanted. Similarly, it’s getting harder to find major appliances without a wireless connection. As the price of connectivity continues to drop, more and more things will only be available Internet-enabled.

Internet security is national security — not because the president is personally vulnerable but because we are all part of a single network. Depending on who we are and what we do, we will make different trade-offs between security and fun. But we all deserve better options.

Regulations that force manufacturers to provide better security for all of us are the only way to do that. We need minimum security standards for computers of all kinds. We need transparency laws that give all of us, from the president on down, sufficient information to make our own security trade-offs. And we need liability laws that hold companies liable when they misrepresent the security of their products and services.

I’m not worried about Biden. He and his staff will figure out how to balance his exercise needs with the national security needs of the country. Sometimes the solutions are weirdly customized, such as the anti-eavesdropping tent that Obama used while traveling. I am much more worried about the political activists, journalists, human rights workers, and oppressed minorities around the world who don’t have the money or expertise to secure their technology, or the information that would give them the ability to make informed decisions on which technologies to choose.

This essay previously appeared in the Washington Post.

New iMessage Security Features

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2021/01/new-imessage-security-features.html

Apple has added added security features to mitigate the risk of zero-click iMessage attacks.

Apple did not document the changes but Groß said he fiddled around with the newest iOS 14 and found that Apple shipped a “significant refactoring of iMessage processing” that severely cripples the usual ways exploits are chained together for zero-click attacks.

Groß notes that memory corruption based zero-click exploits typically require exploitation of multiple vulnerabilities to create exploit chains. In most observed attacks, these could include a memory corruption vulnerability, reachable without user interaction and ideally without triggering any user notifications; a way to break ASLR remotely; a way to turn the vulnerability into remote code execution;; and a way to break out of any sandbox, typically by exploiting a separate vulnerability in another operating system component (e.g. a userspace service or the kernel).