Friday Squid Blogging—18th Anniversary Post: New Species of Pygmy Squid Discovered

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2024/01/friday-squid-blogging-new-species-of-pygmy-squid-discovered.html

They’re Ryukyuan pygmy squid (Idiosepius kijimuna) and Hannan’s pygmy squid (Kodama jujutsu). The second one represents an entire new genus.

As usual, you can also use this squid post to talk about the security stories in the news that I haven’t covered.

And, yes, this is the eighteenth anniversary of Friday Squid Blogging. The first squid post is from January 6, 2006, and I have been posting them weekly since then. Never did I believe there would be so much to write about squid—but the links never seem to end.

Read my blog posting guidelines here.

Metasploit Weekly Wrap-Up 1/05/2024

Post Syndicated from Jacquie Harris original https://blog.rapid7.com/2024/01/05/metasploit-weekly-wrap-up-40/

New module content (2)

Splunk __raw Server Info Disclosure

Metasploit Weekly Wrap-Up 1/05/2024

Authors: KOF2002, h00die, and n00bhaxor
Type: Auxiliary
Pull request: #18635 contributed by n00bhaxor
Path: gather/splunk_raw_server_info

Description: This PR adds a module for an authenticated Splunk information disclosure vulnerability. This module gathers information about the host machine and the Splunk install including OS version, build, CPU arch, Splunk license keys, etc.

[msf](Jobs:0 Agents:0) > use auxiliary/gather/splunk_raw_server_info 
[msf](Jobs:0 Agents:0) auxiliary(gather/splunk_raw_server_info) > set rhosts 127.0.0.1
rhosts => 127.0.0.1
[msf](Jobs:0 Agents:0) auxiliary(gather/splunk_raw_server_info) > set username admin
username => admin
[msf](Jobs:0 Agents:0) auxiliary(gather/splunk_raw_server_info) > set password splunksplunk
password => splunksplunk
[msf](Jobs:0 Agents:0) auxiliary(gather/splunk_raw_server_info) > set verbose true
verbose => true
[msf](Jobs:0 Agents:0) auxiliary(gather/splunk_raw_server_info) > run
[*] Running module against 127.0.0.1
[+] Output saved to /root/.msf4/loot/20231220204049_default_127.0.0.1_splunk.system.st_943292.json
[+] Hostname: 523a845e8652
[+] CPU Architecture: x86_64
[+] Operating System: Linux
[+] OS Build: #1 SMP PREEMPT_DYNAMIC Debian 6.5.6-1kali1 (2023-10-09)
[+] OS Version: 6.5.0-kali3-amd64
[+] Splunk Version: 7.1.0
[+] Trial Version?: false
[+] Splunk Forwarder?: false
[+] Splunk Product Type: splunk
[+] License State: OK
[+] License Key(s): ["FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF"]
[+] Splunk Server Roles: ["indexer", "license_master"]
[+] Splunk Server Startup Time: 2023-12-21 01:40:02
[*] Auxiliary module execution completed

Craft CMS unauthenticated Remote Code Execution (RCE)

Authors: Thanh, chybeta, and h00die-gr3y [email protected]
Type: Exploit
Pull request: #18612 contributed by h00die-gr3y
Path: linux/http/craftcms_unauth_rce_cve_2023_41892

Description: This adds an exploit module that leverages a remote code execution vulnerability in CraftCMS versions between 4.0.0-RC1 and 4.4.14. This vulnerability is identified as CVE-2023-41892 and allows an unauthenticated attacker to execute arbitrary code remotely.

Enhancements and features (2)

  • #18610 from sjanusz-r7 – This PR enables the Metasploit Payload Warnings feature by default. When enabled Metasploit will output warnings about missing Metasploit payloads, for instance if they were removed by antivirus.
  • #18632 from jvoisin – This PR adds improvements to the Glibc Tunables Privilege Escalation module. In the event the file command is not present on the target the module will try to use the readelf command to get the ld.so build ID and determine whether or not the target is compatible with the exploit.

Documentation

You can find the latest Metasploit documentation on our docsite at docs.metasploit.com.

Get it

As always, you can update to the latest Metasploit Framework with msfupdate
and you can get more details on the changes since the last blog post from
GitHub:

If you are a git user, you can clone the Metasploit Framework repo (master branch) for the latest.
To install fresh without using git, you can use the open-source-only Nightly Installers or the
commercial edition Metasploit Pro

New Open Source Tool for Consistency in Cassandra Migrations

Post Syndicated from Elliott Sims original https://www.backblaze.com/blog/new-open-source-tool-for-consistency-in-cassandra-migrations/

A decorative image showing the Cassandra logo with a function represented by two servers on either side of the logo.

Sometimes you find a problem because something breaks, and other times you find a problem the good way—by thinking it through before things break. This is a story about one of those bright, shining, lightbulb moments when you find a problem the good way.

On the Backblaze Site Reliability Engineering (SRE) team, we were thinking through an upcoming datacenter migration in Cassandra. We were running through all of the various types of queries we would have to do when we had the proverbial “aha” moment. We discovered an inconsistency in the way Cassandra handles lightweight transactions (LWTs).

If you’ve ever tried to do a datacenter migration in Cassandra and something got corrupted in the process but you couldn’t figure out why or how—this might be why. I’m going to walk through a short intro on Cassandra, how we use it, and the issue we ran into. Then, I’ll explain the workaround, which we open sourced. 

Get the Open Source Code

You can download the open source code from our Git repository. We’d love to know how you’re using it and how it’s working for you—let us know in the comments.

How We Use Cassandra

First, if you’re not a Cassandra dev, I should mention that when we say “datacenter migration” it means something slightly different in Cassandra than what it sounds like. It doesn’t mean a data center migration in the physical sense (although you can use datacenter migrations in Cassandra when you’re moving data from one physical data center to another). In the simplest terms, it involves moving data between two Cassandra or Cassandra-compatible database replica sets within a cluster.

And, if you’re not familiar with Cassandra at all, it’s an open-source, NoSQL, distributed database management system. It was created to handle large amounts of data across many commodity servers, so it fits our use case—lots of data, lots of servers. 

At Backblaze, we use Cassandra to index filename to location for data stored in Backblaze B2, for example. Because it’s customer data and not just analytics, we care more about durability and consistency than some other applications of Cassandra. We run with three replicas in a single datacenter and “batch” mode to require writes to be committed to disk before acknowledgement rather than the default “periodic.”

Datacenter migrations are an important aspect of running Cassandra, especially on bare metal. We do a few datacenter migrations per year either for physical data moves, hardware refresh, or to change certain cluster layout parameters like tokens per host that are otherwise static. 

What Are LWTs and Why Do They Matter for Datacenter Migrations in Cassandra?

First of all, LWTs are neither lightweight nor transactions, but that’s neither here nor there. They are an important feature in Cassandra. Here’s why. 

Cassandra is great at scaling. In something like a replicated SQL cluster, you can add additional replicas for read throughput, but not writes. Cassandra scales writes (as well as reads) nearly linearly with the number of hosts—into the hundreds. Adding nodes is a fairly straightforward and “automagic” process as well, with no need to do something like manual token range splits. It also handles individual down nodes with little to no impact on queries. Unfortunately, these properties come with a trade-off: a complex and often nonintuitive consistency model that engineers and operators need to understand well.

In a distributed database like Cassandra, data is replicated across multiple nodes for durability and availability. 

Although databases generally allow multiple reads and writes to be submitted at once, they make it look to the outside world like all the operations are happening in order, one at a time. This property is known as serializability, and Cassandra is not serializable. Although it does have a “last write wins” system, there’s no transaction isolation and timestamps can be identical. 

It’s possible, for example, to have a row that has some columns from one write and other columns from another write. It’s safe if you’re only appending additional rows, but mutating existing rows safely requires careful design. Put another way, you can have two transactions with different data that, to the system, appear to have equal priority. 

How Do LWTs Solve This Problem?

As a solution for cases where stronger consistency is needed, Cassandra has a feature called “Lightweight Transactions” or LWTs. These are not really identical to traditional database transactions, but provide a sort of “compare and set” operation that also guarantees pending writes are completed before answering a read. This means if you’re trying to change a row’s value from “A” to “B”, a simultaneous attempt to change that row from “A” to “C” will return a failure. This is accomplished by doing a full—not at all lightweight—Paxos round complete with multiple round trips and slow expensive retries in the event of a conflict.

In Cassandra, the minimum consistency level for read and write operations is ONE, meaning that only a single replica needs to acknowledge the operation for it to be considered successful. This is fast, but in a situation where you have one down host, it could mean data loss, and later reads may or may not show the newest write depending on which replicas are involved and whether they’re received the previous write. For better durability and consistency, Cassandra also provides various quorum levels that require a response from multiple replicas, as well as an ALL consistency that requires responses from every replica.

Cassandra Is My Type of Database

Curious to know more about consistency limitations and LWTs in Cassandra? Christopher Batey’s presentation at the 2016 Cassandra Summit does a good job of explaining the details.

The Problem We Found With LWTs During Datacenter Migrations

Usually we use one datacenter in Cassandra, but there are circumstances where we sometimes stand up a second datacenter in the cluster and migrate to it, then tear down the original. We typically do this either to change num_tokens, to move data when we’re refreshing hardware, or to physically move to another nearby data center.

The TL:DR

We reasoned through the interaction between LWTs/serial and datacenter migrations and found a hole—there’s no guarantee of LWT correctness during a topology change (that is, a change to the number of replicas) large enough to change the number of replicas needed to satisfy quorum. It turns out that combining LWTs and datacenter migrations can violate consistency guarantees in subtle ways without some specific steps and tools to work around it.

The Long Version

Let’s say you are standing up a new datacenter, and you need to copy an existing datacenter to it. So, you have two datacenters—datacenter A, the existing datacenter, and datacenter B, the new datacenter. Let’s say datacenter A has three replicas you need to copy for simplicity’s sake, and you’re using quorum writes to ensure consistency.

Refresher: What is Quorum-Based Consistency in Cassandra?

Quorum consistency in Cassandra is based on the concept that a specific number of replicas must participate in a read or write operation to ensure consistency and availability—a majority (n/2 +1) of the nodes must respond before considering the operation as successful. This ensures that the data is durably stored and available even if a minority of replicas are unavailable.

You have different types of quorum you can choose from, and here’s how those defaults make a decision: 

  • Local quorum: Two out of the three replicas in the datacenter I’m talking to must respond in order to return success. I don’t care about the other datacenter.
  • Global quorum: Four out of the six total replicas must respond in order to return success, and it doesn’t matter which datacenter they come from.
  • Each quorum: Two out of the three replicas in each datacenter must respond in order to return success.

Most of these quorum types also have a serial equivalent for LWTs.

Type of Quorum Serial Regular
Local LOCAL_SERIAL LOCAL_QUORUM
Each unsupported EACH_QUORUM
Global SERIAL QUORUM

The problem you might run into, however, is that LWTs do not have an each_serial mode. They only have local and global. There’s no way to tell the LWT you want quorum in each datacenter. 

local_serial is good for performance, but transactions on different datacenters could overlap and be inconsistent. serial is more expensive, but normally guarantees correctness as long as all queries agree on cluster size. But what if a query straddles a topology change that changes quorum size? 

Let’s use global quorum to show how this plays out. If a LWT starts when RF=3, at least two hosts must process it. 

While it’s running, the topology changes to two datacenters (A and B) each with RF=3 (so six replicas total) with a quorum of four. There’s a chance that a query affecting the same partition could then run without overlapping nodes, which means consistency guarantees are not maintained for those queries. For that query, quorum is four out of six where those four could be the three replicas in datacenter B and the remaining replica in datacenter A. 

Those two queries are on the same partition, but they’re not overlapping any hosts, so they don’t know about each other. It violates the LWT guarantees.

The Solution to LWT Inconsistency

What we needed was a way to make sure that the definition of “quorum” didn’t change too much in the middle of a LWT running. Some change is okay, as long as old and new are guaranteed to overlap.

To account for this, you need to change the replication factor one level at a time and make sure there are no transactions still running that started before the previous topology change before you make the next. Three replicas with a quorum of two can only change to four replicas with a quorum of three. That way, at least one replica must overlap. The same thing happens when you go from four to five replicas or five to six replicas. This also applies when reducing the replication factor, such as when tearing down the old datacenter after everything has moved to the new one.

Then, you just need to make sure no LWT overlaps multiple changes. You could just wait long enough that they’ve timed out, but it’s better to be sure. This requires querying the internal-only system.paxos table on each host in the cluster between topology changes.

We built a tool that checks to see whether there are still transactions running from before we made a topology change. It reads system.paxos on each host, ignoring any rows with proposal_ballot=null, and records them. Then after a short delay, it re-reads system.paxos, ignoring any rows that weren’t present in the previous run, or any with proposal_ballot=null in either read, or any where in_progress_ballot has changed. Any remaining rows are potentially active transactions. 

This worked well the first few times that we used it, on 3.11.6. To our surprise, when we tried to migrate a cluster running 3.11.10 the tool reported hundreds of thousands of long-running LWTs. After a lot of digging, we found a small (but fortunately well-commented) performance optimization added as part of a correctness fix (CASSANDRA-12126), which means proposal_ballot does not get set to null if the proposal is empty/noop. To work around this, we had to actually parse the proposal field. Fortunately all we need is the is_empty flag in the third field, so no need to reimplement the full parsing code. A big impact to us for a seemingly small and innocuous change piggy-backed onto a correctness fix, but that’s the risk of directly reading internal-only tables. 

We’ve used the tool several times now for migrations with good results, but it’s still relatively basic and limited. It requires running repeatedly until all transactions are complete, and sometimes manual intervention to deal with incomplete transactions. In some cases we’ve been able to force-commit a long-pending LWT by doing a SERIAL read of the partition affected, but in a couple of cases we actually ended up running across LWTs that still didn’t seem to complete. Fortunately in every case so far it was in a temporary table and a little work allowed us to confirm that we no longer needed the partition at all and could just delete it.

Most people who use Cassandra may never run across this problem, and most of those who do will likely never track down what caused the small mystery inconsistency around the time they did a datacenter migration. If you rely on LWTs and are doing a datacenter migration, we definitely recommend going through the extra steps to guarantee consistency until and unless Cassandra implements an EACH_SERIAL consistency level.

Using the Tool

If you want to use the tool for yourself to help maintain consistency through datacenter migrations, you can find it here. Drop a note in the comments to let us know how it’s working for you and if you think of any other ways around this problem—we’re all ears!

If You’ve Made It This Far

You might be interested in signing up for our Developer Newsletter where our resident Chief Technical Evangelist, Pat Patterson, shares the latest and greatest ways you can use B2 Cloud Storage in your applications.

The post New Open Source Tool for Consistency in Cassandra Migrations appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Rapid7’s Data-Centric Approach to AI in Belfast

Post Syndicated from Rapid7 original https://blog.rapid7.com/2024/01/05/rapid7s-data-centric-approach-to-ai-in-belfast/

Rapid7’s Data-Centric Approach to AI in Belfast

Rapid7 has expanded significantly in Belfast since establishing a presence back in 2014, resulting in the company’s largest R&D hub outside the US with over 350 people spread across eight floors in our Chichester Street office.  There is a wide range of product development and engineering across the entire Rapid7 platform that happens here, but nearest and dearest to our hearts is that Belfast has really become the epicentre for our more than two decades of investment in data. It has formed the bedrock for our AI, machine learning and data science efforts.

Read on to find out more about the importance of data and AI at Rapid7!

Rapid7’s Data-Centric Approach to AI in Belfast

A Forward-thinking Data Attitude

First up let’s talk data.  We’ve had a specialist data presence in Belfast for a number of years, initially focused on the consumption, distribution, and analytics for quality product usage data, via interfaces such as Amazon SNS/SQS, piping data into time-series data stores like TimescaleDB and InfluxDB.  Product usage data is unique due to its high volume and cardinality, which these data stores are optimized for.  The evolution of data at Rapid7 required more scale, so we’ve been introducing more scalable technologies such as Apache Kafka, Spark, and Iceberg.  This stack will enable multiple entry points for access to our data.

  • Apache Kafka, the heart of our data infrastructure, is a distributed streaming platform allowing us to handle real-time data streams with ease.  Kafka acts as a reliable and scalable pipeline, ingesting massive amounts of data from various sources in real-time. Its pub-sub architecture ensures data is efficiently distributed across the system, with multiple types of consumers per topic enabling teams to process and analyze information as it flows.
  • Transforms run via Apache Spark, serving as the processing engine taking our data to the next level, opening us up to either batch processing or streaming directly from Kafka where we ultimately land the data into an object store with the Iceberg layer on top.
  • Apache Iceberg is an open table format for large-scale datasets, providing ACID transactions, schema evolution, and time travel capabilities. These features are instrumental in maintaining consistency and reliability of our data, which is crucial for AI and analytics at Rapid7.  Additionally, the ability to perform time travel queries with Iceberg allows us to analyze and curate historical data, an essential component in building predictive AI models.

Our Engineers are constantly developing applications on this stack to facilitate ETL pipelines in languages such as Python, Java and Scala, running within K8s clusters.  In keeping with our forward-thinking attitude to data, we continue to adopt new tooling and governance to facilitate growth, such as the introduction of a Data Catalog to visualize lineage with a searchable interface for metadata.  In this way, users self-discover data but also learn how specific datasets are related, giving a clearer understanding of usage.  All this empowers data and AI Engineers to discover and ingest the vast pools of data available at Rapid7.

Our Data-Centric Approach to AI

At the core of our AI engineering is a data-centric approach, in line with the recent gradual trend away from model-centric approaches.  We find model design isn’t always the differentiator: more often it’s all about the data.  In our experience when comparing different models for a classification task, say for example a set of neural networks plus some conventional classifiers, they may all perform fairly similarly with high-quality data.  With data front and centre at Rapid7 as a key part of our overall strategy, major opportunities exist for us to leverage high-quality, high-volume datasets in our AI solutions, leading to better results.

Of course, there are always cases where model design is more influential.  However marginal performance gains, often seen in novelty-focused academia, may not be worth the extra implementation effort or compute expense in practice. Not to say we aren’t across new models and architectures – we review them every week – though the classic saying “avoid unnecessary complexity” also applies to AI.

The Growing Importance of Data in GenAI

Let’s take genAI LLMs as an example.  Anyone following developments may have noticed training data is getting more of the spotlight, again indicative of the data-centric approach.  LLMs, based on transformer architectures, have been getting bigger, and vendors are PRing the latest models hard.  Look closely though and they tend to be trained with different datasets, often in combination with public benchmark data too.  However, if two or three LLMs from different vendors were trained with exactly the same data for long enough we’re willing to bet the performance differences between them might be minimal. Further, we’re seeing researchers push in the other direction, trying to create smaller, less complex open-source LLMs that compare favourably to their much larger commercial counterparts.

Considering this data-centric shift, more complex models in general may not drive performance as much as you think.  The same principles apply to more conventional analytics, machine learning and data science.  Some models we’ve trained also have very small datasets, maybe only 100-200 examples, yet generalise well due to accurately labelled data that is representative of the wild.  Huge datasets are at risk of duplicates or mislabelled examples, essentially significant noise, so it’s about data quality and quantity.  Thankfully we have both in abundance, and our data is of a scale you’re unlikely to find elsewhere.

Expanding our AI Centre of Excellence in Belfast

Rapid7 is further investing in Belfast as part of our new AI Centre of Excellence, encompassing the full range of AI, ML and data science.  We’re on a mission to use data and AI to accelerate threat investigation, detection and response (D&R) capabilities of our Security Operations Centre (SOC).  The AI CoE partners with our data and D&R teams in enabling customers to assess risk, detect threats and automate their security programs.  It ensures AI, ML and data science are applied meaningfully to add customer value, best achieve business objectives and deliver ROI.  Unnecessary complexity is avoided, with a creative, fast-fail, highly iterative approach to accelerate ideas from proof-of-concept to go or no-go.

The group’s make-up is such that our technical skills complement one another; we share our AI, ML and data science knowledge while also contributing on occasion to new external AI policy initiatives with recognised bodies like NIST.  We use, for example, a mix of LLMs, sklearn, PyTorch and more, whilst the team has a track record of publishing award-winning research at the likes of AISec at ACM CCS and with IEEE.  All of this is powered by data, and the AI CoE’s goals are ambitious.

Rapid7’s Data-Centric Approach to AI in Belfast

Interested in joining us?

We are always interested in hearing from people with a desire to be part of something big.  If you want a career move where you can grow and make an impact with data and AI, this is it.  We’re currently recruiting for multiple brand-new data and AI roles in Belfast across various levels of seniority as we expand both teams – and we’d love to hear from you!   Please check out the roles over here!

[$] Kernel-text replication on NUMA systems

Post Syndicated from corbet original https://lwn.net/Articles/956900/

Kernel developers often go out of their way to reduce the memory used by
the kernel itself; that memory is not available for the workloads that
people are actually interested in running on their systems. Lower memory
usage also tends to lead to better performance overall. But there are
times when the expenditure of some extra memory can make the system faster.
The replication of the kernel’s text (executable code) and read-only data
across a NUMA system may be a case in point; patch sets have been posted
adding that capability to two architectures.

Security updates for Friday

Post Syndicated from jake original https://lwn.net/Articles/957005/

Security updates have been issued by Debian (asterisk, chromium, exim4, netatalk, and tomcat9), Fedora (chromium), Gentoo (BlueZ, c-ares, CUPS filters, RDoc, and WebKitGTK+), Oracle (firefox, squid:4, thunderbird, and tigervnc), SUSE (python-aiohttp and python-paramiko), and Ubuntu (linux-intel-iotg).