All posts by Robert Graham

C can be memory-safe

Post Syndicated from Robert Graham original

The idea of memory-safe languages is in the news lately. C/C++ is famous for being the world’s system language (that runs most things) but also infamous for being unsafe. Many want to solve this by hard-forking the world’s system code, either by changing C/C++ into something that’s memory-safe, or rewriting everything in Rust.

Forking is a foolish idea. The core principle of computer-science is that we need to live with legacy, not abandon it.

And there’s no need. Modern C compilers already have the ability to be memory-safe, we just need to make minor — and compatible — changes to turn it on. Instead of a hard-fork that abandons legacy system, this would be a soft-fork that enables memory-safety for new systems.

Consider the most recent memory-safety flaw in OpenSSL. They fixed it by first adding a memory-bounds, then putting every access to the memory behind a macro PUSHC() that checks the memory-bounds:

A better (but currently hypothetical) fix would be something like the following:

size_t maxsize CHK_SIZE(outptr) = out ? *outlen : 0;
This would link the memory-bounds maxsize with the memory outptr. The compiler can then be relied upon to do all the bounds checking to prevent buffer overflows, the rest of the code wouldn’t need to be changed.
An even better (and hypothetical) fix would be to change the function declaration like the following:
int ossl_a2ulabel(const char *in, char *out, size_t *outlen CHK_INOUT_SIZE(out));
That’s the intent anyway, that *outlen is the memory-bounds of out on input, and receives a shorter bounds on output.
This specific feature isn’t in compilers. But gcc and clang already have other similar features. They’ve only been halfway implemented. This feature would be relatively easy to add. I’m currently studying the code to see how I can add it myself. I could just mostly copy what’s done for the alloc_size attribute. But there’s a considerable learning curve, I’d rather just persuade an existing developer of gcc or clang to add the new attributes for me.
Once you give the programmer the ability to fix memory-safety problems like the solution above, you can then enable warnings for unsafe code. The compiler knew the above code was unsafe, but since there’s no practical way to fix it, it’s pointless nagging the programmer about it. With this new features comes warnings about failing to use it.
In other words, it becomes compiler-guided refactoring. Forking code is hard, refactoring is easy.
As the above function shows, the OpenSSL code is already somewhat memory safe, just based upon the flawed principle of relying upon diligent programmers. We need the compiler to enforce it. With such features, the gap is relative small, mostly just changing function parameter lists and data structures to link a pointer with its memory-bounds. The refactoring effort would be small, rather than a major rewrite.
This would be a soft-fork. The memory-bounds would work only when compiled with new compilers. The macro would be ignored on older systems. 
This memory-safety is a problem. The idea of abandoning C/C++ isn’t a solution. We already have the beginnings of a solution in modern gcc and clang compilers. We just need to extend that solution.

I’m still bitter about Slammer

Post Syndicated from Robert Graham original

Today is the 20th anniversary of the Slammer worm. I’m still angry over it, so I thought I’d write up my anger. This post will be of interest to nobody, it’s just me venting my bitterness and get off my lawn!!

Back in the day, I wrote “BlackICE”, an intrusion detection and prevention system that ran as both a desktop version and a network appliance. Most cybersec people from that time remember it as the desktop version, but the bulk of our sales came from the network appliance.

The network appliance competed against other IDSs at the time, such as Snort, an open-source product. For much the cybersec industry, IDS was Snort — they had no knowledge of how intrusion-detection would work other than this product, because it was open-source.

My intrusion-detection technology was radically different. The thing that makes me angry is that I couldn’t explain the differences to the community because they weren’t technical enough.

When Slammer hit, Snort and Snort-like products failed. Mine succeeded extremely well. Yet, I didn’t get the credit for this.

The first difference is that I used a custom poll-mode driver instead of interrupts. This the now the norm in the industry, such as with Linux NAPI drivers. The problem with interrupts is that a computer could handle less than 50,000 interrupts-per-second. If network traffic arrived faster than this, then the computer would hang, spending all it’s time in the interrupt handler doing no other useful work. By turning off interrupts and instead polling for packets, this problem is prevented. The cost is that if the computer isn’t heavily loaded by network traffic, then polling causes wasted CPU and electrical power. Linux NAPI drivers switch between them, interrupts when traffic is light and polling when traffic is heavy.

The consequence is that a typical machine of the time (dual Pentium IIIs) could handle 2-million packets-per-second running my software, far better than the 50,000 packets-per-second of the competitors.

When Slammer hit, it filled a 1-gbps Ethernet with 300,000 packets-per-second. As a consequence, pretty much all other IDS products fell over. Those that survived were attached to slower links — 100-mbps was still common at the time.

An industry luminary even gave a presentation at BlackHat saying that my claimed performance (2-million packets-per-second) was impossible, because everyone knew that computers couldn’t handle traffic that fast. I couldn’t combat that, even by explaining with very small words “but we disable interrupts”.

Now this is the norm. All network drivers are written with polling in mind. Specialized drivers like PF_RING and DPDK do even better. Networks appliances are now written using these things. Now you’d expect something like Snort to keep up and not get overloaded with interrupts. What makes me bitter is that back then, this was inexplicable magic.

I wrote an article in PoC||GTFO 0x15 that shows how my portscanner masscan uses this driver, if you want more info.

The second difference with my product was how signatures were written. Everyone else used signatures that triggered on the pattern-matching. Instead, my technology included protocol-analysis, code that parsed more than 100 protocols.

The difference is that when there is an exploit of a buffer-overflow vulnerability, pattern-matching searched for patterns unique to the exploit. In my case, we’d measure the length of the buffer, triggering when it exceeded a certain length, finding any attempt to attack the vulnerability.

The reason we could do this was through the use of state-machine parsers. Such analysis was considered heavy-weight and slow, which is why others avoided it. State-machines are faster than pattern-matching, many times faster. Better and faster.

Such parsers are now more common. Modern web-servers (nginx, ISS, LightHTTPD, etc.) use them to parse HTTP requests. You can tell if a server does this by sending 1-gigabyte of spaces between “GET” and “/”. Apache gives up after 64k of input. State-machines keep going, because while in that state (“between-method-and-uri”), they’ll accept any number of spaces — the only limit is a timeout. Go read the nginx source-code to understand how this works.

I wrote a paper in PoC||GTFO 0x21 that shows the technique to implement the common wc (word-count) program. A simplified version of this wc2o.c. Go read the code — it’s crazy.

The upshot is that when Slammer hit, most IDSs didn’t have a signature for it. If they didn’t just fall over, what they triggered on were things like “UDP flood”, not “SQL buffer overflow”. This lead many to believe what was happening was DDoS attack. My product correctly identified the vulnerability being exploited.

The third difference with my product was the event coalescer. Instead of a timestamp, my events had a start-time, end-time, and count of the number of times the event triggered.

Other event systems sometimes have this, with such events as “last event repeated 39003 times”, to prevent the system from clogging up with events.

My system was more complex. For one things, an attacker may deliberately intermix events, so it can’t simply be 1 event that gets coalesced this way. For another thing, the attacker could sweep targets or spoof sources. Thus, coalescing needed to aggregate events over address ranges as well as time.

Slammer easily filled a gigabit link with 300,000 packets-per-second. Every packet triggered the signature, thus creating 300,000 events-per-second. No system could handle that. To keep up with the load, events had to be reduced somehow.

My event coalescing logic worked. It reduced the load of events from 300,000 down to roughly 500 events-per-second. This was still a little bit higher load than the system could handle, forwarding to the remote management system. Customers reported that at their consoles, they saw the IDS slowly fall behind, spooling events at the sensor and struggling to ship them up to the management system.

The problem is so accurate that it’s a big flaw in IDS still to this day. Snort often has signatures that throw away the excess data, but it’s still easy to flood them with packets that overload their event logging.

What was exciting for me is that I’d designed all this in theory, tested using artificial cases, unsure how it would stand up to the real world. Watching it stand up to the real world was exciting: big customers saw it successfully work in practice, with the only complaint that at the centralized console, it fell behind a little.

The point is that I made three radical design choices, unprecedented at the time though more normal now, and they worked. And yet, the industry wasn’t technical enough to recognize that it worked.

For example, a few months later I had a meeting at the Pentagon where a Gartner analyst gave a presentation claiming that only hardware-based IDS would work, because software-based IDS couldn’t keep up. Well, these were my customer. I didn’t refute Gartner so much as my customer did, with their techies standing up and pointing out that when Slammer hit, my “software” product did keep up. Gartner doesn’t test products themselves. They rightly identified the problem with other software using interrupts, but couldn’t conceive there was a third alternative, “poll mode” drivers.

I apologize to you, the reader, for subjecting you to this vain bitching, but I just want to get this off my chest.

The RISC Deprogrammer

Post Syndicated from Robert Graham original

I should write up a larger technical document on this, but in the meanwhile is this short (-ish) blogpost. Everything you know about RISC is wrong. It’s some weird nerd cult. Techies frequently mention RISC in conversation, with other techies nodding their head in agreement, but it’s all wrong. Somehow everyone has been mind controlled to believe in wrong concepts.

An example is this recent blogpost which starts out saying that “RISC is a set of design principles”. No, it wasn’t. Let’s start from this sort of viewpoint to discuss this odd cult.

What is RISC?

Because of the march of Moore’s Law, every year, more and more parts of a computer could be included onto a single chip. When chip densities reached the point where we could almost fit an entire computer on a chip, designers made tradeoffs, discarding unimportant stuff to make the fit happen. They made tradeoffs, deciding what needed to be included, what needed to change, and what needed to be discarded.

RISC is a set of creative tradeoffs, meaningful at the time (early 1980s), but which were meaningless by the late 1990s.

The interesting parts of CPU evolution are the three decades from 1964 with IBM’s System/360 mainframe and 2007 with Apple’s iPhone. The issue was a 32-bit core with memory-protection allowing isolation among different programs with virtual memory. These were real computers, from the modern perspective: real computers have at least 32-bit and an MMU (memory management unit).

The year 1975 saw the release of Intel 8080 and MOS 6502, but these were 8-bit systems without memory protection. This was at the point of Moore’s Law where we could get a useful CPU onto a single chip.

In the year 1977 we saw DEC release it’s VAX minicomputer, having a 32-bit CPU w/ MMU. Real computing had moved from insanely expensive mainframes filling entire rooms to less expensive devices that merely filled a rack. But the VAX was way too big to fit onto a chip at this time.

The real interesting evolution of real computing happened in 1980 with Motorola’s 68000 (aka. 68k) processor, essentially the first microprocessor that supported real computing.

But this comes with caveats. Making microprocessor required creative work to decide what wasn’t included. In the case of the 68k, it had only a 16-bit ALU. This meant adding two 32-bit registers required passing them twice through the ALU, adding each half separately. Because of this, many call the 68k a 16-bit rather than 32-bit microprocessor.

More importantly, only the lower 24-bits of the registers were valid for memory addresses. Since it’s memory addressing that makes a real computer “real”, this is the more important measure. But 24-bits allows for 16-megabytes of memory, which is all that anybody could afford to include in a computer anyway. It was more than enough to run a real operating system like Unix. In contrast, 16-bit processors could only address 64-kilobytes of memory, and weren’t really practical for real computing.

The 68k didn’t come with a MMU, but it allowed an extra MMU chip. Thus, the early 1980s saw an explosion of workstations and servers consisting of a 68k and an MMU. The most famous was Sun Microsystems launched in 1982, with their own custom designed MMU chip.

Sun and its competitors transformed the industry running Unix. Many point to IBM’s PC from 1982 as the transformative moment in computer history, but these were non-real 16-bit systems that struggled with more than 64k of memory. IBM PC computers wouldn’t become real until 1993 with Microsoft’s Windows NT, supporting full 32-bits, memory-protection, and pre-emptive multitasking.

But except for Windows itself, the rest of computing is dominated by the Unix heritage. The phone in your hand, whether Android or iPhone, is a Unix computer that inherits almost nothing from the IBM PC.

These 32-bit Unix systems from the early 1980s still lagged behind DEC’s VAX in performance. The VAX was considered a mini-supercomputer. The Unix workstations were mere toys in comparison. Too many tradeoffs were made in order to fit everything onto a single chip, too many sacrifices made.

Some people asked “What if we make different tradeoffs?

Most people thought the VAX was the way of the future, and were all chasing that design. The 68k CPU was essentially a cut down VAX design. But history had anti-VAX designs that worked very differently, notably the CDC 6600 supercomputer from the 1960s and the IBM 801/ROMP processor from the 1970s.

It’s not simply one tradeoff, but a bunch of inter-related tradeoffs. They snowball — each choice you make changes the costs-vs-benefit analysis of other choices, changing them as well.

This is why people can’t agree upon a single definition of RISC. It’s not one tradeoff made in isolation, but a long list of tradeoffs, each part of a larger scheme.

In 1987, Motorola shipped its 68030 version of the 68k processor, chasing the VAX ideal. By then, we had ARM, SPARC, and MIPS processors that significantly outperformed it. Given a budget of roughly 100,000 transistors allowed by Moore’s Law of the time, the RISC tradeoffs were better than VAX-like tradeoffs.

So really, what is RISC?

Let’s define things in terms of 1986, comparing the [ARM, SPARC, MIPS] processors called “RISC” to the [68030, 80386] processors that weren’t “RISC”. They all supported full 32-bit processing, memory-management, and preemptive multitasking operating systems like Unix.

The major ways RISC differed were:

  • fixed-length instructions (32-bits or 4-bytes each)
  • simple instruction decoding
  • horizontal vs. vertical microcode
  • deep pipelines of around 5 stages
  • load/store aka reg-reg
  • simple address modes
  • compilers optimized code
  • more registers

If you are looking for the one thing that defines RISC, it’s the thing that nobody talks about: horizontal microcode.

The VAX/68k/x86 architecture decoded external instructions into internal control ops that were pretty complicated, supporting such things as loops. Each external instruction executed an internal microprogram with a variable number of such operations.

The classic RISC worked differently. Each external instruction decoded into exactly 4 internal ops. Moreover, each op had a fixed purpose:

  1. read from two registers into the ALU (arithmetic-logic unit)
  2. execute a math operation in the ALU
  3. access memory (well, the L1 cache)
  4. write results back into one register

(This explanation has been fudged and simplified, btw).

This internal detail was expressed externally in the instruction set, simplifying decoding. The external instructions specified two registers to read, an ALU opcode, and one register to write. All of this was fit into a constant 32-bits. In contrast, the [68k/x86/VAX] model meant a complex decoding of instructions with a large ROM containing microprograms.

Roughly half (50%) of the 68000’s transistors contained this complex decoding logic and ROM. In contrast, for RISC processors, it was closer to 1%. All those transistors could be dedicated to other things. See how tradeoffs snowball? Saving so many transistors involved in instruction decoding meant being able to support other features elsewhere. It’s not clear this is a benefit, however. This meant that RISC needed multiple instructions to do the same thing as a single [68k/x86/VAX] instruction.

This meant instructions could be deeply pipelined. Instructions could be overlapped. When reading registers for the current instruction, we can simultaneously be fetching the next, and performing the ALU calculation on the previous instruction. The classic RISC pipeline had 5 stages (the 4 mentioned above plus 1 for fetching the next instruction). Each clock cycle would execute part of 5 instructions simultaneous, each at a different stage in the pipeline.

This was called scalar operation, In previous processor, it would take a variable number of clock cycles for an instruction to complete. In RISC, every instruction had 5 clock cycle latency from beginning to end. And since execution was overlapped/pipelined, executing 5 instructions at a time, the throughput was one instruction per clock cycle.

All CPUs are pipelined to some extent, but they need complex interlocks to prevent things from colliding with each other, such as two pipelined instructions trying to read registers at the same time. RISC removed most of those interlocks, by strictly regulation what an instruction could do in each stage of the pipeline. Removing these interlocks reduced transistor count and sped things up. This could be one possible definition of RISC that you never hear of: it got rid of all these interlocks found in other processors.

Some pipeline conflicts were worse. Because pipelining, the results of an instruction won’t be available until many clock cycles later. What if one instruction writes its results to register #5 (r5), and the very next instruction attempts to read from register #5 (r5)? It’s too soon, it has to wait more clock cycles for the result.

The answer: don’t do that. Assembly language programmers need to know this complication, and are told to simply not write code that does this, because then the program won’t work.

This was anathema of the time. Throughout history to this point, each new CPU architecture also had a new operating-system written in assembly language, with many applications written in assembly language. Thus, a programmer-friendly assembly language was considered one of the biggest requirements for any new system. Requiring programmers to know such quirks lead to buggy code was simply unacceptable. Everybody knew that programmer-hostile instruction-sets would never work in the market, even if they performed faster and cheaper.

But technology is littered with what everybody knowing being wrong. In this case, by 1980 we had the  C programming language that was essentially a “portable assembly language” and the Unix operating system written in C. The only people who needed to know about a quirky assembly language were the compiler writers. They would take care of all such problems.

That’s why the history lesson above talks about Unix and real computing. Without Unix and C, RISC wouldn’t have happened. An operating-system written in a high-level language was a prerequisite for RISC. It’s as import an innovation as Moore’s Law allowing 100,000 transistors to fit on a chip.

Because of the lack of complex decoding logic, the transistor budget was freed up to support such things as more registers. The Intel x86 architecture famously had 8 registers, while the RISC competitors typically had as many as 32. The limitation was decode space. It takes 5 bits to specify one of 32 possibilities. Given that most every instructions specified two registers to read from and one register to write to, that’s 15 bits, or half of the instruction space, leaving 17 bits for other purposes.

The creators of the RISC, Hennessy and Patterson, wrote a textbook called a “Computer Architecture: A Quantitative Approach“. It’s horrible. It imagines a world where people need to be taught tradeoffs and transistor budgets. But there is no other approach than a quantitative one, it’s like an economics textbook “Economics: A Supply And Demand Approach”. While the textbook has a weird obsession with quantitative theory, it misses non quantitative tradeoffs, like the fact that RISC couldn’t happen without C and Unix. 

Among the snowballing tradeoffs is the load/store architecture, while at the same time, having fewer addressing modes. It’s here that we need to go back and discuss history — what the heck is an “addressing mode“????

In the beginning, computers had only a single general purpose register, called the accumulator. All calculations, like adding two numbers together, involved reading the second value from memory and combining with the first value already in the accumulator. All calculations, whether arithmetic (add, subtract) or logical (AND, OR, XOR) involved one value already in the register, and another value from memory.

Addresses have to be calculated. For example, when accessing elements in a table, we have to take the row number, multiply it by the size of the table, add an offset into the row for desired column, then add all that to the address at the start of the table. Then after calculating this address, we often want to increment the index to fetch the next row.

If the table base address and row index are held in registers, we might get a complex instructions like the following. This calculates and address using two registers r10 and r11, fetches that value from memory, then adds it into register r9.

 ADD r9, [r10 + r11*8 + 4]

Such calculations embedded in the instruction-set were necessary for such early computers. While they had only a single general purpose register (the accumulator), they still had multiple special purpose registers used this way for address calculations. 

For complex computers like the VAX, such address modes imbedded in instructions were no longer necessary, but still desirable. Half the work of the computer is in calculating memory addresses. It’s very tedious for programmers to do it manually, easier when the instruction-set takes care of common memory access patterns (like accessing cells within a table).

This leads us to the load/store issue.

With many registers, we no longer need to read another value from memory (a reg-mem calculation). We can instead perform the calculation using two registers (reg-reg). The VAX had such reg-reg instructions, but programmers still mostly used the reg-mem instructions with the complex address calculations.

RISC changed this. Calculations were now exclusively reg-reg, where math operations like addition could only operate on registers. To add something from memory, you needed first to load it from memory into a register, using a separate, explicit instruction. Likewise, writing back to memory required an explicit store operation.

This architecture can be called either reg-reg or load/store, with the second name being more popular.

With RISC, addressing modes were still desirable, but now they applied to only the two load and store instructions.

The available addressing modes were constrained by the limited RISC pipeline and limited 32-bit fixed-length instructions. Since the pipeline allowed for the reading of two registers at the start, adding two registers together to form the address was allowed. The example shown above was too complex, though.

What you are supposed to be reading from all of this is that all of these tradeoffs are linked. Each decision that diverges from the ideal VAX-like architecture snowballed into other decisions that drifted further and further from this ideal, until what we had was something that looked nothing like a VAX.

The upshot of these decisions was being able to reduce a 32-bit MMU CPU into roughly a single chip because it needed fewer transistors, while at the same time performing much faster. It required maybe twice as many instructions to perform the same tasks (mostly due to needing more complex address calculations due to lack of addressing modes), but performed them at maybe 5 times faster, for a significant speed up.

At the time, the VAX was the standard benchmark target. When Sun shipped it’s first SPARC RISC systems (the Sun-4), they benchmarked about twice as fast as the latest VAX systems, while being considerably cheaper.

The end of RISC

By the late 1980s, everybody knew that RISC was the future. Sure, Intel continued with its x86 and Motorola with it’s 68000, but that’s because the market wanted backwards compatibility with legacy instruction-sets. Both attempted to build their own RISC alternatives, but failed. When backwards compatibility wasn’t required, everybody created RISC processors, because for 32-bit MMU real computing, they were  better. And everybody knew it.

But of course, everybody was eventually wrong. Even as early as the 80486 in 1989, Intel was converting the innards of the processor into something that looked more RISC-like.

The nail in the coffin came in 1995 with Intel’s Pentium Pro processor that supported out-of-order (or OoO) processing. Again, it wasn’t really a new innovation. Out-of-order instructions first appeared on 1960s era supercomputers from CDC and IBM. This was the first time that transistor budgets allowed it to be practically used on single-chip microprocessors.

Transistor budgets were so high that designers no longer had to make basic painful tradeoffs. The decisions necessary trying to cram everything into 100,000 transistors were longer meaningful when you had more than 1-million transistors to work with. Instruction-set decoding requiring 20k transistors is important with small budgets, but meaningless with large budgets.

With OoO, the microarchitecture inside the chip looks roughly the same, regardless if it’s an Intel x86, ARM, SPARC, or whatever.

This was proven in benchmarks. When Intel released its out-of-order Pentium Pro in 1995, it beat all the competing in-order RISC processors on the market.

Everybody was wrong — RISC wasn’t the future, the future was OoO.

One way of describing the Pentium Pro is that it “translates x86 into RISC-like micro-ops“. What that really means is that instead of vertical microcode, it translated things into horizontal, pipelined micro-ops. Most of the typical math operations were split into two micro-ops, one a load/store operation, and the other a reg-reg operation. (Some x86 instructions need even more micro-ops: address calculation, then load/store, then ALU op).

Intel still has an “x86 tax” decoding complex instructions. But in terms of pipeline stages, that tax only applies to the first stage. Typical OoO processors have at least 10 more stages after that. Even RISC instruction-set processors like ARM must translate external instructions into internal micro-ops.

The only significant difference left is the fact that Intel’s instructions are variable length. The fixed length instructions of RISC means that multiple can be fetched at once, and decoded all in parallel. This is impossible with Intel x86, they must at least partially be decoded serially, one before the next. You don’t know where the next instruction starts until you’ve figured out the length of the current instruction.

Intel and AMD find crafty ways to get around this. For example, AMD has often put hints in its instruction cache (L1I) to so that decoders can know the length of instructions. Intel has played around with “loop caches” (so-called because they are most useful for loops) that track instructions after they’ve been decoded, so they don’t need to be decoded again.

The upshot is that for most code, there’s no inherent difference between x86 and RISC, they have essentially the same internal architecture for out-of-order (OoO) processors. No instruction-set has an inherent advantage over the other.

And it’s been that way since 1995.

I mention this because bizarrely this cult has persisted for the last 30 years after OoO replaced RISC for high-end real computers. It ceased being a useful technical distinction, so what are techies still discussing it?

They persist in believing dumb things, for which no amount of deprogramming is possible. For example, they look at mobile (battery powered) devices and note that they use ARM chips to conserve power. They make the assumption that there must be some sort of inherent power-efficiency advantage.

This isn’t true. These chips consume less power by simply being slower. Fewer transistors mean less power consumption. This meant while desktops/servers used power-hungry OoO processors, mobile phones went back to the transistor budgets of yesteryear, meaning back to in-order RISC.

But as Moore’s Law turned, transistors got smaller, to the point where even mobile phones got OoO chips. They use clever tricks to keep that OoO chip powered down most of the time, often including an in-order chip that runs slower on less power for minor tasks.

We’ve reached the point where mobile and laptops now use the same chips, your MacBook uses (essentially) the same chip as your iPhone, which is the same chip as Apple desktops.

Now Apple’s M1 ARM (and hence RISC) processor is much better at power consumption than it’s older Intel x86 chip, but this isn’t because it’s RISC. Apple did a good job at analyzing what people do on mobile devices like laptops and phones and optimized for that. For example, they added a lot of great JavaScript features, cognizant of the ton of online and semi-offline apps that are written in JavaScript. In contrast, Intel attempts to optimize a chip simultaneously for laptops, desktops, and servers, leading poorly optimizations for laptops.

Apple also does crazy things like putting a high end GPU (graphics processor) on the same chip. This has the effect of making their M1 ARM CPU crazy good for desktops for certain applications, those requiring the sorts of memory-bandwidth normally needed by GPUs.

But overall, x86 chips from AMD and Intel are still faster on desktops and servers.

In addition to the fixed-length instructions providing a tiny benefit, ARM has another key advantage, but it has nothing to do with RISC. When they upgraded their instruction-set to support 64-bit instead of just 32-bit, they went back and redesigned it from scratch. This allowed them to optimize the new instruction-set for the OoO pipeline, such as removing some dependencies that slow things down.

This was something that Intel couldn’t do. When it came time to support 64-bit, AMD simply extended the existing 32-bit instructions. A long sequence of code often looks identical between the 32-bit and 64-bit versions of the x86 instruction-sets, whereas they look completely different on ARM 32-bit vs. 64-bit.

What about RISC-V and ARM-on-servers?

We’ve reached the point in tech where the instruction-set doesn’t matter. It’s not simply that code is written in high-level language. It’s mostly that micro-architectural details have converged.

Take byte-order, for example. Back in the 1980s, most of the major CPUs in the world were big-endian, while Intel bucked the trend being little-endian. The reason is that some engineer made a simple optimization back when the 8008 processor was designed for terminals, and because of backwards compatibility, the poor decision continues to plague x86 today.

Except when it annoys programmers debugging memory dumps, byte-order doesn’t matter. Therefore, all the RISC processors allowed a simple bit to be set to switch processors from big-endian to little-endian mode.

Over time, that has caused everyone to match Intel’s little-endianess, driven primarily by Linux. The kernel itself supports either mode, but a lot of drivers depend upon byte-order, and user-mode programs developed on x86 sometimes have byte-order bugs. As it was ported to architectures like ARM or PowerPC, most of the time it was done in little-endian mode. (You can get PowerPC Linux in big-endian, but the preference is little-endian, because drivers).

The same effect happens even in things that aren’t strictly CPU related, like memory and I/O. The tech stack has converged so that processors look more and more alike except for the instruction-set.

The convergence of architecture is demonstrated most powerfully by Apple’s M1 transition, where they stopped using Intel’s processors in their computers in favor of their custom ARM processor they created for the iPhone.

The MacBook Air M1 looks identical on the outside compared to the immediately preceding x86 MacBook. But more the the point, it performs almost identically running x86 code — it runs x86 code at native x86 speeds but on an ARM CPU. The processors are so similar architecturally that instruction-sets could be converted on the fly — it simply reads the x86 program, converts to ARM transparently on the fly, then runs the ARM version. Previous code translation attempts have incurred massive slowdowns to account for architectural differences, but the M1 cheated by removing any differences that weren’t instruction-set related, allowing smooth translation of the instructions.

Technically, instruction-sets don’t matter, but for business reasons, they still do. Intel and AMD control x86, and prevent others from building compatible processors. ARM lets others build compatible processors (indeed, making no CPUs themselves), but charges them a license fee.

Especially for processors on the low-end, people don’t want to pay license fees.

For that reason, RISC V has become popular. For low-end processors (in-order microcontrollers competing against ARM Cortex Ms) in the 100,000 transistor range, it matters that an instruction-set be RISC. The only free alternative is the aging MIPS. It has annoying quirks, like “delay slots”, which are fixed by RISC V. Since RISC V is an open standard, free of license fees, those designing their own low end processor have adopted it.

For example, nVidia uses RISC V extensively throughout its technology. GPUs contain tiny embedded CPUs to manage things internally. They have ARM licenses, but they don’t want to pay the pennies it would cost for every unit that ARM charges. Likewise, Western Digital (a big hard-drive maker) designed a RISC V core for its drives. 

There are a lot of RISC V fans due to the RISC cult who insist it should go everywhere, but it’s not going anywhere for high-end processors. At the high-end, you are going to pay licensing fees for designs anyway. In other words, while big companies have the resources to design small in-order processors, they don’t have the resources to design big OoO processors, and would therefore buy designs from others.

Amazon’s AWS Graviton is a good example (ARM-based servers). They aren’t licensing the instruction-set from ARM so much as the complete OoO CPU design. They include the ARM cores on a chip of Amazon’s design, having memory, I/O, security features tailored to AWS use cases. Neither the instruction-set architecture or micro-architecture particularly matter to Amazon compared to all the other features of their chips.

Lots of big companies are getting into the custom CPU game, licensing ARM cores. Big tech companies tend to have their own programming language, their own operating systems, their own computer designs, and nowadays their own CPUs. This includes Microsoft, Google, Apple, Facebook, and so on. The advantage of ARM processors (or in the future, possibly RISC V processors) isn’t their RISC nature, or their instruction-sets, but the fact they are big processor designs that others can included with their own chips. There is no inherent power efficiency or speed benefit — only the business benefit.


This blogpost is in reaction to that blogpost I link above. That writer just recycles old RISC rhetoric of the past 30 years, like claiming it’s a “design philosophy”. It, it was a set of tradeoffs meaningful to small in-order chips — the best way of designing a chip with 100,000 transistors.

The term “RISC” has been obsolete for 30 years, and yet this nonsense continues. One reason is the Penisy textbook that indoctrinates the latest college students. Another reason is the political angle, people hating whoever is dominant (in this case, Intel on the desktop). People believe in RISC, people evangelize RISC. But it’s just a cult, it’s all junk. Any conversation that mentions RISC can be improved by removing the word “RISC”.

OoO has replaced RISC as the dominant architecture for CPUs, and it did so in 1995, and ever since then, the terminology “RISC” is obsolete. The only thing you care about when looking at chips is whether it’s an in-order design or an out-of-order design. Well, that’s if you care about theory. If you care about practice, you care about whether it supports your legacy tooling and code. In the real world, whether you use x86 or ARM or MIPS or PowerPC is simply because of legacy market conditions. We still launch rockets to Mars using PowerPC processors because that’s what the market for radiation-hardened CPUs has always used.

DS620slim tiny home server

Post Syndicated from Robert Graham original

In this blogpost, I describe the Synology DS620slim. Mostly these are notes for myself, so when I need to replace something in the future, I can remember how I built the system. It’s a “NAS” (network attached storage) server that has six hot-swappable bays for 2.5 inch laptop drives.

That’s right, laptop 2.5 inch drives. It makes this a tiny server that you can hold in your hand.
The purpose of a NAS is reliable storage. All disk drives eventually fail. If you stick a USB external drive on your desktop for backups, it’ll eventually crash, losing any data on it. A failure is unlikely tomorrow, but a spinning disk will almost certainly fail some time in the next 10 years. If you want to keep things, like photos, for the rest of your life, you need to do something different.
The solution is RAID, an array of redundant disks such that when one fails (or even two), you don’t lose any data. You simply buy a new disk to replace the failed one and keep going. With occasional replacements (as failures happen) it can last decades. My older NAS is 10 years old and I’ve replaced all the disks, one slot replaced twice.

This can be expensive. A NAS requires a separate box in addition to lots of drives. In my case, I’m spending $1500 for a 18-terabytes of disk space that would cost only $400 as an external USB drive. But amortized for the expected 10+ year lifespan, I’m paying $15/month for this home system.
This unit is not just disk drives but also a server. Spending $500 just for a box to hold the drives is a bit expensive, but the advantage is that it’s also a server that’s powered on all the time. I can setup tasks to run on regular basis that would break if I tried to regularly run them on a laptop or desktop computer.
There are lots of do-it-yourself solutions (like the Radaxa Taco carrier board for a Raspberry Pi 4 CM running Linux), but I’m choosing this solution because I want something that just works without any hassle, that’s configured for exactly what I need. For example, eventually a disk will fail and I’ll have to replace it, and I know now that this is something that will be effortless when it happens in the future, without having to relearn some arcane Linux commands that I’ve forgotten years ago.
Despite this, I’m a geek who obsesses about things, so I’m still going to do possibly unnecessary things, like upgrading hardware: memory, network, and fan for an optimized system. Here are all the components of my system:
You can save a bunch of money by going down to 4TB drives (and a 14TB backup USB drive), but I chose the larger 5TB drives.

Disk Drives

The most important reason for choosing this product is the smaller 2.5-inch disk drives (sized for laptops). Otherwise, you should buy one of the larger (much larger) system that’ll holder standard sized drives.
The drives will be largest cost. A 5TB spinning disk costs ~$150, or an 8TB SSD flash costs ~$700. Buying 6 of them is your largest investment. You don’t have to fill up the system, or buy the largest drives, but if you put in the time and effort, you might as well go all the way. On a cost-per-gigabyte, the larger drives seem to be best price.
As you know, there are only three manufacturers remaining for spinning rust drives: Seagate, Western Digital (WD), and Toshiba. Also as you know, laptops have moved away from rotating disks, adopting SSDs instead. Thus, the 2.5 inch form factor for spinning disks is likely dead. For right now, they are a lot cheaper than SSDs, a fifth of the price. In the future, when a drive dies on the array, I’ll likely have to replace it with an SSD, because a replacement spinning disk is no longer available. The SATA SSD itself is eventually going to disappear (to be replaced by NVMe SSDs), but they should still be around a decade from now when I need replacement drives. (I plan on the NAS lasting a decade before I have to upgrade and move the data).
The internal 5TB drives are a bit expensive. One strategy would be to instead buy external USB drives and “shuck” them, removing the USB enclosure to get at the drives themselves. It’s a common strategy when under certain market conditions, external drives are cheaper than internal drives. I tried buying a $100 5TB Western Digital external drive. It didn’t work — it wasn’t a SATA drive in a USB enclosure, but was natively USB on the circuit board. I’m using it as a Raspberry Pi 4 drive instead for storing blockchain info.
Inserting the drive into the 620slim is easy: just pop out the carrier, add the drive, and pop it back in. The carrier comes with little posts on one side that fit the screw holes, meaning you only need to screw in the other side with 2 screws — or you can forgo the screws altogether.
The carriers have locks, to prevent people from accidentally pulling out a drive, but I don’t use them. In 5 years when a drive fails and I need to replace it, I don’t want to go hunting for these keys. The entire strategy I’m using here is that when failure happens, I’ll fix it right away rather than finding reasons to procrastinate. I’ve had to replace 3 failed drives in my previous NAS, and this worked well.


The DS620slim comes with 2-gigabytes of memory, in a single SO-DIMM slot. There’s a second empty SO-DIMM slot. (SO-DIMMs are the smaller form factor for memory that’s intended for notebook computers and tiny servers).
Synology will officially sell you a 4-gig SO-DIMM to put in the empty slot, bringing total memory to 6-gigs.
Unofficially, you can get two of these, using the second to replace the existing 2-gigs, brining it to 8-gigs total.
Even more unofficially, you can go to 16gigs. According to Intel’s official spec sheet for the J3355 CPU, it only supports 8-gigs. Such numbers are usually conservative, reflecting the memory available at the time. When larger capacities appear later, they usually work. Such is the case here, where I put in 16-gigs total using Crucial SO-DIMMs (two 8-gig DIMMs).
I recommend expanding memory here, if only an extra 2gig DIMM to fill that free space. It’s a quick and easy replacement, just unscrew the bottom plate and insert the memory.


The unit only comes with gigabit Ethernet. This can be a bottleneck, so we want to speed that up.
It comes with two Ethernet ports, which support aggregation, but I couldn’t get a speed increase. It seems they’ll speed things up if there are at least two devices talking to the NAS, but won’t speed up when there’s only one client. But then, if you have two clients, then things will slow down anyway, because accesses are no longer sequential.
The solution is to use a faster Ethernet adapter, like 2.5gig, 5gig, or 10gig. There’s no PCIe slot in the device, but it does have USB 3. I can therefore use a 2.5gbps or 5gbps dongle.
I benchmarked the three options, and found the following performance, in mbps (mega-bits per second). This was measured with large sequential transfers, small or random transfers are roughly the same speed, around 350mbps, for all three adapters.

There’s a big jump in performance using the 2.5gbps adapter, but only a marginal increase using the 5gbps adapter.

Synology doesn’t support the adapters directly. To install them, I used the following steps with the following project:
  1. Enable SSH, using (Control Panel -> Terminal). If you are a geek, you’ve already done this.
  2. Go to this GitHub project and download the the r8152-apollolake-2.15.0-5.spk file (from the Releases section) to your local computer. Your DS620slim has an Apollo Lake CPU, so that’s the package we are using.
  3. Use the “Package Center” to do a “Manual” install, and upload this SPK file. If you get an error saying you don’t have permissions, log out and back in. Otherwise, you’ll first get a warning saying the driver isn’t supported by Synology, and eventually you’ll get the error “Failed to install package”. This is supposed to happen.
  4. From the SSH command-line, run the command:
  5. sudo install -m 4755 -o root -D /var/packages/r8152/target/r8152/spk_su /opt/sbin/spk_su
  6. Now repeat the step using “Packet Center” to do a “Manual” install. If you didn’t close the window that you had open, you can just click on the “Done” button a second time and it’ll work.
  7. Now reboot, and plug in the USB adapter.
For 5-gbps, you can use go through the same process to install Aquantia aqc111 drivers. I did this to get a Sabrent NT-SS5G adapter to work.
In practice, when transferring large files, you still aren’t going to be able to exceed 2.5gbps much, so I just use the slower adapter. It’s cheaper and uses a lot less electrical power (a 2.5gbps Ethernet adapter is noticeably cooler than a 5gbps, which is in turn noticeably cooler than 10gbps).


The unit comes with a small fan that by default will run in “quiet” mode, but under load, the noise becomes noticeable. A cheap $15 gets a fan that runs a lot quieter, like a Noctua fan famous for this. Replacing the fan doesn’t require any tools, as it’s held in by rubber thingies.
This allows me to run the fan at a higher speed, with less noise, which keeps everything even cooler. Since I plan on a 10 year lifespan with rotating disks, I figure lower temperatures will be better for longevity.

USB drive backups

RAID6 gives pretty good safety, allowing two drives to fail with no data loss. The term “RAID5” means one redundant disk, the term “RAID6” means two redundant disks.
But you should still do backups. The NAS itself can fail. Or, ransomware can delete all the files. There’s lots of possible failures.
One of the neat things with Synology is that it’s easy to schedule regular backups to an external USB drive.
In my case, I’m using an 18 terabyte USB drive costing $400 for backups. I just schedule it and forget it, backups always happen, and ransomware on Windows machines can delete everything on the NAS but can’t touch the backup.

UPS (Uninterruptable Power Supply)

For a small NAS, I bought a small UPS. This is some weird APC unit that I got on close-out for $100. It’s such a weird little product that I don’t think it was very popular.
It’s a lithium ion UPS. The price for lithium batters, especially LiFePO4, is approaching the point where they are price competitive with traditional lead acid batteries. This is especially true considering that they last longer in UPS applications than lead acid.

File system

Now with hardware out of the way, let’s talk software. Once you insert the drives, plug in the Ethernet, and turn on the power, you access the device with a web browser and configure from there.
There are several choices for how you want to configure RAID and the filesystem.
I chose BTRFS on top of RAID6.
BTRFS is a new Linux filesystem that’s increasingly becoming the default. It’s major feature is that it includes checksums for files as part of their metadata (along with filenames and timestamps). This allows the filesystem to detect when a file has become corrupted, so that the file can be repaired. Bits will rot on hard disk, so files can become corrupted over time even if the files are never written to or read. Scrubbing prevents this from happening. With Synology, I simply configure it to scrub the entire filesystem every month.
This is not “btrfs-raid”, but “btrfs-on-raid6”. BTRFS has some experimental RAID built-in, but it’s buggy and doesn’t really work. Instead, I first create a RAID6 array combining multiple drives into a single virtual drive, then put BTRFS on top of that.
These boxes are designed to allow multiple filesystems to be created, but I create simply the one. I do have multiple “shares”, though, such as for videos and music, but these are still just directories on the same filesystem.
I also occasionally take “snapshots”. I’m not sure how that works since I’ve never restored a snapshot, but in principle it’ll be quicker restoring from backups.


If you are looking for between 16TB and 20TB, for more personal use than a large office, it’s rather perfect. Yea, it’ll be 4 times more expensive than just getting an external USB drive, but it’s RAID and it’s own server.
It’s so cute I got a second one and filled it with 2TB SSDs, for database accesses that spend a lot of time searching through large database of poorly indexed data (like password dumps).

No, a researcher didn’t find Olympics app spying on you

Post Syndicated from Robert Graham original

For the Beijing 2022 Winter Olympics, the Chinese government requires everyone to download an app onto their phone. It has many security/privacy concerns, as CitizenLab documents. However, another researcher goes further, claiming his analysis proves the app is recording all audio all the time. His analysis is fraudulent. He shows a lot of technical content that looks plausible, but nowhere does he show anything that substantiates his claims.

Average techies may not be able to see this. It all looks technical. Therefore, I thought I’d describe one example of the problems with this data — something the average techie can recognize.

His “evidence” consists screenshots from reverse-engineering tools, with red arrows pointing to the suspicious bits. An example of one of these screenshots is this on:

This screenshot is that of a reverse-engineering tool (Hopper, I think) that takes code and “disassembles” it. When you dump something into a reverse-engineering tool, it’ll make a few assumptions about what it sees. These assumptions are usually wrong. There’s a process where the human user looks at the analyzed output, does a “sniff-test” on whether it looks reasonable, and works with the tool until it gets the assumptions correct.
That’s the red flag above: the researcher has dumped the results of a reverse-engineering tool without recognizing that something is wrong in the analysis.

It fails the sniff test. Different researchers will notice different things first. Famed google researcher Tavis Ormandy points out one flaw. In this post, I describe what jumps out first to me. That would be the ‘imul’ (multiplication) instruction shown in the blowup below:

It’s obviously ASCII. In other words, it’s a series of bytes. The tool has tried to interpret these bytes as Intel x86 instructions (like ‘and’, ‘insd’, ‘das’, ‘imul’, etc.). But it’s obviously not Intel x86, because those instructions make no sense.

That ‘imul’ instruction is multiplying something by the (hex) number 0x6b657479. That doesn’t look like a number — it looks like four lower-case ASCII letters. ASCII lower-case letters are in the range 0x61 through 0x7A, so it’s not the single 4-byte number 0x6b657479 but the 4 individual bytes 6b 65 74 79, which map to the ASCII letters ‘k’, ‘e’, ‘t’, ‘y’ (actually, because “little-endian”, reverse order, so “ytek”).

No, no. Techies aren’t expected to be able to read hex this way. Instead, we are expected to recognize what’s going on. I just used a random website to interpret hex bytes as ASCII.

There are 26 lower case letters, roughly 10% of the 256 possible values for a byte. Thus, the chance that a random 4 byte number will consist of all lower-case letters is 1-in-10,000. Moreover, multiplication by strange constants happens even more rarely. You’ll commonly see multiplications by small numbers like 48, or large well-formed numbers like 0x1000000. You pretty much never see multiplication by a number like 0x6b657479, baring something rare like an LCG.

QED: this isn’t actually an Intel x86 imul instruction, it’s ASCII text that the tool has tried to interpret as x86 instructions.


At first glance, all those screenshots by the researcher look very technical, which many will assume supports his claims. But when we actually look at them, none of them support his claims. Instead, it’s all just handwaving nonsense. It’s clear the researchers doesn’t understand them, either.

Journalists: stop selling NFTs that you don’t understand

Post Syndicated from Robert Graham original

The reason you don’t really understand NFTs is because the journalists describing them to you don’t understand them, either. We can see that when they attempt to sell an NFT as part of their stories (e.g. AP and NYTimes). They get important details wrong.

The latest is magazine selling an NFT. As libertarians, you’d think at least they’d get the technical details right. But they didn’t. Instead of selling an NFT of the artwork, it’s just an NFT of a URL. The URL points to OpenSea, which is known to remove artwork from its site (such as in response to DMCA takedown requests).

If you buy that NFT, what you’ll actually get is a token pointing to:

This is just the metadata, which in turn contains a link to the claimed artwork:

If either OpenSea or Google removes the linked content, then any connection between the NFT and the artwork disappears.

It doesn’t have to be this way. The correct way to do NFT artwork is to point to a “hash” instead which uniquely identifies the work regardless of where it’s located. That $69 million Beeple piece was done this correct way. It’s completely decentralized. If the entire Internet disappeared except for the Ethereum blockchain, that Beeple NFT would still work.

This is an analogy for the entire blockchain, cryptocurrency, and Dapp ecosystem: the hype you hear ignores technical details. They promise an entirely decentralized economy controlled by math and code, rather than any human entities. In practice, almost everything cheats, being tied to humans controlling things. In this case, the “ NFT artwork” is under control of OpenSea and not the “owner” of the token.

Journalists have a problem. NFTs selling for millions of dollars are newsworthy, and it’s the journalists place to report news rather than making judgements, like whether or not it’s a scam. But at the same time, journalists are trying to explain things they don’t understand. Instead of standing outside the story, simply quoting sources, they insert themselves into the story, becoming advocates rather than reporters. They can no longer be trusted as an objective observers.

From a fraud perspective, it may not matter that the NFT points to a URL instead of the promised artwork. The entire point of the blockchain is caveat emptor in action. Rules are supposed to be governed by code rather than companies, government, or the courts. There is no undoing of a transaction even if courts were to order it, because it’s math.

But from a journalistic point of view,  this is important. They failed at an honest description of what actually the NFT contains. They’ve involved themselves in the story, creating a conflict of interest. It’s now hard for them to point out NFT scams when they themselves have participated in something that, from a certain point of view, could be viewed as a scam.

Example: forensicating the Mesa County system image

Post Syndicated from Robert Graham original

Tina Peters, the election clerk in Mesa County (Colorado) went rogue and dumped disk images of an election computer on the Internet. They are available on the Internet via BitTorrent [Mesa1][Mesa2], The Colorado Secretary of State is now suing her over the incident.

The lawsuit describes the facts of the case, how she entered the building with an accomplice on Sunday, May 23, 2021. I thought I’d do some forensics on the image to get more details.

Specifically, I see from the Mesa1 image that she logged on at 4:24pm and was done acquiring the image by 4:30pm, in and (presumably) out in under 7 minutes.

In this blogpost, I go into more detail about how to get that information.

The image

To download the Mesa1 image, you need a program that can access BitTorrent, such as the Brave web browser or a BitTorrent client like qBittorrent. Either click on the “magnet” link or copy/paste into the program you’ll use to download. It takes a minute to gather all the “metadata” associated with the link, but it’ll soon start the download:

What you get is file named EMSSERVER.E01. This is a container file that contains both the raw disk image as well as some forensics metadata, like the date it was collected, the forensics investigator, and so on. This container is in the well-known “EnCase Expert Witness” format. EnCase is a commercial product, but its container format is a quasi-standard in the industry.

Some freeware utilities you can use to open this container and view the disk include “FTK Imager”, “Autopsy”, and on the Linux command line, “ewf-tools”.

However you access the E01 file, what you most want to look at is the Windows operating-system logs. These are located in the directory C:\Windows\system32\winevtx. The standard Windows “Event Viewer” application can load these log files to help you view them.

When inserting a USB drive to create the disk image, these event files will be updated and written to that disk before the image was taken. Thus, we can see in the event files all the events that happen right before the disk image happens.

Disk image acquisition

Here’s what the event logs on the Mesa1 image tells us about the acquisition of the disk image itself.

The person taking the disk image logged in at 4:24:16pm, directly to the console (not remotely), on their second attempt after first typing an incorrect password. The account used was “emsadmin”. Their NTLM password hash is 9e4ec70af42436e5f0abf0a99e908b7a. This is a “role-based” account rather than an individual’s account, but I think Tina Peters is the person responsible for the “emsadmin” roll.

Then, at 4:26:10pm, they connected via USB a Western Digital  “easystore™” portable drive that holds 5-terabytes. This was mounted as the F: drive.

The program “Access Data FTK Imager” was run from the USB drive (F:\FTK Imager\FTK Imager.exe) in order to image the system. The image was taken around 4:30pm, local Mountain Time (10:30pm GMT).

It’s impossible to say from this image what happened after it was taken. Presumably, they immediately hit “eject” on the drive, logged off, and disconnected the hard drive. Thus from beginning to end, it took about 7 minutes to take the image once they sat down at the computer.

Dominion Voting Systems

The disk image is that of a an “EMS Server” part of the Dominion Voting Suite. This is a server on an air-gapped network (not connected to any other network) within the count offices.

Most manuals for Colorado are online, though some bits and pieces are missing, and can be found in documents posted to other state’s websites (though each state does things a little different, so such cross referencing can’t be completely trusted).

The locked room with an air-gapped network  you see in the Mesa County office appears to look like the following, an “EMS Standard” configuration (EMS stands for Election Management System).

This small network is “air gapped”, meaning there is no connection from this network to any other network in the building, nor out to the Internet. By looking at the logs from the Mesa1 image, we can see what this network looks like:

  • The EMS Server is named “EMSERVER” with IP address and MAC address 44-A8-42-30-01-5D. The hard drive matches Dominion’s specs: a 1-terabyte boot drive (C:) and a 2-terabyte data drive (D:) that is shared with the rest of the network as \\EMSERVER\NAS. This also acts as the network’s DHCP and DNS server.
  • At least one network printer, model Dell E310dw.
  • Two EMS Workstations (EMSCLIENT01 and EMSCLIENT02). This is where users spend most of their time, before an election to create the ballots, and after all the ballots have been counted to construct the final tally.
  • Four ImageCast Central (ICC) (ICC01 – ICC04) scanners, for automatically scanning and tabulating ballots.
  • Two Adjudication Workstations (ADJCLIENT01 and ADJCLIENT03). These are used when the scanners reject ballots, such as when somebody does a write-in candidate, or marks two candidates. Humans need to get involved to make the final judgement on what the ballot actually says.

Note this isn’t the machines you’d expect to see in a precinct when you vote (which would be “ballot marking devices” predominantly). These are the machines in the back office that count the votes and store the official results.


What we see here is that “system logs” can tell us a lot of interesting things about the system. There’s good reason to retain them in the future.

On the other hand, they generally can’t answer the most important question: whether the system was hacked and votes flipped.

Mike Lindell claims to have “Absolute Proof” that Chinese hackers flipped votes throughout the country, including Maricopa County. If so, this would’ve been the system that the Chinese hackers would’ve hacked. Yet, in the system image, there is no evidence of this. By this, I mean the Mesa1 image, the one from before the system logs were deleted (obviously, there would be nothing in the Mesa2 image).

This lack of hacking evidence in the logs isn’t proof that it didn’t happen, though. The fact is, the logs aren’t comprehensive enough to record most hacks, and the hackers could’ve deleted the logs anyway. That’s why system logs aren’t considered “election records” and that laws don’t mandate keeping them: they could have some utility, as I’ve shown above, but they really wouldn’t show the things that we most want to know.

Debunking: that Jones Alfa-Trump report

Post Syndicated from Robert Graham original

The Alfa-Trump conspiracy-theory has gotten a new life. Among the new things is a report done by Democrat operative Daniel Jones [*]. In this blogpost, I debunk that report.

If you’ll recall, the conspiracy-theory comes from anomalous DNS traffic captured by cybersecurity researchers. In the summer of 2016, while Trump was denying involvement with Russian banks, the Alfa Bank in Russia was doing lookups on the name “”. During this time,  additional lookups were also coming from two other organizations with suspicious ties to Trump, Spectrum Health and Heartland Payments.

This is certainly suspicious, but people have taken it further. They have crafted a conspiracy-theory to explain the anomaly, namely that these organizations were secretly connecting to a Trump server.

We know this explanation to be false. There is no Trump server, no real server at all, and no connections. Instead, the name was created and controlled by Cendyn. The server the name points to for transmitting bulk email and isn’t really configured to accept connections. It’s built for outgoing spam, not incoming connections. The Trump Org had no control over the name or the server. As Cendyn explains, the contract with the Trump Org ended in March 2016, after which they re-used the IP address for other marketing programs, but since they hadn’t changed the DNS settings, this caused lookups of the DNS name.

This still doesn’t answer why Alfa, Spectrum, Heartland, and nobody else were doing the lookups. That’s still a question. But the answer isn’t secret connections to a Trump server. The evidence is pretty solid on that point.

Daniel Jones and Democracy Integrity Project

The report is from Daniel Jones and his Democracy Integrity Project.

It’s at this point that things get squirrely. All sorts of right-wing sites claim he’s a front for George Soros, funds Fusion GPS, and involved in the Steele Dossier. That’s right-wing conspiracy theory nonsense.

But at the same time, he’s clearly not an independent and objective analyst. He was hired to further the interests of Democrats.

If the data and analysis held up, then partisan ties wouldn’t matter. But they don’t hold up. Jones is clearly trying to be deceptive.

The deception starts by repeatedly referring to the “Trump server”. There is no Trump server. There is a Listrak server operated on behalf of Cendyn. Whether the Trump Org had any control over the name or the server is a key question the report should be trying to prove, not a premise. The report clearly understands this fact, so it can’t be considered a mere mistake, but a deliberate deception.

People make assumptions that a domain name like “” would be controlled by the Trump organization. It’s wasn’t. When Trump Hotels hired Cendyn to do marketing for them, Cendyn did what they normally do in such cases, register a domain with their client’s name for the sending of bulk emails. They did the same thing with,,, and so on. What clear is that the Trump organization had no control, no direct ties to this domain until after the conspiracy-theory hit the press.

Finding #1 – Alfa Bank, Spectrum Health, and Heartland account for nearly all of the DNS lookups for in the May-September timeframe.

Yup, that’s weird and unexplained.

But it concludes from this that there were connections, saying the following:

In the DNS environment, if “computer X” does a DNS look-up of “Computer Y,” it means that “Computer X” is trying to connect to “Computer Y”.

This is false. That’s certainly the assumption we usually make, that it’s probably true in most cases. But it’s not something we insist upon if there’s reason to doubt it. And since there’s reason to doubt it here, we would need more evidence to make that conclusion.

For example, before the contract was canceled in March 2016, there were DNS lookups for the “” name from all over the place. That’s because the Listrak server was pumping out bulk emails (“spam”) promoting Trump Hotels. Servers receiving the emails would often check the identity of the server through DNS lookups, but without any attempt to connect. This fact is footnoted in the Jones report even as it claims otherwise in the main text.

Obviously, that’s no longer the case after March 2016, when the contract was canceled. But if Cendyn repurposes the server for something else, such lookups can still happen without connections. The DNS records hadn’t changed. So if the server sends out new things from that IP address, unrelated to Trump Org, it’d still cause DNS lookups for the “” domain to happen. It wouldn’t mean anybody was trying to connect to the server.

This is indeed what Cendyn claims, that they repurposed the resources for their hotel meetings app (whereby hotels can schedule conferences and things on their premises).

It’s still suspicious that only those three organizations were involved, but at the same time, it’s clearly false to assume this is evidence of connections.

Finding #2 – Comparison with

The Jones report compared the DNS logs of with the domain of another of Cendyn’s client, Denihan. Cendyn registered the domain This is another hotel company.

This comparison was obviously bogus. The contract with Cendyn ended in March 2016, after which Cendyn claims it repurposed the server. Jones uses the timeframe August 2016 through September 2016 to compare traffic for those two domains. Of course they’d be different. A valid comparison would be a t timeframe before March 2016, when both were clients of Cendyn.

Since Jones documents the fact the contract between Cendyn and Trump Org was ended, they are knowingly comparing an apple to an orange. Thus, it’s not a mistake but a deception.

This also points to the fundamental problem with the data-set. We don’t really have a full picture of what happened, such as data going back to 2015. We have a carefully curated subset of the data designed to show just what they want us to see.

Everything points to domain and Listrak servers being just normal Cendyn stuff used for Cendyn’s purposes. As far as we can tell, that domain worked the same as other Cendyn clients, such as,,, and so on. These domains are controlled by Cendyn, not their client’s. Cendyn in turn points those names at Listrak servers for sending bulk email.

Finding #3 – Missing SPF record

The Jone’s report points to missing SPF records, showing that the server is not configured correctly for sending mass emails. It includes this exhibit.

But a review shows that this is the same configuration as for other Cendyn/Listrak bulk email servers. For example, compared to, we find it’s configured the same:

The SPF and DMARC standards were not as widely used in 2016, so misconfigurations were common. Moreover, the domains also lacked a DMARC record. Without DMARC, despite SPF being bad, many receivers won’t reject the emails.
Listrak/Cendyn still fail to have proper DMARC records for their clients, which means that some of their bulk email is getting rejected. They should probably fix that. This doesn’t mean Listrak/Cendyn aren’t in the bulk email business, only that they could be better at it.
Thus, we’ve shown that had the perfectly normal Cendyn SPF records. Far from proving this isn’t a bulk email server, the consistency with Cendyn’s normal configuration proves unequivocally that it is.
Finding #4 – Accepts emails only from specific senders
The Jones report shows that the server in question ( accepts incoming email, but rejects email from the public, accepting email only from specific senders. They assume the specific senders would be those from Alfa Bank, Spectrum, and Heartland.
Again, they don’t compare properly to other Cendyn/Listrak systems. If they had, they’d have found that they all are configured the same way. There’s an entire subnet of servers you can test this way:

All these servers show the same messages, allowing incoming email connections but not incoming email messages.

This is a vestigial configuration common to bulk email senders. Spammers only send email. One way to test if somebody is spammer is to connect back. This configuration makes it appear they’ll accept email even if they won’t, passing the test.
In no way is this evidence of secret communications. It’s not evidence of their claim that somehow Alfa Bank, Spectrum Health, and Heartland would be on the list of allowed senders. We would need additional evidence to make that claim, not an assumption.
Finding #5 – Evidence of human interaction and coordination
The report claims a direct link between Alfa and Trump with the following:
On September 23, 2016, two days after The New York Times approached Alfa Bank, the Trump Organization deleted the email server “” … it would have been a deliberate human action taken by a someone working on behalf of the Trump Organization and not by Alfa Bank. An analyst, quoted in the Slate article by Franklin Foer, observed that “the knee was struck in Moscow, and the leg kicked in New York.”
This ‘finding’ is an excellent demonstration of how to identify conspiracy-theories: anomalies that cannot otherwise be explained become proof of the conspiracy. After all, the conspiracy-theory can explain everything.
When I debunked the Alfa-Trump thing back in 2017, reporters grilled me on this specific point. They demanded I come up with an explanation for this coincidence. I told them I had none, but just because I didn’t have one, it didn’t mean it was proof of the conspiracy theory. There could be lots of explanations, just because we don’t know them doesn’t mean they don’t exist. Just because the conspiracy-theory explains it doesn’t mean this is evidence for the conspiracy.
But now we do have another explanation: the FBI called Cendyn on the morning of September 23 and asked them about the domain. As the agent reported back:
“Followed up this morning with Central Dynamics [Cendyn] who confirmed that the domain is an old domain that was set up in approximately 2009 when they were doing business with the Trump Organization that was never used.” — *
Thus, it’s not NYT contacting Alfa Bank that caused the deletion, it’s the FBI calling Cendyn. Thus, there’s no evidence Alfa Bank or Trump Org were even involved. The evidence is quite clear that only Cendyn was involved.
After Cendyn deletes the domain “”, lookups of that name started to fail. The Jones report notes that Alfa Bank then switched to “”. It weaves this in to the conspiracy thusly:
The fact that Alfa Bank was the first entity (IP address) to conduct a DNS look-up for “” in the data-set could indicate that someone at Alfa Bank was in some manner made aware of the new Trump Organization server name.
The name “” is part of Cendyn’s infrastructure. For their “” domains, there’s a matching “” domain. We can see test that live right now:

This is totally consistent with Cendyn’s re-use of the infrastructure for a new purpose, as it would treat both domain names the same. Rather than evidence suggesting human interaction, it’s evidence suggesting the opposite, that there was no human interaction.

6. The Mandiant report doesn’t refuted these findings
After this thing hit the news, Alfa Bank hired Mandiant to come to their offices and investigate. Their report was inconclusive. They didn’t find anything.
Note the difference in language. Things Mandiant can’t explain demonstrates Mandiant’s incompetence, while things Jones can’t explain prove the conspiracy-theory. If Mandiant’s report should be treated as inclusive and proof of nothing, then so too should the Jones report. The Jones report has even less evidence than the Mandiant report.
7. The public statements by Trump et al. are contradictory and incomplete

The Trump Org, Alfa, and Spectrum Health have no idea what happened. Their statements are consistent with knowing they don’t have secret communications, but not knowing where this DNS data came from. They are unable to refute the allegations, but at the same time, are concerned for their reputations, and behave accordingly. Which, of course, means the guess at what’s going on with more confidence than is warranted.
If there were secret communications among them, you’d expect they’d do a better job at coordinating their stories.

In this blogpost, I’ve refuted all the findings of the Jones report. There is still the question where this DNS anomaly came from, but the allegation that this proves a secret connect between Alfa Bank and a Trump server is clearly false.
Moreover, I’ve shown that the Jones report is not merely wrong, but deliberately deceptive. They repeatedly reference a “Trump Organization Server” even though it’s quite clear from the text they know that no such server exists.
For example, when Cendyn removed the “” DNS record, it was described as the “Trump Organization deleted the email server”. It’s clear they know that Cendyn simply removed the record, and that the Listrak server wasn’t touched. Yet, they deliberate phrase things this way in order to deceive.
What we have is Alfa Bank doing DNS queries. What we don’t have is any connection to the Trump Org. Since Jones couldn’t create the conclusion based on evidence that Trump Org was involve, he instead made it the premise.
This in turn makes it easy to disprove the entire Jones report: since there’s not only no evidence of Trump Org involvement, and quite a lot of evidence Trump Org had no control over the domain or servers, it disprove the entire theory that there was secret connections with Alfa Bank.

Review: Dune (2021)

Post Syndicated from Robert Graham original

One of the most important classic sci-fi stories is the book “Dune” from Frank Herbert. It was recently made into a movie. I thought I’d write a quick review.

The summary is this: just read the book. It’s a classic for a good reason, and you’ll be missing a lot by not reading it.

But the movie Dune (2021) movie is very good. The most important thing to know is see it in IMAX. IMAX is this huge screen technology that partly wraps around the viewer, and accompanied by huge speakers that overwhelm you with sound. If you watch it in some other format, what was visually stunning becomes merely very pretty.

This is Villeneuve’s trademark, which you can see in his other works, like his sequel to Bladerunner. The purpose is to marvel at the visuals in every scene. The story telling is just enough to hold the visuals together. I mean, he also seems to do a good job with the story telling, but it’s just not the reason to go see the movie. (I can’t tell — I’ve read the book, so see the story differently than those of you who haven’t).

Beyond the story and visuals, many of the actor’s performances were phenomenal. Javier Bardem’s “Stilgar” character steals his scenes. Stellan Skarsgård exudes evil. The two character actors playing the mentats were each perfect. I found the lead character (Timothée Chalamet) a bit annoying, but simply because he is at this point in the story.

Villeneuve’s splits the book into two parts. This movie is only the first part. This presents a problem, because up until this point, the main character is just responding to events, not the hero who yet drives the events. It doesn’t fit into the traditional Hollywood accounting model. I really want to see the second film even if the first part, released in the post-pandemic turmoil of the movie industry, doesn’t perform well at the box office.

In short, if you haven’t read the books, I’m not sure how well you’ll follow the storytelling. But the visuals (seen at IMAX scale) and the characters are so great that I’m pretty sure most people will enjoy the movie. And go see it on IMAX in order to get the second movie made!!

Fact check: that "forensics" of the Mesa image is crazy

Post Syndicated from Robert Graham original

Tina Peters, the elections clerk from Mesa County (Colorado) went rogue, creating a “disk-image” of the election server, and posting that image to the public Internet. Conspiracy theorists have been analyzing the disk-image trying to find anomalies supporting their conspiracy-theories. A recent example is this “forensics” report. In this blogpost, I debunk that report.

I suppose calling somebody a “conspiracy theorist” is insulting, but there’s three objective ways we can identify them as such.

The first is when they use the logic “everything we can’t explain is proof of the conspiracy“. In other words, since there’s no other rational explanation, the only remaining explanation is the conspiracy-theory. But there can be other possible explanations — just ones unknown to the person because they aren’t smart enough to understand them. We see that here: the person writing this report doesn’t understand some basic concepts, like “airgapped” networks.

This leads to the second way to recognize a conspiracy-theory, when it demands this one thing that’ll clear things up. Here, it’s demanding that a manual audit/recount of Mesa County be performed. But it won’t satisfy them. The Maricopa audit in neighboring Colorado, whose recount found no fraud, didn’t clear anything up — it just found more anomalies demanding more explanation. It’s like Obama’s birth certificate. The reason he ignored demands to show it was that first, there was no serious question (even if born in Kenya, he’d still be a natural born citizen — just like how Cruz was born in Canada and McCain in Panama), and second, showing the birth certificate wouldn’t change anything at all, as they’d just claim it was fake. There is no possibility of showing a birth certificate that can be proven isn’t fake.

The third way to objectively identify a conspiracy theory is when they repeat objectively crazy things. In this case, they keep demanding that the 2020 election be “decertified”. That’s not a thing. There is no regulation or law where that can happen. The most you can hope for is to use this information to prosecute the fraudster, prosecute the elections clerk who didn’t follow procedure, or convince legislators to change the rules for the next election. But there’s just no way to change the results of the last election even if wide spread fraud is now proven.

The document makes 6 individual claims. Let’s debunk them one-by-one.

#1 Data Integrity Violation

The report tracks some logs on how some votes were counted. It concludes:

If the reasons behind these findings cannot be adequately explained, then the county’s election results are indeterminate and must be decertified.

This neatly demonstrates two conditions I cited above. The analyst can’t explain the anomaly not because something bad happened, but because they don’t understand how Dominion’s voting software works. This demand for an explanation is a common attribute of conspiracy theories — the ignorant keep finding things they don’t understand and demand somebody else explain them.

Secondly, there’s the claim that the election results must be “decertified”. It’s something that Trump and his supporters believe is a thing, that somehow the courts will overturn the past election and reinstate Trump. This isn’t a rational claim. It’s not how the courts or the law works or the Constitution works.

#2 Intentional purging of Log Files

This is the issue that convinced Tina Peters to go rogue, that the normal Dominion software update gets rid of all the old system-log files. She leaked two disk-images, before and after the update, to show the disappearance of system-logs. She believes this violates the law demanding the “election records” be preserved. She claims because of this, the election can’t be audited.

Again, we are in crazy territory where they claim things that aren’t true. System-logs aren’t considered election records by any law or regulation. Moreover, they can’t be used to “audit” an election.

Currently, no state/county anywhere treats system-logs as election records (since they can’t be used for “audits”). Maybe this should be different. Maybe you can create a lawsuit where a judge rules that in future elections they must be treated as election records. Maybe you can convince legislatures to pass laws saying system-logs must be preserved. It’s not crazy to say this should be different in the future, it’s just crazy to say that past system-logs were covered under the rules.

And if you did change the rules, the way to preserve them wouldn’t be to let them sit on the C: boot-drive until they eventually rot and disappear (which will eventually happen no matter what). Instead, the process to preserve them would be to copy them elsewhere. The way Dominion works is that all election records that need to be preserved are copied over to the D: data drive.

Which means, by the way, that this entire forensics report is bogus. The Mesa disk image was only of the C: boot-drive, not of the D: data drive. Thus, it’s unable to say which records/logs were preserved or not. Everyone knows that system-logs probably weren’t, because they aren’t auditable election records, so you can still make the claim “system-logs weren’t preserved”. It’s just that you couldn’t make that claim based on a forensics of the C: boot-drive. Again, we are in crazy statements territory that identify something as a conspiracy-theory, weird claims about how reality works.

System-logs cannot be used to audit the vote. That’s confusing the word “audit” with “forensics”. The word “audit” implies you are looking for a definitive result, like whether the vote count was correct, or whether all procedures were followed. Forensics of system-logs can’t tell you that. Instead, they can only lead to indeterminate results.

That’s what you see here. This “forensics” report cannot make any definitive statement based upon the logs. It can find plenty of anomalies, meaning things the forensics investigator can’t understand. But none of that is positive proof of anything. If a hacker had flipped votes on this system, it’s unlikely we would have seen evidence in the log.

#3 Evidence of network connection

The report claims the computer was connected to a network. Of course this is true — it’s not a problem. The network was the one shown in the diagram below:

Specifically, this Mesa image was of the machine labeled “EMS Server” in the above diagram. From my forensics of the network logs, I can see that there are other computers on this network:

  1. Four ICC workstations (named ICC01 through ICC04)
  2. Two Adjudication Workstations (named ADJCLIENT01 and ADJCLINET03, I don’t know what happened to number 2).
  3. Two EMS Workstations (named EMSCLIENT01 and EMSCLIENT02).
  4. A printer, model Dell E310dw.
The word “airgapped” doesn’t mean the EMS Server is airgapped from any network, but that this entire little network is airgapped from anything else. The security of this network is physical security, the fact that nobody can enter the room who isn’t authorized.
I did my own forensics on the Mesa image and could find none of the normal signs that the server accessed the Internet, and pretty good evidence that most of the time, it was unconnected (it gets mad when it can’t find the Internet and produces logs stating this). This doesn’t mean I proved conclusively no Internet connection was ever made. It’s possible that somebody will find some new thing in that image that shows an Internet connection. It’s just that currently, there’s no reason to believe the “airgap” guarantee of security was violated.
The claimed evidence about the “Microsoft Report Server” is wrong.

#4 Lack of Software Updates
This is just stupid. The cybersecurity community does have this weird fetish demanding that every software update be applied immediately, but there’s good reasons why they aren’t, and ways of mitigating the security risk when they can’t be applied.
Software updates sometimes break things. In sensitive environments where computers must be absolutely predictable, they aren’t applied. This includes live financial systems, medical equipment, and industrial control systems.
This also includes elections. It’s simply not acceptable canceling or delaying an election because a software update broke the computer.
This is why Dominion does what they call a “Trusted Build” process that wipes out the boot-drive (deleting system-logs). To update software, they build an entire new boot image with all the software in a tested, known state. They then apply that boot disk image to all the county machines, which replaces everything on the C: boot-drive with a new version of Windows and all the software. This leaves the D: data drive untouched, where the records are preserved.
If you didn’t do things this way, then sometimes elections will fail.
This is also why having an “airgapped” network is important. The voting machines aren’t going to have software updates regularly applied, so they need to be protected. Firewalls would also be another mitigation strategy.

#5 Existence of SQL Server Management Studio.
This is just a normal part of having an SQL server installed.
Yes, in theory it would make it easy for somebody to change records in the database. But at the same time, such a thing is pretty easy even without SSMS installed. One way is command-line scripts.
#6 Referential Integrity
This “referential integrity” is a reliability concern, not an anti-hacking measure. It just means hackers would need only an extra step if they wanted to delete or change records.

Evidence is something that the expert understands. It’s something they can show, explain, and defend against challengers.
This report contained none of that. It contained instead anomalies the writer couldn’t explain.
Note that this doesn’t mean they weren’t an expert. Obviously, they needed enough expertise to get as far as they did. It’s just a consequence of conspiracy-theories. When searching for proof of your conspiracy-theory when there is none, it means going off into the weeds past your are of expertise.
Give that forensics image to any expert, and they’ll find anomalies they can’t explain. That includes me, I’ve posted some of them to Twitter and had other experts explain them to me. The difference is that I attributed the lack of an explanation to my own ignorance, not a conspiracy.
At some point, we have to call out conspiracy-theories for what they are. This isn’t defending the integrity of elections. If it were, it’d be proposing solutions for future elections. Instead, it’s an attack on the integrity of elections, fighting the peaceful transfer of power by unfounded conspiracy-theory claims.
And we can say this objectively. As I stated above, there’s three objective tests. These are:
  • Anomalies that can’t be explained are claimed to be evidence — when in fact they come from simple ignorance.
  • Demands that something needs explaining, when it really doesn’t, and which won’t satisfy them anyway.
  • Statements of a world view (like that the election can be “decertified” or that system-logs are “election records”) that nobody agrees with.

100 terabyte home NAS

Post Syndicated from Robert Graham original

So, as a nerd, let’s say you need 100 terabytes of home storage. What do you do?

My solution would be a commercial NAS RAID, like from Synology, QNAP, or Asustor. I’m a nerd, and I have setup my own Linux systems with RAID, but I’d rather get a commercial product. When a disk fails, and a disk will always eventually fail, then I want something that will loudly beep at me and make it easy to replace the drive and repair the RAID.

Some choices you have are:

  • vendor (Synology, QNAP, and Asustor are the vendors I know and trust the most)
  • number of bays (you want 8 to 12)
  • redundancy (you want at least 2 if not 3 disks)
  • filesystem (btrfs or ZFS) [not btrfs-raid builtin, but btrfs on top of RAID]
  • drives (NAS optimized between $20/tb and $30/tb)
  • networking (at least 2-gbps bonded, but box probably can’t use all of 10gbps)
  • backup (big external USB drives)

The products I link above all have at least 8 drive bays. When you google “NAS”, you’ll get a list of smaller products. You don’t want them. You want somewhere between 8 and 12 drives.

The reason is that you want two-drive redundancy like RAID6 or RAIDZ2, meaning two additional drives. Everyone tells you one-disk redundancy (like RAID5) is enough, they are wrong. It’s just legacy thinking, because it was sufficient in the past when drives were small. Disks are so big nowadays that you really need two-drive redundancy. If you have a 4-bay unit, then half the drives are used for redundancy. If you have a 12-bay unit, then only 2 out of the 12 drives are being used for redundancy.

The next decision is the filesystem. There’s only two choices, btrfs and ZFS. The reason is that they both healing and snapshots. Note btrfs means btrfs-on-RAID6, not btrfs-RAID, which is broken. In other words, btrfs contains its own RAID feature that you don’t want to use.

Over long periods of time, errors creep into the file system. You want to scrub the data occasionally. This means reading the entire filesystem, checksuming the files, and repairing them if there’s a problem. That requires a filesystem that checksums each block of data.

Another thing you want snapshots to guard against things like ransomware. This means you mark the files you want to keep, and even if a workstation attempts to change or delete the file, it’ll still be held on the disk.

QNAP uses ZFS while others like Synology and Asustor use btrfs. I really don’t know which is better.

It’s cheaper to buy the NAS diskless then add your own disk drives. If you can’t do this, then you’ll be helpless when a drive fails and needs to be replaced.

Drives cost between $20/tb and $30/tb right now. This recent article has a good buying guide. You probably want to get a NAS optimized hard drive. You probably want to double-check that it’s CMR instead of SMR — SMR is “shingled” vs. “conventional” magnetic recording. SMR is bad. There’s only three hard drive makers (Seagate, Western Digital, and Toshiba), so there’s not a big selection.

Working with such large data sets over 1-gbps is painful. These units allow 802.3ad link aggregation as well as faster Ethernet. Some have 10gbe built-in, others allow a PCIe adapter to be plugged in.

However, due to the overhead of spinning disks, you are unlikely to get 10gbps speeds. I mention this because 10gbps copper Ethernet sucks, so is not necessarily a buying criteria. You may prefer multigig/NBASE-T that only does 5gbps with relaxed cabling requirements and lower power consumption.

This means that your NAS decision is going to be made with your home networking decision. I use a couple of these multigig switches as something that doesn’t cost too much for home networking.

Even though RAID is pretty darn reliable, you still need a backup solution. The way I do this is wither external USB hard drives. I schedule the NAS to backup to those drives automatically. As a home user, tapes aren’t an effective solution, so you are stuck with USB drives.

In the end, this means that your total storage costs, with the NAS server, the drives, and the backup drives, is going to cost you 3x the price of the raw storage. Spinning drives fail often. If you plan on keeping your data around for the next decade, there’s no way to do this without 3x the cost for storage.

I choose Synology because I have the most familiarity with the software, and its software gets the best reviews. But QNAP and Asustor also have great reputations. 

Note that I’ve made the assumption here that you’ll want “desktop NAS” solutions. There are also rackmount solutions available.

Check: that Republican audit of Maricopa

Post Syndicated from Robert Graham original

Author: Robert Graham (@erratarob)

Later today (Friday, September 24, 2021), Republican auditors release their final report on what they found with elections in Maricopa county. Draft copies of the report have already circulated online. In this blogpost, I write up my comments on the cybersecurity portions of their draft.

The three main problems are:

  • They misapply cybersecurity principles that are meaningful for normal networks, but which don’t really apply to the “air gapped” networks we see here.
  • They make some errors about technology, especially networking.
  • They are overstretching themselves to find dirt, claiming the things they don’t understand are evidence of something bad.

In the parts below, I pick apart individual pieces from that document to demonstrate these criticisms. I focus on section 7, the cybersecurity section, and ignore the other parts of the document, where others are more qualified than I to opine.

In short, when corrected, section 7 is nearly empty of any content. Software and Patch Management, part 1

They claim Dominion is defective at one of the best-known cyber-security issues: applying patches.

It’s not true. The systems are “air gapped”, disconnected from the typical sort of threat that exploits unpatched systems. The primary security of the system is physical. Frequent patching isn’t expected.

This is a standard in other industries with hard reliability constraints, like industrial or medical. Patches in those systems can destabilize computers and kill people, so these industries are risk averse and resist applying them. They prefer to mitigate the threat in other ways, such as with firewalls and air gaps.

Yes, this approach is controversial. There are some in the cybersecurity community who use lack of patches as a bludgeon with which to bully any who don’t apply every patch immediately. But this is because patching is more a political issue than a technical one. In the real, non-political world we live in, most things don’t get immediately patched all the time. Software and Patch Management, part 2

The auditors claim new software executables were applied to the system, despite the rules against new software being applied. This isn’t necessarily true.

There are many reasons why Windows may create new software executables even when no new software is added. One reason is “Features on Demand” or FOD. You’ll see new executables appear in C:\Windows\WinSxS for these. Another reason is their .NET language, which causes binary x86 executables to be created from bytecode. You’ll see this in the C:\Windows\assembly directory.

The auditors simply counted the number of new executables, with no indication which category they fell into. Maybe they are right, maybe new software was installed or old software updated. It’s just that their mere counting of executable files doesn’t show understanding of these differences. Log Management

The auditors claim that a central log management system should be used.

This obviously wouldn’t apply to “air gapped” systems, because it would need a connection to an external network.

Dominion already designates their EMSERVER as the central log repository for their little air gapped network. Important files from C: are copied to D:, a RAID10 drive. This is a perfectly adequate solution, adding yet another computer to their little network would be overkill, and add as many security problems as it solved.

One could argue more Windows logs need to be preserved, but that would simply mean archiving the from the C: drive onto the D: drive, not that you need to connect to the Internet to centrally log files. Credential Management

Like the other sections, this claim is out of place given the airgapped nature of the network.

Dominion simply uses “role based security” instead of normal user accounts. It’s a well known technique, and considered very appropriate for this sort of environment.

The auditors claim account passwords must “be changed every 90 days”. This is a well-know fallacy in cybersecurity. It took years to get NIST to remove it from their recommendations. If CISA still has it in their recommendations for election systems, then CISA is wrong.

Ideally, accounts wouldn’t be created until they were needed. In practice, system administrators aren’t available (again, it’s an airgapped system, so no remote administration). Dominions alternative is to create the accounts ahead of time, suc has “adjuser09”, waiting for the 9th person you hire that might use that account.

They are all given the same default password to start, like “Arizona2019!!!”. Some customers choose to change the default password, but obviously Maricopa did not. This is weak – but not a big deal, since the primary security is from controlling physical access. Lack of Baseline for Host and Network Activity

They claim sort of baselining should be done. This is absurd. Baselines are always problematic, but would be especially so in this case.

The theory of baselines is that a networks traffic is somewhat predictable on a day-to-day basis. This obviously doesn’t apply to elections systems, which are highly variable day-to-day, especially on election day.

Baselining is the sort of thing you do with a dedicated threat hunting team. It’s incredibly inappropriate for a small installation like this. Network Related Data

The auditors asked for an unreasonable access to network data, in the worst way possible, triggering the refusal to hand it over. They didn’t ask for reasonable data. They blame Maricopa Count for the conflict, but it’s really themselves who are to blame.

A reasonable request would take the MAC addresses from the election machines and ask for any matching records the Maricopa might have in their Splunk, DHCP, or ARP logs. Matches shouldn’t be found, but if they were, the auditors should then ask for flow data for the associated IP addresses.

They are correct in identifying this as a very important issue. Dominion security depends upon an airgap. If auditors find a netowrk connection, it’s bad. It’s not catastrophic, and sometimes machines are disconnected from one network and attached to a network during other times than the election. But this would very much be a useful part of a report – if only they had made a reasonable request that didn’t demand Maricopa spend their entire yearly budget to comply. Other Devices Connected to the Election Network

The auditors complain they weren’t given access to the router identified by

It probably doesn’t exist.

Routers aren’t needed by devices that are on the same local Ethernet. They wouldn’t exist on a single-segment air gapped network. But typical operating-system configuration demands one be configured anyway, so it’s common to put in a dummy router address even if it’s unused.

If you see messages like this one in the logs, it means the router wasn’t there:

The auditors are right in identifying this as an important issue. If there were such a router, then this would cast doubt whether the network was “airgapped”.

Note that if such a router did exist, it would almost certainly be a NAT. This would still offer some firewall protection, just not as strong as an air gap.

7.5.4 Anonymous Logins

They see something in the security logs they don’t understand, and blame Maricopa’s lack of network data (“the routers”) for their inability to explain it.

This is an extraordinarily inappropriate claim, based not on expert understanding of what they see in the logs, but complete ignorance. There’s no reason to believe that getting access to Maricopa Count network logs would explain what’s going on here.

This demonstrates they are on a phishing expedition, and that everything they see that they can’t explain is used as evidence of a conspiracy, either of Maricopa to withhold data, or of election fraud.

The Dominion suite of applications and services is oddly constructed and will produce anomalies. Comparing against a general Windows server not running Dominion’s suite is meaningless.

7.5.5 Dual Boot System Discovered

The auditors claim something about “dual-homed hosts” or “jump-boxes”. That’s not how these terms are normally used. These terms normally refer to a box with access to two separate networks, not a box with two separate operating systems.

This requires no nefarious explanation. This is commonly seen in corporate networks, either because somebody simply added a new drive to re-install the operating-system, or repurposed an old drive from another system as a data drive, and simply forgot to wipe it. The BIOS points to one they intend to boot from and ignore the fact that the other can also boot.

There are endless non-nefarious explanations for what is seen here that doesn’t require a nefarious one. It’s not even clear its a failure of their build process, which focuses on what’s on the boot drive and not what’s on other drives in the system.

7.5.6 EMS Operating System Logs Not Preserved

It is true the EMS operating-system logs are not preserved (well, generally not preserved). By this I refer to the generic Windows logs, the same logs that your own Windows desktop keeps.

The auditors falsely claim that this violates the law. This is false. The “electron records” laws don’t cover the operating-system. The laws instead are intended to preserve the records of the election software running on top of the operating-system, not those of the operating-system itself.

This issue has long been known. You don’t need an auditor’s report to tell you that these logs aren’t generally preserved – everyone has known this for a long time, including those who certified Dominion.

The subtext of this claim is the continued argument by Republicans that the fact they can’t find evidence for 2020 election fraud is because key data is missing. That’s the argument of Tina Peters, the former clerk of a county in the neighboring state of Colorado, who claims their elections cannot be audited because they don’t have the Windows operating-system logs.

It’s not true. System logs are as likely to cause confusion, as they do above with the “anonymous logins” issue. They are unlikely to provide proof of votes being flipped in a hack. If there was massive fraud, as detected by recounts of paper ballots, I’d certainly want such system logs to search for how it happened. But I wouldn’t use such logs in order to audit the vote.

Note that the description of “deleting” log entries by overfilling the logs is wrong. If it were important to preserve such logs, then they would be copied right after the election. They wouldn’t be left to rot on the boot drive for months afterwards.

As a forensics guy, I would certainly support the idea that Dominion should both enable more logs and preserve them after each election. They don’t require excessive storage and can be saved automatically in the last phase of an election. But their lack really isn’t all that important, they are mostly just full of junk.


We live in a pluralistic democracy, meaning there are many centers of power, each competing with each other. It’s inherently valid for one side to question and challenge the other side. But this can go too far, to the point where you are challenging the stability of our republic.

The Republican party is split. Some are upholding that principle of pluralism, wanting to make sure future elections are secure and fair. Others are attacking that principle, challenging the peaceful transfer of power in the last election with baseless accusations of fraud.

This split is seen in Arizona, where Republicans have demanded an audit by highly partisan auditors. An early draft of their report straddles that split, containing some reasonable attempt to create recommendations for future elections, while simultaneous providing fodder for the other side to believe the last election was stolen.

A common problem with auditors is that when they can’t find the clear evidence they were looking for, the fill their reports with things they don’t understand. I think I see that here. The auditors make technical errors in ways that question their competence, but that’s likely not true. Instead, they kept searching past where they were strong into areas where they were weak, looking for as much dirt as possible. Thus, in this report, we see where they are technically weak.

Trumpists, meaning those attacking the peaceful transfer of power with baseless accusations of fraud, will certainly use this report to champion their cause, despite the headline portion that confirms the vote count. But for the rest of us, we should welcome this report. Elections do need to be fixed, and while it’s unlikely we’ll fix them in the ways suggested in this report, it will add visibility into the process which we can use to debate improvements.

This blogpost is only a first draft. While the technical bits in section 7 look fairly straightforward to me, I’m guessing that people who don’t understand them will come up with weird conspiracy-theories about them. Thus, I’m guessing I’ll have to write another blogpost in a week debunking some of the crazier ideas.

That Alfa-Trump Sussman indictment

Post Syndicated from Robert Graham original

Five years ago, online magazine Slate broke a story about how DNS packets showed secret communications between Alfa Bank in Russia and the Trump Organization, proving a link that Trump denied. I was the only prominent tech expert that debunked this as just a conspiracy-theory[*][*][*].

Last week, I was vindicated by the indictment of a lawyer involved, a Michael Sussman. It tells a story of where this data came from, and some problems with it.

But we should first avoid reading too much into this indictment. It cherry picks data supporting its argument while excluding anything that disagrees with it. We see chat messages expressing doubt in the DNS data. If chat messages existed expressing confidence in the data, we wouldn’t see them in the indictment.

In addition, the indictment tries to make strong ties to the Hillary campaign and the Steele Dossier, but ultimately, it’s weak. It looks to me like an outsider trying to ingratiated themselves with the Hillary campaign rather than there being part of a grand Clinton-lead conspiracy against Trump.

With these caveats, we do see some important things about where the data came from.

We see how Tech-Executive-1 used his position at cyber-security companies to search private data (namely, private DNS logs) to search for anything that might link Trump to somebody nefarious, including Russian banks. In other words, a link between Trump and Alfa bank wasn’t something they accidentally found, it was one of the many thousands of links they looked for.

Such a technique has been long known as a problem in science. If you cast the net wide enough, you are sure to find things that would otherwise be statistically unlikely. In other words, if you do hundreds of tests of hydroxychloroquine or invermectin on Covid-19, you are sure to find results that are so statistically unlikely that they wouldn’t happen more than 1% of the time.

If you search world-wide DNS logs, you are certain to find weird anomalies that you can’t explain. Unexplained computer anomalies happen all the time, as every user of computers can tell you.

We’ve seen from the start that the data was highly manipulated. It’s likely that the data is real, that the DNS requests actually happened, but at the same time, it’s been stripped of everything that might cast doubt on the data. In this indictment we see why: before the data was found the purpose was to smear Trump. The finders of the data don’t want people to come to the best explanation, they want only explainations that hurt Trump.

Trump had no control over the domain in question, Instead, it was created by a hotel marketing firm they hired, Cendyne. It’s Cendyne who put Trump’s name in the domain. A broader collection of DNS information including Cendyne’s other clients would show whether this was normal or not.

In other words, a possible explanation of the data, hints of a Trump-Alfa connection, has always been the dishonesty of those who collected the data. The above indictment confirms they were at this level of dishonesty. It doesn’t mean the DNS requests didn’t happen, but that their anomalous nature can be created by deletion of explanatory data.

Lastly, we see in this indictment the problem with “experts”.

Sadly, this didn’t happen. Even experts are biased. The original Slate story quoted Paul Vixie, who hates Trump, who was willing to believe it rather than question it. It’s not necessarily Vixie’s fault: the Slate reporter gave the experts they quoted a brief taste of the data, then pretended their response was a full in-depth analysis, rather than a quick hot-take. It’s not clear that Vixie really still stands behind the conclusions in the story.

But of the rest of the “experts” in the field, few really care. Most hate Trump, and therefore, wouldn’t challenge anything that hurts Trump. Experts who like Trump also wouldn’t put the work into it, because nobody would listen to them. Most people choose sides — they don’t care about the evidence.

This indictment vindicates my analysis in those blogposts linked above. My analysis shows convincingly that Trump had no real connection to the domain. I can’t explain the anomaly, why Alfa Bank is so interested in a domain containing the word “trump”, but I can show that conspirational communications is the least likely explanation.

How not to get caught in law-enforcement geofence requests

Post Syndicated from Robert Graham original

I thought I’d write up a response to this question from well-known 4th Amendment and CFAA lawyer Orin Kerr:

First, let me address the second part of his tweet, whether I’m technically qualified to answer this. I’m not sure, I have only 80% confidence that I am. Hence, I’m writing this answer as blogpost hoping people will correct me if I’m wrong.

There is a simple answer and it’s this: just disable “Location” tracking in the settings on the phone. Both iPhone and Android have a one-click button to tap that disables everything.

The trick is knowing which thing to disable. On the iPhone it’s called “Location Services”. On the Android, it’s simply called “Location”.

If you do start googling around for answers, you’ll find articles upset that Google is still tracking them. That’s because they disabled “Location History” and not “Location”. This left “Location Services” and “Web and App Activity” still tracking them. Disabling “Location” on the phone disables all these things [*].

It’s that simple: one click and done, and Google won’t be able to report your location in a geofence request.

I’m pretty confident in this answer, despite what your googling around will tell you about Google’s pernicious ways. But I’m only 80% confident in my answer. Technology is complex and constantly changing.

Note that the answer is very different for mobile phone companies, like AT&T or T-Mobile. They have their own ways of knowing about your phone’s location independent of whatever Google or Apple do on the phone itself. Because of modern 4G/LTE, cell towers must estimate both your direction and distance from the tower. I’ve confirmed that they can know your location to within 50 feet. There are limitations to this, it depends upon whether you are simply in range of the tower or have an active phone call in progress. Thus, I think law enforcement prefers asking Google.

Another example is how my car uses Google Maps all the time, and doesn’t have privacy settings. I don’t know what it reports to Google. So when I rob a bank, my phone won’t betray me, but my car will.

Note that “disabling GPS” isn’t sufficient. I include the screenshot above because of how it mentions the phone relies upon WiFi, BlueTooth, and cell tower info to also confirm your location. Tricking GPS will do little to stop your phone from knowing your location.

I only know about this from the phone side of things and not actual legal cases. I’d love to see the sort of geofence results the FBI gets. There might be some subtle thing that I missed about how Android works with mobile companies, such as this old story where Android phones reported cell tower information to Google (since removed). Or worse, there might be something completely obvious I should’ve known about that everyone seems to know, but for some reason I simply forgot.

Both Apple and Google are upfront about what private information they do and don’t track and how to disable it. Thus, while I think they may do something on accident hidden from view, I don’t think there’s anything going on that isn’t documented. And what’s documented this concern is that simply turning off the “Location” button.

Update: Many comments note that Google does log the IP address of requests, and that IP addresses can sometimes be geolocated.

Well, yes and no. It’s not something companies log in that way. Thus, when given a geofence request for everything within a certain physical location, logs containing only IP addresses wouldn’t be something covered by the request. The log would need a record of the physical location to be covered. Moreover, geolocation by IP address is incredibly inaccurate, often telling you only what city or neighborhood where the IP address is located. Even if Google logged a record of the best-guess about location, I’m still not sure whether it would be an appropriate response to a geofence request.

In any event, this wouldn’t apply to mobile IP addresses. In America, consumer mobile phones don’t have public IP addresses by share the same pool of private addresses. Thus, the IP address from a mobile phone is meaningless for location purposes.

Now you can create a hypothetical situation like the following:

  • a Capitol Hill protestor logs onto a nearby WiFi (meaning: it’s not the mobile IP address in question, but the IP address of the WiFi hotspot)
  • the geolocation record of that WiFi hotspot is actually accurate
  • requests to Google resolves that geolocation when it logs the IP address
  • they give such IP/location logs in response to geofence request

Then, yes, my argument is defeated, a hypothetical geofence request might then get you.

Which I actually like. It’s a good demonstration of why I doubt myself at the top of the post. I don’t think this scenario is likely, and hence don’t consider it a reasonable rebuttal, but “unlikely” doesn’t mean “impossible”. I’m still pretty confident that a one-click disabling “Location” is all you need to defeat geofence warrants given to Google.

Note that the discussion of this blogpost is just about the “geofence request to Google”. This “Capital Hill WiFi” hypothetical is unlikely to help with requests by location, but of course would for requests by IP address. Law enforcement could certainly ask Google for a list of users that came in via the Capital Hill WiFi IP address.

Of course you can’t trust scientists on politics

Post Syndicated from Robert Graham original

Many people make the same claim as this tweet. It’s obviously wrong. Yes,, the right-wing has a problem with science, but this isn’t it.

First of all, people trust airplanes because of their long track record of safety, not because of any claims made by scientists. Secondly, people distrust “scientists” when politics is involved because of course scientists are human and can get corrupted by their political (or religious) beliefs.

And thirdly, the concept of “trusting scientific authority” is wrong, since the bedrock principle of science is distrusting authority. What defines sciences is how often prevailing scientific beliefs are challenged.

Carl Sagan has many quotes along these lines that eloquently expresses this:

A central lesson of science is that to understand complex issues (or even simple ones), we must try to free our minds of dogma and to guarantee the freedom to publish, to contradict, and to experiment. Arguments from authority are unacceptable.

If you are “arguing from authority”, like Paul Graham is doing above, then you are fundamentally misunderstanding both the principles of science and its history.

We know where this controversy comes from: politics. The above tweet isn’t complaining about the $400 billion U.S. market for alternative medicines, a largely non-political example. It’s complaining about political issues like vaccines, global warming, and evolution.

The reason those on the right-wing resist these things isn’t because they are inherently anti-science, it’s because the left-wing is. They left has corrupted and politicized these topics. The “Green New Deal” contains very little that is “Green” and much that is “New Deal”, for example. The left goes from the fact “carbon dioxide absorbs infrared” to justify “we need to promote labor unions”.

Take Marjorie Taylor Green’s (MTG) claim that she doesn’t believe in the Delta variant because she doesn’t believe in evolution. Her argument is laughably stupid, of course, but it starts with the way the left has politicized the term “evolution”.

The “Delta” variant didn’t arise from “evolution”, it arose because of “mutation” and “natural selection”. We know the “mutation” bit is true, because we can sequence the complete DNA and detect that changes happen. We know that “selection” happens, because we see some variants overtake others in how fast they spread.

Yes, “evolution” is synonymous with mutation plus selection, but it’s also a politically loaded term that means a lot of additional things. The public doesn’t understand mutation and natural-selection, because these concepts are not really taught in school. Schools don’t teach students to understand these things, they teach students to believe.

The focus of science eduction in school is indoctrinating students into believing in “evolution” rather than teaching the mechanisms of “mutation” and “natural-selection”. We see the conflict in things like describing the evolution of the eyeball, which Creationists “reasonably” believe is too complex to have evolved this way. I put “reasonable” in quotes here because it’s just the “Gods in the gaps” argument, which credits God for everything that science can’t explain, which isn’t very smart. But at the same time, science textbooks go too far, refusing to admit their gaps in knowledge here. The fossil records shows a lot of complexity arising over time through steady change — it just doesn’t show anything about eyeballs.

In other words, it’s possible for a kid to graduate high-school with a full understanding of science, including mutation, selection, and the fossil record, while believing God created the eyeball. This is anathema to educators, who would rather students “believe in evolution” than understand it.

Thus, “believing” in the “evolution” of the Delta variant becomes this horrible political debate because the left-wing has corrupted science. You have politicians like MTG virtue signaling their opposition to evolution in what should be a non-political, neutral science discussion.

The political debate over vaccines isn’t the vaccines themselves, but forcing people to become vaccinated.

The evidence is clear that the covid vaccines are in your own (and your kids’) best interest. If we left it there, few would be challenging the science. There is no inherent right-wing opposition to vaccines. Indeed, Trump championed the covid vaccines, trying to take credit for their development. 

But the left-wing chose a different argument, that covid vaccines are in the best interest of society, and therefore, that government must coerce/force people to become vaccinated. It’s at this point that political opposition appears on the right-wing. It’s the same whether you are describing the debate in the United States, Europe, or Asia.

We know the juvenile method which people defend their political positions. Once people decide to oppose “forcible vaccination”, they then build a position that vaccines aren’t “good” anyway.

Thus, you’ll get these nonsense arguments from people who have get their opinions from dodgy blogs/podcasts, like “these don’t even meet the definition of a vaccine”. The started from the political goal first, and then looked for things that might support it, no matter how intellectually vacuous. It’s frustrating trying to argue against the garbage arguments they’ll toss up.

But at the same time, the left is no better. The tweet above is equally a vacuous meme, that they repeat because it sounds good, not because they’ve put much thought into it. It’s simply an argument that strokes the prejudices of those who repeat it, rather than being a robust argument that can change the minds of opponents. It’s obviously false: people trust planes because of their track record, not because of scientists claim. They trust scientists and doctors on non-political things, but rightly distrust their pronouncements on politically-tainted issues. And lastly, the above argument is completely anti-scientific — science is all about questioning and doubting.

Risk analysis for DEF CON 2021

Post Syndicated from Robert Graham original

It’s the second year of the pandemic and the DEF CON hacker conference wasn’t canceled. However, the Delta variant is spreading. I thought I’d do a little bit of risk analysis. TL;DR: I’m not canceling my ticket, but changing my plans what I do in Vegas during the convention.

First, a note about risk analysis. For many people, “risk” means something to avoid. They work in a binary world, labeling things as either “risky” (to be avoided) or “not risky”. But real risk analysis is about shades of gray, trying to quantify things.

The Delta variant is a mutation out of India that, at the moment, is particularly affecting the UK. Cases are nearly up to their pre-vaccination peaks in that country.

Note that the UK has already vaccinated nearly 70% of their population — more than the United States. In both the UK and US there are few preventive measures in place (no lockdowns, no masks) other than vaccines.


Thus, the UK graph is somewhat predictive of what will happen in the United States. If we time things from when the latest wave hit the same levels as peak of the first wave, then it looks like the USA is only about 1.5 months behind the UK.

It’s another interesting lesson about risk analysis. Most people experience these things as sudden changes. One moment, everything seems fine, and cases are decreasing. The next moment, we are experiencing a major new wave of infections. It’s especially jarring when the thing we are tracking is exponential. But we can compare the curves and see that things are totally predictable. In about another 1.5 months, the US will experience a wave that looks similar to the UK wave.

Sometimes the problem is that the change is inconceivable. We saw that recently with 1-in-100 year floods in Germany. Weather forecasters predicted 1-in-100 level of floods days in advance, but they still surprised many people.

Nevada is ahead of the curve in the US, probably because Vegas is such a hub of unvaccinated people going on vacation. Because of exponential growth, there’s a good chance that in 2 weeks, that peek will be triple where it is now. It may not look like “time to cancel your ticket” now, but it probably will in 2 weeks when the event takes place. In other words, the closer we get to the event, the more people will look at this graph and cancel their tickets.

The risk is really high for the unvaccinated, but much less for the vaccinated. We see that in the death rates in the UK, which are still low, even accounting for the 2 week lag that you see between spikes in infections and spikes in deaths. This is partly due to the fact that while the new variant infects the vaccinated, it doesn’t cause much harm. Also, I suspect it’s due to how much better we are at treating infections if they do require a hospital visit.

But still, death isn’t the major concern. It appears the major concern is long term-lung (and other organ) damage caused by even mild cases. Thus, one should fear infection even if one believes they have no chance of dying.

So here’s my personal risk analysis: I’m not canceling my ticket. Instead, I’m changing my plans of what I do. For the most part, this means that wherever there’s a crowd, go someplace else.
It also means I’m going to take this opportunity to do things I’ve never had the opportunity to do before: go outside of Vegas. I plan on renting a car to go down to the Grand Canyon, Hoover Dam, and do hikes around the area (like along Lake Meade, up in the canyons, and so on). This means spending most of my time away from people.
During the pandemic, outdoor activities (without masks, socially distanced) is one of the safest things you can do, especially considering the exercise and vitamin D that you’ll be getting.
Also, airplanes aren’t much of a worry. They have great filtration and as far as anybody can tell, haven’t resulted in superspreader events this entire pandemic.
The real point of this blogpost is the idea of “predictions”. This post predicts that US infection rates will be spiking in 1.5 months in a curve that looks similar to the UK, and that in 2 weeks during DEFCON, Nevada’s infection rates will be around 3 times higher. The biggest lesson about risk analysis is that it’s usually done in hind-sight, what people should’ve known, once the outcome is known. It’s much harder doing it the other way around, estimating what might happen in the future.

Ransomware: Quis custodiet ipsos custodes

Post Syndicated from Robert Graham original

Many claim that “ransomware” is due to cybersecurity failures. It’s not really true. We are adequately protecting users and computers. The failure is in the inability of cybersecurity guardians to protect themselves. Ransomware doesn’t make the news when it only accesses the files normal users have access to. The big ransomware news events happened because ransomware elevated itself to that of an “administrator” over the network, giving it access to all files, including online backups.

Generic improvements in cybersecurity will help only a little, because they don’t specifically address this problem. Likewise, blaming ransomware on how it breached perimeter defenses (phishing, patches, password reuse) will only produce marginal improvements. Ransomware solutions need to instead focus on looking at the typical human-operated ransomware killchain, identify how they typically achieve “administrator” credentials, and fix those problems. In particular, large organizations need to redesign how they handle Windows “domains” and “segment” networks.

I read a lot of lazy op-eds on ransomware. Most of them claim that the problem is due to some sort of moral weakness (laziness, stupidity, greed, slovenliness, lust). They suggest things like “taking cybersecurity more seriously” or “do better at basic cyber hygiene”. These are “unfalsifiable” — things that nobody would disagree with, meaning they are things the speaker doesn’t really have to defend. They don’t rest upon technical authority but moral authority: anybody, regardless of technical qualifications, can have an opinion on ransomware as long as they phrase it in such terms.

Another flaw of these “unfalsifiable” solutions is that they are not measurable. There’s no standard definition for “best practices” or “basic cyber hygiene”, so there no way to tell if you aren’t already doing such things, or the gap you need to overcome to reach this standard. Worse, some people point to the “NIST Cybersecurity Framework” as the “basics” — but that’s a framework for all cybersecurity practices. In other words, anything short of doing everything possible is considered a failure to follow the basics.

In this post, I try to focus on specifics, while at the same time, making sure things are broadly applicable. It’s detailed enough that people will disagree with my solutions.

The thesis of this blogpost is that we are failing to protect “administrative” accounts. The big ransomware attacks happen because the hackers got administrative control over the network, usually the Windows domain admin. It’s with administrative control that they are able to cause such devastation, able to reach all the files in the network, while also being able to delete backups.

The Kaseya attacks highlight this particularly well. The company produces a product that is in turn used by “Managed Security Providers” (MSPs) to administer the security of small and medium sized businesses. Hackers found and exploited a vulnerability in the product, which gave them administrative control of over 1000 small and medium sized businesses around the world.

The underlying problems start with the way their software gives indiscriminate administrative access over computers. Then, this software was written using standard software techniques, meaning, with the standard vulnerabilities that most software has (such as “SQL injection”). It wasn’t written in a paranoid, careful way that you’d hope for software that poses this much danger.

A good analogy is airplanes. A common joke refers to the “black box” flight-recorders that survive airplane crashes, that maybe we should make the entire airplane out of that material. The reason we can’t do this is that airplanes would be too heavy to fly. The same is true of software: airplane software is written with extreme paranoia knowing that bugs can lead to airplanes falling out of the sky. You wouldn’t want to write all software to that standard, because it’d be too costly.

This analogy tells us we can’t write all software to the highest possible standard. However, we should write administrative software (like Kaseya) to this sort of standard. Anything less invites something like the massive attack we saw in the last couple weeks.

Another illustrative example is the “PrinterNightmare” bug. The federal government issued a directive telling everyone under it’s authority (executive branch, military) to disable the Printer Spooler on “domain controllers”. The issue here is that this service should never have been enabled on “domain controllers” in the first place.

Windows security works by putting all the security eggs into a single basket known as “Active Directory”, which is managed by several “Domain Controller” (AD DC) servers. Hacking a key DC gives the ransomware hacker full control over the network. Thus, we should be paranoid about protecting DCs. They should not be running any service other than those needed to fulfill their mission. The more additional services they provide, like “printing”, the larger the attack surface, the more likely they can get hacked, allowing hackers full control over the network. 

Yet, I rarely see Domain Controllers with this level of paranoid security. Instead, when an organization has a server, they load it up with lots of services, including those for managing domains. Microsoft’s advice securing domain controllers “recommends” a more paranoid attitude, but only as one of the many other things it “recommends”.

When you look at detailed analysis of ransomware killchains, you’ll find the most frequently used technique is “domain admin account hijacking“. Once a hacker controls a desktop computer, they wait for an administrator to login, then steal the administrators credentials. There are various ways this happens, the most famous being “pass-the-hash” (which itself is outdated, but good analogy for still-current techniques). Hijacking even restricted administrator accounts can lead to elevation to unrestricted administrator privileges over the entire network.

If you had to fix only one thing in your network, it would be this specific problem.

Unfortunately, I only know how to attack this problem as a pentester, I don’t know how to defend against it. I feel that separating desktop admins and server/domain admins into separate, non-overlapping groups is the answer, but I don’t know how to achieve this in practice. I don’t have enough experience as a defender to know how to make reasonable tradeoffs.

In addition to attacking servers and accounts, ransomware attackers also target networks. Organizations focus on “perimeter security”, where the major security controls are between the public Internet and the internal organization. They also need an internal perimeter, between the organization’s network and the core servers.

There are lots of tools for doing this: VLANs, port-isolation, network segmentation, read-only Domain Controllers, and the like.

As an attacker, I see the lack of these techniques. I don’t know why defenders doin’t use them more. There might be good reasons. I suspect the biggest problem is inertia: networks were designed back when these solutions were hard, and change would break things.

In summary, I see the major problem exploited by ransomware is that we don’t protect “administrators” enough. We don’t do enough to protect administrative software, servers, accounts, or network segments. When we look at ransomware, the big cases that get splashed across the news, its not because they compromised a single desktop, but because they got administrative control over the entire network and thus were able to encrypt everything.

Sadly, as a person experience in attack (red-team) and exploiting these problems, I can see the problem. However, I have little experience as a defender (blue-team), and while solutions look easy in theory, I’m not sure what can be done in practice to mitigate these threats.

I do know that general hand-waving, exhorting people to “take security seriously” and perform “cyber hygiene” is the least helpful answer to the problem.

Some quick notes on SDR

Post Syndicated from Robert Graham original

I’m trying to create perfect screen captures of SDR to explain the world of radio around us. In this blogpost, I’m going to discuss some of the imperfect captures I’m getting, specifically, some notes about WiFi and Bluetooth.

An SDR is a “software defined radio” which digitally samples radio waves and uses number crunching to decode the signal into data. Among the simplest thing an SDR can do is look at a chunk of spectrum and see signal strength. This is shown below, where I’m monitoring part of the famous 2.4 GHz pectrum used by WiFi/Bluetooth/microwave-ovens:

There are two panes. The top shows the current signal strength as graph. The bottom pane is the “waterfall” graph showing signal strength over time, display strength as colors: black means almost no signal, blue means some, and yellow means a strong signal.

The signal strength graph is a bowl shape, because we are actually sampling at a specific frequency of 2.42 GHz, and the further away from this “center”, the less accurate the analysis. Thus, the algorithms think there is more signal the further away from the center we are.

What we do see here is two peaks, at 2.402 GHz toward the left and 2.426 GHz toward the right (which I’ve marked with the red line). These are the “Bluetooth beacon” channels. I was able to capture the screen at the moment some packets were sent, showing signal at this point. Below in the waterfall chart, we see packets constantly being sent at these frequencies.

We are surrounded by devices giving off packets here: our phones, our watches, “tags” attached to devices, televisions, remote controls, speakers, computers, and so on. This is a picture from my home, showing only my devices and perhaps my neighbors. In a crowded area, these two bands are saturated with traffic.

The 2.4 GHz region also includes WiFi. So I connected to a WiFi access-point to watch the signal.

WiFi uses more bandwidth than Bluetooth. The term “bandwidth” is used today to mean “faster speeds”, but it comes from the world of radio where it quite literally means the width of the band. The width of the Bluetooth transmissions seen above is 2 MHz, the width of the WiFi band shown here is 20 MHz.

It took about 50 screenshots before getting these two. I had to hit the “capture” button right at the moment things were being transmitted. And easier way is a setting that graphs the current signal strength compared to the maximum recently seen as a separate line. That’s shown below: the instant it was taken, there was no signal, but it shows the maximum of recent signals as a separate line:

You can see there is WiFi traffic on multiple channels. My traffic is on channel #1 at 2.412 GHz. My neighbor has traffic on channel #6 at 2.437 GHz. Another neighbor has traffic on channel #8 at 2.447 GHz. WiFi splits the spectrum assigned to it into 11 overlapping channels set 5 MHz apart.
Now the reason I wanted to take these pictures was to highlight the difference between old WiFi (802.11b) and new WiFi (802.11n). The newer standard uses the spectrum more efficiently. Notice in the picture above how signal strength for a WiFi channel is strongest in the center but gets weaker toward the edges. That means it’s not fully using all the band.
Newer WiFi uses a different scheme to encode data into radio waves, using all the band given to it. We can see the difference in shape below, when I change from 802.11b to 802.11n:

Instead of a curve it’s more of a square block. It fills its entire 20 MHz bandwidth instead of only using the center.
What we see here is the limits of math and physics, known as the Shannon Limit, that governs the maximum possible speed for something like WiFi (or mobile phone radios like LTE). It’s simply the size of that box: its width times its height. The width is measured in frequency, 20 MHz wide. It’s height is signal strength measure above the noise floor (which should be straight line across the bottom of our graph, but as I mentioned before, is shown in this SDR by a curved line increasingly inaccurate near the edges).
As we move toward faster and faster speeds, we cannot exceed this theoretical limit.
One solution is directional antennas, such as the yagi antennas you see on top of houses or satellite dishes. A directional antenna or dish means getting a stronger signal with less noise — thus, increasing the “height” of the box.
The same effect can be achieved with something called “phased arrays”, using multiple antennas that transmit/receive at (very) slightly different times, such that waves they produce reinforce each other in one direction but cancel each other out in other directions. This is how SpaceX “Starlink” space-based Internet works. The low Earth orbit satellites whizzing by overhead travel too fast to keep an antenna pointed at them, so their antenna is a phases array instead. The antennas are fixed, but the timing is slightly altered to aim the beam toward the satellite.
What’s even more interesting is MIMO: receiving different signals on different antennas. With fancy circuits and math, doubling the number of antennas doubles the effective bandwidth.
The latest mobile phones and WiFi use MIMO and phases arrays to increase bandwidth.
But mostly, higher frequencies give more bandwidth. That’s why WiFi at 5 GHz is better — bands are a minimum of 40 MHz (instead of 20 MHz as in 2.4 GHz WiFi), are more commonly 80 MHz, and can go up to 160 MHz.
Anyway, these are more imperfect picture I’m creating to explain WiFi and Bluetooth. At some point in the time, I’ll be generating more perfect ones.

When we’ll get a 128-bit CPU

Post Syndicated from Robert Graham original

On Hacker News, this article claiming “You won’t live to see a 128-bit CPU” is trending”. Sadly, it was non-technical, so didn’t really contain anything useful. I thought I’d write up some technical notes.

The issue isn’t the CPU, but memory. It’s not about the size of computations, but when CPUs will need more than 64-bits to address all the memory future computers will have. It’s a simple question of math and Moore’s Law.

Today, Intel’s server CPUs support 48-bit addresses, which is enough to address 256-terabytes of memory — in theory. In practice, Amazon’s AWS cloud servers are offered up to 24-terabytes, or 45-bit addresses, in the year 2020.

Doing the math, it means we have 19-bits or 38-years left before we exceed the 64-bit registers in modern processors. This means that by the year 2058, we’ll exceed the current address size and need to move 128-bits. Most people reading this blogpost will be alive to see that, though probably retired.

There are lots of reasons to suspect that this event will come both sooner and later.

It could come sooner if storage merges with memory. We are moving away from rotating platters of rust toward solid-state storage like flash. There are post-flash technologies like Intel’s Optane that promise storage that can be accessed at speeds close to that of memory. We already have machines needing petabytes (at least 50-bits worth) of storage.

Addresses often contain more just the memory address, but also some sort of description about the memory. For many applications, 56-bits is the maximum, as they use the remaining 8-bits for tags.

Combining those two points, we may be only 12 years away from people starting to argue for 128-bit registers in the CPU.

Or, it could come later because few applications need more than 64-bits, other than databases and file-systems.

Previous transitions were delayed for this reason, as the x86 history shows. The first Intel CPUs were 16-bits addressing 20-bits of memory, and the Pentium Pro was 32-bits addressing 36-bits worth of memory.

The few applications that needed the extra memory could deal with the pain of needing to use multiple numbers for addressing. Databases used Intel’s address extensions, almost nobody else did. It took 20 years, from the initial release of MIPS R4000 in 1990 to Intel’s average desktop processor shipped in 2010 for mainstream apps needing larger addresses.

For the transition beyond 64-bits, it’ll likely take even longer, and might never happen. Working with large datasets needing more than 64-bit addresses will be such a specialized discipline that it’ll happen behind libraries or operating-systems anyway.

So let’s look at the internal cost of larger registers, if we expand registers to hold larger addresses.

We already have 512-bit CPUs — with registers that large. My laptop uses one. It supports AVX-512, a form of “SIMD” that packs multiple small numbers in one big register, so that he can perform identical computations on many numbers at once, in parallel, rather than sequentially. Indeed, even very low-end processors have been 128-bit for a long time — for “SIMD”.

In other words, we can have a large register file with wide registers, and handle the bandwidth of shipping those registers around the CPU performing computations on them. Today’s processors already handle this for certain types of computations.

But just because we can do many 64-bit computations at once (“SIMD”) still doesn’t mean we can do a 128-bit computation (“scalar”). Simple problems like “carry” get difficult as numbers get larger. Just because SIMD can do multiple small computations doesn’t tell us what one large computation will cost. This was why it took an extra decade for Intel to make the transition — they added 64-bit MMX registers for SIMD a decade before they added 64-bit for normal computations.

The above discussion is about speed, but it’s also a concern for power consumption. Mobile devices were a decade later (than desktops) adopting 64-bits, exceeding the 32-bit barrier just now. It’s likely they be decades late getting to 128-bits. Even if you live to see supercomputers transition to 128-bits, you probably won’t live to see your mobile device transition.

Now let’s look at the market. What the last 40 years has taught us is that old technology doesn’t really day, it’s that it stops growing — with all the growth happening in some new direction. 40 years ago, IBM dominated computing with their mainframes. Their mainframe business is as large as ever, it’s just that all the growth in the industry has been in other directions than the mainframe. The same thing happened to Microsoft’s business, Windows still dominates the desktop, but all the growth in the last 15 years has bypassed the desktop, moving to mobile devices and the cloud.

40 years from now, it won’t be an issue of mainstream processors jumping from 64-bits to 128-bits, like the previous transitions. I’m pretty sure we’ll have ossified into some 64-bit standard like ARM. Instead, I think 128-bit systems will come with a bunch of other radical changes. It’ll happen on the side of computers, much like how GPUs evolved separately from mainstream CPUs can became increasingly integrated into them.

Anatomy of how you get pwned

Post Syndicated from Robert Graham original

Today, somebody had a problem: they kept seeing a popup on their screen, and obvious scam trying to sell them McAfee anti-virus. Where was this coming from?

In this blogpost, I follow this rabbit hole on down. It starts with “search engine optimization” links and leads to an entire industry of tricks, scams, exploiting popups, trying to infect your machine with viruses, and stealing emails or credit card numbers.

Evidence of the attack first appeared with occasional popups like the following. The popup isn’t part of any webpage.

This is obviously a trick. But from where? How did it “get on the machine”?

There’s lots of possible answers. But the most obvious answer (to most people), that your machine is infected with a virus, is likely wrong. Viruses are generally silent, doing evil things in the background. When you see something like this, you aren’t infected … yet.

Instead, things popping with warnings is almost entirely due to evil websites. But that’s confusing, since this popup doesn’t appear within a web page. It’s off to one side of the screen, nowhere near the web browser.

Moreover, we spent some time diagnosing this. We restarted the webbrowser in “troubleshooting mode” with all extensions disabled and went to a clean website like Twitter. The popup still kept happening.

As it turns out, he had another windows with Firefox running under a different profile. So while he cleaned out everything in this one profile, he wasn’t aware the other one was still running

This happens a lot in investigations. We first rule out the obvious things, and then struggle to find the less obvious explanation — when it was the obvious thing all along.

In this case, the reason the popup wasn’t attached to a browser window is because it’s a new type of popup notification that’s suppose to act more like an app and less like a web page. It has a hidden web page underneath called a “service worker”, so the popups keep happening when you think the webpage is closed.

Once we figured the mistake of the other Firefox profile, we quickly tracked this down and saw that indeed, it was in the Notification list with Permissions set to Allow. Simply changing this solved the problem.

Note that the above picture of the popup has a little wheel in the lower right. We are taught not to click on dangerous thing, so the user in this case was avoiding it. However, had the user clicked on it, it would’ve led him straight here to the solution. I can’t recommend you click on such a thing and trust it, because that means in the future, malicious tricks will contain such safe looking icons that aren’t so safe.

Anyway, the next question is: which website did this come from?

The answer is Google.

In the news today was the story of the Michigan guys who tried to kidnap the governor. The user googled “attempted kidnap sentencing guidelines“. This search produced a page with the following top result:

Google labels this a “featured snippet”. This isn’t an advertisement, not a “promoted” result. But it’s a link that Google’s algorithms thinks is somehow more worthy than the rest.

This happened because hackers tricked Google’s algorithms. It’s been a constant cat and mouse game for 20 years, in an industry known as “search engine optimization” or SEO. People are always trying to trick google into placing their content highest, both legitimate companies and the quasi-illegitimate that we see here. In this case, they seem to have succeeded.
The way this trick works is that the hackers posted a PDF instead of a webpage containing the desired text. Since PDF documents are much less useful for SEO purposes, google apparently trusts them more.
But the hackers have found a way to make PDFs more useful. They designed it to appear like a webpage with the standard CAPTCHA. You click anywhere on the page such as saying “I’m not robot”, and it takes you to the real webstie.

But where is the text I was promised in the Google’s search result? It’s there, behind the image. PDF files have layers. You can put images on top that hides the text underneath. Humans only see the top layer, but google’s indexing spiders see all the layers, and will index the hidden text. You can verify this by downloading the PDF and using tools to examine the raw text:

If you click on the “I am not robot” in the fake PDF, it takes you to a page like the following:

Here’s where the “hack” happened. The user misclicked on “Allow” instead of “Block” — accidentally. Once they did that, popups started happening, even when this window appeared to go away.

The lesson here is that “misclicks happen”. Even the most knowledgeable users, the smartest of cybersecurity experts, will eventually misclick themselves.

As described above, once we identified this problem, we were able to safely turn off the popups by going to Firefox’s “Notification Permissions”.

Note that the screenshots above are a mixture of Firefox images from the original user, and pictures of Chrome where I tried to replicate the attack in one of my browsers. I didn’t succeed — I still haven’t been able to get any popups appearing on my computer.

So I tried a bunch of different browsers: Firefox, Chrome, and Brave on both Windows and macOS.

Each browser produced a different result, a sort of A/B testing based on the User-Agent (the string sent to webservers that identifies which browser you are using). Sometime following the hostile link from that PDF attempted to install a popup script in our original example, but sometimes it tried something else.

For example, on my Firefox, it tried to download a ZIP file containing a virus:

When I attempt to download, Firefox tells me it’s a virus — probably because Firefox knows the site where it came from is evil.

However, Microsoft’s free anti-virus didn’t catch it. One reason is that it comes as an encrypted zip file. In order to open the file, you have to first read the unencrypted text file to get the password — something humans can do but anti-virus products aren’t able to do (or at least, not well).

So I opened the password file to get the password (“257048169”) and extracted the virus. This is mostly safe — as long as I don’t run it. Viruses are harmless sitting on your machine as long as they aren’t running. I say “mostly” because even for experts, “misclicks happen”, and if I’m not careful, I may infect my machine.

Anyway, I want to see what the virus actually is. The easiest way to do that is upload it to VirusTotal, a website that runs all the known anti-virus programs on a submission to see what triggers what. It tells me that somebody else uploaded the same sample 2 hours ago, and that a bunch of anti-virus vendors detect it, with the following names:
With VirusTotal, you can investigate why anti-virus products think it may be a virus. 
For example, anti-virus companies will run viruses to see what they do. They run them in “emulated” machines that are a lot slower, but safer. If viruses find themselves running in an emulated environment, then they stop doing all the bad behaviors the anti-virus programs might detection. So they repeated check the timestamp to see how fast they are running — if too slow, they assume emulation.
But this itself is a bad behavior. This timestamp detection is one of the behaviors the anti-virus programs triggered on as suspicious.

You can go investigate on VirusTotal other things it found with this virus.

Viruses and disconnected popups wasn’t the only trick. In yet another attempt with web browsers, the hostile site attempt to open lots and lots of windows full of advertising. This is a direct way they earn money — hacking the advertising companies rather than hacking you.

In yet another attempt with another browser, this time from my MacBook air, it asked for an email address:

I happily obliged, giving it a fake address.

At this point, the hackers are going to try to use the same email and password to log into Gmail, into a few banks, and so on. It’s one of the top hacks these days (if not the most important hack) — since most people reuse the same password for everything, even though it’s not asking your for your Gmail or bank password, most of the time people will simply reuse them anyway. (This is why you need to keep important passwords separate from unimportant ones — and write down your passwords or use a password manager).
Anyway, I now get the next webpage. This is a straight up attempt to steal my credit card — maybe. 

This is a website called “” that promises streaming movies, for free signup, but requires a credit card.

This may be a quasi-legitimate website. I saw “quasi” because their goal isn’t outright credit card fraud, but a “dark pattern” whereby they make it easy to sign up for the first month free with a credit card, and then make it nearly impossible to stop the service, where they continue to bill you month after month. As long as the charges are small each month, most people won’t bother going through all the effort canceling the service. And since it’s not actually fraud, people won’t call their credit card company and reverse the charges, since they actually did sign up for the service and haven’t canceled it.
It’s a slimy thing the Trump campaign did in the last election. Their website asked for one time donations but tricked people into unwittingly making it a regular donation. This caused a lot of “chargebacks” as people complained to their credit card company.
In truth, everyone does the same pattern: makes it easy to sign up, and sign up for more than you realize, and then makes it hard to cancel. I thought I’d canceled an AT&T phone but found out they’d kept billing me for 3 years, despite the phone no longer existing and using their network.
They probably have a rewards program. In other words, they aren’t out there doing SEO hacking of google. Instead, they pay others to do it for them, and then give a percentage profit, either for incoming links, but probably “conversion”, money whenever somebody actually enters their credit card number and signs up.
Those people are in tern a different middleman. It probably goes like this:
  • somebody skilled at SEO optimization, who sends links to a broker
  • a broker who then forwards those links to other middlemen
  • middlemen who then deliver those links to sites like that actually ask for an email address or credit card
There’s probably even more layers — like any fine tuned industry, there are lots of specialists who focus on doing their job well.
Okay, I’ll play along, and I enter a credit card number to see what happens (I have bunch of used debit cards to play this game). This leads to an error message saying the website is down and they can’t deliver videos for me, but then pops up another box asking for my email, from yet another movie website:

This leads to yet another site:

It’s an endless series. Once a site “converts” you, it then simply sells the link back to another middleman, who then forwards you on to the next. I could probably sit there all day with fake email addresses and credit cards and still not come to the end of it all.


So here’s what we found.
First, there was a “search engine optimization” hacker who specializes in getting their content at the top of search results for random terms.
Second, they pass hits off to a broker who distributes the hits to various hackers who pay them. These hackers will try to exploit you with:
  • popups pretending to be anti-virus warnings that show up outside the browser
  • actual virus downloads in encrypted zips that try to evade anti-virus, but not well
  • endless new windows selling you advertising
  • steal your email address and password, hoping that you’ve simply reused one from legitimate websites, like Gmail or your bank
  • signups for free movie websites that try to get your credit card and charge you legally
Even experts get confused. I had trouble helping this user track down exactly where the popup was coming from. Also, any expert can misclick and make the wrong thing happen — this user had been clicking the right thing “Block” for years and accidentally hit “Allow” this one time.