Tag Archives: Rust

A History of HTML Parsing at Cloudflare: Part 2

Post Syndicated from Andrew Galloni original https://blog.cloudflare.com/html-parsing-2/

A History of HTML Parsing at Cloudflare: Part 2

A History of HTML Parsing at Cloudflare: Part 2

The second blog post in the series on HTML rewriters picks up the story in 2017 after the launch of the Cloudflare edge compute platform Cloudflare Workers. It became clear that the developers using workers wanted the same HTML rewriting capabilities that we used internally, but accessible via a JavaScript API.

This blog post describes the building of a streaming HTML rewriter/parser with a CSS-selector based API in Rust. It is used as the back-end for the Cloudflare Workers HTMLRewriter. We have open-sourced the library (LOL HTML) as it can also be used as a stand-alone HTML rewriting/parsing library.

The major change compared to LazyHTML, the previous rewriter, is the dual-parser architecture required to overcome the additional performance overhead of wrapping/unwrapping each token when propagating tokens to the workers runtime. The remainder of the post describes a CSS selector matching engine inspired by a Virtual Machine approach to regular expression matching.

v2 : Give it to everyone and make it faster

In 2017, Cloudflare introduced an edge compute platform – Cloudflare Workers. It was no surprise that customers quickly required the same HTML rewriting capabilities that we were using internally. Our team was impressed with the platform and decided to migrate some of our features to Workers. The goal was to improve our developer experience working with modern JavaScript rather than statically linked NGINX modules implemented in C with a Lua API.

It is possible to rewrite HTML in Workers, though for that you needed a third party JavaScript package (such as Cheerio). These packages are not designed for HTML rewriting on the edge due to the latency, speed and memory considerations described in the previous post.

JavaScript is really fast but it still can’t always produce performance comparable to native code for some tasks – parsing being one of those. Customers typically needed to buffer the whole content of the page to do the rewriting resulting in considerable output latency and memory consumption that often exceeded the memory limits enforced by the Workers runtime.

We started to think about how we could reuse the technology in Workers. LazyHTML was a perfect fit in terms of parsing performance, but it had two issues:

  1. API ergonomics: LazyHTML produces a stream of HTML tokens. This is sufficient for our internal needs. However, for an average user, it is not as convenient as the jQuery-like API of Cheerio.
  2. Performance: Even though LazyHTML is tremendously fast, integration with the Workers runtime adds even more limitations. LazyHTML operates as a simple parse-modify-serialize pipeline, which means that it produces tokens for the whole content of the page. All of these tokens then have to be propagated to the Workers runtime and wrapped inside a JavaScript object and then unwrapped and fed back to LazyHTML for serialization. This is an extremely expensive operation which would nullify the performance benefit of LazyHTML.

A History of HTML Parsing at Cloudflare: Part 2
LazyHTML with V8

LOL HTML

We needed something new, designed with Workers requirements in mind, using a language with the native speed and safety guarantees (it’s incredibly easy to shoot yourself in the foot doing parsing). Rust was the obvious choice as it provides the native speed and the best guarantee of memory safety which minimises the attack surface of untrusted input. Wherever possible the Low Output Latency HTML rewriter (LOL HTML) uses all the previous optimizations developed for LazyHTML such as tag name hashing.

Dual-parser architecture

Most developers are familiar and prefer to use CSS selector-based APIs (as in Cheerio, jQuery or DOM itself) for HTML mutation tasks. We decided to base our API on CSS selectors as well. Although this meant additional implementation complexity, the decision created even more opportunities for parsing optimizations.

As selectors define the scope of the content that should be rewritten, we realised we can skip the content that is not in this scope and not produce tokens for it. This not only significantly speeds up the parsing itself, but also avoids the performance burden of the back and forth interactions with the JavaScript VM. As ever the best optimization is not to do something.

A History of HTML Parsing at Cloudflare: Part 2

Considering the tasks required, LOL HTML’s parser consists of two internal parsers:

  • Lexer – a regular full parser, that produces output for all types of content that it encounters;
  • Tag scanner – looks for start and end tags and skips parsing the rest of the content. The tag scanner parses only the tag name and feeds it to the selector matcher. The matcher will switch parser to the lexer if there was a match or additional information about the tag (such as attributes) are required for matching.

The parser switches back to the tag scanner as soon as input leaves the scope of all selector matches. The tag scanner may also sometimes switch the parser to the Lexer – if it requires additional tag information for the parsing feedback simulation.

A History of HTML Parsing at Cloudflare: Part 2
LOL HTML architecture

Having two different parser implementations for the same grammar will increase development costs and is error-prone due to implementation inconsistencies. We minimize these risks by implementing a small Rust macro-based DSL which is similar in spirit to Ragel. The DSL program describes Nondeterministic finite automaton states and actions associated with each state transition and matched input byte.

An example of a DSL state definition:

tag_name_state {
   whitespace => ( finish_tag_name?; --> before_attribute_name_state )
   b'/'       => ( finish_tag_name?; --> self_closing_start_tag_state )
   b'>'       => ( finish_tag_name?; emit_tag?; --> data_state )
   eof        => ( emit_raw_without_token_and_eof?; )
   _          => ( update_tag_name_hash; )
}

The DSL program gets expanded by the Rust compiler into not quite as beautiful, but extremely efficient Rust code.

We no longer need to reimplement the code that drives the parsing process for each of our parsers. All we need to do is to define different action implementations for each. In the case of the tag scanner, the majority of these actions are a no-op, so the Rust compiler does the NFA optimization job for us: it optimizes away state branches with no-op actions and even whole states if all of the branches have no-op actions. Now that’s cool.

Byte slice processing optimisations

Moving to a memory-safe language provided new challenges. Rust has great memory safety mechanisms, however sometimes they have a runtime performance cost.

The task of the parser is to scan through the input and find the boundaries of lexical units of the language – tokens and their internal parts. For example, an HTML start tag token consists of multiple parts: a byte slice of input that represents the tag name and multiple pairs of input slices that represent attributes and values:

struct StartTagToken<'i> {
   name: &'i [u8],
   attributes: Vec<(&'i [u8], &'i [u8])>,
   self_closing: bool
}

As Rust uses bound checks on memory access, construction of a token might be a relatively expensive operation. We need to be capable of constructing thousands of them in a fraction of second, so every CPU instruction counts.

Following the principle of doing as little as possible to improve performance we use a “token outline” representation of tokens: instead of having memory slices for token parts we use numeric ranges which are lazily transformed into a byte slice when required.

struct StartTagTokenOutline {
   name: Range<usize>,
   attributes: Vec<(Range<usize>, Range<usize>)>,
   self_closing: bool
}

As you might have noticed, with this approach we are no longer bound to the lifetime of the input chunk which turns out to be very useful. If a start tag is spread across multiple input chunks we can easily update the token that is currently in construction, as new chunks of input arrive by just adjusting integer indices. This allows us to avoid constructing a new token with slices from the new input memory region (it could be the input chunk itself or the internal parser’s buffer).

This time we can’t get away with avoiding the conversion of input character encoding; we expose a user-facing API that operates on JavaScript strings and input HTML can be of any encoding. Luckily, as we can still parse without decoding and only encode and decode within token bounds by a request (though we still can’t do that for UTF-16 encoding).

So, when a user requests an element’s tag name in the API, internally it is still represented as a byte slice in the character encoding of the input, but when provided to the user it gets dynamically decoded. The opposite process happens when a user sets a new tag name.

For selector matching we can still operate on the original encoding representation – because we know the input encoding ahead of time we preemptively convert values in a selector to the page’s character encoding, so comparisons can be done without decoding fields of each token.

As you can see, the new parser architecture along with all these optimizations produced great performance results:

A History of HTML Parsing at Cloudflare: Part 2
Average parsing time depending on the input size – lower is better

LOL HTML’s tag scanner is typically twice as fast as LazyHTML and the lexer has comparable performance, outperforming LazyHTML on bigger inputs. Both are a few times faster than the tokenizer from html5ever – another parser implemented in Rust used in the Mozilla’s Servo browser engine.

CSS selector matching VM

With an impressively fast parser on our hands we had only one thing missing – the CSS selector matcher. Initially we thought we could just use Servo’s CSS selector matching engine for this purpose. After a couple of days of experimentation it turned out that it is not quite suitable for our task.

It did not work well with our dual parser architecture. We first need to to match just a tag name from the tag scanner, and then, if we fail, query the lexer for the attributes. The selectors library wasn’t designed with this architecture in mind so we needed ugly hacks to bail out from matching in case of insufficient information. It was inefficient as we needed to start matching again after the bailout doing twice the work. There were other problems, such as the integration of lazy character decoding and integration of tag name comparison using tag name hashes.

Matching direction

The main problem encountered was the need to backtrack all the open elements for matching. Browsers match selectors from right to left and traverse all ancestors of an element. This StackOverflow has a good explanation of why they do it this way. We would need to store information about all open elements and their attributes – something that we can’t do while operating with tight memory constraints. This matching approach would be inefficient for our case – unlike browsers, we expect to have just a few selectors and a lot of elements. In this case it is much more efficient to match selectors from left to right.

And this is when we had a revelation. Consider the following CSS selector:

body > div.foo  img[alt] > div.foo ul

It can be split into individual components attributed to a particular element with hierarchical combinators in between:

body > div.foo img[alt] > div.foo  ul
---    ------- --------   -------  --

Each component is easy to match having a start tag token – it’s just a matter of comparison of token fields with values in the component. Let’s dive into abstract thinking and imagine that each such component is a character in the infinite alphabet of all possible components:

Selector componentCharacter
bodya
div.foob
img[alt]c
uld

Let’s rewrite our selector with selector components replaced by our imaginary characters:

a > b c > b d

Does this remind you of something?

A   `>` combinator can be considered a child element, or “immediately followed by”.

The ` ` (space) is a descendant element can be thought of as there might be zero or more elements in between.

There is a very well known abstraction to express these relations – regular expressions. The selector replacing combinators can be replaced with a regular expression syntax:

ab.*cb.*d

We transformed our CSS selector into a regular expression that can be executed on the sequence of start tag tokens. Note that not all CSS selectors can be converted to such a regular grammar and the input on which we match has some specifics, which we’ll discuss later. However, it was a good starting point: it allowed us to express a significant subset of selectors.

Implementing a Virtual Machine

Next, we started looking at non-backtracking algorithms for regular expressions. The virtual machine approach seemed suitable for our task as it was possible to have a non-backtracking implementation that was flexible enough to work around differences between real regular expression matching on strings and our abstraction.

VM-based regular expression matching is implemented as one of the engines in many regular expression libraries such as regexp2 and Rust’s regex. The basic idea is that instead of building an NFA or DFA for a regular expression it is instead converted into DSL assembly language with instructions later executed by the virtual machine – regular expressions are treated as programs that accept strings for matching.

Since the VM program is just a representation of NFA with ε-transitions it can exist in multiple states simultaneously during the execution, or, in other words, spawns multiple threads. The regular expression matches if one or more states succeed.

For example, consider the following VM instructions:

  • expect c – waits for next input character, aborts the thread if doesn’t equal to the instruction’s operand;
  • jmp L – jump to label ‘L’;
  • thread L1, L2 – spawns threads for labels L1 and L2, effectively splitting the execution;
  • match – succeed the thread with a match;

For example, using this instructions set regular expression “ab*c” can be translated into:

    expect a
L1: thread L2, L3
L2: expect b
    jmp L1
L3: expect c
    match

Let’s try to translate the regular expression ab.*cb.*d from the selector we saw earlier:

    expect a
    expect b
L1: thread L2, L3
L2: expect [any]
    jmp L1
L3: expect c
    expect b
L4: thread L5, L6
L5: expect [any]
    jmp L4
L6: expect d
    match

That looks complex! Though this assembly language is designed for regular expressions in general, and regular expressions can be much more complex than our case. For us the only kind of repetition that matters is “.*”. So, instead of expressing it with multiple instructions we can use just one called hereditary_jmp:

    expect a
    expect b
    hereditary_jmp L1
L1: expect c
    expect b
    hereditary_jmp L2
L2: expect d
    match

The instruction tells VM to memoize instruction’s label operand and unconditionally spawn a thread with a jump to this label on each input character.

There is one significant distinction between the string input of regular expressions and the input provided to our VM. The input can shrink!

A regular string is just a contiguous sequence of characters, whereas we operate on a sequence of open elements. As new tokens arrive this sequence can grow as well as shrink. Assume we represent <div> as ‘a’ character in our imaginary language, so having <div><div><div> input we can represent it as aaa, if the next token in the input is </div> then our “string” shrinks to aa.

You might think at this point that our abstraction doesn’t work and we should try something else. What we have as an input for our machine is a stack of open elements and we needed a stack-like structure to store our hereditrary_jmp instruction labels that VM had seen so far. So, why not store it on the open element stack? If we store the next instruction pointer on each of stack items on which the expect instruction was successfully executed, we’ll have a full snapshot of the VM state, so we can easily roll back to it if our stack shrinks.

With this implementation we don’t need to store anything except a tag name on the stack, and, considering that we can use the tag name hashing algorithm, it is just a 64-bit integer per open element. As an additional small optimization, to avoid traversing of the whole stack in search of active hereditary jumps on each new input we store an index of the first ancestor with a hereditary jump on each stack item.

For example, having the following selector “body > div span” we’ll have the following VM program (let’s get rid of labels and just use instruction indices instead):

0| expect <body>
1| expect <div>
2| hereditary_jmp 3
3| expect <span>
4| match

Having an input “<body><div><div><a>” we’ll have the following stack:

A History of HTML Parsing at Cloudflare: Part 2

Now, if the next token is a start tag <span> the VM will first try to execute the selectors program from the beginning and will fail on the first instruction. However, it will also look for any active hereditary jumps on the stack. We have one which jumps to the instructions at index 3. After jumping to this instruction the VM successfully produces a match. If we get yet another <span> start tag later it will much as well following the same steps which is exactly what we expect for the descendant selector.

If we then receive a sequence of “</span></span></div></a></div>” end tags our stack will contain only one item:

A History of HTML Parsing at Cloudflare: Part 2

which instructs VM to jump to instruction at index 1, effectively rolling back to matching the div component of the selector.

We mentioned earlier that we can bail out from the matching process if we only have a tag name from the tag scanner and we need to obtain more information by running the lexer? With a VM approach it is as easy as stopping the execution of the current instruction and resuming it later when we get the required information.

Duplicate selectors

As we need a separate program for each selector we need to match, how can we stop the same simple components doing the same job? The AST for our selector matching program is a radix tree-like structure whose edge labels are simple selector components and nodes are hierarchical combinators.
For example for the following selectors:

body > div > link[rel]
body > span
body > span a

we’ll get the following AST:

A History of HTML Parsing at Cloudflare: Part 2

If selectors have common prefixes we can match them just once for all these selectors. In the compilation process, we flatten this structure into a vector of instructions.

[not] JIT-compilation

For performance reasons compiled instructions are macro-instructions – they incorporate multiple basic VM instruction calls. This way the VM can execute only one macro instruction per input token. Each of the macro instructions compiled using the so-called “[not] JIT-compilation” (the same approach to the compilation is used in our other Rust project – wirefilter).

Internally the macro instruction contains expect and following jmp, hereditary_jmp and match basic instructions. In that sense macro-instructions resemble microcode making it easy to suspend execution of a macro instruction if we need to request attributes information from the lexer.

What’s next

It is obviously not the end of the road, but hopefully, we’ve got a bit closer to it. There are still multiple bits of functionality that need to be implemented and certainly, there is a space for more optimizations.

If you are interested in the topic don’t hesitate to join us in development of LazyHTML and LOL HTML at GitHub and, of course, we are always happy to see people passionate about technology here at Cloudflare, so don’t hesitate to contact us if you are too :).

A History of HTML Parsing at Cloudflare: Part 1

Post Syndicated from Andrew Galloni original https://blog.cloudflare.com/html-parsing-1/

A History of HTML Parsing at Cloudflare: Part 1

A History of HTML Parsing at Cloudflare: Part 1

To coincide with the launch of streaming HTML rewriting functionality for Cloudflare Workers we are open sourcing the Rust HTML rewriter (LOL  HTML) used to back the Workers HTMLRewriter API. We also thought it was about time to review the history of HTML rewriting at Cloudflare.

The first blog post will explain the basics of a streaming HTML rewriter and our particular requirements. We start around 8 years ago by describing the group of ‘ad-hoc’ parsers that were created with specific functionality such as to rewrite e-mail addresses or minify HTML. By 2016 the state machine defined in the HTML5 specification could be used to build a single spec-compliant HTML pluggable rewriter, to replace the existing collection of parsers. The source code for this rewriter is now public and available here: https://github.com/cloudflare/lazyhtml.

The second blog post will describe the next iteration of rewriter. With the launch of the edge compute platform Cloudflare Workers we came to realise that developers wanted the same HTML rewriting capabilities with a JavaScript API. The post describes the thoughts behind a low latency streaming HTML rewriter with a CSS-selector based API. We open-sourced the Rust library as it can also be used as a stand-alone HTML rewriting/parsing library.

What is a streaming HTML rewriter ?

A streaming HTML rewriter takes either a HTML string or byte stream input, parses it into tokens or any other structured intermediate representation (IR) – such as an Abstract Syntax Tree (AST). It then performs transformations on the tokens before converting back to HTML. This provides the ability to modify, extract or add to an existing HTML document as the bytes are being  processed. Compare this with a standard HTML tree parser which needs to retrieve the entire file to generate a full DOM tree. The tree-based rewriter will both take longer to deliver the first processed bytes and require significantly more memory.

A History of HTML Parsing at Cloudflare: Part 1
HTML rewriter

For example; consider you own a large site with a lot of historical content that you want to now serve over HTTPS. You will quickly run into the problem of resources (images, scripts, videos) being served over HTTP. This ‘mixed content’ opens a security hole and browsers will warn or block these resources. It can be difficult or even impossible to update every link on every page of a website. With a streaming HTML rewriter you can select the URI attribute of any HTML tag and change any HTTP links to HTTPS. We built this very feature Automatic HTTPS rewrites back in 2016 to solve mixed content issues for our customers.

The reader may already be wondering: “Isn’t this a solved problem, aren’t there many widely used open-source browsers out there with HTML parsers that can be used for this purpose?”. The reality is that writing code to run in 190+ PoPs around the world with a strict low latency requirement turns even seemingly trivial problems into complex engineering challenges.

The following blog posts will detail the journey of how starting with a simple idea of finding email addresses within an HTML page led to building an almost spec compliant HTML parser and then on to a CSS selector matching Virtual Machine. We learned a lot on this journey. I hope you find some of this as interesting as we did.

Rewriting at the edge

When rewriting content through Cloudflare we do not want to impact site performance. The balance in designing a streaming HTML rewriter is to minimise the pause in response byte flow by holding onto as little information as possible whilst retaining the ability to rewrite matching tokens.

The difference in requirements compared to an HTML parser used in a browser include:

Output latency

For browsers, the Document Object Model (DOM) is the end product of the parsing process but in our case we have to parse, rewrite and serialize back to HTML. In the case of Cloudflare’s reverse proxy any content processing on the edge server results in latency between the server and an eyeball. It is desirable to minimize the latency impact of HTML handling, which involves parsing, rewriting and serializing back to HTML. In all of these stages we want to be as fast as possible to minimize latency.

Parser throughput

Let’s assume that usually browsers rarely need to deal with HTML pages bigger than 1Mb in size and an average page load time is somewhere around 3s at best. HTML parsing is not the main bottleneck of the page loading process as the browser will be blocked on running scripts and loading other render-critical resources. We can roughly estimate that ~3Mbps is an acceptable throughput for browser’s HTML parser. At Cloudflare we have hundreds of megabytes of traffic per CPU, so we need a parser that is faster by an order of magnitude.

Memory limitations

As most users must realise, browsers have the luxury of being able to consume memory. For example, this simple HTML markup when opened in a browser will consume a significant chunk of your system memory before eventually killing a browser tab (and all this memory will be consumed by the parser) :

<script>
   document.write('<');
   while(true) {
      document.write('aaaaaaaaaaaaaaaaaaaaaaaa');
   }
</script>

Unfortunately, buffering of some fraction of the input is inevitable even for streaming HTML rewriting. Consider these 2 HTML snippets:

<div foo="bar" qux="qux">

<div foo="bar" qux="qux"

These seemingly similar fragments of HTML will be treated completely differently when encountered at the end of an HTML page. The first fragment will be parsed as a start tag and the second one will be ignored. By just seeing a `<` character followed by a tag name, the parser can’t determine if it has found a start tag or not. It needs to traverse the input in the search of the closing `>` to make a decision, buffering all content in between, so it can later be emitted to the consumer as a start tag token.

This requirement forces browsers to indefinitely buffer content before eventually giving up with the out-of-memory error.

In our case, we can’t afford to spend hundreds of megabytes of memory parsing a single HTML file (actual constraints are even tighter – even using a dozen kilobytes for each request would be unacceptable). We need to be much more sophisticated than other implementations in terms of memory usage and gracefully handle all the situations where provided memory capacity is insufficient to accomplish parsing.

v0 : “Ad-hoc parsers”

As usual with big projects, it all started pretty innocently.

Find and obfuscate an email

In 2010, Cloudflare decided to provide a feature that would stop popular email scrapers. The basic idea of this protection was to find and obfuscate emails on pages and later decode them back in the browser with injected JavaScript code. Sounds easy, right? You search for anything that looks like an email, encode it and then decode it with some JavaScript magic and present the result to the end-user.

However, even such a seemingly simple task already requires solving several issues. First of all, we need to define what an email is, and there is no simple answer. Even the infamous regex supposedly covering the entire RFC is, in fact, outdated and incomplete as the new RFC added lots of valid email constructions, including Unicode support. Let’s not go down that rabbit hole for now and instead focus on a higher-level issue: transforming streaming content.

Content from the network comes in packets, which have to be buffered and parsed as HTTP by our servers. You can’t predict how the content will be split, which means you always need to buffer some of it because content that is going to be replaced can be present in multiple input chunks.

Let’s say we decided to go with a simple regex like `[\w.][email protected][\w.]+`. If the content that comes through contains the email “[email protected]”, it might be split in the following chunks:

A History of HTML Parsing at Cloudflare: Part 1

In order to keep good Time To First Byte (TTFB) and consistent speed, we want to ensure that the preceding chunk is emitted as soon as we determine that it’s not interesting for replacement purposes.

The easiest way to do that is to transform our regex into a state machine, or a finite automata. While you could do that by hand, you will end up with hard-to-maintain and error-prone code. Instead, Ragel was chosen to transform regular expressions into efficient native state machine code. Ragel doesn’t try to take care of buffering or anything other than traversing the state machine. It provides a syntax that not only describes patterns, but can also associate custom actions (code in a host language) with any given state.

In our case we can pass through buffers until we match the beginning of an email. If we subsequently find out the pattern is not an email we can bail out from buffering as soon as the pattern stops matching. Otherwise, we can retrieve the matched email and replace it with new content.

To turn our pattern into a streaming parser we can remember the position of the potential start of an email and, unless it was already discarded or replaced by the end of the current input, store the unhandled part in a permanent buffer. Then, when a new chunk comes, we can process it separately, resuming from a state Ragel remembers itself, but then use both the buffered chunk and a new one to either emit or obfuscate.

Now that we have solved the problem of matching email patterns in text, we need to deal with the fact that they need to be obfuscated on pages. This is when the first hints of HTML “parsing” were introduced.

I’ve put “parsing” in quotes because, rather than implementing the whole parser, the email filter (as the module was called) didn’t attempt to replicate the whole HTML grammar, but rather added custom Ragel patterns just for skipping over comments and tags where emails should not be obfuscated.

This was a reasonable approach, especially back in 2010 – four years before the HTML5 specification, when all browsers had their own quirks handling of HTML. However, as you can imagine, this approach did not scale well. If you’re trying to work around quirks in other parsers, you start gaining more and more quirks in your own, and then work around these too. Simultaneously, new features started to be added, which also required modifying HTML on the fly (like automatic insertion of Google Analytics script), and an existing module seemed to be the best place for that. It grew to handle more and more tags, operations and syntactic edge cases.

Now let’s minify..

In 2011, Cloudflare decided to also add minification to allow customers to speed up their websites even if they had not employed minification themselves. For that, we decided to use an existing streaming minifier – jitify. It already had NGINX bindings, which made it a great candidate for integration into the existing pipeline.

Unfortunately, just like most other parsers from that time as well as ours described above, it had its own processing rules for HTML, JavaScript and CSS, which weren’t precise but rather tried to parse content on a best-effort basis. This led to us having two independent streaming parsers that were incompatible and could produce bugs either individually or only in combination.

v1 : “(Almost) HTML5 Spec compliant parser”

Over the years engineers kept adding new features to the ever-growing state machines, while fixing new bugs arising from imprecise syntax implementations, conflicts between various parsers, and problems in features themselves.

By 2016, it was time to get out of the multiple ad hoc parsers business and do things ‘the right way’.

The next section(s) will describe how we built our HTML5 compliant parser starting from the specification state machine. Using only this state machine it should have been straight-forward to build a parser. You may be aware that historically the parsing of HTML had not been entirely strict which meant to not break existing implementations the building of an actual DOM was required for parsing. This is not possible for a streaming rewriter so a simulator of the parser feedback was developed. In terms of performance, it is always better not to do something. We  then describe why the rewriter can be ‘lazy’ and not perform the expensive encoding and decoding of text when rewriting HTML. The surprisingly difficult problem of deciding if a response is HTML is then detailed.

HTML5

By 2016, HTML5 had defined precise syntax rules for parsing and compatibility with legacy content and custom browser implementations. It was already implemented by all browsers and many 3rd-party implementations.

The HTML5 parsing specification defines basic HTML syntax in the form of a state machine. We already had experience with Ragel for similar use cases, so there was no question about what to use for the new streaming parser. Despite the complexity of the grammar, the translation of the specification to Ragel syntax was straightforward. The code looks simpler than the formal description of the state machine, thanks to the ability to mix regex syntax with explicit transitions.

A History of HTML Parsing at Cloudflare: Part 1
A visualisation of a small fraction of the HTML state machine. Source: https://twitter.com/RReverser/status/715937136520916992

HTML5 parsing requires a ‘DOM’

However, HTML has a history. To not break existing implementations HTML5 is specified with  recovery procedures for incorrect tag nesting, ordering, unclosed tags, missing attributes and all the other possible quirks that used to work in older browsers.  In order to resolve these issues, the specification expects a tree builder to drive the lexer, essentially meaning you can’t correctly tokenize HTML (split into separate tags) without a DOM.

A History of HTML Parsing at Cloudflare: Part 1
HTML parsing flow as defined by the specification

For this reason, most parsers don’t even try to perform streaming parsing and instead take the input as a whole and produce a document tree as an output. This is not something we could do for streaming transformation without adding significant delays to page loading.

An existing HTML5 JavaScript parser – parse5 – had already implemented spec-compliant tree parsing using a streaming tokenizer and rewriter. To avoid having to create a full DOM the concept of a “parser feedback simulator” was introduced.

Tree builder feedback

As you can guess from the name, this is a module that aims to simulate a full parser’s feedback to the tokenizer, without actually building the whole DOM, but instead preserving only the required information and context necessary for correctly driving the state machine.

After rigorous testing and upstreaming a test runner to parse5, we found this technique to be suitable for the majority of even poorly written pages on the Internet, and employed it in LazyHTML.

A History of HTML Parsing at Cloudflare: Part 1
LazyHTML architecture

Avoiding decoding – everything is ASCII

Now that we had a streaming tokenizer working, we wanted to make sure that it was fast enough so that users didn’t notice any slowdowns to their pages as they go through the parser and transformations. Otherwise it would completely circumvent any optimisations we’d want to attempt on the fly.

It would not only cause a performance hit due to decoding and re-encoding any modified HTML content, but also significantly complicates our implementation due to multiple sources of potential encoding information  required to determine the character encoding, including sniffing of the first 1 KB of the content.

The “living” HTML Standard specification permits only encodings defined in the Encoding Standard. If we look carefully through those encodings, as well as a remark on Character encodings section of the HTML spec, we find that all of them are ASCII-compatible with the exception of UTF-16 and ISO-2022-JP.

This means that any ASCII text will be represented in such encodings exactly as it would be in ASCII, and any non-ASCII text will be represented by bytes outside of the ASCII range. This property allows us to safely tokenize, compare and even modify original HTML without decoding or even knowing which particular encoding it contains. It is possible as all the token boundaries in HTML grammar are represented by an ASCII character.

We need to detect UTF-16 by sniffing and either decode or skip such documents without modification. We chose the latter to avoid potential security-sensitive bugs which are common with UTF-16, and because the character encoding is seen in less than 0.1% of known character encodings luckily.

The only issue left with this approach is that in most places the HTML tokenization specification  requires you to replace U+0000 (NUL) characters with U+FFFD (replacement character) during parsing. Presumably, this was added as a security precaution against bugs in C implementations of old engines which could treat NUL character, encoded in ASCII / UTF-8 / … as a 0x00 byte, as the end of the string (yay, null-terminated strings…). It’s problematic for us because U+FFFD is outside of the ASCII range, and will be represented by different sequences of bytes in different encodings. We don’t know the encoding of the document, so this will lead to corruption of the output.

Luckily, we’re not in the same business as browser vendors, and don’t worry about NUL characters in strings as much – we use “fat pointer” string representation, in which the length of the string is determined not by the position of the NUL character, but stored along with the data pointer as an integer field:

typedef struct {
   const char *data;
   size_t length;
} lhtml_string_t;

Instead, we can quietly ignore these parts of the spec (sorry!), and keep U+0000 characters as-is and add them as such to tag, attribute names, and other strings, and later re-emit to the document. This is safe to do, because it doesn’t affect any state machine transitions, but merely preserves original 0x00 bytes and delegates their replacement to the parser in the end user’s browser.

Content type madness

We want to be lazy and minimise false positives. We only want to spend time parsing, decoding and rewriting actual HTML rather than breaking images or JSON. So the question is how do you decide if something is a HTML document. Can you just use the Content-Type for example ? A comment left in the source code best describes the reality.

/*
Dear future generations. I didn't like this hack either and hoped
we could do the right thing instead. Unfortunately, the Internet
was a bad and scary place at the moment of writing. If this
ever changes and websites become more standards compliant,
please do remove it just like I tried.
Many websites use PHP which sets Content-Type: text/html by
default. There is no error or warning if you don't provide own
one, so most websites don't bother to change it and serve
JSON API responses, private keys and binary data like images
with this default Content-Type, which we would happily try to
parse and transforms. This not only hurts performance, but also
easily breaks response data itself whenever some sequence inside
it happens to look like a valid HTML tag that we are interested
in. It gets even worse when JSON contains valid HTML inside of it
and we treat it as such, and append random scripts to the end
breaking APIs critical for popular web apps.
This hack attempts to mitigate the risk by ensuring that the
first significant character (ignoring whitespaces and BOM)
is actually `<` - which increases the chances that it's indeed HTML.
That way we can potentially skip some responses that otherwise
could be rendered by a browser as part of AJAX response, but this
is still better than the opposite situation.
*/

The reader might think that it’s a rare edge case, however, our observations show that almost 25% of the traffic served through Cloudflare with the “text/html” content type is unlikely to be HTML.

A History of HTML Parsing at Cloudflare: Part 1

The trouble doesn’t end there: it turns out that there is a considerable amount of XML content served with the “text/html” content type which can’t be always processed correctly when treated as HTML.

Over time bailouts for binary data, JSON, AMP and correctly identifying HTML fragments leads to the content sniffing logic which can be described by the following diagram:

A History of HTML Parsing at Cloudflare: Part 1

This is a good example of divergence between formal specifications and reality.

Tag name comparison optimisation

But just having fast parsing is not enough – we have functionality that consumes the output of the parser, rewrites it and feeds it back for the serialization. And all the memory and time constraints that we have for the parser are applicable for this code as well, as it is a part of the same content processing pipeline.

It’s a common requirement to compare parsed HTML tag names, e.g. to determine if the current tag should be rewritten or not. A naive implementation will use regular per-byte comparison which can require traversing the whole tag name. We were able to narrow this operation to a single integer comparison instruction in the majority of cases by using specially designed hashing algorithm.

The tag names of all standard HTML elements contain only alphabetical ASCII characters and digits from 1 to 6 (in numbered header tags, i.e. <h1> – <h6>). Comparison of tag names is case-insensitive, so we only need 26 characters to represent alphabetical characters. Using the same basic idea as arithmetic coding, we can represent each of the possible 32 characters of a  tag name using just 5 bits and, thus, fit up to floor(64 / 5) = 12 characters in a 64-bit integer which is enough for all the standard tag names and any other tag names that satisfy the same requirements! The great part is that we don’t even need to additionally traverse a tag name to hash it – we can do that as we parse the tag name consuming the input byte by byte.

However, there is one problem with this hashing algorithm and the culprit is not so obvious: to fit all 32 characters in 5 bits we need to use all possible bit combinations including 00000. This means that if the leading character of the tag name is represented with 00000 then we will not be able to differentiate between a varying number of consequent repetitions of this character.

For example, considering that ‘a’ is encoded as 00000 and ‘b’ as 00001 :

Tag nameBit representationEncoded value
ab00000 000011
aab00000 00000 000011

Luckily, we know that HTML grammar doesn’t allow the first character of a tag name to be anything except an ASCII alphabetical character, so reserving numbers from 0 to 5 (00000b-00101b) for digits and numbers from 6 to 31 (00110b – 11111b) for ASCII alphabetical characters solves the problem.

LazyHTML

After taking everything  mentioned  above into  consideration the LazyHTML (https://github.com/cloudflare/lazyhtml) library was created. It is a fast streaming HTML parser and serializer with a token based C-API derived from the HTML5 lexer written in Ragel. It provides a pluggable transformation pipeline to allow multiple transformation handlers to be chained together.

An example of a function that transforms `href` property of links:

// define static string to be used for replacements
static const lhtml_string_t REPLACEMENT = {
   .data = "[REPLACED]",
   .length = sizeof("[REPLACED]") - 1
};

static void token_handler(lhtml_token_t *token, void *extra /* this can be your state */) {
  if (token->type == LHTML_TOKEN_START_TAG) { // we're interested only in start tags
    const lhtml_token_starttag_t *tag = &token->start_tag;
    if (tag->type == LHTML_TAG_A) { // check whether tag is of type <a>
      const size_t n_attrs = tag->attributes.count;
      const lhtml_attribute_t *attrs = tag->attributes.items;
      for (size_t i = 0; i < n_attrs; i++) { // iterate over attributes
        const lhtml_attribute_t *attr = &attrs[i];
        if (lhtml_name_equals(attr->name, "href")) { // match the attribute name
          attr->value = REPLACEMENT; // set the attribute value
        }
      }
    }
  }
  lhtml_emit(token, extra); // pass transformed token(s) to next handler(s)
}

So, is it correct and how fast is  it?

It is HTML5 compliant as tested against the official test suites. As part of the work several contributions were sent to the specification itself for clarification / simplification of the spec language.

Unlike the previous parser(s), it didn’t bail out on any of the 2,382,625 documents from HTTP Archive, although 0.2% of documents exceeded expected bufferization limits as they were in fact JavaScript or RSS or other types of content incorrectly served with Content-Type: text/html, and since anything is valid HTML5, the parser tried to parse e.g. a<b; x=3; y=4 as incomplete tag with attributes. This is very rare (and goes to even lower amount of 0.03% when two error-prone advertisement networks are excluded from those results), but still needs to be accounted for and is a valid case for bailing out.

As for the benchmarks, In September 2016 using an example which transforms the HTML spec itself (7.9 MB HTML file) by replacing every <a href> (only that property only in those tags) to a static value. It was compared against the few existing and popular HTML parsers (only tokenization mode was used for the fair comparison, so that they don’t need to build AST and so on), and timings in milliseconds for 100 iterations are the following (lazy mode means that we’re using raw strings whenever possible, the other one serializes each token just for comparison):

A History of HTML Parsing at Cloudflare: Part 1

The results show that LazyHTML parser speeds are around an order of magnitude faster.

That concludes the first post in our series on HTML rewriters at Cloudflare. The next post describes how we built  a new streaming rewriter on top of the ideas of LazyHTML. The major update was to provide an easier to use CSS selector API.  It provides the back-end for the Cloudflare workers HTMLRewriter JavaScript API.

🤠 The Wrangler CLI: Deploying Rust with WASM on Cloudflare Workers

Post Syndicated from Ashley Williams original https://blog.cloudflare.com/introducing-wrangler-cli/

🤠 The Wrangler CLI: Deploying Rust with WASM on Cloudflare Workers
Wrangler is a CLI tool for building Rust WebAssembly Workers

🤠 The Wrangler CLI: Deploying Rust with WASM on Cloudflare Workers

Today, we’re open sourcing and announcing wrangler, a CLI tool for building, previewing, and publishing Rust and WebAssembly Cloudflare Workers.

If that sounds like some word salad to you, that’s a reasonable reaction. All three of the technologies involved are relatively new and upcoming: WebAssembly, Rust, and Cloudflare Workers.

Why WebAssembly?

Cloudflare’s mission is to help build a better Internet. We see Workers as an extension of the already incredibly powerful Web Platform, where JavaScript has allowed users to go from building small bits of interactivity, to building full applications. Node.js first extended this from the client to the server- unifying web application development around a single language – JavaScript. By choosing to use V8 isolates (the technology that powers both Node.js and the most popular browser, Chrome), we sought to make its Workers product a fully compatible, new platform for the Web, eliding the distinction between server and client. By leveraging its large global network of servers, Workers allows users to run code as close as possible to end users, eliminating the latency associated server-side logic or large client-side bundles.

But not everyone wants to write JavaScript, and JavaScript is not well suited to express every application. WebAssembly emerged in 2017 as a way to further extend the Web Platform to applications, such as games and other high resource intensive programs, that were previously excluded by the limitations of JavaScript.

V8 isolates give us both JavaScript and WebAssembly. This means that you can leverage the prototyping power and extensive ecosystem of JavaScript, alongside the power of WebAssembly, which in addition to fast, predictable, performance, also opens up the wealth of libraries written in languages that can target WebAssembly, like C, C++, and Rust.

WebAssembly on Workers eliminates trade-offs that were originally considered irresolvable: low-latency, high-performing, and Web Platform compatible- pick three.

Why Rust?

Rust is a relatively new programming language with the goal of “empowering everyone to build reliable and efficient software”. It’s a systems level language that offers its users a high amount of control, while still seeking to offer an ergonomic, friendly and modern development experience.

The Rust-WebAssembly Working Group made incredible efforts last year to build out a suite of developer tools for WebAssembly. At Cloudflare, we’re excited to support those efforts with paid developer hours and leverage those efforts to empower our users to start harnessing the power of WebAssembly on Workers now.

There are several other toolchains including Emscripten (C, C++) and AssemblyScript (TypeScript) that we’re eager to support in the future. Rust is just the beginning (but we think it’s a pretty great place to start!).

Why now?

When developing new, highly technical, products, it’s easy to get caught up in the promise and vision- often to the detriment of getting the technology into the hands of the folks who will be using it every day in the future.

We want to broaden the community that has access at the early stages of this technology- who can bring their valuable perspectives and experience, and help us shape the future of these tools.

The first step to accomplishing that is building the tools that can enable folks to engage with the new platform. wrangler is a that enabling tool. It’s just enough to unblock users who were previously unable to interact with the platform because there was no paved path.

We don’t plan to stop here. Folks will rightly note that there are some critical developer workflow steps that are missing from wrangler: linting, testing, benchmarking, and size profiling are a few that come to mind. We’ve got some big plans and we’re excited to build out more, but we’re eager to release this now to enable more folks to participate in the process. The best way to know what developers need is to ask and listen- by creating and open sourcing wrangler in such an early phase we’re hoping to shorten the feedback cycle between product and user- and build the right thing, faster.

You can install wrangler using cargo:

cargo install wrangler

To get started, head on over to the Cloudflare docs and follow the tutorial. You’ll build and preview a Cloudflare Worker that uses Rust compiled to WebAssembly to parse Markdown.

Don’t stop there though. Please check out the repo, file some issues, build project templates, and write about your experience- we want to hear from you.

We’re really excited to see what y’all build!

Flight Sim Company Threatens Reddit Mods Over “Libelous” DRM Posts

Post Syndicated from Andy original https://torrentfreak.com/flight-sim-company-threatens-reddit-mods-over-libellous-drm-posts-180604/

Earlier this year, in an effort to deal with piracy of their products, flight simulator company FlightSimLabs took drastic action by installing malware on customers’ machines.

The story began when a Reddit user reported something unusual in his download of FlightSimLabs’ A320X module. A file – test.exe – was being flagged up as a ‘Chrome Password Dump’ tool, something which rang alarm bells among flight sim fans.

As additional information was made available, the story became even more sensational. After first dodging the issue with carefully worded statements, FlightSimLabs admitted that it had installed a password dumper onto ALL users’ machines – whether they were pirates or not – in an effort to catch a particular software cracker and launch legal action.

It was an incredible story that no doubt did damage to FlightSimLabs’ reputation. But the now the company is at the center of a new storm, again centered around anti-piracy measures and again focused on Reddit.

Just before the weekend, Reddit user /u/walkday reported finding something unusual in his A320X module, the same module that caused the earlier controversy.

“The latest installer of FSLabs’ A320X puts two cmdhost.exe files under ‘system32\’ and ‘SysWOW64\’ of my Windows directory. Despite the name, they don’t open a command-line window,” he reported.

“They’re a part of the authentication because, if you remove them, the A320X won’t get loaded. Does someone here know more about cmdhost.exe? Why does FSLabs give them such a deceptive name and put them in the system folders? I hate them for polluting my system folder unless, of course, it is a dll used by different applications.”

Needless to say, the news that FSLabs were putting files into system folders named to make them look like system files was not well received.

“Hiding something named to resemble Window’s “Console Window Host” process in system folders is a huge red flag,” one user wrote.

“It’s a malware tactic used to deceive users into thinking the executable is a part of the OS, thus being trusted and not deleted. Really dodgy tactic, don’t trust it and don’t trust them,” opined another.

With a disenchanted Reddit userbase simmering away in the background, FSLabs took to Facebook with a statement to quieten down the masses.

“Over the past few hours we have become aware of rumors circulating on social media about the cmdhost file installed by the A320-X and wanted to clear up any confusion or misunderstanding,” the company wrote.

“cmdhost is part of our eSellerate infrastructure – which communicates between the eSellerate server and our product activation interface. It was designed to reduce the number of product activation issues people were having after the FSX release – which have since been resolved.”

The company noted that the file had been checked by all major anti-virus companies and everything had come back clean, which does indeed appear to be the case. Nevertheless, the critical Reddit thread remained, bemoaning the actions of a company which probably should have known better than to irritate fans after February’s debacle. In response, however, FSLabs did just that once again.

In private messages to the moderators of the /r/flightsim sub-Reddit, FSLabs’ Marketing and PR Manager Simon Kelsey suggested that the mods should do something about the thread in question or face possible legal action.

“Just a gentle reminder of Reddit’s obligations as a publisher in order to ensure that any libelous content is taken down as soon as you become aware of it,” Kelsey wrote.

Noting that FSLabs welcomes “robust fair comment and opinion”, Kelsey gave the following advice.

“The ‘cmdhost.exe’ file in question is an entirely above board part of our anti-piracy protection and has been submitted to numerous anti-virus providers in order to verify that it poses no threat. Therefore, ANY suggestion that current or future products pose any threat to users is absolutely false and libelous,” he wrote, adding:

“As we have already outlined in the past, ANY suggestion that any user’s data was compromised during the events of February is entirely false and therefore libelous.”

Noting that FSLabs would “hate for lawyers to have to get involved in this”, Kelsey advised the /r/flightsim mods to ensure that no such claims were allowed to remain on the sub-Reddit.

But after not receiving the response he would’ve liked, Kelsey wrote once again to the mods. He noted that “a number of unsubstantiated and highly defamatory comments” remained online and warned that if something wasn’t done to clean them up, he would have “no option” than to pass the matter to FSLabs’ legal team.

Like the first message, this second effort also failed to have the desired effect. In fact, the moderators’ response was to post an open letter to Kelsey and FSLabs instead.

“We sincerely disagree that you ‘welcome robust fair comment and opinion’, demonstrated by the censorship on your forums and the attempted censorship on our subreddit,” the mods wrote.

“While what you do on your forum is certainly your prerogative, your rules do not extend to Reddit nor the r/flightsim subreddit. Removing content you disagree with is simply not within our purview.”

The letter, which is worth reading in full, refutes Kelsey’s claims and also suggests that critics of FSLabs may have been subjected to Reddit vote manipulation and coordinated efforts to discredit them.

What will happen next is unclear but the matter has now been placed in the hands of Reddit’s administrators who have agreed to deal with Kelsey and FSLabs’ personally.

It’s a little early to say for sure but it seems unlikely that this will end in a net positive for FSLabs, no matter what decision Reddit’s admins take.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN reviews, discounts, offers and coupons.

FCC Asks Amazon & eBay to Help Eliminate Pirate Media Box Sales

Post Syndicated from Andy original https://torrentfreak.com/fcc-asks-amazon-ebay-to-help-eliminate-pirate-media-box-sales-180530/

Over the past several years, anyone looking for a piracy-configured set-top box could do worse than search for one on Amazon or eBay.

Historically, people deploying search terms including “Kodi” or “fully-loaded” were greeted by page after page of Android-type boxes, each ready for illicit plug-and-play entertainment consumption following delivery.

Although the problem persists on both platforms, people are now much less likely to find infringing devices than they were 12 to 24 months ago. Under pressure from entertainment industry groups, both Amazon and eBay have tightened the screws on sellers of such devices. Now, however, both companies have received requests to stem sales from a completetey different direction.

In a letter to eBay CEO Devin Wenig and Amazon CEO Jeff Bezos first spotted by Ars, FCC Commissioner Michael O’Rielly calls on the platforms to take action against piracy-configured boxes that fail to comply with FCC equipment authorization requirements or falsely display FCC logos, contrary to United States law.

“Disturbingly, some rogue set-top box manufacturers and distributors are exploiting the FCC’s trusted logo by fraudulently placing it on devices that have not been approved via the Commission’s equipment authorization process,” O’Rielly’s letter reads.

“Specifically, nine set-top box distributors were referred to the FCC in October for enabling the unlawful streaming of copyrighted material, seven of which displayed the FCC logo, although there was no record of such compliance.”

While O’Rielly admits that the copyright infringement aspects fall outside the jurisdiction of the FCC, he says it’s troubling that many of these devices are used to stream infringing content, “exacerbating the theft of billions of dollars in American innovation and creativity.”

As noted above, both Amazon and eBay have taken steps to reduce sales of pirate boxes on their respective platforms on copyright infringement grounds, something which is duly noted by O’Rielly. However, he points out that devices continue to be sold to members of the public who may believe that the devices are legal since they’re available for sale from legitimate companies.

“For these reasons, I am seeking your further cooperation in assisting the FCC in taking steps to eliminate the non-FCC compliant devices or devices that fraudulently bear the FCC logo,” the Commissioner writes (pdf).

“Moreover, if your company is made aware by the Commission, with supporting evidence, that a particular device is using a fraudulent FCC label or has not been appropriately certified and labeled with a valid FCC logo, I respectfully request that you commit to swiftly removing these products from your sites.”

In the event that Amazon and eBay take action under this request, O’Rielly asks both platforms to hand over information they hold on offending manufacturers, distributors, and suppliers.

Amazon was quick to respond to the FCC. In a letter published by Ars, Amazon’s Public Policy Vice President Brian Huseman assured O’Rielly that the company is not only dedicated to tackling rogue devices on copyright-infringement grounds but also when there is fraudulent use of the FCC’s logos.

Noting that Amazon is a key member of the Alliance for Creativity and Entertainment (ACE) – a group that has been taking legal action against sellers of infringing streaming devices (ISDs) and those who make infringing addons for Kodi-type systems – Huseman says that dealing with the problem is a top priority.

“Our goal is to prevent the sale of ISDs anywhere, as we seek to protect our customers from the risks posed by these devices, in addition to our interest in protecting Amazon Studios content,” Huseman writes.

“In 2017, Amazon became the first online marketplace to prohibit the sale of streaming media players that promote or facilitate piracy. To prevent the sale of these devices, we proactively scan product listings for signs of potentially infringing products, and we also invest heavily in sophisticated, automated real-time tools to review a variety of data sources and signals to identify inauthentic goods.

“These automated tools are supplemented by human reviewers that conduct manual investigations. When we suspect infringement, we take immediate action to remove suspected listings, and we also take enforcement action against sellers’ entire accounts when appropriate.”

Huseman also reveals that since implementing a proactive policy against such devices, “tens of thousands” of listings have been blocked from Amazon. In addition, the platform has been making criminal referrals to law enforcement as well as taking civil action (1,2,3) as part of ACE.

“As noted in your letter, we would also appreciate the opportunity to collaborate further with the FCC to remove non-compliant devices that improperly use the FCC logo or falsely claim FCC certification. If any FCC non-compliant devices are identified, we seek to work with you to ensure they are not offered for sale,” Huseman concludes.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN reviews, discounts, offers and coupons.

[$] Killing processes that don’t want to die

Post Syndicated from jake original https://lwn.net/Articles/754980/rss

Suppose you have a program running on your system that you don’t quite
trust. Maybe it’s a program submitted by a student to an automated
grading system. Or maybe it’s a QEMU device model running in a Xen
control domain ("domain 0" or “dom0”), and you want to make sure
that even
if an attacker from a rogue virtual machine manages to take over the QEMU
process,
they can’t do any further harm. There are many things you want to do as far
as restricting its ability
to do mischief. But one thing in particular you probably want to do
is to be able to reliably kill the process once you think it should be
done. This turns out to be quite a bit more tricky than you’d think.

Google’s Chrome Web Store Spammed With Dodgy ‘Pirate’ Movie Links

Post Syndicated from Andy original https://torrentfreak.com/googles-chrome-web-store-spammed-with-dodgy-pirate-movie-links-180527/

Launched in 2010, Google’s Chrome Store is the go-to place for people looking to pimp their Chrome browser.

Often referred to as apps and extensions, the programs offered by the platform run in Chrome and can perform a dazzling array of functions, from improving security and privacy, to streaming video or adding magnet links to torrent sites.

Also available on the Chrome Store are themes, which can be installed locally to change the appearance of the Chrome browser.

While there are certainly plenty to choose from, some additions to the store over the past couple of months are not what most people have come to expect from the add-on platform.

Free movies on Chrome’s Web Store?

As the image above suggests, unknown third parties appear to be exploiting the Chrome Store’s ‘theme’ section to offer visitors access to a wide range of pirate movies including Black Panther, Avengers: Infinity War and Rampage.

When clicking through to the page offering Ready Player One, for example, users are presented with a theme that apparently allows them to watch the movie online in “Full HD Online 4k.”

Of course, the whole scheme is a dubious scam which eventually leads users to Vioos.co, a platform that tries very hard to give the impression of being a pirate streaming portal but actually provides nothing of use.

Nothing to see here

In fact, as soon as one clicks the play button on movies appearing on Vioos.co, visitors are re-directed to another site called Zumastar which asks people to “create a free account” to “access unlimited downloads & streaming.”

“With over 20 million titles, Zumastar is your number one entertainment resource. Join hundreds of thousands of satisfied members and enjoy the hottest movies,” the site promises.

With this kind of marketing, perhaps we should think about this offer for a second. Done. No thanks.

In extended testing, some visits to Vioos.co resulted in a redirection to EtnaMedia.net, a domain that was immediately blocked by MalwareBytes due to suspected fraud. However, after allowing the browser to make the connection, TF was presented with another apparent subscription site.

We didn’t follow through with a sign-up but further searches revealed upset former customers complaining of money being taken from their credit cards when they didn’t expect that to happen.

Quite how many people have signed up to Zumastar or EtnaMedia via this convoluted route from Google’s Chrome Store isn’t clear but a worrying number appear to have installed the ‘themes’ (if that’s what they are) offered on each ‘pirate movie’ page.

At the time of writing the ‘free Watch Rampage Online Full Movie’ ‘theme’ has 2,196 users, the “Watch Avengers Infinity War Full Movie” variant has 974, the ‘Watch Ready Player One 2018 Full HD’ page has 1,031, and the ‘Watch Black Panther Online Free 123putlocker’ ‘theme’ has more than 1,800. Clearly, a worrying number of people will click and install just about anything.

We haven’t tested the supposed themes to see what they do but it’s a cast-iron guarantee that they don’t offer the movies displayed and there’s always a chance they’ll do something awful. As a rule of thumb, it’s nearly always wise to steer clear of anything with “full movie” in the title, they can rarely be trusted.

Finally, those hoping to get some guidance on quality from the reviews on the Chrome Store will be bitterly disappointed.

Garbage reviews, probably left by the scammers

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN reviews, discounts, offers and coupons.

BPI Wants Piracy Dealt With Under New UK Internet ‘Clean-Up’ Laws

Post Syndicated from Andy original https://torrentfreak.com/bpi-wants-music-piracy-dealt-with-under-uk-internet-clean-up-laws-180523/

For the past several years, the UK Government has expressed a strong desire to “clean up” the Internet.

Strong emphasis has been placed on making the Internet safer for children but that’s just the tip of a much larger iceberg.

This week, the Government published its response to the Internet Safety Strategy green paper, stating unequivocally that more needs to be done to tackle “online harm”.

Noting that six out of ten people report seeing inappropriate or harmful content online, the Government said that work already underway with social media companies to protect users had borne fruit but overall industry response has been less satisfactory.

As a result, the Government will now carry through with its threat to introduce new legislation, albeit with the assistance of technology companies, children’s charities and other stakeholders.

“Digital technology is overwhelmingly a force for good across the world and we must always champion innovation and change for the better,” said Matt Hancock, Secretary of State for Digital, Culture, Media and Sport.

“At the same time I have been clear that we have to address the Wild West elements of the Internet through legislation, in a way that supports innovation. We strongly support technology companies to start up and grow, and we want to work with them to keep our citizens safe.”

While emphasis is being placed on hot-button topics such as cyberbullying and online child exploitation, the Government is clear that it wishes to tackle “the full range” of online harms. That has been greeted by UK music group BPI with a request that the Government introduces new measures to tackle Internet piracy.

In a statement issued this week, BPI chief executive Geoff Taylor welcomed the move towards legislative change and urged the Government to encompass the music industry and beyond.

“This is a vital opportunity to protect consumers and boost the UK’s music and creative industries. The BPI has long pressed for internet intermediaries and online platforms to take responsibility for the content that they promote to users,” Taylor said.

“Government should now take the power in legislation to require online giants to take effective, proactive measures to clean illegal content from their sites and services. This will keep fans away from dodgy sites full of harmful content and prevent criminals from undermining creative businesses that create UK jobs.”

The BPI has published four initial requests, each of which provides food for thought.

The demand to “establish a new fast-track process for blocking illegal sites” is not entirely unexpected, particularly given the expense of launching applications for blocking injunctions at the High Court.

“The BPI has taken a large number of actions against individual websites – 63 injunctions are in place against sites that are wholly or mainly infringing and whose business is simply to profit from criminal activity,” the BPI says.

Those injunctions can be expanded fairly easily to include new sites operating under similar banners or facilitating access to those already covered, but it’s clear the BPI would like something more streamlined. Voluntary schemes, such as the one in place in Portugal, could be an option but it’s unclear how troublesome that could be for ISPs. New legislation could solve that dilemma, however.

Another big thorn in the side for groups like the BPI are people and entities that post infringing content. The BPI is very good at taking these listings down from sites and search engines in particular (more than 600 million requests to date) but it’s a game of whac-a-mole the group would rather not engage in.

With that in mind, the BPI would like the Government to impose new rules that would compel online platforms to stop content from being re-posted after it’s been taken down while removing the accounts of repeat infringers.

Thirdly, the BPI would like the Government to introduce penalties for “online operators” who do not provide “transparent contact and ownership information.” The music group isn’t any more specific than that, but the suggestion is that operators of some sites have a tendency to hide in the shadows, something which frustrates enforcement activity.

Finally, and perhaps most interestingly, the BPI is calling on the Government to legislate for a new “duty of care” for online intermediaries and platforms. Specifically, the BPI wants “effective action” taken against businesses that use the Internet to “encourage” consumers to access content illegally.

While this could easily encompass pirate sites and services themselves, this proposal has the breadth to include a wide range of offenders, from people posting piracy-focused tutorials on monetized YouTube channels to those selling fully-loaded Kodi devices on eBay or social media.

Overall, the BPI clearly wants to place pressure on intermediaries to take action against piracy when they’re in a position to do so, and particularly those who may not have shown much enthusiasm towards industry collaboration in the past.

“Legislation in this Bill, to take powers to intervene with respect to operators that do not co-operate, would bring focus to the roundtable process and ensure that intermediaries take their responsibilities seriously,” the BPI says.

The Department for Digital, Culture, Media & Sport and the Home Office will now work on a White Paper, to be published later this year, to set out legislation to tackle “online harms”. The BPI and similar entities will hope that the Government takes their concerns on board.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN reviews, discounts, offers and coupons.

Security updates for Tuesday

Post Syndicated from ris original https://lwn.net/Articles/755205/rss

Security updates have been issued by Debian (gitlab and packagekit), Fedora (glibc, postgresql, and webkitgtk4), Oracle (java-1.7.0-openjdk, java-1.8.0-openjdk, kernel, libvirt, and qemu-kvm), Red Hat (java-1.7.0-openjdk, kernel-rt, qemu-kvm, and qemu-kvm-rhev), SUSE (openjpeg2, qemu, and squid3), and Ubuntu (kernel, linux, linux-aws, linux-azure, linux-gcp, linux-kvm, linux-oem, linux, linux-aws, linux-kvm,, linux-hwe, linux-azure, linux-gcp, linux-oem, linux-lts-trusty, linux-lts-xenial, linux-aws, qemu, and xdg-utils).

[$] Securing the container image supply chain

Post Syndicated from corbet original https://lwn.net/Articles/754443/rss

“Security is hard” is a tautology, especially in the fast-moving world
of container orchestration. We have previously covered various aspects of
Linux container
security through, for example, the Clear Containers implementation
or the broader question of Kubernetes and
security
, but those are mostly concerned with container isolation; they do not address the
question of trusting a container’s contents. What is a container running?
Who built it and when? Even assuming we have good programmers and solid
isolation layers, propagating that good code around a Kubernetes cluster
and making strong assertions on the integrity of that supply chain is far
from trivial. The 2018 KubeCon
+ CloudNativeCon Europe
event featured some projects that could
eventually solve that problem.

Canonical on trust and security in the Snap Store

Post Syndicated from corbet original https://lwn.net/Articles/754502/rss

Here’s a
posting from Canonical
concerning the cryptocurrency-mining app that
was discovered in its Snap Store. “Several years ago when we started
the work on snap packages, we understood that we could not instantly
implement an alternative that was completely safe from all perspectives. In
addition to being safe, it had to be useful. So the challenge we gave
ourselves was to significantly improve the situation immediately, and then
pave the road for incremental improvements that could be rolled out
gradually.

Some notes on eFail

Post Syndicated from Robert Graham original https://blog.erratasec.com/2018/05/some-notes-on-efail.html

I’ve been busy trying to replicate the “eFail” PGP/SMIME bug. I thought I’d write up some notes.

PGP and S/MIME encrypt emails, so that eavesdroppers can’t read them. The bugs potentially allow eavesdroppers to take the encrypted emails they’ve captured and resend them to you, reformatted in a way that allows them to decrypt the messages.

Disable remote/external content in email

The most important defense is to disable “external” or “remote” content from being automatically loaded. This is when HTML-formatted emails attempt to load images from remote websites. This happens legitimately when they want to display images, but not fill up the email with them. But most of the time this is illegitimate, they hide images on the webpage in order to track you with unique IDs and cookies. For example, this is the code at the end of an email from politician Bernie Sanders to his supporters. Notice the long random number assigned to track me, and the width/height of this image is set to one pixel, so you don’t even see it:

Such trackers are so pernicious they are disabled by default in most email clients. This is an example of the settings in Thunderbird:

The problem is that as you read email messages, you often get frustrated by the fact the error messages and missing content, so you keep adding exceptions:

The correct defense against this eFail bug is to make sure such remote content is disabled and that you have no exceptions, or at least, no HTTP exceptions. HTTPS exceptions (those using SSL) are okay as long as they aren’t to a website the attacker controls. Unencrypted exceptions, though, the hacker can eavesdrop on, so it doesn’t matter if they control the website the requests go to. If the attacker can eavesdrop on your emails, they can probably eavesdrop on your HTTP sessions as well.

Some have recommended disabling PGP and S/MIME completely. That’s probably overkill. As long as the attacker can’t use the “remote content” in emails, you are fine. Likewise, some have recommend disabling HTML completely. That’s not even an option in any email client I’ve used — you can disable sending HTML emails, but not receiving them. It’s sufficient to just disable grabbing remote content, not the rest of HTML email rendering.

I couldn’t replicate the direct exfiltration

There rare two related bugs. One allows direct exfiltration, which appends the decrypted PGP email onto the end of an IMG tag (like one of those tracking tags), allowing the entire message to be decrypted.

An example of this is the following email. This is a standard HTML email message consisting of multiple parts. The trick is that the IMG tag in the first part starts the URL (blog.robertgraham.com/…) but doesn’t end it. It has the starting quotes in front of the URL but no ending quotes. The ending will in the next chunk.

The next chunk isn’t HTML, though, it’s PGP. The PGP extension (in my case, Enignmail) will detect this and automatically decrypt it. In this case, it’s some previous email message I’ve received the attacker captured by eavesdropping, who then pastes the contents into this email message in order to get it decrypted.

What should happen at this point is that Thunderbird will generate a request (if “remote content” is enabled) to the blog.robertgraham.com server with the decrypted contents of the PGP email appended to it. But that’s not what happens. Instead, I get this:

I am indeed getting weird stuff in the URL (the bit after the GET /), but it’s not the PGP decrypted message. Instead what’s going on is that when Thunderbird puts together a “multipart/mixed” message, it adds it’s own HTML tags consisting of lines between each part. In the email client it looks like this:

The HTML code it adds looks like:

That’s what you see in the above URL, all this code up to the first quotes. Those quotes terminate the quotes in the URL from the first multipart section, causing the rest of the content to be ignored (as far as being sent as part of the URL).

So at least for the latest version of Thunderbird, you are accidentally safe, even if you have “remote content” enabled. Though, this is only according to my tests, there may be a work around to this that hackers could exploit.

STARTTLS

In the old days, email was sent plaintext over the wire so that it could be passively eavesdropped on. Nowadays, most providers send it via “STARTTLS”, which sorta encrypts it. Attackers can still intercept such email, but they have to do so actively, using man-in-the-middle. Such active techniques can be detected if you are careful and look for them.
Some organizations don’t care. Apparently, some nation states are just blocking all STARTTLS and forcing email to be sent unencrypted. Others do care. The NSA will passively sniff all the email they can in nations like Iraq, but they won’t actively intercept STARTTLS messages, for fear of getting caught.
The consequence is that it’s much less likely that somebody has been eavesdropping on you, passively grabbing all your PGP/SMIME emails. If you fear they have been, you should look (e.g. send emails from GMail and see if they are intercepted by sniffing the wire).

You’ll know if you are getting hacked

If somebody attacks you using eFail, you’ll know. You’ll get an email message formatted this way, with multipart/mixed components, some with corrupt HTML, some encrypted via PGP. This means that for the most part, your risk is that you’ll be attacked only once — the hacker will only be able to get one message through and decrypt it before you notice that something is amiss. Though to be fair, they can probably include all the emails they want decrypted as attachments to the single email they sent you, so the risk isn’t necessarily that you’ll only get one decrypted.
As mentioned above, a lot of attackers (e.g. the NSA) won’t attack you if its so easy to get caught. Other attackers, though, like anonymous hackers, don’t care.
Somebody ought to write a plugin to Thunderbird to detect this.

Summary

It only works if attackers have already captured your emails (though, that’s why you use PGP/SMIME in the first place, to guard against that).
It only works if you’ve enabled your email client to automatically grab external/remote content.
It seems to not be easily reproducible in all cases.
Instead of disabling PGP/SMIME, you should make sure your email client hast remote/external content disabled — that’s a huge privacy violation even without this bug.

Notes: The default email client on the Mac enables remote content by default, which is bad:

Announcing Rust 1.26

Post Syndicated from ris original https://lwn.net/Articles/754166/rss

The Rust team has announced
the release of version 1.26.0 of the Rust programming language. “The past few releases have had a steady stream of relatively minor additions. We’ve been working on a lot of stuff, however, and it’s all starting to land in stable. 1.26 is possibly the most feature-packed release since Rust 1.0.

Securing Your Cryptocurrency

Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/backing-up-your-cryptocurrency/

Securing Your Cryptocurrency

In our blog post on Tuesday, Cryptocurrency Security Challenges, we wrote about the two primary challenges faced by anyone interested in safely and profitably participating in the cryptocurrency economy: 1) make sure you’re dealing with reputable and ethical companies and services, and, 2) keep your cryptocurrency holdings safe and secure.

In this post, we’re going to focus on how to make sure you don’t lose any of your cryptocurrency holdings through accident, theft, or carelessness. You do that by backing up the keys needed to sell or trade your currencies.

$34 Billion in Lost Value

Of the 16.4 million bitcoins said to be in circulation in the middle of 2017, close to 3.8 million may have been lost because their owners no longer are able to claim their holdings. Based on today’s valuation, that could total as much as $34 billion dollars in lost value. And that’s just bitcoins. There are now over 1,500 different cryptocurrencies, and we don’t know how many of those have been misplaced or lost.



Now that some cryptocurrencies have reached (at least for now) staggering heights in value, it’s likely that owners will be more careful in keeping track of the keys needed to use their cryptocurrencies. For the ones already lost, however, the owners have been separated from their currencies just as surely as if they had thrown Benjamin Franklins and Grover Clevelands over the railing of a ship.

The Basics of Securing Your Cryptocurrencies

In our previous post, we reviewed how cryptocurrency keys work, and the common ways owners can keep track of them. A cryptocurrency owner needs two keys to use their currencies: a public key that can be shared with others is used to receive currency, and a private key that must be kept secure is used to spend or trade currency.

Many wallets and applications allow the user to require extra security to access them, such as a password, or iris, face, or thumb print scan. If one of these options is available in your wallets, take advantage of it. Beyond that, it’s essential to back up your wallet, either using the backup feature built into some applications and wallets, or manually backing up the data used by the wallet. When backing up, it’s a good idea to back up the entire wallet, as some wallets require additional private data to operate that might not be apparent.

No matter which backup method you use, it is important to back up often and have multiple backups, preferable in different locations. As with any valuable data, a 3-2-1 backup strategy is good to follow, which ensures that you’ll have a good backup copy if anything goes wrong with one or more copies of your data.

One more caveat, don’t reuse passwords. This applies to all of your accounts, but is especially important for something as critical as your finances. Don’t ever use the same password for more than one account. If security is breached on one of your accounts, someone could connect your name or ID with other accounts, and will attempt to use the password there, as well. Consider using a password manager such as LastPass or 1Password, which make creating and using complex and unique passwords easy no matter where you’re trying to sign in.

Approaches to Backing Up Your Cryptocurrency Keys

There are numerous ways to be sure your keys are backed up. Let’s take them one by one.

1. Automatic backups using a backup program

If you’re using a wallet program on your computer, for example, Bitcoin Core, it will store your keys, along with other information, in a file. For Bitcoin Core, that file is wallet.dat. Other currencies will use the same or a different file name and some give you the option to select a name for the wallet file.

To back up the wallet.dat or other wallet file, you might need to tell your backup program to explicitly back up that file. Users of Backblaze Backup don’t have to worry about configuring this, since by default, Backblaze Backup will back up all data files. You should determine where your particular cryptocurrency, wallet, or application stores your keys, and make sure the necessary file(s) are backed up if your backup program requires you to select which files are included in the backup.

Backblaze B2 is an option for those interested in low-cost and high security cloud storage of their cryptocurrency keys. Backblaze B2 supports 2-factor verification for account access, works with a number of apps that support automatic backups with encryption, error-recovery, and versioning, and offers an API and command-line interface (CLI), as well. The first 10GB of storage is free, which could be all one needs to store encrypted cryptocurrency keys.

2. Backing up by exporting keys to a file

Apps and wallets will let you export your keys from your app or wallet to a file. Once exported, your keys can be stored on a local drive, USB thumb drive, DAS, NAS, or in the cloud with any cloud storage or sync service you wish. Encrypting the file is strongly encouraged — more on that later. If you use 1Password or LastPass, or other secure notes program, you also could store your keys there.

3. Backing up by saving a mnemonic recovery seed

A mnemonic phrase, mnemonic recovery phrase, or mnemonic seed is a list of words that stores all the information needed to recover a cryptocurrency wallet. Many wallets will have the option to generate a mnemonic backup phrase, which can be written down on paper. If the user’s computer no longer works or their hard drive becomes corrupted, they can download the same wallet software again and use the mnemonic recovery phrase to restore their keys.

The phrase can be used by anyone to recover the keys, so it must be kept safe. Mnemonic phrases are an excellent way of backing up and storing cryptocurrency and so they are used by almost all wallets.

A mnemonic recovery seed is represented by a group of easy to remember words. For example:

eye female unfair moon genius pipe nuclear width dizzy forum cricket know expire purse laptop scale identify cube pause crucial day cigar noise receive

The above words represent the following seed:

0a5b25e1dab6039d22cd57469744499863962daba9d2844243fec 9c0313c1448d1a0b2cd9e230a78775556f9b514a8be45802c2808e fd449a20234e9262dfa69

These words have certain properties:

  • The first four letters are enough to unambiguously identify the word.
  • Similar words are avoided (such as: build and built).

Bitcoin and most other cryptocurrencies such as Litecoin, Ethereum, and others use mnemonic seeds that are 12 to 24 words long. Other currencies might use different length seeds.

4. Physical backups — Paper, Metal

Some cryptocurrency holders believe that their backup, or even all their cryptocurrency account information, should be stored entirely separately from the internet to avoid any risk of their information being compromised through hacks, exploits, or leaks. This type of storage is called “cold storage.” One method of cold storage involves printing out the keys to a piece of paper and then erasing any record of the keys from all computer systems. The keys can be entered into a program from the paper when needed, or scanned from a QR code printed on the paper.

Printed public and private keys

Printed public and private keys

Some who go to extremes suggest separating the mnemonic needed to access an account into individual pieces of paper and storing those pieces in different locations in the home or office, or even different geographical locations. Some say this is a bad idea since it could be possible to reconstruct the mnemonic from one or more pieces. How diligent you wish to be in protecting these codes is up to you.

Mnemonic recovery phrase booklet

Mnemonic recovery phrase booklet

There’s another option that could make you the envy of your friends. That’s the CryptoSteel wallet, which is a stainless steel metal case that comes with more than 250 stainless steel letter tiles engraved on each side. Codes and passwords are assembled manually from the supplied part-randomized set of tiles. Users are able to store up to 96 characters worth of confidential information. Cryptosteel claims to be fireproof, waterproof, and shock-proof.

image of a Cryptosteel cold storage device

Cryptosteel cold wallet

Of course, if you leave your Cryptosteel wallet in the pocket of a pair of ripped jeans that gets thrown out by the housekeeper, as happened to the character Russ Hanneman on the TV show Silicon Valley in last Sunday’s episode, then you’re out of luck. That fictional billionaire investor lost a USB drive with $300 million in cryptocoins. Let’s hope that doesn’t happen to you.

Encryption & Security

Whether you store your keys on your computer, an external disk, a USB drive, DAS, NAS, or in the cloud, you want to make sure that no one else can use those keys. The best way to handle that is to encrypt the backup.

With Backblaze Backup for Windows and Macintosh, your backups are encrypted in transmission to the cloud and on the backup server. Users have the option to add an additional level of security by adding a Personal Encryption Key (PEK), which secures their private key. Your cryptocurrency backup files are secure in the cloud. Using our web or mobile interface, previous versions of files can be accessed, as well.

Our object storage cloud offering, Backblaze B2, can be used with a variety of applications for Windows, Macintosh, and Linux. With B2, cryptocurrency users can choose whichever method of encryption they wish to use on their local computers and then upload their encrypted currency keys to the cloud. Depending on the client used, versioning and life-cycle rules can be applied to the stored files.

Other backup programs and systems provide some or all of these capabilities, as well. If you are backing up to a local drive, it is a good idea to encrypt the local backup, which is an option in some backup programs.

Address Security

Some experts recommend using a different address for each cryptocurrency transaction. Since the address is not the same as your wallet, this means that you are not creating a new wallet, but simply using a new identifier for people sending you cryptocurrency. Creating a new address is usually as easy as clicking a button in the wallet.

One of the chief advantages of using a different address for each transaction is anonymity. Each time you use an address, you put more information into the public ledger (blockchain) about where the currency came from or where it went. That means that over time, using the same address repeatedly could mean that someone could map your relationships, transactions, and incoming funds. The more you use that address, the more information someone can learn about you. For more on this topic, refer to Address reuse.

Note that a downside of using a paper wallet with a single key pair (type-0 non-deterministic wallet) is that it has the vulnerabilities listed above. Each transaction using that paper wallet will add to the public record of transactions associated with that address. Newer wallets, i.e. “deterministic” or those using mnemonic code words support multiple addresses and are now recommended.

There are other approaches to keeping your cryptocurrency transaction secure. Here are a couple of them.

Multi-signature

Multi-signature refers to requiring more than one key to authorize a transaction, much like requiring more than one key to open a safe. It is generally used to divide up responsibility for possession of cryptocurrency. Standard transactions could be called “single-signature transactions” because transfers require only one signature — from the owner of the private key associated with the currency address (public key). Some wallets and apps can be configured to require more than one signature, which means that a group of people, businesses, or other entities all must agree to trade in the cryptocurrencies.

Deep Cold Storage

Deep cold storage ensures the entire transaction process happens in an offline environment. There are typically three elements to deep cold storage.

First, the wallet and private key are generated offline, and the signing of transactions happens on a system not connected to the internet in any manner. This ensures it’s never exposed to a potentially compromised system or connection.

Second, details are secured with encryption to ensure that even if the wallet file ends up in the wrong hands, the information is protected.

Third, storage of the encrypted wallet file or paper wallet is generally at a location or facility that has restricted access, such as a safety deposit box at a bank.

Deep cold storage is used to safeguard a large individual cryptocurrency portfolio held for the long term, or for trustees holding cryptocurrency on behalf of others, and is possibly the safest method to ensure a crypto investment remains secure.

Keep Your Software Up to Date

You should always make sure that you are using the latest version of your app or wallet software, which includes important stability and security fixes. Installing updates for all other software on your computer or mobile device is also important to keep your wallet environment safer.

One Last Thing: Think About Your Testament

Your cryptocurrency funds can be lost forever if you don’t have a backup plan for your peers and family. If the location of your wallets or your passwords is not known by anyone when you are gone, there is no hope that your funds will ever be recovered. Taking a bit of time on these matters can make a huge difference.

To the Moon*

Are you comfortable with how you’re managing and backing up your cryptocurrency wallets and keys? Do you have a suggestion for keeping your cryptocurrencies safe that we missed above? Please let us know in the comments.


*To the Moon — Crypto slang for a currency that reaches an optimistic price projection.

The post Securing Your Cryptocurrency appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Supply-Chain Security

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2018/05/supply-chain_se.html

Earlier this month, the Pentagon stopped selling phones made by the Chinese companies ZTE and Huawei on military bases because they might be used to spy on their users.

It’s a legitimate fear, and perhaps a prudent action. But it’s just one instance of the much larger issue of securing our supply chains.

All of our computerized systems are deeply international, and we have no choice but to trust the companies and governments that touch those systems. And while we can ban a few specific products, services or companies, no country can isolate itself from potential foreign interference.

In this specific case, the Pentagon is concerned that the Chinese government demanded that ZTE and Huawei add “backdoors” to their phones that could be surreptitiously turned on by government spies or cause them to fail during some future political conflict. This tampering is possible because the software in these phones is incredibly complex. It’s relatively easy for programmers to hide these capabilities, and correspondingly difficult to detect them.

This isn’t the first time the United States has taken action against foreign software suspected to contain hidden features that can be used against us. Last December, President Trump signed into law a bill banning software from the Russian company Kaspersky from being used within the US government. In 2012, the focus was on Chinese-made Internet routers. Then, the House Intelligence Committee concluded: “Based on available classified and unclassified information, Huawei and ZTE cannot be trusted to be free of foreign state influence and thus pose a security threat to the United States and to our systems.”

Nor is the United States the only country worried about these threats. In 2014, China reportedly banned antivirus products from both Kaspersky and the US company Symantec, based on similar fears. In 2017, the Indian government identified 42 smartphone apps that China subverted. Back in 1997, the Israeli company Check Point was dogged by rumors that its government added backdoors into its products; other of that country’s tech companies have been suspected of the same thing. Even al-Qaeda was concerned; ten years ago, a sympathizer released the encryption software Mujahedeen Secrets, claimed to be free of Western influence and backdoors. If a country doesn’t trust another country, then it can’t trust that country’s computer products.

But this trust isn’t limited to the country where the company is based. We have to trust the country where the software is written — and the countries where all the components are manufactured. In 2016, researchers discovered that many different models of cheap Android phones were sending information back to China. The phones might be American-made, but the software was from China. In 2016, researchers demonstrated an even more devious technique, where a backdoor could be added at the computer chip level in the factory that made the chips ­ without the knowledge of, and undetectable by, the engineers who designed the chips in the first place. Pretty much every US technology company manufactures its hardware in countries such as Malaysia, Indonesia, China and Taiwan.

We also have to trust the programmers. Today’s large software programs are written by teams of hundreds of programmers scattered around the globe. Backdoors, put there by we-have-no-idea-who, have been discovered in Juniper firewalls and D-Link routers, both of which are US companies. In 2003, someone almost slipped a very clever backdoor into Linux. Think of how many countries’ citizens are writing software for Apple or Microsoft or Google.

We can go even farther down the rabbit hole. We have to trust the distribution systems for our hardware and software. Documents disclosed by Edward Snowden showed the National Security Agency installing backdoors into Cisco routers being shipped to the Syrian telephone company. There are fake apps in the Google Play store that eavesdrop on you. Russian hackers subverted the update mechanism of a popular brand of Ukrainian accounting software to spread the NotPetya malware.

In 2017, researchers demonstrated that a smartphone can be subverted by installing a malicious replacement screen.

I could go on. Supply-chain security is an incredibly complex problem. US-only design and manufacturing isn’t an option; the tech world is far too internationally interdependent for that. We can’t trust anyone, yet we have no choice but to trust everyone. Our phones, computers, software and cloud systems are touched by citizens of dozens of different countries, any one of whom could subvert them at the demand of their government. And just as Russia is penetrating the US power grid so they have that capability in the event of hostilities, many countries are almost certainly doing the same thing at the consumer level.

We don’t know whether the risk of Huawei and ZTE equipment is great enough to warrant the ban. We don’t know what classified intelligence the United States has, and what it implies. But we do know that this is just a minor fix for a much larger problem. It’s doubtful that this ban will have any real effect. Members of the military, and everyone else, can still buy the phones. They just can’t buy them on US military bases. And while the US might block the occasional merger or acquisition, or ban the occasional hardware or software product, we’re largely ignoring that larger issue. Solving it borders on somewhere between incredibly expensive and realistically impossible.

Perhaps someday, global norms and international treaties will render this sort of device-level tampering off-limits. But until then, all we can do is hope that this particular arms race doesn’t get too far out of control.

This essay previously appeared in the Washington Post.

Security updates for Wednesday

Post Syndicated from ris original https://lwn.net/Articles/754021/rss

Security updates have been issued by Debian (kernel), Gentoo (rsync), openSUSE (Chromium), Oracle (kernel), Red Hat (kernel and kernel-rt), Scientific Linux (kernel), SUSE (kernel and php7), and Ubuntu (dpdk, libraw, linux, linux-lts-trusty, linux-snapdragon, and webkit2gtk).

ISPs Win Landmark Case to Protect Privacy of Alleged Pirates

Post Syndicated from Andy original https://torrentfreak.com/isps-win-landmark-case-protect-privacy-alleged-pirates-180508/

With waves of piracy settlement letters being sent out across the world, the last line of defense for many accused Internet users has been their ISPs.

In a number of regions, notably the United States, Europe, and the UK, most ISPs have given up the fight, handing subscriber details over to copyright trolls with a minimum of resistance. However, there are companies out there prepared to stand up for their customers’ rights, if eventually.

Over in Denmark, Telenor grew tired of tens of thousands of requests for subscriber details filed by a local law firm on behalf of international copyright troll groups. It previously complied with demands to hand over the details of individuals behind 22,000 IP addresses, around 11% of the 200,000 total handled by ISPs in Denmark. But with no end in sight, the ISP dug in its heels.

“We think there is a fundamental legal problem because the courts do not really decide what is most important: the legal security of the public or the law firms’ commercial interests,” Telenor’s Legal Director Mette Eistrøm Krüger said last year.

Assisted by rival ISP Telia, Telenor subsequently began preparing a case to protect the interests of their customers, refusing in the meantime to comply with disclosure requests in copyright cases. But last October, the District Court ruled against the telecoms companies, ordering them to provide identities to the copyright trolls.

Undeterred, the companies took their case to the Østre Landsret, one of Denmark’s two High Courts. Yesterday their determination paid off with a resounding victory for the ISPs and security for the individuals behind approximately 4,000 IP addresses targeted by Copyright Collection Ltd via law firm Njord Law.

“In its order based on telecommunications legislation, the Court has weighed subscribers’ rights to confidentiality of information regarding their use of the Internet against the interests of rightsholders to obtain information for the purpose of prosecuting claims against the subscribers,” the Court said in a statement.

Noting that the case raised important questions of European Union law and the European Convention on Human Rights, the High Court said that after due consideration it would overrule the decision of the District Court. The rights of the copyright holders do not trump the individuals right to privacy, it said.

“The telecommunications companies are therefore not required to disclose the names and addresses of their subscribers,” the Court ruled.

Telenor welcomed the decision, noting that it had received countless requests from law firms to disclose the identities of thousands of subscribers but had declined to hand them over, a decision that has now been endorsed by the High Court.

“This is an important victory for our right to protect our customers’ data,” said Telenor Denmark’s Legal Director, Mette Eistrøm Krüger.

“At Telenor we protect our customers’ data and trust – therefore it has been our conviction that we cannot be forced into almost automatically submitting personal data on our customers simply to support some private actors who are driven by commercial interests.”

Noting that it’s been putting up a fight since 2016 against handing over customers’ data for purposes other than investigating serious crime, Telenor said that the clarity provided by the decision is most welcome.

“We and other Danish telecom companies are required to log customer data for the police to fight serious crime and terrorism – but the legislation has just been insufficient in relation to the use of logged data,” Krüger said.

“Therefore I am pleased that with this judgment the High Court has stated that customers’ legal certainty is most important in these cases.”

The decision was also welcomed by Telia Denmark, with Legal Director Lasse Andersen describing the company as being “really really happy” with “a big win.”

“It is a victory for our customers and for all telecom companies’ customers,” Andersen said.

“They can now feel confident that the data that we collect about them cannot be disclosed for purposes other than the terms under which they are collected as determined by the jurisdiction.

“Therefore, anyone and everybody cannot claim our data. We are pleased that throughout the process we have determined that we will not hand over our data to anyone other than the police with a court order,” Andersen added.

But as the ISPs celebrate, the opposite is true for Njord Law and its copyright troll partners.

“It is a sad message to the Danish film and television industry that the possibilities for self-investigating illegal file sharing are complicated and that the work must be left to the police’s scarce resources,” said Jeppe Brogaard Clausen of Njord Law.

While the ISPs finally stood up for users in these cases, Telenor in particular wishes to emphasize that supporting the activities of pirates is not its aim. The company says it does not support illegal file-sharing “in any way” and is actively working with anti-piracy outfit Rights Alliance to prevent unauthorized downloading of movies and other content.

The full decision of the Østre Landsret can be found here (Danish, pdf)

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN reviews, discounts, offers and coupons.

Саморегулиране на медиите за борба с онлайн дезинформацията

Post Syndicated from nellyo original https://nellyo.wordpress.com/2018/05/08/jti/

 

Journalism Trust Initiative (JTI) е  инициатива за саморегулиране на медиите, предназначена да насърчава качествената журналистика в новата информационна екосистема. Това е идея на Репортери без граници  съвместно с партньори като Агенция Франс Прес (АФП) и  Европейския съюз за радио и телевизия (EBU).

В рамките на инициативата ще бъдат създадени система от стандарти, след което ще може да се провежда сертифициране.

Очакваното значение на стандартите  – според първоначалните текстове, свързани с инициативата:

  • ново средство за борба с дезинформацията и  защита на надеждната и качествена информация;
  • ползи за доставчици на съдържание, които се присъединят към инициативата и прилагат стандартите;
  • повече прозрачност по отношение на доставчиците на съдържание;
  • по-добра видимост онлайн за качественото съдържание;
  • повече  рекламни приходи, тъй като рекламодателите ще могат да разпознават  качествени медии;
  • обществена подкрепа за  качествените медии;
  • основа за знак за качество и доверие.

Стандартите ще бъдат разработени за период 12-18 месеца със сътрудничество на френския орган по стандартизация AFNOR  и германския орган за стандартизация Deutsches Institut für Normung (DIN).

Компанията Google е информирала Репортери без граници, че е взела решение да участва в инициативата. 

До 18 май е открита регистрация за участие.

Повече   – на страницата на инициативата в интернет.