Tag Archives: trap

Naturebytes’ weatherproof Pi and camera case

Post Syndicated from Helen Lynn original https://www.raspberrypi.org/blog/naturebytes-weatherproof-pi-and-camera-case/

Naturebytes are making their weatherproof Wildlife Cam Case available as a standalone product for the first time, a welcome addition to the Raspberry Pi ecosystem that should take some of the hassle out of your outdoor builds.

A robin on a bird feeder in a garden with a Naturebytes Wildlife Cam mounted beside it

Weatherproofing digital making projects

People often use Raspberry Pis and Camera Modules for outdoor projects, but weatherproofing your set-up can be tricky. You need to keep water — and tiny creatures — out, but you might well need access for wires and cables, whether for power or sensors; if you’re using a camera, it’ll need something clear and cleanable in front of the lens. You can use sealant, but if you need to adjust anything that you’ve applied it to, you’ll have to remove it and redo it. While we’ve seen a few reasonable options available to buy, the choice has never been what you’d call extensive.

The Naturebytes case

For all these reasons, I was pleased to learn that Naturebytes, the wildlife camera people, are releasing their Wildlife Cam Case as a standalone product for the first time.

Naturebytes case open

The Wildlife Cam Case is ideal for nature camera projects, of course, but it’ll also be useful for anyone who wants to take their Pi outdoors. It has weatherproof lenses that are transparent to visible and IR light, for all your nature observation projects. Its opening is hinged to allow easy access to your hardware, and the case has waterproof access for cables. Inside, there’s a mount for fixing any model of Raspberry Pi and camera, as well as many other components. On top of all that, the case comes with a sturdy nylon strap to make it easy to attach it to a post or a tree.

Naturebytes case additional components

Order yours now!

At the moment, Naturebytes are producing a limited run of the cases. The first batch of 50 are due to be dispatched next week to arrive just in time for the Bank Holiday weekend in the UK, so get them while they’re hot. It’s the perfect thing for recording a timelapse of exactly how quickly the slugs obliterate your vegetable seedlings, and of lots more heartening things that must surely happen in gardens other than mine.

The post Naturebytes’ weatherproof Pi and camera case appeared first on Raspberry Pi.

UK soldiers design Raspberry Pi bomb disposal robot

Post Syndicated from Helen Lynn original https://www.raspberrypi.org/blog/uk-soldiers-design-raspberry-pi-bomb-disposal-robot/

Three soldiers in the British Army have used a Raspberry Pi to build an autonomous robot, as part of their Foreman of Signals course.

Meet The Soldiers Revolutionising Bomb Disposal

Three soldiers from Blandford Camp have successfully designed and built an autonomous robot as part of their Foreman of Signals Course at the Dorset Garrison.

Autonomous robots

Forces Radio BFBS carried a story last week about Staff Sergeant Jolley, Sergeant Rana, and Sergeant Paddon, also known as the “Project ROVER” team. As part of their Foreman of Signals training, their task was to design an autonomous robot that can move between two specified points, take a temperature reading, and transmit the information to a remote computer. The team comments that, while semi-autonomous robots have been used as far back as 9/11 for tasks like finding people trapped under rubble, nothing like their robot and on a similar scale currently exists within the British Army.

The ROVER buggy

Their build is named ROVER, which stands for Remote Obstacle aVoiding Environment Robot. It’s a buggy that moves on caterpillar tracks, and it’s tethered; we wonder whether that might be because it doesn’t currently have an on-board power supply. A demo shows the robot moving forward, then changing its path when it encounters an obstacle. The team is using RealVNC‘s remote access software to allow ROVER to send data back to another computer.

Applications for ROVER

Dave Ball, Senior Lecturer in charge of the Foreman of Signals course, comments that the project is “a fantastic opportunity for [the team] to, even only halfway through the course, showcase some of the stuff they’ve learnt and produce something that’s really quite exciting.” The Project ROVER team explains that the possibilities for autonomous robots like this one are extensive: they include mine clearance, bomb disposal, and search-and-rescue campaigns. They point out that existing semi-autonomous hardware is not as easy to program as their build. In contrast, they say, “with the invention of the Raspberry Pi, this has allowed three very inexperienced individuals to program a robot very capable of doing these things.”

We make Raspberry Pi computers because we want building things with technology to be as accessible as possible. So it’s great to see a project like this, made by people who aren’t techy and don’t have a lot of computing experience, but who want to solve a problem and see that the Pi is an affordable and powerful tool that can help.

The post UK soldiers design Raspberry Pi bomb disposal robot appeared first on Raspberry Pi.

Vetter: Linux Kernel Maintainer Statistics

Post Syndicated from corbet original https://lwn.net/Articles/752563/rss

Daniel Vetter looks at
some kernel-development statistics
, with a focus on patches written by
the maintainers who commit them. “Naively extrapolating the relative trend predicts that around the year 2025 large numbers of kernel maintainers will do nothing else than be the bottleneck, preventing everyone else from getting their work merged and not contributing anything of their own. The kernel community imploding under its own bureaucratic weight being the likely outcome of that.

This is a huge contrast to the ‘everything is getting better, bigger, and
the kernel community is very healthy’ fanfare touted at keynotes and the
yearly kernel report. In my opinion, the kernel community is very much not
looking like it is coping with its growth well and an overall healthy
community.”

Pirates Taunt Amazon Over New “Turd Sandwich” Prime Video Quality

Post Syndicated from Andy original https://torrentfreak.com/pirates-taunt-amazon-over-new-turd-sandwich-prime-video-quality-180419/

Even though they generally aren’t paying for the content they consume, don’t fall into the trap of believing that all pirates are eternally grateful for even poor quality media.

Without a doubt, some of the most quality-sensitive individuals are to be found in pirate communities and they aren’t scared to make their voices known when release groups fail to come up with the best possible goods.

This week there’s been a sustained chorus of disapproval over the quality of pirate video releases sourced from Amazon Prime. The anger is usually directed at piracy groups who fail to capture content in the correct manner but according to a number of observers, the problem is actually at Amazon’s end.

Discussions on Reddit, for example, report that episodes in a single TV series have been declining in filesize and bitrate, from 1.56 GB in 720p at a 3073 kb/s video bitrate for episode 1, down to 907 MB in 720p at just 1514 kb/s video bitrate for episode 10.

Numerous theories as to why this may be the case are being floated around, including that Amazon is trying to save on bandwidth expenses. While this is a possibility, the company hasn’t made any announcements to that end.

Indeed, one legitimate customer reported that he’d raised the quality issue with Amazon and they’d said that the problem was “probably on his end”.

“I have Amazon Prime Video and I noticed the quality was always great for their exclusive shows, so I decided to try buying the shows on Amazon instead of iTunes this year. I paid for season pass subscriptions for Legion, Billions and Homeland this year,” he wrote.

“Just this past weekend, I have noticed a significant drop in details compared to weeks before! So naturally I assumed it was an issue on my end. I started trying different devices, calling support, etc, but nothing really helped.

“Billions continued to look like a blurry mess, almost like I was watching a standard definition DVD instead of the crystal clear HD I paid for and have experienced in the past! And when I check the previous episodes, sure enough, they look fantastic again. What the heck??”

With Amazon distancing itself from the issues, piracy groups have already begun to dig in the knife. Release group DEFLATE has been particularly critical.

“Amazon, in their infinite wisdom, have decided to start fucking with the quality of their encodes. They’re now reaching Netflix’s subpar 1080p.H264 levels, and their H265 encodes aren’t even close to what Netflix produces,” the group said in a file attached to S02E07 of The Good Fight released on Sunday.

“Netflix is able to produce drastic visual improvements with their H265 encodes compared to H264 across every original. In comparison, Amazon can’t decide whether H265 or H264 is going to produce better results, and as a result we suffer for it.”

Arrr! The quality be fallin’

So what’s happening exactly?

A TorrentFreak source (who tells us he’s been working in the BluRay/DCP authoring business for the last 10 years) was kind enough to give us two opinions, one aimed at the techies and another at us mere mortals.

“In technical terms, it appears [Amazon has] increased the CRF [Constant Rate Factor] value they use when encoding for both the HEVC [H265] and H264 streams. Previously, their H264 streams were using CRF 18 and a max bitrate of 15Mbit/s, which usually resulted in file sizes of roughly 3GB, or around 10Mbit/s. Similarly with their HEVC streams, they were using CRF 20 and resulting in streams which were around the same size,” he explained.

“In the past week, the H264 streams have decreased by up to 50% for some streams. While there are no longer any x264 headers embedded in the H264 streams, the HEVC streams still retain those headers and the CRF value used has been increased, so it does appear this change has been done on purpose.”

In layman’s terms, our source believes that Amazon had previously been using an encoding profile that was “right on the edge of relatively good quality” which kept bitrates relatively low but high enough to ensure no perceivable loss of quality.

“H264 streams encoded with CRF 18 could provide an acceptable compromise between quality and file size, where the loss of detail is often negligible when watched at regular viewing distances, at a desk, or in a lounge room on a larger TV,” he explained.

“Recently, it appears these values have been intentionally changed in order to lower the bitrate and file sizes for reasons unknown. As a result, the quality of some streams has been reduced by up to 50% of their previous values. This has introduced a visual loss of quality, comparable to that of viewing something in standard definition versus high definition.”

With the situation failing to improve during the week, by the time piracy group DEFLATE released S03E14 of Supergirl on Tuesday their original criticism had transformed into flat-out insults.

“These are only being done in H265 because Amazon have shit the bed, and it’s a choice between a turd sandwich and a giant douche,” they wrote, offering these images as illustrative of the problem and these indicating what should be achievable.

With DEFLATE advising customers to start complaining to Amazon, the memes have already begun, with unfavorable references to now-defunct group YIFY (which was often chastized for its low quality rips) and even a spin on one of the most well known anti-piracy campaigns.

You wouldn’t download stream….

TorrentFreak contacted Amazon Prime for comment on both the recent changes and growing customer complaints but at the time of publication we were yet to receive a response.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN reviews, discounts, offers and coupons.

Reddit Copyright Complaints Jump 138% But Almost Half Get Rejected

Post Syndicated from Andy original https://torrentfreak.com/reddit-copyright-complaints-jump-138-but-almost-half-get-rejected-180411/

So-called ‘transparency reports’ are becoming increasingly popular with Internet-based platforms and their users. Among other things, they provide much-needed insight into how outsiders attempt to censor content published online and what actions are taken in response.

Google first started publishing its report in 2010, Twitter followed in 2012, and they’ve now been joined by a multitude of major companies including Microsoft, Facebook and Cloudflare.

As one of the world’s most recognized sites, Reddit joined the transparency party fairly late, publishing its first report in early 2015. While light on detail, it revealed that in the previous year the site received just 218 requests to remove content, 81% of which were DMCA-style copyright notices. A significant 62% of those copyright-related requests were rejected.

Over time, Reddit’s reporting has become a little more detailed. Last April it revealed that in 2016, the platform received ‘just’ 3,294 copyright removal requests for the entire year. However, what really caught the eye is how many notices were rejected. In just 610 instances, Reddit was required to remove content from the site, a rejection rate of 81%.

Having been a year since Reddit’s last report, the company has just published its latest edition, covering the period January 1, 2017 to December 31, 2017.

“Reddit publishes this transparency report every year as part of our ongoing commitment to keep you aware of the trends on the various requests regarding private Reddit user account information or removal of content posted to Reddit,” the company said in a statement.

“Reddit believes that maintaining this transparency is extremely important. We want you to be aware of this information, consider it carefully, and ask questions to keep us accountable.”

The detailed report covers a wide range of topics, including government requests for the preservation or production of user information (there were 310) and even an instruction to monitor one Reddit user’s activities in real time via a so-called ‘Trap and Trace’ order.

In copyright terms, there has been significant movement. In 2017, Reddit received 7,825 notifications of alleged copyright infringement under the Digital Millennium Copyright Act, that’s up roughly 138% over the 3,294 notifications received in 2016.

For a platform of Reddit’s unquestionable size, these volumes are not big. While the massive percentage increase is notable, the site still receives less than 10 complaints each day. For comparison, Google receives millions every week.

But perhaps most telling is that despite receiving more than 7,800 DMCA-style takedown notices, these resulted in Reddit carrying out just 4,352 removals. This means that for whatever reasons (Reddit doesn’t specify), 3,473 requests were denied, a rejection rate of 44.38%. Google, on the other hand, removes around 90% of content reported.

DMCA notices can be declared invalid for a number of reasons, from incorrect formatting through to flat-out abuse. In many cases, copyright law is incorrectly applied and it’s not unknown for complainants to attempt a DMCA takedown to stifle speech or perceived competition.

Reddit says it tries to take all things into consideration before removing content.

“Reddit reviews each DMCA takedown notice carefully, and removes content where a valid report is received, as required by the law,” the company says.

“Reddit considers whether the reported content may fall under an exception listed in the DMCA, such as ‘fair use,’ and may ask for clarification that will assist in the review of the removal request.”

Considering the numbers of community-focused “subreddits” dedicated to piracy (not just general discussion, but actual links to content), the low numbers of copyright notices received by Reddit continues to baffle.

There are sections in existence right now offering many links to movies and TV shows hosted on various file-hosting sites. They’re the type of links that are targeted all the time whenever they appear in Google search but copyright owners don’t appear to notice or care about them on Reddit.

Finally, it would be nice if Reddit could provide more information in next year’s report, including detail on why so many requests are rejected. Perhaps regular submission of notices to the Lumen Database would be something Reddit would consider for the future.

Reddit’s Transparency Report for 2017 can be found here.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN reviews, discounts, offers and coupons.

[$] wait_var_event()

Post Syndicated from corbet original https://lwn.net/Articles/750774/rss

One of the trickiest aspects to concurrency in the kernel is waiting for a
specific event to take place. There is a wide variety of possible events,
including a process exiting, the last reference to a data structure going
away, a device completing an operation, or a timeout occurring.
Waiting is surprisingly hard to get right — race conditions abound to trap
the unwary — so the kernel has
accumulated a
large set of wait_event_*() macros to make the task easier. An
attempt to add a new one, though, has led to the generalization of specific
types of waits for 4.17.

A geometric Rust adventure

Post Syndicated from Eevee original https://eev.ee/blog/2018/03/30/a-geometric-rust-adventure/

Hi. Yes. Sorry. I’ve been trying to write this post for ages, but I’ve also been working on a huge writing project, and apparently I have a very limited amount of writing mana at my disposal. I think this is supposed to be a Patreon reward from January. My bad. I hope it’s super great to make up for the wait!

I recently ported some math code from C++ to Rust in an attempt to do a cool thing with Doom. Here is my story.

The problem

I presented it recently as a conundrum (spoilers: I solved it!), but most of those details are unimportant.

The short version is: I have some shapes. I want to find their intersection.

Really, I want more than that: I want to drop them all on a canvas, intersect everything with everything, and pluck out all the resulting polygons. The input is a set of cookie cutters, and I want to press them all down on the same sheet of dough and figure out what all the resulting contiguous pieces are. And I want to know which cookie cutter(s) each piece came from.

But intersection is a good start.

Example of the goal.  Given two squares that overlap at their corners, I want to find the small overlap piece, plus the two L-shaped pieces left over from each square

I’m carefully referring to the input as shapes rather than polygons, because each one could be a completely arbitrary collection of lines. Obviously there’s not much you can do with shapes that aren’t even closed, but at the very least, I need to handle concavity and multiple disconnected polygons that together are considered a single input.

This is a non-trivial problem with a lot of edge cases, and offhand I don’t know how to solve it robustly. I’m not too eager to go figure it out from scratch, so I went hunting for something I could build from.

(Infuriatingly enough, I can just dump all the shapes out in an SVG file and any SVG viewer can immediately solve the problem, but that doesn’t quite help me. Though I have had a few people suggest I just rasterize the whole damn problem, and after all this, I’m starting to think they may have a point.)

Alas, I couldn’t find a Rust library for doing this. I had a hard time finding any library for doing this that wasn’t a massive fully-featured geometry engine. (I could’ve used that, but I wanted to avoid non-Rust dependencies if possible, since distributing software is already enough of a nightmare.)

A Twitter follower directed me towards a paper that described how to do very nearly what I wanted and nothing else: “A simple algorithm for Boolean operations on polygons” by F. Martínez (2013). Being an academic paper, it’s trapped in paywall hell; sorry about that. (And as I understand it, none of the money you’d pay to get the paper would even go to the authors? Is that right? What a horrible and predatory system for discovering and disseminating knowledge.)

The paper isn’t especially long, but it does describe an awful lot of subtle details and is mostly written in terms of its own reference implementation. Rather than write my own implementation based solely on the paper, I decided to try porting the reference implementation from C++ to Rust.

And so I fell down the rabbit hole.

The basic algorithm

Thankfully, the author has published the sample code on his own website, if you want to follow along. (It’s the bottom link; the same author has, confusingly, published two papers on the same topic with similar titles, four years apart.)

If not, let me describe the algorithm and how the code is generally laid out. The algorithm itself is based on a sweep line, where a vertical line passes across the plane and ✨ does stuff ✨ as it encounters various objects. This implementation has no physical line; instead, it keeps track of which segments from the original polygon would be intersecting the sweep line, which is all we really care about.

A vertical line is passing rightwards over a couple intersecting shapes.  The line current intersects two of the shapes' sides, and these two sides are the "sweep list"

The code is all bundled inside a class with only a single public method, run, because… that’s… more object-oriented, I guess. There are several helper methods, and state is stored in some attributes. A rough outline of run is:

  1. Run through all the line segments in both input polygons. For each one, generate two SweepEvents (one for each endpoint) and add them to a std::deque for storage.

    Add pointers to the two SweepEvents to a std::priority_queue, the event queue. This queue uses a custom comparator to order the events from left to right, so the top element is always the leftmost endpoint.

  2. Loop over the event queue (where an “event” means the sweep line passed over the left or right end of a segment). Encountering a left endpoint means the sweep line is newly touching that segment, so add it to a std::set called the sweep list. An important point is that std::set is ordered, and the sweep list uses a comparator that keeps segments in order vertically.

    Encountering a right endpoint means the sweep line is leaving a segment, so that segment is removed from the sweep list.

  3. When a segment is added to the sweep list, it may have up to two neighbors: the segment above it and the segment below it. Call possibleIntersection to check whether it intersects either of those neighbors. (This is nearly sufficient to find all intersections, which is neat.)

  4. If possibleIntersection detects an intersection, it will split each segment into two pieces then and there. The old segment is shortened in-place to become the left part, and a new segment is created for the right part. The new endpoints at the point of intersection are added to the event queue.

  5. Some bookkeeping is done along the way to track which original polygons each segment is inside, and eventually the segments are reconstructed into new polygons.

Hopefully that’s enough to follow along. It took me an inordinately long time to tease this out. The comments aren’t especially helpful.

1
    std::deque<SweepEvent> eventHolder;    // It holds the events generated during the computation of the boolean operation

Syntax and basic semantics

The first step was to get something that rustc could at least parse, which meant translating C++ syntax to Rust syntax.

This was surprisingly straightforward! C++ classes become Rust structs. (There was no inheritance here, thankfully.) All the method declarations go away. Method implementations only need to be indented and wrapped in impl.

I did encounter some unnecessarily obtuse uses of the ternary operator:

1
(prevprev != sl.begin()) ? --prevprev : prevprev = sl.end();

Rust doesn’t have a ternary — you can use a regular if block as an expression — so I expanded these out.

C++ switch blocks become Rust match blocks, but otherwise function basically the same. Rust’s enums are scoped (hallelujah), so I had to explicitly spell out where enum values came from.

The only really annoying part was changing function signatures; C++ types don’t look much at all like Rust types, save for the use of angle brackets. Rust also doesn’t pass by implicit reference, so I needed to sprinkle a few &s around.

I would’ve had a much harder time here if this code had relied on any remotely esoteric C++ functionality, but thankfully it stuck to pretty vanilla features.

Language conventions

This is a geometry problem, so the sample code unsurprisingly has its own home-grown point type. Rather than port that type to Rust, I opted to use the popular euclid crate. Not only is it code I didn’t have to write, but it already does several things that the C++ code was doing by hand inline, like dot products and cross products. And all I had to do was add one line to Cargo.toml to use it! I have no idea how anyone writes C or C++ without a package manager.

The C++ code used getters, i.e. point.x (). I’m not a huge fan of getters, though I do still appreciate the need for them in lowish-level systems languages where you want to future-proof your API and the language wants to keep a clear distinction between attribute access and method calls. But this is a point, which is nothing more than two of the same numeric type glued together; what possible future logic might you add to an accessor? The euclid authors appear to side with me and leave the coordinates as public fields, so I took great joy in removing all the superfluous parentheses.

Polygons are represented with a Polygon class, which has some number of Contours. A contour is a single contiguous loop. Something you’d usually think of as a polygon would only have one, but a shape with a hole would have two: one for the outside, one for the inside. The weird part of this arrangement was that Polygon implemented nearly the entire STL container interface, then waffled between using it and not using it throughout the rest of the code. Rust lets anything in the same module access non-public fields, so I just skipped all that and used polygon.contours directly. Hell, I think I made contours public.

Finally, the SweepEvent type has a pol field that’s declared as an enum PolygonType (either SUBJECT or CLIPPING, to indicate which of the two inputs it is), but then some other code uses the same field as a numeric index into a polygon’s contours. Boy I sure do love static typing where everything’s a goddamn integer. I wanted to extend the algorithm to work on arbitrarily many input polygons anyway, so I scrapped the enum and this became a usize.


Then I got to all the uses of STL. I have only a passing familiarity with the C++ standard library, and this code actually made modest use of it, which caused some fun days-long misunderstandings.

As mentioned, the SweepEvents are stored in a std::deque, which is never read from. It took me a little thinking to realize that the deque was being used as an arena: it’s the canonical home for the structs so pointers to them can be tossed around freely. (It can’t be a std::vector, because that could reallocate and invalidate all the pointers; std::deque is probably a doubly-linked list, and guarantees no reallocation.)

Rust’s standard library does have a doubly-linked list type, but I knew I’d run into ownership hell here later anyway, so I think I replaced it with a Rust Vec to start with. It won’t compile either way, so whatever. We’ll get back to this in a moment.

The list of segments currently intersecting the sweep line is stored in a std::set. That type is explicitly ordered, which I’m very glad I knew already. Rust has two set types, HashSet and BTreeSet; unsurprisingly, the former is unordered and the latter is ordered. Dropping in BTreeSet and fixing some method names got me 90% of the way there.

Which brought me to the other 90%. See, the C++ code also relies on finding nodes adjacent to the node that was just inserted, via STL iterators.

1
2
3
next = prev = se->posSL = it = sl.insert(se).first;
(prev != sl.begin()) ? --prev : prev = sl.end();
++next;

I freely admit I’m bad at C++, but this seems like something that could’ve used… I don’t know, 1 comment. Or variable names more than two letters long. What it actually does is:

  1. Add the current sweep event (se) to the sweep list (sl), which returns a pair whose first element is an iterator pointing at the just-inserted event.

  2. Copies that iterator to several other variables, including prev and next.

  3. If the event was inserted at the beginning of the sweep list, set prev to the sweep list’s end iterator, which in C++ is a legal-but-invalid iterator meaning “the space after the end” or something. This is checked for in later code, to see if there is a previous event to look at. Otherwise, decrement prev, so it’s now pointing at the event immediately before the inserted one.

  4. Increment next normally. If the inserted event is last, then this will bump next to the end iterator anyway.

In other words, I need to get the previous and next elements from a BTreeSet. Rust does have bidirectional iterators, which BTreeSet supports… but BTreeSet::insert only returns a bool telling me whether or not anything was inserted, not the position. I came up with this:

1
2
3
let mut maybe_below = active_segments.range(..segment).last().map(|v| *v);
let mut maybe_above = active_segments.range(segment..).next().map(|v| *v);
active_segments.insert(segment);

The range method returns an iterator over a subset of the tree. The .. syntax makes a range (where the right endpoint is exclusive), so ..segment finds the part of the tree before the new segment, and segment.. finds the part of the tree after it. (The latter would start with the segment itself, except I haven’t inserted it yet, so it’s not actually there.)

Then the standard next() and last() methods on bidirectional iterators find me the element I actually want. But the iterator might be empty, so they both return an Option. Also, iterators tend to return references to their contents, but in this case the contents are already references, and I don’t want a double reference, so the map call dereferences one layer — but only if the Option contains a value. Phew!

This is slightly less efficient than the C++ code, since it has to look up where segment goes three times rather than just one. I might be able to get it down to two with some more clever finagling of the iterator, but microsopic performance considerations were a low priority here.

Finally, the event queue uses a std::priority_queue to keep events in a desired order and efficiently pop the next one off the top.

Except priority queues act like heaps, where the greatest (i.e., last) item is made accessible.

Sorting out sorting

C++ comparison functions return true to indicate that the first argument is less than the second argument. Sweep events occur from left to right. You generally implement sorts so that the first thing comes, erm, first.

But sweep events go in a priority queue, and priority queues surface the last item, not the first. This C++ code handled this minor wrinkle by implementing its comparison backwards.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
struct SweepEventComp : public std::binary_function<SweepEvent, SweepEvent, bool> { // for sorting sweep events
// Compare two sweep events
// Return true means that e1 is placed at the event queue after e2, i.e,, e1 is processed by the algorithm after e2
bool operator() (const SweepEvent* e1, const SweepEvent* e2)
{
    if (e1->point.x () > e2->point.x ()) // Different x-coordinate
        return true;
    if (e2->point.x () > e1->point.x ()) // Different x-coordinate
        return false;
    if (e1->point.y () != e2->point.y ()) // Different points, but same x-coordinate. The event with lower y-coordinate is processed first
        return e1->point.y () > e2->point.y ();
    if (e1->left != e2->left) // Same point, but one is a left endpoint and the other a right endpoint. The right endpoint is processed first
        return e1->left;
    // Same point, both events are left endpoints or both are right endpoints.
    if (signedArea (e1->point, e1->otherEvent->point, e2->otherEvent->point) != 0) // not collinear
        return e1->above (e2->otherEvent->point); // the event associate to the bottom segment is processed first
    return e1->pol > e2->pol;
}
};

Maybe it’s just me, but I had a hell of a time just figuring out what problem this was even trying to solve. I still have to reread it several times whenever I look at it, to make sure I’m getting the right things backwards.

Making this even more ridiculous is that there’s a second implementation of this same sort, with the same name, in another file — and that one’s implemented forwards. And doesn’t use a tiebreaker. I don’t entirely understand how this even compiles, but it does!

I painstakingly translated this forwards to Rust. Unlike the STL, Rust doesn’t take custom comparators for its containers, so I had to implement ordering on the types themselves (which makes sense, anyway). I wrapped everything in the priority queue in a Reverse, which does what it sounds like.

I’m fairly pleased with Rust’s ordering model. Most of the work is done in Ord, a trait with a cmp() method returning an Ordering (one of Less, Equal, and Greater). No magic numbers, no need to implement all six ordering methods! It’s incredible. Ordering even has some handy methods on it, so the usual case of “order by this, then by this” can be written as:

1
2
return self.point().x.cmp(&other.point().x)
    .then(self.point().y.cmp(&other.point().y));

Well. Just kidding! It’s not quite that easy. You see, the points here are composed of floats, and floats have the fun property that not all of them are comparable. Specifically, NaN is not less than, greater than, or equal to anything else, including itself. So IEEE 754 float ordering cannot be expressed with Ord. Unless you want to just make up an answer for NaN, but Rust doesn’t tend to do that.

Rust’s float types thus implement the weaker PartialOrd, whose method returns an Option<Ordering> instead. That makes the above example slightly uglier:

1
2
return self.point().x.partial_cmp(&other.point().x).unwrap()
    .then(self.point().y.partial_cmp(&other.point().y).unwrap())

Also, since I use unwrap() here, this code will panic and take the whole program down if the points are infinite or NaN. Don’t do that.

This caused some minor inconveniences in other places; for example, the general-purpose cmp::min() doesn’t work on floats, because it requires an Ord-erable type. Thankfully there’s a f64::min(), which handles a NaN by returning the other argument.

(Cool story: for the longest time I had this code using f32s. I’m used to translating int to “32 bits”, and apparently that instinct kicked in for floats as well, even floats spelled double.)

The only other sorting adventure was this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
// Due to overlapping edges the resultEvents array can be not wholly sorted
bool sorted = false;
while (!sorted) {
    sorted = true;
    for (unsigned int i = 0; i < resultEvents.size (); ++i) {
        if (i + 1 < resultEvents.size () && sec (resultEvents[i], resultEvents[i+1])) {
            std::swap (resultEvents[i], resultEvents[i+1]);
            sorted = false;
        }
    }
}

(I originally misread this comment as saying “the array cannot be wholly sorted” and had no idea why that would be the case, or why the author would then immediately attempt to bubble sort it.)

I’m still not sure why this uses an ad-hoc sort instead of std::sort. But I’m used to taking for granted that general-purpose sorting implementations are tuned to work well for almost-sorted data, like Python’s. Maybe C++ is untrustworthy here, for some reason. I replaced it with a call to .sort() and all seemed fine.

Phew! We’re getting there. Finally, my code appears to type-check.

But now I see storm clouds gathering on the horizon.

Ownership hell

I have a problem. I somehow run into this problem every single time I use Rust. The solutions are never especially satisfying, and all the hacks I might use if forced to write C++ turn out to be unsound, which is even more annoying because rustc is just sitting there with this smug “I told you so expression” and—

The problem is ownership, which Rust is fundamentally built on. Any given value must have exactly one owner, and Rust must be able to statically convince itself that:

  1. No reference to a value outlives that value.
  2. If a mutable reference to a value exists, no other references to that value exist at the same time.

This is the core of Rust. It guarantees at compile time that you cannot lose pointers to allocated memory, you cannot double-free, you cannot have dangling pointers.

It also completely thwarts a lot of approaches you might be inclined to take if you come from managed languages (where who cares, the GC will take care of it) or C++ (where you just throw pointers everywhere and hope for the best apparently).

For example, pointer loops are impossible. Rust’s understanding of ownership and lifetimes is hierarchical, and it simply cannot express loops. (Rust’s own doubly-linked list type uses raw pointers and unsafe code under the hood, where “unsafe” is an escape hatch for the usual ownership rules. Since I only recently realized that pointers to the inside of a mutable Vec are a bad idea, I figure I should probably not be writing unsafe code myself.)

This throws a few wrenches in the works.

Problem the first: pointer loops

I immediately ran into trouble with the SweepEvent struct itself. A SweepEvent pulls double duty: it represents one endpoint of a segment, but each left endpoint also handles bookkeeping for the segment itself — which means that most of the fields on a right endpoint are unused. Also, and more importantly, each SweepEvent has a pointer to the corresponding SweepEvent at the other end of the same segment. So a pair of SweepEvents point to each other.

Rust frowns upon this. In retrospect, I think I could’ve kept it working, but I also think I’m wrong about that.

My first step was to wrench SweepEvent apart. I moved all of the segment-stuff (which is virtually all of it) into a single SweepSegment type, and then populated the event queue with a SweepEndpoint tuple struct, similar to:

1
2
3
4
5
6
enum SegmentEnd {
    Left,
    Right,
}

struct SweepEndpoint<'a>(&'a SweepSegment, SegmentEnd);

This makes SweepEndpoint essentially a tuple with a name. The 'a is a lifetime and says, more or less, that a SweepEndpoint cannot outlive the SweepSegment it references. Makes sense.

Problem solved! I no longer have mutually referential pointers. But I do still have pointers (well, references), and they have to point to something.

Problem the second: where’s all the data

Which brings me to the problem I always run into with Rust. I have a bucket of things, and I need to refer to some of them multiple times.

I tried half a dozen different approaches here and don’t clearly remember all of them, but I think my core problem went as follows. I translated the C++ class to a Rust struct with some methods hanging off of it. A simplified version might look like this.

1
2
3
4
struct Algorithm {
    arena: LinkedList<SweepSegment>,
    event_queue: BinaryHeap<SweepEndpoint>,
}

Ah, hang on — SweepEndpoint needs to be annotated with a lifetime, so Rust can enforce that those endpoints don’t live longer than the segments they refer to. No problem?

1
2
3
4
struct Algorithm<'a> {
    arena: LinkedList<SweepSegment>,
    event_queue: BinaryHeap<SweepEndpoint<'a>>,
}

Okay! Now for some methods.

1
2
3
4
5
6
7
8
fn run(&mut self) {
    self.arena.push_back(SweepSegment{ data: 5 });
    self.event_queue.push(SweepEndpoint(self.arena.back().unwrap(), SegmentEnd::Left));
    self.event_queue.push(SweepEndpoint(self.arena.back().unwrap(), SegmentEnd::Right));
    for event in &self.event_queue {
        println!("{:?}", event)
    }
}

Aaand… this doesn’t work. Rust “cannot infer an appropriate lifetime for autoref due to conflicting requirements”. The trouble is that self.arena.back() takes a reference to self.arena, and then I put that reference in the event queue. But I promised that everything in the event queue has lifetime 'a, and I don’t actually know how long self lives here; I only know that it can’t outlive 'a, because that would invalidate the references it holds.

A little random guessing let me to change &mut self to &'a mut self — which is fine because the entire impl block this lives in is already parameterized by 'a — and that makes this compile! Hooray! I think that’s because I’m saying self itself has exactly the same lifetime as the references it holds onto, which is true, since it’s referring to itself.

Let’s get a little more ambitious and try having two segments.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
fn run(&'a mut self) {
    self.arena.push_back(SweepSegment{ data: 5 });
    self.event_queue.push(SweepEndpoint(self.arena.back().unwrap(), SegmentEnd::Left));
    self.event_queue.push(SweepEndpoint(self.arena.back().unwrap(), SegmentEnd::Right));
    self.arena.push_back(SweepSegment{ data: 17 });
    self.event_queue.push(SweepEndpoint(self.arena.back().unwrap(), SegmentEnd::Left));
    self.event_queue.push(SweepEndpoint(self.arena.back().unwrap(), SegmentEnd::Right));
    for event in &self.event_queue {
        println!("{:?}", event)
    }
}

Whoops! Rust complains that I’m trying to mutate self.arena while other stuff is referring to it. And, yes, that’s true — I have references to it in the event queue, and Rust is preventing me from potentially deleting everything from the queue when references to it still exist. I’m not actually deleting anything here, of course (though I could be if this were a Vec!), but Rust’s type system can’t encode that (and I dread the thought of a type system that can).

I struggled with this for a while, and rapidly encountered another complete showstopper:

1
2
3
4
5
6
fn run(&'a mut self) {
    self.mutate_something();
    self.mutate_something();
}

fn mutate_something(&'a mut self) {}

Rust objects that I’m trying to borrow self mutably, twice — once for the first call, once for the second.

But why? A borrow is supposed to end automatically once it’s no longer used, right? Maybe if I throw some braces around it for scope… nope, that doesn’t help either.

It’s true that borrows usually end automatically, but here I have explicitly told Rust that mutate_something() should borrow with the lifetime 'a, which is the same as the lifetime in run(). So the first call explicitly borrows self for at least the rest of the method. Removing the lifetime from mutate_something() does fix this error, but if that method tries to add new segments, I’m back to the original problem.

Oh no. The mutation in the C++ code is several calls deep. Porting it directly seems nearly impossible.

The typical solution here — at least, the first thing people suggest to me on Twitter — is to wrap basically everything everywhere in Rc<RefCell<T>>, which gives you something that’s reference-counted (avoiding questions of ownership) and defers borrow checks until runtime (avoiding questions of mutable borrows). But that seems pretty heavy-handed here — not only does RefCell add .borrow() noise anywhere you actually want to interact with the underlying value, but do I really need to refcount these tiny structs that only hold a handful of floats each?

I set out to find a middle ground.

Solution, kind of

I really, really didn’t want to perform serious surgery on this code just to get it to build. I still didn’t know if it worked at all, and now I had to rearrange it without being able to check if I was breaking it further. (This isn’t Rust’s fault; it’s a natural problem with porting between fairly different paradigms.)

So I kind of hacked it into working with minimal changes, producing a grotesque abomination which I’m ashamed to link to. Here’s how!

First, I got rid of the class. It turns out this makes lifetime juggling much easier right off the bat. I’m pretty sure Rust considers everything in a struct to be destroyed simultaneously (though in practice it guarantees it’ll destroy fields in order), which doesn’t leave much wiggle room. Locals within a function, on the other hand, can each have their own distinct lifetimes, which solves the problem of expressing that the borrows won’t outlive the arena.

Speaking of the arena, I solved the mutability problem there by switching to… an arena! The typed-arena crate (a port of a type used within Rust itself, I think) is an allocator — you give it a value, and it gives you back a reference, and the reference is guaranteed to be valid for as long as the arena exists. The method that does this is sneaky and takes &self rather than &mut self, so Rust doesn’t know you’re mutating the arena and won’t complain. (One drawback is that the arena will never free anything you give to it, but that’s not a big problem here.)


My next problem was with mutation. The main loop repeatedly calls possibleIntersection with pairs of segments, which can split either or both segment. Rust definitely doesn’t like that — I’d have to pass in two &muts, both of which are mutable references into the same arena, and I’d have a bunch of immutable references into that arena in the sweep list and elsewhere. This isn’t going to fly.

This is kind of a shame, and is one place where Rust seems a little overzealous. Something like this seems like it ought to be perfectly valid:

1
2
3
4
let mut v = vec![1u32, 2u32];
let a = &mut v[0];
let b = &mut v[1];
// do stuff with a, b

The trouble is, Rust only knows the type signature, which here is something like index_mut(&'a mut self, index: usize) -> &'a T. Nothing about that says that you’re borrowing distinct elements rather than some core part of the type — and, in fact, the above code is only safe because you’re borrowing distinct elements. In the general case, Rust can’t possibly know that. It seems obvious enough from the different indexes, but nothing about the type system even says that different indexes have to return different values. And what if one were borrowed as &mut v[1] and the other were borrowed with v.iter_mut().next().unwrap()?

Anyway, this is exactly where people start to turn to RefCell — if you’re very sure you know better than Rust, then a RefCell will skirt the borrow checker while still enforcing at runtime that you don’t have more than one mutable borrow at a time.

But half the lines in this algorithm examine the endpoints of a segment! I don’t want to wrap the whole thing in a RefCell, or I’ll have to say this everywhere:

1
if segment1.borrow().point.x < segment2.borrow().point.x { ... }

Gross.

But wait — this code only mutates the points themselves in one place. When a segment is split, the original segment becomes the left half, and a new segment is created to be the right half. There’s no compelling need for this; it saves an allocation for the left half, but it’s not critical to the algorithm.

Thus, I settled on a compromise. My segment type now looks like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
struct SegmentPacket {
    // a bunch of flags and whatnot used in the algorithm
}
struct SweepSegment {
    left_point: MapPoint,
    right_point: MapPoint,
    faces_outwards: bool,
    index: usize,
    order: usize,
    packet: RefCell<SegmentPacket>,
}

I do still need to call .borrow() or .borrow_mut() to get at the stuff in the “packet”, but that’s far less common, so there’s less noise overall. And I don’t need to wrap it in Rc because it’s part of a type that’s allocated in the arena and passed around only via references.


This still leaves me with the problem of how to actually perform the splits.

I’m not especially happy with what I came up with, I don’t know if I can defend it, and I suspect I could do much better. I changed possibleIntersection so that rather than performing splits, it returns the points at which each segment needs splitting, in the form (usize, Option<MapPoint>, Option<MapPoint>). (The usize is used as a flag for calling code and oughta be an enum, but, isn’t yet.)

Now the top-level function is responsible for all arena management, and all is well.

Except, er. possibleIntersection is called multiple times, and I don’t want to copy-paste a dozen lines of split code after each call. I tried putting just that code in its own function, which had the world’s most godawful signature, and that didn’t work because… uh… hm. I can’t remember why, exactly! Should’ve written that down.

I tried a local closure next, but closures capture their environment by reference, so now I had references to a bunch of locals for as long as the closure existed, which meant I couldn’t mutate those locals. Argh. (This seems a little silly to me, since the closure’s references cannot possibly be used for anything if the closure isn’t being called, but maybe I’m missing something. Or maybe this is just a limitation of lifetimes.)

Increasingly desperate, I tried using a macro. But… macros are hygienic, which means that any new name you use inside a macro is different from any name outside that macro. The macro thus could not see any of my locals. Usually that’s good, but here I explicitly wanted the macro to mess with my locals.

I was just about to give up and go live as a hermit in a cabin in the woods, when I discovered something quite incredible. You can define local macros! If you define a macro inside a function, then it can see any locals defined earlier in that function. Perfect!

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
macro_rules! _split_segment (
    ($seg:expr, $pt:expr) => (
        {
            let pt = $pt;
            let seg = $seg;
            // ... waaay too much code ...
        }
    );
);

loop {
    // ...
    // This is possibleIntersection, renamed because Rust rightfully complains about camelCase
    let cross = handle_intersections(Some(segment), maybe_above);
    if let Some(pt) = cross.1 {
        segment = _split_segment!(segment, pt);
    }
    if let Some(pt) = cross.2 {
        maybe_above = Some(_split_segment!(maybe_above.unwrap(), pt));
    }
    // ...
}

(This doesn’t actually quite match the original algorithm, which has one case where a segment can be split twice. I realized that I could just do the left-most split, and a later iteration would perform the other split. I sure hope that’s right, anyway.)

It’s a bit ugly, and I ran into a whole lot of implicit behavior from the C++ code that I had to fix — for example, the segment is sometimes mutated just before it’s split, purely as a shortcut for mutating the left part of the split. But it finally compiles! And runs! And kinda worked, a bit!

Aftermath

I still had a lot of work to do.

For one, this code was designed for intersecting two shapes, not mass-intersecting a big pile of shapes. The basic algorithm doesn’t care about how many polygons you start with — all it sees is segments — but the code for constructing the return value needed some heavy modification.

The biggest change by far? The original code traced each segment once, expecting the result to be only a single shape. I had to change that to trace each side of each segment once, since the vast bulk of the output consists of shapes which share a side. This violated a few assumptions, which I had to hack around.

I also ran into a couple very bad edge cases, spent ages debugging them, then found out that the original algorithm had a subtle workaround that I’d commented out because it was awkward to port but didn’t seem to do anything. Whoops!

The worst was a precision error, where a vertical line could be split on a point not quite actually on the line, which wreaked all kinds of havoc. I worked around that with some tasteful rounding, which is highly dubious but makes the output more appealing to my squishy human brain. (I might switch to the original workaround, but I really dislike that even simple cases can spit out points at 1500.0000000000003. The whole thing is parameterized over the coordinate type, so maybe I could throw a rational type in there and cross my fingers?)

All that done, I finally, finally, after a couple months of intermittent progress, got what I wanted!

This is Doom 2’s MAP01. The black area to the left of center is where the player starts. Gray areas indicate where the player can walk from there, with lighter shades indicating more distant areas, where “distance” is measured by the minimum number of line crossings. Red areas can’t be reached at all.

(Note: large playable chunks of the map, including the exit room, are red. That’s because those areas are behind doors, and this code doesn’t understand doors yet.)

(Also note: The big crescent in the lower-right is also black because I was lazy and looked for the player’s starting sector by checking the bbox, and that sector’s bbox happens to match.)

The code that generated this had to go out of its way to delete all the unreachable zones around solid walls. I think I could modify the algorithm to do that on the fly pretty easily, which would probably speed it up a bit too. Downside is that the algorithm would then be pretty specifically tied to this problem, and not usable for any other kind of polygon intersection, which I would think could come up elsewhere? The modifications would be pretty minor, though, so maybe I could confine them to a closure or something.

Some final observations

It runs surprisingly slowly. Like, multiple seconds. Unless I add --release, which speeds it up by a factor of… some number with multiple digits. Wahoo. Debug mode has a high price, especially with a lot of calls in play.

The current state of this code is on GitHub. Please don’t look at it. I’m very sorry.

Honestly, most of my anguish came not from Rust, but from the original code relying on lots of fairly subtle behavior without bothering to explain what it was doing or even hint that anything unusual was going on. God, I hate C++.

I don’t know if the Rust community can learn from this. I don’t know if I even learned from this. Let’s all just quietly forget about it.

Now I just need to figure this one out…

[$] Read-only dynamic data

Post Syndicated from corbet original https://lwn.net/Articles/750215/rss

Kernel developers go to some lengths to mark read-only data so that it can
be protected by the system’s memory-management unit.
Memory that cannot be changed cannot be altered by an attacker to corrupt the
system. But the kernel’s mechanisms for managing read-only memory do not
work for memory that must be initialized after the initial system bootstrap
has completed. A patch set from Igor Stoppa
seeks to change that situation by creating a new API just for
late-initialized read-only data.

Needed: Associate Front End Developer

Post Syndicated from Yev original https://www.backblaze.com/blog/needed-associate-front-end-developer/

Want to work at a company that helps customers in over 150 countries around the world protect the memories they hold dear? Do you want to challenge yourself with a business that serves consumers, SMBs, Enterprise, and developers?

If all that sounds interesting, you might be interested to know that Backblaze is looking for an Associate Front End Developer!

Backblaze is a 10 year old company. Providing great customer experiences is the “secret sauce” that enables us to successfully compete against some of technology’s giants. We’ll finish the year at ~$20MM ARR and are a profitable business. This is an opportunity to have your work shine at scale in one of the fastest growing verticals in tech — Cloud Storage.

You will utilize HTML, ReactJS, CSS and jQuery to develop intuitive, elegant user experiences. As a member of our Front End Dev team, you will work closely with our web development, software design, and marketing teams.

On a day to day basis, you must be able to convert image mockups to HTML or ReactJS – There’s some production work that needs to get done. But you will also be responsible for helping build out new features, rethink old processes, and enabling third party systems to empower our marketing, sales, and support teams.

Our Associate Front End Developer must be proficient in:

  • HTML, CSS, Javascript (ES5)
  • jQuery, Bootstrap (with responsive targets)
  • Understanding of ensuring cross-browser compatibility and browser security for features
  • Basic SEO principles and ensuring that applications will adhere to them
  • Familiarity with ES2015+, ReactJS, unit testing
  • Learning about third party marketing and sales tools through reading documentation. Our systems include Google Tag Manager, Google Analytics, Salesforce, and Hubspot.
  • React Flux, Redux, SASS, Node experience is a plus

We’re looking for someone that is:

  • Passionate about building friendly, easy to use Interfaces and APIs.
  • Likes to work closely with other engineers, support, and marketing to help customers.
  • Is comfortable working independently on a mutually agreed upon prioritization queue (we don’t micromanage, we do make sure tasks are reasonably defined and scoped).
  • Diligent with quality control. Backblaze prides itself on giving our team autonomy to get work done, do the right thing for our customers, and keep a pace that is sustainable over the long run. As such, we expect everyone that checks in code that is stable. We also have a small QA team that operates as a secondary check when needed.

Backblaze Employees Have:

  • Good attitude and willingness to do whatever it takes to get the job done.
  • Strong desire to work for a small fast, paced company.
  • Desire to learn and adapt to rapidly changing technologies and work environment.
  • Comfort with well behaved pets in the office.

This position is located in San Mateo, California. Regular attendance in the office is expected.

Backblaze is an Equal Opportunity Employer and we offer competitive salary and benefits, including our no policy vacation policy.

If this sounds like you…
Send an email to: jobscontact@backblaze.com with:

  1. Associate Front End Dev in the subject line
  2. Your resume attached
  3. An overview of your relevant experience

The post Needed: Associate Front End Developer appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

The Challenges of Opening a Data Center — Part 2

Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/factors-for-choosing-data-center/

Rows of storage pods in a data center

This is part two of a series on the factors that an organization needs to consider when opening a data center and the challenges that must be met in the process.

In Part 1 of this series, we looked at the different types of data centers, the importance of location in planning a data center, data center certification, and the single most expensive factor in running a data center, power.

In Part 2, we continue to look at factors that need to considered both by those interested in a dedicated data center and those seeking to colocate in an existing center.

Power (continued from Part 1)

In part 1, we began our discussion of the power requirements of data centers.

As we discussed, redundancy and failover is a chief requirement for data center power. A redundantly designed power supply system is also a necessity for maintenance, as it enables repairs to be performed on one network, for example, without having to turn off servers, databases, or electrical equipment.

Power Path

The common critical components of a data center’s power flow are:

  • Utility Supply
  • Generators
  • Transfer Switches
  • Distribution Panels
  • Uninterruptible Power Supplies (UPS)
  • PDUs

Utility Supply is the power that comes from one or more utility grids. While most of us consider the grid to be our primary power supply (hats off to those of you who manage to live off the grid), politics, economics, and distribution make utility supply power susceptible to outages, which is why data centers must have autonomous power available to maintain availability.

Generators are used to supply power when the utility supply is unavailable. They convert mechanical energy, usually from motors, to electrical energy.

Transfer Switches are used to transfer electric load from one source or electrical device to another, such as from one utility line to another, from a generator to a utility, or between generators. The transfer could be manually activated or automatic to ensure continuous electrical power.

Distribution Panels get the power where it needs to go, taking a power feed and dividing it into separate circuits to supply multiple loads.

A UPS, as we touched on earlier, ensures that continuous power is available even when the main power source isn’t. It often consists of batteries that can come online almost instantaneously when the current power ceases. The power from a UPS does not have to last a long time as it is considered an emergency measure until the main power source can be restored. Another function of the UPS is to filter and stabilize the power from the main power supply.

Data Center UPS

Data center UPSs

PDU stands for the Power Distribution Unit and is the device that distributes power to the individual pieces of equipment.

Network

After power, the networking connections to the data center are of prime importance. Can the data center obtain and maintain high-speed networking connections to the building? With networking, as with all aspects of a data center, availability is a primary consideration. Data center designers think of all possible ways service can be interrupted or lost, even briefly. Details such as the vulnerabilities in the route the network connections make from the core network (the backhaul) to the center, and where network connections enter and exit a building, must be taken into consideration in network and data center design.

Routers and switches are used to transport traffic between the servers in the data center and the core network. Just as with power, network redundancy is a prime factor in maintaining availability of data center services. Two or more upstream service providers are required to ensure that availability.

How fast a customer can transfer data to a data center is affected by: 1) the speed of the connections the data center has with the outside world, 2) the quality of the connections between the customer and the data center, and 3) the distance of the route from customer to the data center. The longer the length of the route and the greater the number of packets that must be transferred, the more significant a factor will be played by latency in the data transfer. Latency is the delay before a transfer of data begins following an instruction for its transfer. Generally latency, not speed, will be the most significant factor in transferring data to and from a data center. Packets transferred using the TCP/IP protocol suite, which is the conceptual model and set of communications protocols used on the internet and similar computer networks, must be acknowledged when received (ACK’d) and requires a communications roundtrip for each packet. If the data is in larger packets, the number of ACKs required is reduced, so latency will be a smaller factor in the overall network communications speed.

Latency generally will be less significant for data storage transfers than for cloud computing. Optimizations such as multi-threading, which is used in Backblaze’s Cloud Backup service, will generally improve overall transfer throughput if sufficient bandwidth is available.

Those interested in testing the overall speed and latency of their connection to Backblaze’s data centers can use the Check Your Bandwidth tool on our website.
Data center telecommunications equipment

Data center telecommunications equipment

Data center under floor cable runs

Data center under floor cable runs

Cooling

Computer, networking, and power generation equipment generates heat, and there are a number of solutions employed to rid a data center of that heat. The location and climate of the data center is of great importance to the data center designer because the climatic conditions dictate to a large degree what cooling technologies should be deployed that in turn affect the power used and the cost of using that power. The power required and cost needed to manage a data center in a warm, humid climate will vary greatly from managing one in a cool, dry climate. Innovation is strong in this area and many new approaches to efficient and cost-effective cooling are used in the latest data centers.

Switch's uninterruptible, multi-system, HVAC Data Center Cooling Units

Switch’s uninterruptible, multi-system, HVAC Data Center Cooling Units

There are three primary ways data center cooling can be achieved:

Room Cooling cools the entire operating area of the data center. This method can be suitable for small data centers, but becomes more difficult and inefficient as IT equipment density and center size increase.

Row Cooling concentrates on cooling a data center on a row by row basis. In its simplest form, hot aisle/cold aisle data center design involves lining up server racks in alternating rows with cold air intakes facing one way and hot air exhausts facing the other. The rows composed of rack fronts are called cold aisles. Typically, cold aisles face air conditioner output ducts. The rows the heated exhausts pour into are called hot aisles. Typically, hot aisles face air conditioner return ducts.

Rack Cooling tackles cooling on a rack by rack basis. Air-conditioning units are dedicated to specific racks. This approach allows for maximum densities to be deployed per rack. This works best in data centers with fully loaded racks, otherwise there would be too much cooling capacity, and the air-conditioning losses alone could exceed the total IT load.

Security

Data Centers are high-security facilities as they house business, government, and other data that contains personal, financial, and other secure information about businesses and individuals.

This list contains the physical-security considerations when opening or co-locating in a data center:

Layered Security Zones. Systems and processes are deployed to allow only authorized personnel in certain areas of the data center. Examples include keycard access, alarm systems, mantraps, secure doors, and staffed checkpoints.

Physical Barriers. Physical barriers, fencing and reinforced walls are used to protect facilities. In a colocation facility, one customers’ racks and servers are often inaccessible to other customers colocating in the same data center.

Backblaze racks secured in the data center

Backblaze racks secured in the data center

Monitoring Systems. Advanced surveillance technology monitors and records activity on approaching driveways, building entrances, exits, loading areas, and equipment areas. These systems also can be used to monitor and detect fire and water emergencies, providing early detection and notification before significant damage results.

Top-tier providers evaluate their data center security and facilities on an ongoing basis. Technology becomes outdated quickly, so providers must stay-on-top of new approaches and technologies in order to protect valuable IT assets.

To pass into high security areas of a data center requires passing through a security checkpoint where credentials are verified.

Data Center security

The gauntlet of cameras and steel bars one must pass before entering this data center

Facilities and Services

Data center colocation providers often differentiate themselves by offering value-added services. In addition to the required space, power, cooling, connectivity and security capabilities, the best solutions provide several on-site amenities. These accommodations include offices and workstations, conference rooms, and access to phones, copy machines, and office equipment.

Additional features may consist of kitchen facilities, break rooms and relaxation lounges, storage facilities for client equipment, and secure loading docks and freight elevators.

Moving into A Data Center

Moving into a data center is a major job for any organization. We wrote a post last year, Desert To Data in 7 Days — Our New Phoenix Data Center, about what it was like to move into our new data center in Phoenix, Arizona.

Desert To Data in 7 Days — Our New Phoenix Data Center

Visiting a Data Center

Our Director of Product Marketing Andy Klein wrote a popular post last year on what it’s like to visit a data center called A Day in the Life of a Data Center.

A Day in the Life of a Data Center

Would you Like to Know More about The Challenges of Opening and Running a Data Center?

That’s it for part 2 of this series. If readers are interested, we could write a post about some of the new technologies and trends affecting data center design and use. Please let us know in the comments.

Here's a tip!Here’s a tip on finding all the posts tagged with data center on our blog. Just follow https://www.backblaze.com/blog/tag/data-center/.

Don’t miss future posts on data centers and other topics, including hard drive stats, cloud storage, and tips and tricks for backing up to the cloud. Use the Join button above to receive notification of future posts on our blog.

The post The Challenges of Opening a Data Center — Part 2 appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Getting product security engineering right

Post Syndicated from Michal Zalewski original http://lcamtuf.blogspot.com/2018/02/getting-product-security-engineering.html

Product security is an interesting animal: it is a uniquely cross-disciplinary endeavor that spans policy, consulting,
process automation, in-depth software engineering, and cutting-edge vulnerability research. And in contrast to many
other specializations in our field of expertise – say, incident response or network security – we have virtually no
time-tested and coherent frameworks for setting it up within a company of any size.

In my previous post, I shared some thoughts
on nurturing technical organizations and cultivating the right kind of leadership within. Today, I figured it would
be fitting to follow up with several notes on what I learned about structuring product security work – and about actually
making the effort count.

The “comfort zone” trap

For security engineers, knowing your limits is a sought-after quality: there is nothing more dangerous than a security
expert who goes off script and starts dispensing authoritatively-sounding but bogus advice on a topic they know very
little about. But that same quality can be destructive when it prevents us from growing beyond our most familiar role: that of
a critic who pokes holes in other people’s designs.

The role of a resident security critic lends itself all too easily to a sense of supremacy: the mistaken
belief that our cognitive skills exceed the capabilities of the engineers and product managers who come to us for help
– and that the cool bugs we file are the ultimate proof of our special gift. We start taking pride in the mere act
of breaking somebody else’s software – and then write scathing but ineffectual critiques addressed to executives,
demanding that they either put a stop to a project or sign off on a risk. And hey, in the latter case, they better
brace for our triumphant “I told you so” at some later date.

Of course, escalations of this type have their place, but they need to be a very rare sight; when practiced routinely, they are a telltale
sign of a dysfunctional team. We might be failing to think up viable alternatives that are in tune with business or engineering needs; we might
be very unpersuasive, failing to communicate with other rational people in a language they understand; or it might be that our tolerance for risk
is badly out of whack with the rest of the company. Whatever the cause, I’ve seen high-level escalations where the security team
spoke of valiant efforts to resist inexplicably awful design decisions or data sharing setups; and where product leads in turn talked about
pressing business needs randomly blocked by obstinate security folks. Sometimes, simply having them compare their notes would be enough to arrive
at a technical solution – such as sharing a less sensitive subset of the data at hand.

To be effective, any product security program must be rooted in a partnership with the rest of the company, focused on helping them get stuff done
while eliminating or reducing security risks. To combat the toxic us-versus-them mentality, I found it helpful to have some team members with
software engineering backgrounds, even if it’s the ownership of a small open-source project or so. This can broaden our horizons, helping us see
that we all make the same mistakes – and that not every solution that sounds good on paper is usable once we code it up.

Getting off the treadmill

All security programs involve a good chunk of operational work. For product security, this can be a combination of product launch reviews, design consulting requests, incoming bug reports, or compliance-driven assessments of some sort. And curiously, such reactive work also has the property of gradually expanding to consume all the available resources on a team: next year is bound to bring even more review requests, even more regulatory hurdles, and even more incoming bugs to triage and fix.

Being more tractable, such routine tasks are also more readily enshrined in SDLs, SLAs, and all kinds of other official documents that are often mistaken for a mission statement that justifies the existence of our teams. Soon, instead of explaining to a developer why they should fix a particular problem right away, we end up pointing them to page 17 in our severity classification guideline, which defines that “severity 2” vulnerabilities need to be resolved within a month. Meanwhile, another policy may be telling them that they need to run a fuzzer or a web application scanner for a particular number of CPU-hours – no matter whether it makes sense or whether the job is set up right.

To run a product security program that scales sublinearly, stays abreast of future threats, and doesn’t erect bureaucratic speed bumps just for the sake of it, we need to recognize this inherent tendency for operational work to take over – and we need to reign it in. No matter what the last year’s policy says, we usually don’t need to be doing security reviews with a particular cadence or to a particular depth; if we need to scale them back 10% to staff a two-quarter project that fixes an important API and squashes an entire class of bugs, it’s a short-term risk we should feel empowered to take.

As noted in my earlier post, I find contingency planning to be a valuable tool in this regard: why not ask ourselves how the team would cope if the workload went up another 30%, but bad financial results precluded any team growth? It’s actually fun to think about such hypotheticals ahead of the time – and hey, if the ideas sound good, why not try them out today?

Living for a cause

It can be difficult to understand if our security efforts are structured and prioritized right; when faced with such uncertainty, it is natural to stick to the safe fundamentals – investing most of our resources into the very same things that everybody else in our industry appears to be focusing on today.

I think it’s important to combat this mindset – and if so, we might as well tackle it head on. Rather than focusing on tactical objectives and policy documents, try to write down a concise mission statement explaining why you are a team in the first place, what specific business outcomes you are aiming for, how do you prioritize it, and how you want it all to change in a year or two. It should be a fluid narrative that reads right and that everybody on your team can take pride in; my favorite way of starting the conversation is telling folks that we could always have a new VP tomorrow – and that the VP’s first order of business could be asking, “why do you have so many people here and how do I know they are doing the right thing?”. It’s a playful but realistic framing device that motivates people to get it done.

In general, a comprehensive product security program should probably start with the assumption that no matter how many resources we have at our disposal, we will never be able to stay in the loop on everything that’s happening across the company – and even if we did, we’re not going to be able to catch every single bug. It follows that one of our top priorities for the team should be making sure that bugs don’t happen very often; a scalable way of getting there is equipping engineers with intuitive and usable tools that make it easy to perform common tasks without having to worry about security at all. Examples include standardized, managed containers for production jobs; safe-by-default APIs, such as strict contextual autoescaping for XSS or type safety for SQL; security-conscious style guidelines; or plug-and-play libraries that take care of common crypto or ACL enforcement tasks.

Of course, not all problems can be addressed on framework level, and not every engineer will always reach for the right tools. Because of this, the next principle that I found to be worth focusing on is containment and mitigation: making sure that bugs are difficult to exploit when they happen, or that the damage is kept in check. The solutions in this space can range from low-level enhancements (say, hardened allocators or seccomp-bpf sandboxes) to client-facing features such as browser origin isolation or Content Security Policy.

The usual consulting, review, and outreach tasks are an important facet of a product security program, but probably shouldn’t be the sole focus of your team. It’s also best to avoid undue emphasis on vulnerability showmanship: while valuable in some contexts, it creates a hypercompetitive environment that may be hostile to less experienced team members – not to mention, squashing individual bugs offers very limited value if the same issue is likely to be reintroduced into the codebase the next day. I like to think of security reviews as a teaching opportunity instead: it’s a way to raise awareness, form partnerships with engineers, and help them develop lasting habits that reduce the incidence of bugs. Metrics to understand the impact of your work are important, too; if your engagements are seen mostly as a yet another layer of red tape, product teams will stop reaching out to you for advice.

The other tenet of a healthy product security effort requires us to recognize at a scale and given enough time, every defense mechanism is bound to fail – and so, we need ways to prevent bugs from turning into incidents. The efforts in this space may range from developing product-specific signals for the incident response and monitoring teams; to offering meaningful vulnerability reward programs and nourishing a healthy and respectful relationship with the research community; to organizing regular offensive exercises in hopes of spotting bugs before anybody else does.

Oh, one final note: an important feature of a healthy security program is the existence of multiple feedback loops that help you spot problems without the need to micromanage the organization and without being deathly afraid of taking chances. For example, the data coming from bug bounty programs, if analyzed correctly, offers a wonderful way to alert you to systemic problems in your codebase – and later on, to measure the impact of any remediation and hardening work.

SUPER game night 3: GAMES MADE QUICK??? 2.0

Post Syndicated from Eevee original https://eev.ee/blog/2018/01/23/super-game-night-3-games-made-quick-2-0/

Game night continues with a smorgasbord of games from my recent game jam, GAMES MADE QUICK??? 2.0!

The idea was to make a game in only a week while watching AGDQ, as an alternative to doing absolutely nothing for a week while watching AGDQ. (I didn’t submit a game myself; I was chugging along on my Anise game, which isn’t finished yet.)

I can’t very well run a game jam and not play any of the games, so here’s some of them in no particular order! Enjoy!

These are impressions, not reviews. I try to avoid major/ending spoilers, but big plot points do tend to leave impressions.

Weather Quest, by timlmul

short · rpg · jan 2017 · (lin)/mac/win · free on itch · jam entry

Weather Quest is its author’s first shipped game, written completely from scratch (the only vendored code is a micro OO base). It’s very short, but as someone who has also written LÖVE games completely from scratch, I can attest that producing something this game-like in a week is a fucking miracle. Bravo!

For reference, a week into my first foray, I think I was probably still writing my own Tiled importer like an idiot.

Only Mac and Windows builds are on itch, but it’s a LÖVE game, so Linux folks can just grab a zip from GitHub and throw that at love.

FINAL SCORE: ⛅☔☀

Pancake Numbers Simulator, by AnorakThePrimordial

short · sim · jan 2017 · lin/mac/win · free on itch · jam entry

Given a stack of N pancakes (of all different sizes and in no particular order), the Nth pancake number is the most flips you could possibly need to sort the pancakes in order with the smallest on top. A “flip” is sticking a spatula under one of the pancakes and flipping the whole sub-stack over. There’s, ah, a video embedded on the game page with some visuals.

Anyway, this game lets you simulate sorting a stack via pancake flipping, which is surprisingly satisfying! I enjoy cleaning up little simulated messes, such as… incorrectly-sorted pancakes, I guess?

This probably doesn’t work too well as a simulator for solving the general problem — you’d have to find an optimal solution for every permutation of N pancakes to be sure you were right. But it’s a nice interactive illustration of the problem, and if you know the pancake number for your stack size of choice (which I wish the game told you — for seven pancakes, it’s 8), then trying to restore a stack in that many moves makes for a nice quick puzzle.

FINAL SCORE: \(\frac{18}{11}\)

Framed Animals, by chridd

short · metroidvania · jan 2017 · web/win · free on itch · jam entry

The concept here was to kill the frames, save the animals, which is a delightfully literal riff on a long-running AGDQ/SGDQ donation incentive — people vote with their dollars to decide whether Super Metroid speedrunners go out of their way to free the critters who show you how to walljump and shinespark. Super Metroid didn’t have a showing at this year’s AGDQ, and so we have this game instead.

It’s rough, but clever, and I got really into it pretty quickly — each animal you save gives you a new ability (in true Metroid style), and you get to test that ability out by playing as the animal, with only that ability and no others, to get yourself back to the most recent save point.

I did, tragically, manage to get myself stuck near what I think was about to be the end of the game, so some of the animals will remain framed forever. What an unsatisfying conclusion.

Gravity feels a little high given the size of the screen, and like most tile-less platformers, there’s not really any way to gauge how high or long your jump is before you leap. But I’m only even nitpicking because I think this is a great idea and I hope the author really does keep working on it.

FINAL SCORE: $136,596.69

Battle 4 Glory, by Storyteller Games

short · fighter · jan 2017 · win · free on itch · jam entry

This is a Smash Bros-style brawler, complete with the four players, the 2D play area in a 3D world, and the random stage obstacles showing up. I do like the Smash style, despite not otherwise being a fan of fighting games, so it’s nice to see another game chase that aesthetic.

Alas, that’s about as far as it got — which is pretty far for a week of work! I don’t know what more to say, though. The environments are neat, but unless I’m missing something, the only actions at your disposal are jumping and very weak melee attacks. I did have a good few minutes of fun fruitlessly mashing myself against the bumbling bots, as you can see.

FINAL SCORE: 300%

Icnaluferu Guild, Year Sixteen, by CHz

short · adventure · jan 2017 · web · free on itch · jam entry

Here we have the first of several games made with bitsy, a micro game making tool that basically only supports walking around, talking to people, and picking up items.

I tell you this because I think half of my appreciation for this game is in the ways it wriggled against those limits to emulate a Zelda-like dungeon crawler. Everything in here is totally fake, and you can’t really understand just how fake unless you’ve tried to make something complicated with bitsy.

It’s pretty good. The dialogue is entertaining (the rest of your party develops distinct personalities solely through oneliners, somehow), the riffs on standard dungeon fare are charming, and the Link’s Awakening-esque perspective walls around the edges of each room are fucking glorious.

FINAL SCORE: 2 bits

The Lonely Tapes, by JTHomeslice

short · rpg · jan 2017 · web · free on itch · jam entry

Another bitsy entry, this one sees you play as a Wal— sorry, a JogDawg, which has lost its cassette tapes and needs to go recover them!

(A cassette tape is like a VHS, but for music.)

(A VHS is—)

I have the sneaking suspicion that I missed out on some musical in-jokes, due to being uncultured swine. I still enjoyed the game — it’s always clear when someone is passionate about the thing they’re writing about, and I could tell I was awash in that aura even if some of it went over my head. You know you’ve done good if someone from way outside your sphere shows up and still has a good time.

FINAL SCORE: Nine… Inch Nails? They’re a band, right? God I don’t know write your own damn joke

Pirate Kitty-Quest, by TheKoolestKid

short · adventure · jan 2017 · win · free on itch · jam entry

I completely forgot I’d even given “my birthday” and “my cat” as mostly-joking jam themes until I stumbled upon this incredible gem. I don’t think — let me just check here and — yeah no this person doesn’t even follow me on Twitter. I have no idea who they are?

BUT THEY MADE A GAME ABOUT ANISE AS A PIRATE, LOOKING FOR TREASURE

PIRATE. ANISE

PIRATE ANISE!!!

This game wins the jam, hands down. 🏆

FINAL SCORE: Yarr, eight pieces o’ eight

CHIPS Mario, by NovaSquirrel

short · platformer · jan 2017 · (lin/mac)/win · free on itch · jam entry

You see this? This is fucking witchcraft.

This game is made with MegaZeux. MegaZeux games look like THIS. Text-mode, bound to a grid, with two colors per cell. That’s all you get.

Until now, apparently?? The game is a tech demo of “unbound” sprites, which can be drawn on top of the character grid without being aligned to it. And apparently have looser color restrictions.

The collision is a little glitchy, which isn’t surprising for a MegaZeux platformer; I had some fun interactions with platforms a couple times. But hey, goddamn, it’s free-moving Mario, in MegaZeux, what the hell.

(I’m looking at the most recently added games on DigitalMZX now, and I notice that not only is this game in the first slot, but NovaSquirrel’s MegaZeux entry for Strawberry Jam last February is still in the seventh slot. RIP, MegaZeux. I’m surprised a major feature like this was even added if the community has largely evaporated?)

FINAL SCORE: n/a, disqualified for being probably summoned from the depths of Hell

d!¢< pic, by 573 Games

short · story · jan 2017 · web · free on itch · jam entry

This is a short story about not sending dick pics. It’s very short, so I can’t say much without spoiling it, but: you are generally prompted to either text something reasonable, or send a dick pic. You should not send a dick pic.

It’s a fascinating artifact, not because of the work itself, but because it’s so terse that I genuinely can’t tell what the author was even going for. And this is the kind of subject where the author was, surely, going for something. Right? But was it genuinely intended to be educational, or was it tongue-in-cheek about how some dudes still don’t get it? Or is it side-eying the player who clicks the obviously wrong option just for kicks, which is the same reason people do it for real? Or is it commentary on how “send a dick pic” is a literal option for every response in a real conversation, too, and it’s not that hard to just not do it — unless you are one of the kinds of people who just feels a compulsion to try everything, anything, just because you can? Or is it just a quick Twine and I am way too deep in this? God, just play the thing, it’s shorter than this paragraph.

I’m also left wondering when it is appropriate to send a dick pic. Presumably there is a correct time? Hopefully the author will enter Strawberry Jam 2 to expound upon this.

FINAL SCORE: 3½” 😉

Marble maze, by Shtille

short · arcade · jan 2017 · win · free on itch · jam entry

Ah, hm. So this is a maze navigated by rolling a marble around. You use WASD to move the marble, and you can also turn the camera with the arrow keys.

The trouble is… the marble’s movement is always relative to the world, not the camera. That means if you turn the camera 30° and then try to move the marble, it’ll move at a 30° angle from your point of view.

That makes navigating a maze, er, difficult.

Camera-relative movement is the kind of thing I take so much for granted that I wouldn’t even think to do otherwise, and I think it’s valuable to look at surprising choices that violate fundamental conventions, so I’m trying to take this as a nudge out of my comfort zone. What could you design in an interesting way that used world-relative movement? Probably not the player, but maybe something else in the world, as long as you had strong landmarks? Hmm.

FINAL SCORE: ᘔ

Refactor: flight, by fluffy

short · arcade · jan 2017 · lin/mac/win · free on itch · jam entry

Refactor is a game album, which is rather a lot what it sounds like, and Flight is one of the tracks. Which makes this a single, I suppose.

It’s one of those games where you move down an oddly-shaped tunnel trying not to hit the walls, but with some cute twists. Coins and gems hop up from the bottom of the screen in time with the music, and collecting them gives you points. Hitting a wall costs you some points and kills your momentum, but I don’t think outright losing is possible, which is great for me!

Also, the monk cycles through several animal faces. I don’t know why, and it’s very good. One of those odd but memorable details that sits squarely on the intersection of abstract, mysterious, and a bit weird, and refuses to budge from that spot.

The music is great too? Really chill all around.

FINAL SCORE: 🎵🎵🎵🎵

The Adventures of Klyde

short · adventure · jan 2017 · web · free on itch · jam entry

Another bitsy game, this one starring a pig (humorously symbolized by a giant pig nose with ears) who must collect fruit and solve some puzzles.

This is charmingly nostalgic for me — it reminds me of some standard fare in engines like MegaZeux, where the obvious things to do when presented with tiles and pickups were to make mazes. I don’t mean that in a bad way; the maze is the fundamental environmental obstacle.

A couple places in here felt like invisible teleport mazes I had to brute-force, but I might have been missing a hint somewhere. I did make it through with only a little trouble, but alas — I stepped in a bad warp somewhere and got sent to the upper left corner of the starting screen, which is surrounded by walls. So Klyde’s new life is being trapped eternally in a nowhere space.

FINAL SCORE: 19/20 apples

And more

That was only a third of the games, and I don’t think even half of the ones I’ve played. I’ll have to do a second post covering the rest of them? Maybe a third?

Or maybe this is a ludicrous format for commenting on several dozen games and I should try to narrow it down to the ones that resonated the most for Strawberry Jam 2? Maybe??

Cloud Babble: The Jargon of Cloud Storage

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/what-is-cloud-computing/

Cloud Babble

One of the things we in the technology business are good at is coming up with names, phrases, euphemisms, and acronyms for the stuff that we create. The Cloud Storage market is no different, and we’d like to help by illuminating some of the cloud storage related terms that you might come across. We know this is just a start, so please feel free to add in your favorites in the comments section below and we’ll update this post accordingly.

Clouds

The cloud is really just a collection of purpose built servers. In a public cloud the servers are shared between multiple unrelated tenants. In a private cloud, the servers are dedicated to a single tenant or sometimes a group of related tenants. A public cloud is off-site, while a private cloud can be on-site or off-site – or on-prem or off-prem, if you prefer.

Both Sides Now: Hybrid Clouds

Speaking of on-prem and off-prem, there are Hybrid Clouds or Hybrid Data Clouds depending on what you need. Both are based on the idea that you extend your local resources (typically on-prem) to the cloud (typically off-prem) as needed. This extension is controlled by software that decides, based on rules you define, what needs to be done where.

A Hybrid Data Cloud is specific to data. For example, you can set up a rule that says all accounting files that have not been touched in the last year are automatically moved off-prem to cloud storage. The files are still available; they are just no longer stored on your local systems. The rules can be defined to fit an organization’s workflow and data retention policies.

A Hybrid Cloud is similar to a Hybrid Data Cloud except it also extends compute. For example, at the end of the quarter, you can spin up order processing application instances off-prem as needed to add to your on-prem capacity. Of course, determining where the transactional data used and created by these applications resides can be an interesting systems design challenge.

Clouds in my Coffee: Fog

Typically, public and private clouds live in large buildings called data centers. Full of servers, networking equipment, and clean air, data centers need lots of power, lots of networking bandwidth, and lots of space. This often limits where data centers are located. The further away you are from a data center, the longer it generally takes to get your data to and from there. This is known as latency. That’s where “Fog” comes in.

Fog is often referred to as clouds close to the ground. Fog, in our cloud world, is basically having a “little” data center near you. This can make data storage and even cloud based processing faster for everyone nearby. Data, and less so processing, can be transferred to/from the Fog to the Cloud when time is less a factor. Data could also be aggregated in the Fog and sent to the Cloud. For example, your electric meter could report its minute-by-minute status to the Fog for diagnostic purposes. Then once a day the aggregated data could be send to the power company’s Cloud for billing purposes.

Another term used in place of Fog is Edge, as in computing at the Edge. In either case, a given cloud (data center) usually has multiple Edges (little data centers) connected to it. The connection between the Edge and the Cloud is sometimes known as the middle-mile. The network in the middle-mile can be less robust than that required to support a stand-alone data center. For example, the middle-mile can use 1 Gbps lines, versus a data center, which would require multiple 10 Gbps lines.

Heavy Clouds No Rain: Data

We’re all aware that we are creating, processing, and storing data faster than ever before. All of this data is stored in either a structured or more likely an unstructured way. Databases and data warehouses are structured ways to store data, but a vast amount of data is unstructured – meaning the schema and data access requirements are not known until the data is queried. A large pool of unstructured data in a flat architecture can be referred to as a Data Lake.

A Data Lake is often created so we can perform some type of “big data” analysis. In an over simplified example, let’s extend the lake metaphor a bit and ask the question; “how many fish are in our lake?” To get an answer, we take a sufficient sample of our lake’s water (data), count the number of fish we find, and extrapolate based on the size of the lake to get an answer within a given confidence interval.

A Data Lake is usually found in the cloud, an excellent place to store large amounts of non-transactional data. Watch out as this can lead to our data having too much Data Gravity or being locked in the Hotel California. This could also create a Data Silo, thereby making a potential data Lift-and-Shift impossible. Let me explain:

  • Data Gravity — Generally, the more data you collect in one spot, the harder it is to move. When you store data in a public cloud, you have to pay egress and/or network charges to download the data to another public cloud or even to your own on-premise systems. Some public cloud vendors charge a lot more than others, meaning that depending on your public cloud provider, your data could financially have a lot more gravity than you expected.
  • Hotel California — This is like Data Gravity but to a lesser scale. Your data is in the Hotel California if, to paraphrase, “your data can check out any time you want, but it can never leave.” If the cost of downloading your data is limiting the things you want to do with that data, then your data is in the Hotel California. Data is generally most valuable when used, and with cloud storage that can include archived data. This assumes of course that the archived data is readily available, and affordable, to download. When considering a cloud storage project always figure in the cost of using your own data.
  • Data Silo — Over the years, businesses have suffered from organizational silos as information is not shared between different groups, but instead needs to travel up to the top of the silo before it can be transferred to another silo. If your data is “trapped” in a given cloud by the cost it takes to share such data, then you may have a Data Silo, and that’s exactly opposite of what the cloud should do.
  • Lift-and-Shift — This term is used to define the movement of data or applications from one data center to another or from on-prem to off-prem systems. The move generally occurs all at once and once everything is moved, systems are operational and data is available at the new location with few, if any, changes. If your data has too much gravity or is locked in a hotel, a data lift-and-shift may break the bank.

I Can See Clearly Now

Hopefully, the cloudy terms we’ve covered are well, less cloudy. As we mentioned in the beginning, our compilation is just a start, so please feel free to add in your favorite cloud term in the comments section below and we’ll update this post with your contributions. Keep your entries “clean,” and please no words or phrases that are really adverts for your company. Thanks.

The post Cloud Babble: The Jargon of Cloud Storage appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Early Challenges: Managing Cash Flow

Post Syndicated from Gleb Budman original https://www.backblaze.com/blog/managing-cash-flow/

Cash flow projection charts

This post by Backblaze’s CEO and co-founder Gleb Budman is the eighth in a series about entrepreneurship. You can choose posts in the series from the list below:

  1. How Backblaze got Started: The Problem, The Solution, and the Stuff In-Between
  2. Building a Competitive Moat: Turning Challenges Into Advantages
  3. From Idea to Launch: Getting Your First Customers
  4. How to Get Your First 1,000 Customers
  5. Surviving Your First Year
  6. How to Compete with Giants
  7. The Decision on Transparency
  8. Early Challenges: Managing Cash Flow

Use the Join button above to receive notification of new posts in this series.

Running out of cash is one of the quickest ways for a startup to go out of business. When you are starting a company the question of where to get cash is usually the top priority, but managing cash flow is critical for every stage in the lifecycle of a company. As a primarily bootstrapped but capital-intensive business, managing cash flow at Backblaze was and still is a key element of our success and requires continued focus. Let’s look at what we learned over the years.

Raising Your Initial Funding

When starting a tech business in Silicon Valley, the default assumption is that you will immediately try to raise venture funding. There are certainly many advantages to raising funding — not the least of which is that you don’t need to be cash-flow positive since you have cash in the bank and the expectation is that you will have a “burn rate,” i.e. you’ll be spending more than you make.

Note: While you’re not expected to be cash-flow positive, that doesn’t mean you don’t have to worry about cash. Cash-flow management will determine your burn rate. Whether you can get to cash-flow breakeven or need to raise another round of funding is a direct byproduct of your cash flow management.

Also, raising funding takes time (most successful fundraising cycles take 3-6 months start-to-finish), and time at a startup is in short supply. Constantly trying to raise funding can take away from product development and pursuing growth opportunities. If you’re not successful in raising funding, you then have to either shut down or find an alternate method of funding the business.

Sources of Funding

Depending on the stage of the company, type of company, and other factors, you may have access to different sources of funding. Let’s list a number of them:

Customers

Sales — the best kind of funding. It is non-dilutive, doesn’t have to be paid back, and is a direct metric of the success of your company.

Pre-Sales — some customers may be willing to pay you for a product in beta, a test, or pre-pay for a product they’ll receive when finished. Pre-Sales income also is great because it shares the characteristics of cash from sales, but you get the cash early. It also can be a good sign that the product you’re building fills a market need. We started charging for Backblaze computer backup while it was still in private beta, which allowed us to not only collect cash from customers, but also test the billing experience and users’ real desire for the service.

Services — if you’re a service company and customers are paying you for that, great. You can effectively scale for the number of hours available in a day. As demand grows, you can add more employees to increase the total number of billable hours.

Note: If you’re a product company and customers are paying you to consult, that can provide much needed cash, and could provide feedback toward the right product. However, it can also distract from your core business, send you down a path where you’re building a product for a single customer, and addict you to a path that prevents you from building a scalable business.

Investors

Yourself — you likely are putting your time into the business, and deferring salary in the process. You may also put your own cash into the business either as an investment or a loan.

Angels — angels are ideal as early investors since they are used to investing in businesses with little to no traction. AngelList is a good place to find them, though finding people you’re connected with through someone that knows you well is best.

Crowdfunding — a component of the JOBS Act permitted entrepreneurs to raise money from nearly anyone since May 2016. The SEC imposes limits on both investors and the companies. This article goes into some depth on the options and sites available.

VCs — VCs are ideal for companies that need to raise at least a few million dollars and intend to build a business that will be worth over $1 billion.

Debt

Friends & Family — F&F are often the first people to give you money because they are investing in you. It’s great to have some early supporters, but it also can be risky to take money from people who aren’t used to the risks. The key advice here is to only take money from people who won’t mind losing it. If someone is talking about using their children’s college funds or borrowing from their 401k, say ‘no thank you’ — even if they’re sure they want to loan you money.

Bank Loans — a variety of loan types exist, but most either require the company to have been operational for a couple years, be able to borrow against money the company has or is making, or be able to get a personal guarantee from the founders whereby their own credit is on the line. Fundera provides a good overview of loan options and can help secure some, but most will not be an option for a brand new startup.

Grants

Government — in some areas there is the potential for government grants to facilitate research. The SBIR program facilitates some such grants.

At Backblaze, we used a number of these options:

• Investors/Yourself
We loaned a cumulative total of a couple hundred thousand dollars to the company and invested our time by going without a salary for a year and a half.
• Customers/Pre-Sales
We started selling the Backblaze service while it was still in beta.
• Customers/Sales
We launched v1.0 and kept selling.
• Investors/Angels
After a year and a half, we raised $370k from 11 angels. All of them were either people whom we knew personally or were a strong recommendation from a mutual friend.
• Debt/Loans
After a couple years we were able to get equipment leases whereby the Storage Pods and hard drives were used as collateral to secure the lease on them.
• Investors/VCs
Ater five years we raised $5m from TMT Investments to add to the balance sheet and invest in growth.

The variety and quantity of sources we used is by no means uncommon.

GAAP vs. Cash

Most companies start tracking financials based on cash, and as they scale they switch to GAAP (Generally Accepted Accounting Principles). Cash is easier to track — we got paid $XXXX and spent $YYY — and as often mentioned, is required for the business to stay alive. GAAP has more subtlety and complexity, but provides a clearer picture of how the business is really doing. Backblaze was on a ‘cash’ system for the first few years, then switched to GAAP. For this post, I’m going to focus on things that help cash flow, not GAAP profitability.

Stages of Cash Flow Management

All-spend

In a pure service business (e.g. solo proprietor law firm), you may have no expenses other than your time, so this stage doesn’t exist. However, in a product business there is a period of time where you are building the product and have nothing to sell. You have zero cash coming in, but have cash going out. Your cash-flow is completely negative and you need funds to cover that.

Sales-generating

Starting to see cash come in from customers is thrilling. I initially had our system set up to email me with every $5 payment we received. You’re making sales, but not covering expenses.

Ramen-profitable

But it takes a lot of $5 payments to pay for servers and salaries, so for a while expenses are likely to outstrip sales. Getting to ramen-profitable is a critical stage where sales cover the business expenses and are “paying enough for the founders to eat ramen.” This extends the runway for a business, but is not completely sustainable, since presumably the founders can’t (or won’t) live forever on a subsistence salary.

Business-profitable

This is the ultimate stage whereby the business is truly profitable, including paying everyone market-rate salaries. A business at this stage is self-sustaining. (Of course, market shifts and plenty of other challenges can kill the business, but cash-flow issues alone will not.)

Note, I’m using the word ‘profitable’ here to mean this is still on a cash-basis.

Backblaze was in the all-spend stage for just over a year, during which time we built the service and hadn’t yet made the service available to customers. Backblaze was in the sales-generating stage for nearly another year before the company was barely ramen-profitable where sales were covering the company expenses and paying the founders minimum wage. (I say ‘barely’ since minimum wage in the SF Bay Area is arguably never subsistence.) It took almost three more years before the company was business-profitable, paying everyone including the founders market-rate.

Cash Flow Forecasting

When raising funding it’s helpful to think of milestones reached. You don’t necessarily need enough cash on day one to last for the next 100 years of the company. Some good milestones to consider are how much cash you need to prove there is a market need, prove you can build a product to meet that need, or get to ramen-profitable.

Two things to consider:

1) Unit Economics (COGS)

If your product is 100% software, this may not be relevant. Once software is built it costs effectively nothing to deliver the product to one customer or one million customers. However, in most businesses there is some incremental cost to provide the product. If you’re selling a hardware device, perhaps you sell it for $100 but it costs you $50 to make it. This is called “COGS” (Cost of Goods Sold).

Many products rely on cloud services where the costs scale with growth. That model works great, but it’s still important to understand what the costs are for the cloud service you use per unit of product you sell.

Support is often done by the founders early-on in a business, but that is another real cost to factor in and estimate on a per-user basis. Taking all of the per unit costs combined, you may charge $10/month/user for your service, but if it costs you $7/month/user in cloud services, you’re only netting $3/month/user.

2) Operating Expenses (OpEx)

These are expenses that don’t scale with the number of product units you sell. Typically this includes research & development, sales & marketing, and general & administrative expenses. Presumably there is a certain level of these functions required to build the product, market it, sell it, and run the organization. You can choose to invest or cut back on these, but you’ll still make the same amount per product unit.

Incremental Net Profit Per Unit

If you’ve calculated your COGS and your unit economics are “upside down,” where the amount you charge is less than that it costs you to provide your service, it’s worth thinking hard about how that’s going to change over time. If it will not change, there is no scale that will make the business work. Presuming you do make money on each unit of product you sell — what is sometimes referred to as “Contribution Margin” — consider how many of those product units you need to sell to cover your operating expenses as described above.

Calculating Your Profit

The math on getting to ramen-profitable is simple:

(Number of Product Units Sold x Contribution Margin) - Operating Expenses = Profit

If your operating expenses include subsistence salaries for the founders and profit > $0, you’re ramen-profitable.

Improving Cash Flow

Having access to sources of cash, whether from selling to customers or other methods, is excellent. But needing less cash gives you more choices and allows you to either dilute less, owe less, or invest more.

There are two ways to improve cash flow:

1) Collect More Cash

The best way to collect more cash is to provide more value to your customers and as a result have them pay you more. Additional features/products/services can allow this. However, you can also collect more cash by changing how you charge for your product. If you have a subscription, changing from charging monthly to yearly dramatically improves your cash flow. If you have a product that customers use up, selling a year’s supply instead of selling them one-by-one can help.

2) Spend Less Cash

Reducing COGS is a fantastic way to spend less cash in a scalable way. If you can do this without harming the product or customer experience, you win. There are a myriad of ways to also reduce operating expenses, including taking sub-market salaries, using your home instead of renting office space, staying focused on your core product, etc.

Ultimately, collecting more and spending less cash dramatically simplifies the process of getting to ramen-profitable and later to business-profitable.

Be Careful (Why GAAP Matters)

A word of caution: while running out of cash will put you out of business immediately, overextending yourself will likely put you out of business not much later. GAAP shows how a business is really doing; cash doesn’t. If you only focus on cash, it is possible to commit yourself to both delivering products and repaying loans in the future in an unsustainable fashion. If you’re taking out loans, watch the total balance and monthly payments you’re committing to. If you’re asking customers for pre-payment, make sure you believe you can deliver on what they’ve paid for.

Summary

There are numerous challenges to building a business, and ensuring you have enough cash is amongst the most important. Having the cash to keep going lets you keep working on all of the other challenges. The frameworks above were critical for maintaining Backblaze’s cash flow and cash balance. Hopefully you can take some of the lessons we learned and apply them to your business. Let us know what works for you in the comments below.

The post Early Challenges: Managing Cash Flow appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Random with care

Post Syndicated from Eevee original https://eev.ee/blog/2018/01/02/random-with-care/

Hi! Here are a few loose thoughts about picking random numbers.

A word about crypto

DON’T ROLL YOUR OWN CRYPTO

This is all aimed at frivolous pursuits like video games. Hell, even video games where money is at stake should be deferring to someone who knows way more than I do. Otherwise you might find out that your deck shuffles in your poker game are woefully inadequate and some smartass is cheating you out of millions. (If your random number generator has fewer than 226 bits of state, it can’t even generate every possible shuffling of a deck of cards!)

Use the right distribution

Most languages have a random number primitive that spits out a number uniformly in the range [0, 1), and you can go pretty far with just that. But beware a few traps!

Random pitches

Say you want to pitch up a sound by a random amount, perhaps up to an octave. Your audio API probably has a way to do this that takes a pitch multiplier, where I say “probably” because that’s how the only audio API I’ve used works.

Easy peasy. If 1 is unchanged and 2 is pitched up by an octave, then all you need is rand() + 1. Right?

No! Pitch is exponential — within the same octave, the “gap” between C and C♯ is about half as big as the gap between B and the following C. If you pick a pitch multiplier uniformly, you’ll have a noticeable bias towards the higher pitches.

One octave corresponds to a doubling of pitch, so if you want to pick a random note, you want 2 ** rand().

Random directions

For two dimensions, you can just pick a random angle with rand() * TAU.

If you want a vector rather than an angle, or if you want a random direction in three dimensions, it’s a little trickier. You might be tempted to just pick a random point where each component is rand() * 2 - 1 (ranging from −1 to 1), but that’s not quite right. A direction is a point on the surface (or, equivalently, within the volume) of a sphere, and picking each component independently produces a point within the volume of a cube; the result will be a bias towards the corners of the cube, where there’s much more extra volume beyond the sphere.

No? Well, just trust me. I don’t know how to make a diagram for this.

Anyway, you could use the Pythagorean theorem a few times and make a huge mess of things, or it turns out there’s a really easy way that even works for two or four or any number of dimensions. You pick each coordinate from a Gaussian (normal) distribution, then normalize the resulting vector. In other words, using Python’s random module:

1
2
3
4
5
6
def random_direction():
    x = random.gauss(0, 1)
    y = random.gauss(0, 1)
    z = random.gauss(0, 1)
    r = math.sqrt(x*x + y*y + z*z)
    return x/r, y/r, z/r

Why does this work? I have no idea!

Note that it is possible to get zero (or close to it) for every component, in which case the result is nonsense. You can re-roll all the components if necessary; just check that the magnitude (or its square) is less than some epsilon, which is equivalent to throwing away a tiny sphere at the center and shouldn’t affect the distribution.

Beware Gauss

Since I brought it up: the Gaussian distribution is a pretty nice one for choosing things in some range, where the middle is the common case and should appear more frequently.

That said, I never use it, because it has one annoying drawback: the Gaussian distribution has no minimum or maximum value, so you can’t really scale it down to the range you want. In theory, you might get any value out of it, with no limit on scale.

In practice, it’s astronomically rare to actually get such a value out. I did a hundred million trials just to see what would happen, and the largest value produced was 5.8.

But, still, I’d rather not knowingly put extremely rare corner cases in my code if I can at all avoid it. I could clamp the ends, but that would cause unnatural bunching at the endpoints. I could reroll if I got a value outside some desired range, but I prefer to avoid rerolling when I can, too; after all, it’s still (astronomically) possible to have to reroll for an indefinite amount of time. (Okay, it’s really not, since you’ll eventually hit the period of your PRNG. Still, though.) I don’t bend over backwards here — I did just say to reroll when picking a random direction, after all — but when there’s a nicer alternative I’ll gladly use it.

And lo, there is a nicer alternative! Enter the beta distribution. It always spits out a number in [0, 1], so you can easily swap it in for the standard normal function, but it takes two “shape” parameters α and β that alter its behavior fairly dramatically.

With α = β = 1, the beta distribution is uniform, i.e. no different from rand(). As α increases, the distribution skews towards the right, and as β increases, the distribution skews towards the left. If α = β, the whole thing is symmetric with a hump in the middle. The higher either one gets, the more extreme the hump (meaning that value is far more common than any other). With a little fiddling, you can get a number of interesting curves.

Screenshots don’t really do it justice, so here’s a little Wolfram widget that lets you play with α and β live:

Note that if α = 1, then 1 is a possible value; if β = 1, then 0 is a possible value. You probably want them both greater than 1, which clamps the endpoints to zero.

Also, it’s possible to have either α or β or both be less than 1, but this creates very different behavior: the corresponding endpoints become poles.

Anyway, something like α = β = 3 is probably close enough to normal for most purposes but already clamped for you. And you could easily replicate something like, say, NetHack’s incredibly bizarre rnz function.

Random frequency

Say you want some event to have an 80% chance to happen every second. You (who am I kidding, I) might be tempted to do something like this:

1
2
if random() < 0.8 * dt:
    do_thing()

In an ideal world, dt is always the same and is equal to 1 / f, where f is the framerate. Replace that 80% with a variable, say P, and every tic you have a P / f chance to do the… whatever it is.

Each second, f tics pass, so you’ll make this check f times. The chance that any check succeeds is the inverse of the chance that every check fails, which is \(1 – \left(1 – \frac{P}{f}\right)^f\).

For P of 80% and a framerate of 60, that’s a total probability of 55.3%. Wait, what?

Consider what happens if the framerate is 2. On the first tic, you roll 0.4 twice — but probabilities are combined by multiplying, and splitting work up by dt only works for additive quantities. You lose some accuracy along the way. If you’re dealing with something that multiplies, you need an exponent somewhere.

But in this case, maybe you don’t want that at all. Each separate roll you make might independently succeed, so it’s possible (but very unlikely) that the event will happen 60 times within a single second! Or 200 times, if that’s someone’s framerate.

If you explicitly want something to have a chance to happen on a specific interval, you have to check on that interval. If you don’t have a gizmo handy to run code on an interval, it’s easy to do yourself with a time buffer:

1
2
3
4
5
6
timer += dt
# here, 1 is the "every 1 seconds"
while timer > 1:
    timer -= 1
    if random() < 0.8:
        do_thing()

Using while means rolls still happen even if you somehow skipped over an entire second.

(For the curious, and the nerds who already noticed: the expression \(1 – \left(1 – \frac{P}{f}\right)^f\) converges to a specific value! As the framerate increases, it becomes a better and better approximation for \(1 – e^{-P}\), which for the example above is 0.551. Hey, 60 fps is pretty accurate — it’s just accurately representing something nowhere near what I wanted. Er, you wanted.)

Rolling your own

Of course, you can fuss with the classic [0, 1] uniform value however you want. If I want a bias towards zero, I’ll often just square it, or multiply two of them together. If I want a bias towards one, I’ll take a square root. If I want something like a Gaussian/normal distribution, but with clearly-defined endpoints, I might add together n rolls and divide by n. (The normal distribution is just what you get if you roll infinite dice and divide by infinity!)

It’d be nice to be able to understand exactly what this will do to the distribution. Unfortunately, that requires some calculus, which this post is too small to contain, and which I didn’t even know much about myself until I went down a deep rabbit hole while writing, and which in many cases is straight up impossible to express directly.

Here’s the non-calculus bit. A source of randomness is often graphed as a PDF — a probability density function. You’ve almost certainly seen a bell curve graphed, and that’s a PDF. They’re pretty nice, since they do exactly what they look like: they show the relative chance that any given value will pop out. On a bog standard bell curve, there’s a peak at zero, and of course zero is the most common result from a normal distribution.

(Okay, actually, since the results are continuous, it’s vanishingly unlikely that you’ll get exactly zero — but you’re much more likely to get a value near zero than near any other number.)

For the uniform distribution, which is what a classic rand() gives you, the PDF is just a straight horizontal line — every result is equally likely.


If there were a calculus bit, it would go here! Instead, we can cheat. Sometimes. Mathematica knows how to work with probability distributions in the abstract, and there’s a free web version you can use. For the example of squaring a uniform variable, try this out:

1
PDF[TransformedDistribution[u^2, u \[Distributed] UniformDistribution[{0, 1}]], u]

(The \[Distributed] is a funny tilde that doesn’t exist in Unicode, but which Mathematica uses as a first-class operator. Also, press shiftEnter to evaluate the line.)

This will tell you that the distribution is… \(\frac{1}{2\sqrt{u}}\). Weird! You can plot it:

1
Plot[%, {u, 0, 1}]

(The % refers to the result of the last thing you did, so if you want to try several of these, you can just do Plot[PDF[…], u] directly.)

The resulting graph shows that numbers around zero are, in fact, vastly — infinitely — more likely than anything else.

What about multiplying two together? I can’t figure out how to get Mathematica to understand this, but a great amount of digging revealed that the answer is -ln x, and from there you can plot them both on Wolfram Alpha. They’re similar, though squaring has a much better chance of giving you high numbers than multiplying two separate rolls — which makes some sense, since if either of two rolls is a low number, the product will be even lower.

What if you know the graph you want, and you want to figure out how to play with a uniform roll to get it? Good news! That’s a whole thing called inverse transform sampling. All you have to do is take an integral. Good luck!


This is all extremely ridiculous. New tactic: Just Simulate The Damn Thing. You already have the code; run it a million times, make a histogram, and tada, there’s your PDF. That’s one of the great things about computers! Brute-force numerical answers are easy to come by, so there’s no excuse for producing something like rnz. (Though, be sure your histogram has sufficiently narrow buckets — I tried plotting one for rnz once and the weird stuff on the left side didn’t show up at all!)

By the way, I learned something from futzing with Mathematica here! Taking the square root (to bias towards 1) gives a PDF that’s a straight diagonal line, nothing like the hyperbola you get from squaring (to bias towards 0). How do you get a straight line the other way? Surprise: \(1 – \sqrt{1 – u}\).

Okay, okay, here’s the actual math

I don’t claim to have a very firm grasp on this, but I had a hell of a time finding it written out clearly, so I might as well write it down as best I can. This was a great excuse to finally set up MathJax, too.

Say \(u(x)\) is the PDF of the original distribution and \(u\) is a representative number you plucked from that distribution. For the uniform distribution, \(u(x) = 1\). Or, more accurately,

$$
u(x) = \begin{cases}
1 & \text{ if } 0 \le x \lt 1 \\
0 & \text{ otherwise }
\end{cases}
$$

Remember that \(x\) here is a possible outcome you want to know about, and the PDF tells you the relative probability that a roll will be near it. This PDF spits out 1 for every \(x\), meaning every number between 0 and 1 is equally likely to appear.

We want to do something to that PDF, which creates a new distribution, whose PDF we want to know. I’ll use my original example of \(f(u) = u^2\), which creates a new PDF \(v(x)\).

The trick is that we need to work in terms of the cumulative distribution function for \(u\). Where the PDF gives the relative chance that a roll will be (“near”) a specific value, the CDF gives the relative chance that a roll will be less than a specific value.

The conventions for this seem to be a bit fuzzy, and nobody bothers to explain which ones they’re using, which makes this all the more confusing to read about… but let’s write the CDF with a capital letter, so we have \(U(x)\). In this case, \(U(x) = x\), a straight 45° line (at least between 0 and 1). With the definition I gave, this should make sense. At some arbitrary point like 0.4, the value of the PDF is 1 (0.4 is just as likely as anything else), and the value of the CDF is 0.4 (you have a 40% chance of getting a number from 0 to 0.4).

Calculus ahoy: the PDF is the derivative of the CDF, which means it measures the slope of the CDF at any point. For \(U(x) = x\), the slope is always 1, and indeed \(u(x) = 1\). See, calculus is easy.

Okay, so, now we’re getting somewhere. What we want is the CDF of our new distribution, \(V(x)\). The CDF is defined as the probability that a roll \(v\) will be less than \(x\), so we can literally write:

$$V(x) = P(v \le x)$$

(This is why we have to work with CDFs, rather than PDFs — a PDF gives the chance that a roll will be “nearby,” whatever that means. A CDF is much more concrete.)

What is \(v\), exactly? We defined it ourselves; it’s the do something applied to a roll from the original distribution, or \(f(u)\).

$$V(x) = P\!\left(f(u) \le x\right)$$

Now the first tricky part: we have to solve that inequality for \(u\), which means we have to do something, backwards to \(x\).

$$V(x) = P\!\left(u \le f^{-1}(x)\right)$$

Almost there! We now have a probability that \(u\) is less than some value, and that’s the definition of a CDF!

$$V(x) = U\!\left(f^{-1}(x)\right)$$

Hooray! Now to turn these CDFs back into PDFs, all we need to do is differentiate both sides and use the chain rule. If you never took calculus, don’t worry too much about what that means!

$$v(x) = u\!\left(f^{-1}(x)\right)\left|\frac{d}{dx}f^{-1}(x)\right|$$

Wait! Where did that absolute value come from? It takes care of whether \(f(x)\) increases or decreases. It’s the least interesting part here by far, so, whatever.

There’s one more magical part here when using the uniform distribution — \(u(\dots)\) is always equal to 1, so that entire term disappears! (Note that this only works for a uniform distribution with a width of 1; PDFs are scaled so the entire area under them sums to 1, so if you had a rand() that could spit out a number between 0 and 2, the PDF would be \(u(x) = \frac{1}{2}\).)

$$v(x) = \left|\frac{d}{dx}f^{-1}(x)\right|$$

So for the specific case of modifying the output of rand(), all we have to do is invert, then differentiate. The inverse of \(f(u) = u^2\) is \(f^{-1}(x) = \sqrt{x}\) (no need for a ± since we’re only dealing with positive numbers), and differentiating that gives \(v(x) = \frac{1}{2\sqrt{x}}\). Done! This is also why square root comes out nicer; inverting it gives \(x^2\), and differentiating that gives \(2x\), a straight line.

Incidentally, that method for turning a uniform distribution into any distribution — inverse transform sampling — is pretty much the same thing in reverse: integrate, then invert. For example, when I saw that taking the square root gave \(v(x) = 2x\), I naturally wondered how to get a straight line going the other way, \(v(x) = 2 – 2x\). Integrating that gives \(2x – x^2\), and then you can use the quadratic formula (or just ask Wolfram Alpha) to solve \(2x – x^2 = u\) for \(x\) and get \(f(u) = 1 – \sqrt{1 – u}\).

Multiply two rolls is a bit more complicated; you have to write out the CDF as an integral and you end up doing a double integral and wow it’s a mess. The only thing I’ve retained is that you do a division somewhere, which then gets integrated, and that’s why it ends up as \(-\ln x\).

And that’s quite enough of that! (Okay but having math in my blog is pretty cool and I will definitely be doing more of this, sorry, not sorry.)

Random vs varied

Sometimes, random isn’t actually what you want. We tend to use the word “random” casually to mean something more like chaotic, i.e., with no discernible pattern. But that’s not really random. In fact, given how good humans can be at finding incidental patterns, they aren’t all that unlikely! Consider that when you roll two dice, they’ll come up either the same or only one apart almost half the time. Coincidence? Well, yes.

If you ask for randomness, you’re saying that any outcome — or series of outcomes — is acceptable, including five heads in a row or five tails in a row. Most of the time, that’s fine. Some of the time, it’s less fine, and what you really want is variety. Here are a couple examples and some fairly easy workarounds.

NPC quips

The nature of games is such that NPCs will eventually run out of things to say, at which point further conversation will give the player a short brush-off quip — a slight nod from the designer to the player that, hey, you hit the end of the script.

Some NPCs have multiple possible quips and will give one at random. The trouble with this is that it’s very possible for an NPC to repeat the same quip several times in a row before abruptly switching to another one. With only a few options to choose from, getting the same option twice or thrice (especially across an entire game, which may have numerous NPCs) isn’t all that unlikely. The notion of an NPC quip isn’t very realistic to start with, but having someone repeat themselves and then abruptly switch to something else is especially jarring.

The easy fix is to show the quips in order! Paradoxically, this is more consistently varied than choosing at random — the original “order” is likely to be meaningless anyway, and it already has the property that the same quip can never appear twice in a row.

If you like, you can shuffle the list of quips every time you reach the end, but take care here — it’s possible that the last quip in the old order will be the same as the first quip in the new order, so you may still get a repeat. (Of course, you can just check for this case and swap the first quip somewhere else if it bothers you.)

That last behavior is, in fact, the canonical way that Tetris chooses pieces — the game simply shuffles a list of all 7 pieces, gives those to you in shuffled order, then shuffles them again to make a new list once it’s exhausted. There’s no avoidance of duplicates, though, so you can still get two S blocks in a row, or even two S and two Z all clumped together, but no more than that. Some Tetris variants take other approaches, such as actively avoiding repeats even several pieces apart or deliberately giving you the worst piece possible.

Random drops

Random drops are often implemented as a flat chance each time. Maybe enemies have a 5% chance to drop health when they die. Legally speaking, over the long term, a player will see health drops for about 5% of enemy kills.

Over the short term, they may be desperate for health and not survive to see the long term. So you may want to put a thumb on the scale sometimes. Games in the Metroid series, for example, have a somewhat infamous bias towards whatever kind of drop they think you need — health if your health is low, missiles if your missiles are low.

I can’t give you an exact approach to use, since it depends on the game and the feeling you’re going for and the variables at your disposal. In extreme cases, you might want to guarantee a health drop from a tough enemy when the player is critically low on health. (Or if you’re feeling particularly evil, you could go the other way and deny the player health when they most need it…)

The problem becomes a little different, and worse, when the event that triggers the drop is relatively rare. The pathological case here would be something like a raid boss in World of Warcraft, which requires hours of effort from a coordinated group of people to defeat, and which has some tiny chance of dropping a good item that will go to only one of those people. This is why I stopped playing World of Warcraft at 60.

Dialing it back a little bit gives us Enter the Gungeon, a roguelike where each room is a set of encounters and each floor only has a dozen or so rooms. Initially, you have a 1% chance of getting a reward after completing a room — but every time you complete a room and don’t get a reward, the chance increases by 9%, up to a cap of 80%. Once you get a reward, the chance resets to 1%.

The natural question is: how frequently, exactly, can a player expect to get a reward? We could do math, or we could Just Simulate The Damn Thing.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
from collections import Counter
import random

histogram = Counter()

TRIALS = 1000000
chance = 1
rooms_cleared = 0
rewards_found = 0
while rewards_found < TRIALS:
    rooms_cleared += 1
    if random.random() * 100 < chance:
        # Reward!
        rewards_found += 1
        histogram[rooms_cleared] += 1
        rooms_cleared = 0
        chance = 1
    else:
        chance = min(80, chance + 9)

for gaps, count in sorted(histogram.items()):
    print(f"{gaps:3d} | {count / TRIALS * 100:6.2f}%", '#' * (count // (TRIALS // 100)))
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
  1 |   0.98%
  2 |   9.91% #########
  3 |  17.00% ################
  4 |  20.23% ####################
  5 |  19.21% ###################
  6 |  15.05% ###############
  7 |   9.69% #########
  8 |   5.07% #####
  9 |   2.09% ##
 10 |   0.63%
 11 |   0.12%
 12 |   0.03%
 13 |   0.00%
 14 |   0.00%
 15 |   0.00%

We’ve got kind of a hilly distribution, skewed to the left, which is up in this histogram. Most of the time, a player should see a reward every three to six rooms, which is maybe twice per floor. It’s vanishingly unlikely to go through a dozen rooms without ever seeing a reward, so a player should see at least one per floor.

Of course, this simulated a single continuous playthrough; when starting the game from scratch, your chance at a reward always starts fresh at 1%, the worst it can be. If you want to know about how many rewards a player will get on the first floor, hey, Just Simulate The Damn Thing.

1
2
3
4
5
6
7
  0 |   0.01%
  1 |  13.01% #############
  2 |  56.28% ########################################################
  3 |  27.49% ###########################
  4 |   3.10% ###
  5 |   0.11%
  6 |   0.00%

Cool. Though, that’s assuming exactly 12 rooms; it might be worth changing that to pick at random in a way that matches the level generator.

(Enter the Gungeon does some other things to skew probability, which is very nice in a roguelike where blind luck can make or break you. For example, if you kill a boss without having gotten a new gun anywhere else on the floor, the boss is guaranteed to drop a gun.)

Critical hits

I suppose this is the same problem as random drops, but backwards.

Say you have a battle sim where every attack has a 6% chance to land a devastating critical hit. Presumably the same rules apply to both the player and the AI opponents.

Consider, then, that the AI opponents have exactly the same 6% chance to ruin the player’s day. Consider also that this gives them an 0.4% chance to critical hit twice in a row. 0.4% doesn’t sound like much, but across an entire playthrough, it’s not unlikely that a player might see it happen and find it incredibly annoying.

Perhaps it would be worthwhile to explicitly forbid AI opponents from getting consecutive critical hits.

In conclusion

An emerging theme here has been to Just Simulate The Damn Thing. So consider Just Simulating The Damn Thing. Even a simple change to a random value can do surprising things to the resulting distribution, so unless you feel like differentiating the inverse function of your code, maybe test out any non-trivial behavior and make sure it’s what you wanted. Probability is hard to reason about.

VPN Provider Jailed For Five Years After Helping Thousands Breach China’s Firewall

Post Syndicated from Andy original https://torrentfreak.com/vpn-provider-jailed-for-five-years-after-helping-thousands-breach-chinas-firewall-171222/

The Chinese government’s grip on power is matched by its determination to control access to information. To that end, it seeks to control what people in China can see on the Internet, thereby limiting the effect of outside influences on society.

The government tries to reach these goals by use of the so-called Great Firewall, a complex system that grants access to some foreign resources while denying access to others. However, technologically advanced citizens are able to bypass this state censorship by using circumvention techniques including Virtual Private Networks (VPNs).

While large numbers of people use such services, in January 2017 the government gave its clearest indication yet that it would begin to crack down on people offering Great Firewall-evading tools.

Operating such a service without a corresponding telecommunications business license constitutes an offense, the government said. Now we have a taste of how serious the government is on this matter.

According to an announcement from China’s Procuratorate Daily, Wu Xiangyang, a resident of the Guangxi autonomous region, has just been jailed for five-and-a-half years and fined 500,000 yuan ($75,920) for building and selling access to VPNs without an appropriate license.

It’s alleged that between 2013 and June 2017, Wu Xiangyang sold VPN server access to the public via his own website, FangouVPN / Where Dog VPN, and Taobao, a Chinese online shopping site similar to eBay and Amazon.

The member accounts provided by the man allowed customers to browse foreign websites, without being trapped behind China’s Great Firewall. He also sold custom hardware routers that came read-configured to use the VPN service, granting access to the wider Internet, contrary to the wishes of Chinese authorities.

Prosecutors say that the illegal VPN business had revenues of 792,638 yuan (US$120,377) and profits of around 500,000 yuan ($75,935). SCMP reports that the company previously boasted on Twitter at having 8,000 foreigners and 5,000 businesses using its services to browse blocked websites.

This is at least the second big sentence handed down to a Chinese citizen for providing access to VPNs. Back in September, it was revealed that Deng Jiewei, a 26-year-old from the city of Dongguan in the Guangdong province, had been jailed for nine months after offering a similar service to the public for around a year.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons