Tag Archives: realtime

Open Sourcing Mantis: A Platform For Building Cost-Effective, Realtime, Operations-Focused…

Post Syndicated from Netflix Technology Blog original https://medium.com/netflix-techblog/open-sourcing-mantis-a-platform-for-building-cost-effective-realtime-operations-focused-5b8ff387813a?source=rss----2615bd06b42e---4

Open Sourcing Mantis: A Platform For Building Cost-Effective, Realtime, Operations-Focused Applications

By Jeff Chao on behalf of the Mantis team

Today we’re excited to announce that we’re open sourcing Mantis, a platform that helps Netflix engineers better understand the behavior of their applications to ensure the highest quality experience for our members. We believe the challenges we face here at Netflix are not necessarily unique to Netflix which is why we’re sharing it with the broader community.

As a streaming microservices ecosystem, the Mantis platform provides engineers with capabilities to minimize the costs of observing and operating complex distributed systems without compromising on operational insights. Engineers have built cost-efficient applications on top of Mantis to quickly identify issues, trigger alerts, and apply remediations to minimize or completely avoid downtime to the Netflix service. Where other systems may take over ten minutes to process metrics accurately, Mantis reduces that from tens of minutes down to seconds, effectively reducing our Mean-Time-To-Detect. This is crucial because any amount of downtime is brutal and comes with an incredibly high impact to our members — every second counts during an outage.

As the company continues to grow our member base, and as those members use the Netflix service even more, having cost-efficient, rapid, and precise insights into the operational health of our systems is only growing in importance. For example, a five-minute outage today is equivalent to a two-hour outage at the time of our last Mantis blog post.

Mantis Makes It Easy to Answer New Questions

The traditional way of working with metrics and logs alone is not sufficient for large-scale and growing systems. Metrics and logs require that you to know what you want to answer ahead of time. Mantis on the other hand allows us to sidestep this drawback completely by giving us the ability to answer new questions without having to add new instrumentation. Instead of logs or metrics, Mantis enables a democratization of events where developers can tap into an event stream from any instrumented application on demand. By making consumption on-demand, you’re able to freely publish all of your data to Mantis.

Mantis is Cost-Effective in Answering Questions

Publishing 100% of your operational data so that you’re able to answer new questions in the future is traditionally cost prohibitive at scale. Mantis uses an on-demand, reactive model where you don’t pay the cost for these events until something is subscribed to their stream. To further reduce cost, Mantis reissues the same data for equivalent subscribers. In this way, Mantis is differentiated from other systems by allowing us to achieve streaming-based observability on events while empowering engineers with the tooling to reduce costs that would otherwise become detrimental to the business.

From the beginning, we’ve built Mantis with this exact guiding principle in mind: Let’s make sure we minimize the costs of observing and operating our systems without compromising on required and opportunistic insights.

Guiding Principles Behind Building Mantis

The following are the guiding principles behind building Mantis.

  1. We should have access to raw events. Applications that publish events into Mantis should be free to publish every single event. If we prematurely transform events at this stage, then we’re already at a disadvantage when it comes to getting insight since data in its original form is already lost.
  2. We should be able to access these events in realtime. Operational use cases are inherently time sensitive by nature. The traditional method of publishing, storing, and then aggregating events in batch is too slow. Instead, we should process and serve events one at a time as they arrive. This becomes increasingly important with scale as the impact becomes much larger in far less time.
  3. We should be able to ask new questions of this data without having to add new instrumentation to your applications. It’s not possible to know ahead of time every single possible failure mode our systems might encounter despite all the rigor built in to make these systems resilient. When these failures do inevitably occur, it’s important that we can derive new insights with this data. You should be able to publish as large of an event with as much context as you want. That way, when you think of a new questions to ask of your systems in the future, the data will be available for you to answer those questions.
  4. We should be able to do all of the above in a cost-effective way. As our business critical systems scale, we need to make sure the systems in support of these business critical systems don’t end up costing more than the business critical systems themselves.

With these guiding principles in mind, let’s take a look at how Mantis brings value to Netflix.

How Mantis Brings Value to Netflix

Mantis has been in production for over four years. Over this period several critical operational insight applications have been built on top of the Mantis platform.

A few noteworthy examples include:

Realtime monitoring of Netflix streaming health which examines all of Netflix’s streaming video traffic in realtime and accurately identifies negative impact on the viewing experience with fine-grained granularity. This system serves as an early warning indicator of the overall health of the Netflix service and will trigger and alert relevant teams within seconds.

Contextual Alerting which analyzes millions of interactions between dozens of Netflix microservices in realtime to identify anomalies and provide operators with rich and relevant context. The realtime nature of these Mantis-backed aggregations allows the Mean-Time-To-Detect to be cut down from tens of minutes to a few seconds. Given the scale of Netflix this makes a huge impact.

Raven which allows users to perform ad-hoc exploration of realtime data from hundreds of streaming sources using our Mantis Query Language (MQL).

Cassandra Health check which analyzes rich operational events in realtime to generate a holistic picture of the health of every Cassandra cluster at Netflix.

Alerting on Log data which detects application errors by processing data from thousands of Netflix servers in realtime.

Chaos Experimentation monitoring which tracks user experience during a Chaos exercise in realtime and triggers an abort of the chaos exercise in case of an adverse impact.

Realtime Personally Identifiable Information (PII) data detection samples data across all streaming sources to quickly identify transmission of sensitive data.

Try It Out Today

To learn more about Mantis, you can check out the main Mantis page. You can try out Mantis today by spinning up your first Mantis cluster locally using Docker or using the Mantis CLI to bootstrap a minimal cluster in AWS. You can also start contributing to Mantis by getting the code on Github or engaging with the community on the users or dev mailing list.

Acknowledgements

A lot of work has gone into making Mantis successful at Netflix. We’d like to thank all the contributors, in alphabetical order by first name, that have been involved with Mantis at various points of its existence:

Andrei Ushakov, Ben Christensen, Ben Schmaus, Chris Carey, Cody Rioux, Daniel Jacobson, Danny Yuan, Erik Meijer, Indrajit Roy Choudhury, Jeff Chao, Josh Evans, Justin Becker, Kathrin Probst, Kevin Lew, Neeraj Joshi, Nick Mahilani, Piyush Goyal, Prashanth Ramdas, Ram Vaithalingam, Ranjit Mavinkurve, Sangeeta Narayanan, Santosh Kalidindi, Seth Katz, Sharma Podila, Zhenzhong Xu.


Open Sourcing Mantis: A Platform For Building Cost-Effective, Realtime, Operations-Focused… was originally published in Netflix TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.

View Stonehenge in real time via Raspberry Pi

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/view-stonehenge-in-real-time-via-raspberry-pi/

You can see how the skies above Stonehenge affect the iconic stones via a web browser thanks to a Raspberry Pi computer.

Stonehenge

Stonehenge is Britain’s greatest monument and it currently attracts more than 1.5 million visitors each year. It’s possible to walk around the iconic stone circle and visit the Neolithic houses outside the visitor centre. Yet, worries about potential damage have forced preservationists to limit access.

With that in mind, Eric Winbolt, Interim Head of Digital/Innovation at English Heritage, had a brainwave. “We decided to give people an idea of what it’s like to see the sunrise and sunset within the circle, and allow them to enjoy the skies over Stonehenge in real time without actually stepping inside,” he explains.

This could have been achieved by permanently positioning a camera within the stone circle, but this was ruled out for fear of being too intrusive. Instead, Eric and developers from The Bespoke Pixel agency snapped a single panoramic shot of the circle’s interior using a large 8K high-res, 360-degree camera when the shadows and light were quite neutral.

“We then took the sky out of the image with the aim of capturing an approximation of the view without impacting on the actual stones themselves,” Eric says.

Stone me

By taking a separate hemispherical snapshot of the sky from a nearby position and merging it with the master photograph of the stones, the team discovered they could create a near real-time effect for online visitors. They used an off-the-shelf, upwards-pointing, 220-degree fish-eye lens camera connected to a Raspberry Pi 3 Model A+ computer, taking images once every four minutes.

This Raspberry Pi was also fitted with a Pimoroni Enviro pHAT containing atmospheric, air pressure, and light sensors. Captured light values from the sky image were then used to alter the colour values of the master image of the stones so that the light on Stonehenge, as seen via the web, reflected the ambient light of the sky.

What can you see?

“What it does is give a view of the stones as it looks right now, or at least within a few minutes,” says Eric. “It also means the effect doesn’t look like two images simply Photoshopped together.”

Indeed, coder Mark Griffiths says the magic all runs from Node.js. “It uses a Python shell to get the sensor data and integrates with Amazon’s AWS and an IoT messaging service called DweetPro to tie all the events together,” he adds.

There was also a lot of experimentation. “We used the HAT via the I2C connectors so that we could mount it away from the main board to get better temperature readings,” says Mark, “We also tried a number of experiments with different cameras, lenses, and connections and it became clear that just connecting the camera via USB didnít allow access to the full functionality and resolutions.”

Mark reverse-engineered the camera’s WiFi connection and binary protocol to work out how to communicate with it via Raspberry Pi so that full-quality images could be taken and downloaded. “We also found the camera’s WiFi connection would time out after several days,” reveals Mark, “so we had to use a relay board connected via the GPIO pins.”
With such issues resolved, the team then created an easy-to-use online interface that lets users click boxes and see the view over the past 24 hours. They also added a computer model to depict the night sky.

“Visitors can go to the website day and night and allow the tool to pan around Stonehenge or pause it and pan manually, viewing the stones as they would be at the time of visiting,” Eric says. “It can look especially good on a smart television. It’s very relaxing.”

View the stones in realtime right now by visiting the English Heritage website.

The post View Stonehenge in real time via Raspberry Pi appeared first on Raspberry Pi.

Security updates for Monday

Post Syndicated from ris original https://lwn.net/Articles/749087/rss

Security updates have been issued by CentOS (389-ds-base, dhcp, kernel, libreoffice, php, quagga, and ruby), Debian (ming, util-linux, vips, and zsh), Fedora (community-mysql, php, ruby, and transmission), Gentoo (newsbeuter), Mageia (libraw and mbedtls), openSUSE (php7 and python-Django), Red Hat (MRG Realtime 2.5), and SUSE (kernel).

[$] Deadline scheduling part 1 — overview and theory

Post Syndicated from corbet original https://lwn.net/Articles/743740/rss

The deadline scheduler enables the user to specify a realtime task’s
requirements
using well-defined realtime abstractions, allowing the system to make
the best scheduling decisions, guaranteeing the scheduling of realtime
tasks even in higher-load systems.
This article, the first in a series of two, provides an introduction to
realtime scheduling (deadline
scheduling in particular) and some of the theory behind it.

Potential impact of the Intel ME vulnerability

Post Syndicated from Matthew Garrett original https://mjg59.dreamwidth.org/49611.html

(Note: this is my personal opinion based on public knowledge around this issue. I have no knowledge of any non-public details of these vulnerabilities, and this should not be interpreted as the position or opinion of my employer)

Intel’s Management Engine (ME) is a small coprocessor built into the majority of Intel CPUs[0]. Older versions were based on the ARC architecture[1] running an embedded realtime operating system, but from version 11 onwards they’ve been small x86 cores running Minix. The precise capabilities of the ME have not been publicly disclosed, but it is at minimum capable of interacting with the network[2], display[3], USB, input devices and system flash. In other words, software running on the ME is capable of doing a lot, without requiring any OS permission in the process.

Back in May, Intel announced a vulnerability in the Advanced Management Technology (AMT) that runs on the ME. AMT offers functionality like providing a remote console to the system (so IT support can connect to your system and interact with it as if they were physically present), remote disk support (so IT support can reinstall your machine over the network) and various other bits of system management. The vulnerability meant that it was possible to log into systems with enabled AMT with an empty authentication token, making it possible to log in without knowing the configured password.

This vulnerability was less serious than it could have been for a couple of reasons – the first is that “consumer”[4] systems don’t ship with AMT, and the second is that AMT is almost always disabled (Shodan found only a few thousand systems on the public internet with AMT enabled, out of many millions of laptops). I wrote more about it here at the time.

How does this compare to the newly announced vulnerabilities? Good question. Two of the announced vulnerabilities are in AMT. The previous AMT vulnerability allowed you to bypass authentication, but restricted you to doing what AMT was designed to let you do. While AMT gives an authenticated user a great deal of power, it’s also designed with some degree of privacy protection in mind – for instance, when the remote console is enabled, an animated warning border is drawn on the user’s screen to alert them.

This vulnerability is different in that it allows an authenticated attacker to execute arbitrary code within the AMT process. This means that the attacker shouldn’t have any capabilities that AMT doesn’t, but it’s unclear where various aspects of the privacy protection are implemented – for instance, if the warning border is implemented in AMT rather than in hardware, an attacker could duplicate that functionality without drawing the warning. If the USB storage emulation for remote booting is implemented as a generic USB passthrough, the attacker could pretend to be an arbitrary USB device and potentially exploit the operating system through bugs in USB device drivers. Unfortunately we don’t currently know.

Note that this exploit still requires two things – first, AMT has to be enabled, and second, the attacker has to be able to log into AMT. If the attacker has physical access to your system and you don’t have a BIOS password set, they will be able to enable it – however, if AMT isn’t enabled and the attacker isn’t physically present, you’re probably safe. But if AMT is enabled and you haven’t patched the previous vulnerability, the attacker will be able to access AMT over the network without a password and then proceed with the exploit. This is bad, so you should probably (1) ensure that you’ve updated your BIOS and (2) ensure that AMT is disabled unless you have a really good reason to use it.

The AMT vulnerability applies to a wide range of versions, everything from version 6 (which shipped around 2008) and later. The other vulnerability that Intel describe is restricted to version 11 of the ME, which only applies to much more recent systems. This vulnerability allows an attacker to execute arbitrary code on the ME, which means they can do literally anything the ME is able to do. This probably also means that they are able to interfere with any other code running on the ME. While AMT has been the most frequently discussed part of this, various other Intel technologies are tied to ME functionality.

Intel’s Platform Trust Technology (PTT) is a software implementation of a Trusted Platform Module (TPM) that runs on the ME. TPMs are intended to protect access to secrets and encryption keys and record the state of the system as it boots, making it possible to determine whether a system has had part of its boot process modified and denying access to the secrets as a result. The most common usage of TPMs is to protect disk encryption keys – Microsoft Bitlocker defaults to storing its encryption key in the TPM, automatically unlocking the drive if the boot process is unmodified. In addition, TPMs support something called Remote Attestation (I wrote about that here), which allows the TPM to provide a signed copy of information about what the system booted to a remote site. This can be used for various purposes, such as not allowing a compute node to join a cloud unless it’s booted the correct version of the OS and is running the latest firmware version. Remote Attestation depends on the TPM having a unique cryptographic identity that is tied to the TPM and inaccessible to the OS.

PTT allows manufacturers to simply license some additional code from Intel and run it on the ME rather than having to pay for an additional chip on the system motherboard. This seems great, but if an attacker is able to run code on the ME then they potentially have the ability to tamper with PTT, which means they can obtain access to disk encryption secrets and circumvent Bitlocker. It also means that they can tamper with Remote Attestation, “attesting” that the system booted a set of software that it didn’t or copying the keys to another system and allowing that to impersonate the first. This is, uh, bad.

Intel also recently announced Intel Online Connect, a mechanism for providing the functionality of security keys directly in the operating system. Components of this are run on the ME in order to avoid scenarios where a compromised OS could be used to steal the identity secrets – if the ME is compromised, this may make it possible for an attacker to obtain those secrets and duplicate the keys.

It’s also not entirely clear how much of Intel’s Secure Guard Extensions (SGX) functionality depends on the ME. The ME does appear to be required for SGX Remote Attestation (which allows an application using SGX to prove to a remote site that it’s the SGX app rather than something pretending to be it), and again if those secrets can be extracted from a compromised ME it may be possible to compromise some of the security assumptions around SGX. Again, it’s not clear how serious this is because it’s not publicly documented.

Various other things also run on the ME, including stuff like video DRM (ensuring that high resolution video streams can’t be intercepted by the OS). It may be possible to obtain encryption keys from a compromised ME that allow things like Netflix streams to be decoded and dumped. From a user privacy or security perspective, these things seem less serious.

The big problem at the moment is that we have no idea what the actual process of compromise is. Intel state that it requires local access, but don’t describe what kind. Local access in this case could simply require the ability to send commands to the ME (possible on any system that has the ME drivers installed), could require direct hardware access to the exposed ME (which would require either kernel access or the ability to install a custom driver) or even the ability to modify system flash (possible only if the attacker has physical access and enough time and skill to take the system apart and modify the flash contents with an SPI programmer). The other thing we don’t know is whether it’s possible for an attacker to modify the system such that the ME is persistently compromised or whether it needs to be re-compromised every time the ME reboots. Note that even the latter is more serious than you might think – the ME may only be rebooted if the system loses power completely, so even a “temporary” compromise could affect a system for a long period of time.

It’s also almost impossible to determine if a system is compromised. If the ME is compromised then it’s probably possible for it to roll back any firmware updates but still report that it’s been updated, giving admins a false sense of security. The only way to determine for sure would be to dump the system flash and compare it to a known good image. This is impractical to do at scale.

So, overall, given what we know right now it’s hard to say how serious this is in terms of real world impact. It’s unlikely that this is the kind of vulnerability that would be used to attack individual end users – anyone able to compromise a system like this could just backdoor your browser instead with much less effort, and that already gives them your banking details. The people who have the most to worry about here are potential targets of skilled attackers, which means activists, dissidents and companies with interesting personal or business data. It’s hard to make strong recommendations about what to do here without more insight into what the vulnerability actually is, and we may not know that until this presentation next month.

Summary: Worst case here is terrible, but unlikely to be relevant to the vast majority of users.

[0] Earlier versions of the ME were built into the motherboard chipset, but as portions of that were incorporated onto the CPU package the ME followed
[1] A descendent of the SuperFX chip used in Super Nintendo cartridges such as Starfox, because why not
[2] Without any OS involvement for wired ethernet and for wireless networks in the system firmware, but requires OS support for wireless access once the OS drivers have loaded
[3] Assuming you’re using integrated Intel graphics
[4] “Consumer” is a bit of a misnomer here – “enterprise” laptops like Thinkpads ship with AMT, but are often bought by consumers.

comment count unavailable comments

[$] A report from the Realtime Summit

Post Syndicated from jake original https://lwn.net/Articles/738001/rss

The 2017
Realtime Summit
(RT-Summit) was hosted by the Czech Technical University on
Saturday, October 21 in Prague, just before the Embedded Linux
Conference. It
was attended by more than 50 individuals with backgrounds ranging from
academic to
industrial, and some local students daring enough to spend a day with that
group. Guest author Mathieu Poirier provides summaries of some of the
talks from the summit.

[$] The state of the realtime union

Post Syndicated from jake original https://lwn.net/Articles/737367/rss

The 2017
Realtime Summit
was held October 21 at Czech Technical University
in Prague to discuss all manner of topics related to realtime Linux.
Nearly two years ago, a collaborative
project
was formed with the goal of mainlining the realtime patch set. At the
summit, project
lead Thomas Gleixner reported on the progress that has been made and the
plans for the future.

[$] Safety-critical realtime with Linux

Post Syndicated from corbet original https://lwn.net/Articles/734694/rss

Doing realtime processing with a general-purpose operating-system like
Linux can be a challenge by itself, but safety-critical realtime processing
ups the ante considerably. During a session at Open Source Summit North
America, Wolfgang Maurer discussed the difficulties involved in this kind
of work and what Linux has to offer.

[$] Notes from the LPC scheduler microconference

Post Syndicated from corbet original https://lwn.net/Articles/734039/rss

The scheduler
workloads microconference
at the 2017 Linux Plumbers Conference covered
several aspects of the kernel’s CPU scheduler. While workloads were on the
agenda, so were a rework of the realtime scheduler’s push/pull mechanism, a
distinctly different approach to multi-core scheduling, and the use of
tracing for workload simulation and analysis. As the following summary
shows, CPU scheduling has not yet reached a point where all of the
important questions have been answered.

A few tidbits on networking in games

Post Syndicated from Eevee original https://eev.ee/blog/2017/05/22/a-few-tidbits-on-networking-in-games/

Nova Dasterin asks, via Patreon:

How about do something on networking code, for some kind of realtime game (platformer or MMORPG or something). 😀

Ah, I see. You’re hoping for my usual detailed exploration of everything I know about networking code in games.

Well, joke’s on you! I don’t know anything about networking.

Wait… wait… maybe I know one thing.

Doom

Surprise! The thing I know is, roughly, how multiplayer Doom works.

Doom is 100% deterministic. Its random number generator is really a list of shuffled values; each request for a random number produces the next value in the list. There is no seed, either; a game always begins at the first value in the list. Thus, if you play the game twice with exactly identical input, you’ll see exactly the same playthrough: same damage, same monster behavior, and so on.

And that’s exactly what a Doom demo is: a file containing a recording of player input. To play back a demo, Doom runs the game as normal, except that it reads input from a file rather than the keyboard.

Multiplayer works the same way. Rather than passing around the entirety of the world state, Doom sends the player’s input to all the other players. Once a node has received input from every connected player, it advances the world by one tic. There’s no client or server; every peer talks to every other peer.

You can read the code if you want to, but at a glance, I don’t think there’s anything too surprising here. Only sending input means there’s not that much to send, and the receiving end just has to queue up packets from every peer and then play them back once it’s heard from everyone. The underlying transport was pluggable (this being the days before we’d even standardized on IP), which complicated things a bit, but the Unix port that’s on GitHub just uses UDP. The Doom Wiki has some further detail.

This approach is very clever and has a few significant advantages. Bandwidth requirements are fairly low, which is important if it happens to be 1993. Bandwidth and processing requirements are also completely unaffected by the size of the map, since map state never touches the network.

Unfortunately, it has some drawbacks as well. The biggest is that, well, sometimes you want to get the world state back in sync. What if a player drops and wants to reconnect? Everyone has to quit and reconnect to one another. What if an extra player wants to join in? It’s possible to load a saved game in multiplayer, but because the saved game won’t have an actor for the new player, you can’t really load it; you’d have to start fresh from the beginning of a map.

It’s fairly fundamental that Doom allows you to save your game at any moment… but there’s no way to load in the middle of a network game. Everyone has to quit and restart the game, loading the right save file from the command line. And if some players load the wrong save file… I’m not actually sure what happens! I’ve seen ZDoom detect the inconsistency and refuse to start the game, but I suspect that in vanilla Doom, players would have mismatched world states and their movements would look like nonsense when played back in each others’ worlds.

Ah, yes. Having the entire game state be generated independently by each peer leads to another big problem.

Cheating

Maybe this wasn’t as big a deal with Doom, where you’d probably be playing with friends or acquaintances (or coworkers). Modern games have matchmaking that pits you against strangers, and the trouble with strangers is that a nontrivial number of them are assholes.

Doom is a very moddable game, and it doesn’t check that everyone is using exactly the same game data. As long as you don’t change anything that would alter the shape of the world or change the number of RNG rolls (since those would completely desynchronize you from other players), you can modify your own game however you like, and no one will be the wiser. For example, you might change the light level in a dark map, so you can see more easily than the other players. Lighting doesn’t affect the game, only how its drawn, and it doesn’t go over the network, so no one would be the wiser.

Or you could alter the executable itself! It knows everything about the game state, including the health and loadout of the other players; altering it to show you this information would give you an advantage. Also, all that’s sent is input; no one said the input had to come from a human. The game knows where all the other players are, so you could modify it to generate the right input to automatically aim at them. Congratulations; you’ve invented the aimbot.

I don’t know how you can reliably fix these issues. There seems to be an entire underground ecosystem built around playing cat and mouse with game developers. Perhaps the most infamous example is World of Warcraft, where people farm in-game gold as automatically as possible to sell to other players for real-world cash.

Egregious cheating in multiplayer really gets on my nerves; I couldn’t bear knowing that it was rampant in a game I’d made. So I will probably not be working on anything with random matchmaking anytime soon.

Starbound

Let’s jump to something a little more concrete and modern.

Starbound is a procedurally generated universe exploration game — like Terraria in space. Or, if you prefer, like Minecraft in space and also flat. Notably, it supports multiplayer, using the more familiar client/server approach. The server uses the same data files as single-player, but it runs as a separate process; if you want to run a server on your own machine, you run the server and then connect to localhost with the client.

I’ve run a server before, but that doesn’t tell me anything about how it works. Starbound is an interesting example because of the existence of StarryPy — a proxy server that can add some interesting extra behavior by intercepting packets going to and from the real server.

That means StarryPy necessarily knows what the protocol looks like, and perhaps we can glean some insights by poking around in it. Right off the bat there’s a list of all the packet types and rough shapes of their data.

I modded StarryPy to print out every single decoded packet it received (from either the client or the server), then connected and immediately disconnected. (Note that these aren’t necessarily TCP packets; they’re just single messages in the Starbound protocol.) Here is my quick interpretation of what happens:

  1. The client and server briefly negotiate a connection. The password, if any, is sent with a challenge and response.

  2. The client sends a full description of its “ship world” — the player’s ship, which they take with them to other servers. The server sends a partial description of the planet the player is either on, or orbiting.

  3. From here, the server and client mostly communicate world state in the form of small delta updates. StarryPy doesn’t delve into the exact format here, unfortunately. The world basically freezes around you during a multiplayer lag spike, though, so it’s safe to assume that the vast bulk of game simulation happens server-side, and the effects are broadcast to clients.

The protocol has specific message types for various player actions: damaging tiles, dropping items, connecting wires, collecting liquids, moving your ship, and so on. So the basic model is that the player can attempt to do stuff with the chunk of the world they’re looking at, and they’ll get a reaction whenever the server gets back to them.

(I’m dimly aware that some subset of object interactions can happen client-side, but I don’t know exactly which ones. The implications for custom scripted objects are… interesting. Actually, those are slightly hellish in general; Starbound is very moddable, but last I checked it has no way to send mods from the server to the client or anything similar, and by default the server doesn’t even enforce that everyone’s using the same set of mods… so it’s possible that you’ll have an object on your ship that’s only provided by a mod you have but the server lacks, and then who knows what happens.)

IRC

Hang on, this isn’t a video game at all.

Starbound’s “fire and forget” approach reminds me a lot of IRC — a protocol I’ve even implemented, a little bit, kinda. IRC doesn’t have any way to match the messages you send to the responses you get back, and success is silent for some kinds of messages, so it’s impossible (in the general case) to know what caused an error. The most obvious fix for this would be to attach a message id to messages sent out by the client, and include the same id on responses from the server.

It doesn’t look like Starbound has message ids or any other solution to this problem — though StarryPy doesn’t document the protocol well enough for me to be sure. The server just sends a stream of stuff it thinks is important, and when it gets a request from the client, it queues up a response to that as well. It’s TCP, so the client should get all the right messages, eventually. Some of them might be slightly out of order depending on the order the client does stuff, but that’s not a big deal; anyway, the server knows the canonical state.

Some thoughts

I bring up IRC because I’m kind of at the limit of things that I know. But one of those things is that IRC is simultaneously very rickety and wildly successful: it’s a decade older than Google and still in use. (Some recent offerings are starting to eat its lunch, but those are really because clients are inaccessible to new users and the protocol hasn’t evolved much. The problems with the fundamental design of the protocol are only obvious to server and client authors.)

Doom’s cheery assumption that the game will play out the same way for every player feels similarly rickety. Obviously it works — well enough that you can go play multiplayer Doom with exactly the same approach right now, 24 years later — but for something as complex as an FPS it really doesn’t feel like it should.

So while I don’t have enough experience writing multiplayer games to give you a run-down of how to do it, I think the lesson here is that you can get pretty far with simple ideas. Maybe your game isn’t deterministic like Doom — although there’s no reason it couldn’t be — but you probably still have to save the game, or at least restore the state of the world on death/loss/restart, right? There you go: you already have a fragment of a concept of entity state outside the actual entities. Codify that, stick it on the network, and see what happens.

I don’t know if I’ll be doing any significant multiplayer development myself; I don’t even play many multiplayer games. But I’d always assumed it would be a nigh-impossible feat of architectural engineering, and I’m starting to think that maybe it’s no more difficult than anything else in game dev. Easy to fudge, hard to do well, impossible to truly get right so give up that train of thought right now.

Also now I am definitely thinking about how a multiplayer puzzle-platformer would work.

BEURK – Linux Userland Preload Rootkit

Post Syndicated from Darknet original http://feedproxy.google.com/~r/darknethackers/~3/ocUJrwmh2Bk/

BEURK is an userland preload rootkit for GNU/Linux, heavily focused around anti-debugging and anti-detection. Being a userland rootkit it gives limited privileges (whatever the user has basically) vs a superuser or root level rootkit. Features Hide attacker files and directories Realtime log cleanup (on utmp/wtmp) Anti process and login detection…

Read the full post at darknet.org.uk

[$] A deadline scheduler update

Post Syndicated from corbet original https://lwn.net/Articles/716982/rss

The deadline CPU scheduler has come a long way, Juri Lelli said in his 2017
Linaro Connect session, but there is still quite a bit of work to be done.
While this scheduler was originally intended for realtime workloads, there is
reason to believe that it is well suited for other settings, including the
embedded and mobile world. In this talk, he gave a summary of what the
deadline scheduler provides now and the changes that are envisioned for the
near (and not-so-near) future.

AWS Week in Review – February 27, 2016

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-week-in-review-february-27-2016/

This edition includes all of our announcements, content from all of our blogs, and as much community-generated AWS content as I had time for. Going forward I hope to bring back the other sections, as soon as I get my tooling and automation into better shape.

Monday

February 27

Tuesday

February 28

Wednesday

March 1

Thursday

March 2

Friday

March 3

Saturday

March 4

Sunday

March 5

Jeff;

 

Congratulations to the Winners of the Serverless Chatbot Competition!

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/congratulations-to-the-winners-of-the-serverless-chatbot-competition/

I announced the AWS Serverless Chatbot Competion in August and invited you to build a chatbot for Slack using AWS Lambda and Amazon API Gateway.

Last week I sat down with fellow judges Tim Wagner (General Manager of AWS Lambda) and Cecilia Deng (a Software Development Engineer on Tim’s team) to watch the videos and to evaluate all 62 submissions. We were impressed by the functionality and diversity of the entrees, as well as the efforts that the entrants put in to producing attractive videos to show their submissions in action.

After hours of intense deliberation we chose a total of 9 winners: 8 from individuals, teams & small organizations and one from a larger organization. Without further ado, here you go:

Individuals, Teams, and Small Organizations
Here are the winners of the Serverless Slackbot Hero Award. Each winner receives one ticket to AWS re:Invent, access to discounted hotel room rates, public announcement and promotion during the Serverless Computing keynote, some cool swag, and $100 in AWS Credits. You can find the code for many of these bots on GitHub. In alphabetical order, the winners are:

AWS Network Helper“The goal of this project is to provide an AWS network troubleshooting script that runs on a serverless architecture, and can be interacted with via Slack as a chat bot.GitHub repo.

B0pb0t – “Making Mealtime Awesome.” GitHub repo.

Borges – “Borges is a real-time translator for multilingual Slack teams.” GitHub repo.

CLIve – “CLIve makes managing your AWS EC2 instances a doddle. He understands natural language, so no need to learn a new CLI!”

Litlbot – “Litlbot is a Slack bot that enables realtime interaction with students in class, creating a more engaged classroom and learning experience.” GitHub repo.

Marbot – “Forward alerts from Amazon Web Services to your DevOps team.”

Opsidian – “Collaborate on your AWS infra from Slack using natural language.”

ServiceBot – “Communication platform between humans, machines, and enterprises.” GitHub repo.

Larger Organization
And here’s the winner of the Serverless Slackbot Large Organization Award:

Eva – “The virtual travel assistant for your team.” GitHub repo.

Thanks & Congratulations
I would like to personally thank each of the entrants for taking the time to submit their entries to the competition!

Congratulations to all of the winners; I hope to see you all at AWS re:Invent.


Jeff;

 

PS – If this list has given you an idea for a chatbot of your very own, please watch our Building Serverless Chatbots video and take advantage of our Serverless Chatbot Sample.

Friday’s security updates

Post Syndicated from n8willis original http://lwn.net/Articles/697707/rss

CentOS has updated python (C7; C6: multiple vulnerabilities).

Fedora has updated ca-certificates (F24: update to CA certificates) and spice (F23: multiple vulnerabilities).

Oracle has updated kernel
(O7: TCP injection) and python (O7; O6: multiple vulnerabilities).

Red Hat has updated kernel (RHEL7; RHEL6:
TCP injection),
kernel-rt (RHEL7: TCP injection), python (RHEL 6,7: multiple vulnerabilities), python27-python (RHSC: multiple vulnerabilities), python33-python (RHSC: multiple vulnerabilities), realtime-kernel (RHEM2.5: TCP injection), rh-mariadb101-mariadb (RHSC: multiple vulnerabilities), rh-python34-python (RHSC: multiple vulnerabilities), and rh-python35-python (RHSC: multiple vulnerabilities).

SUSE has updated the Linux
Kernel
(SLE12: multiple vulnerabilities) and xen (SLE11: multiple vulnerabilities).

Ubuntu has updated gnupg
(12.04, 14.04, 16.04: flawed random-number generation), libgcrypt11, libgcrypt20 (12.04, 14.04,
16.06: flawed random-number generation),
and postgresql-9.1, postgresql-9.3,
postgresql-9.5
(12.04, 14.04, 16.04: multiple vulnerabilities).

Xenomai project mourns Gilles Chanteperdrix

Post Syndicated from jake original http://lwn.net/Articles/697594/rss

The Xenomai project is mourning Gilles Chanteperdrix, a longtime maintainer of the realtime framework, who recently passed away. In the announcement, Philippe Gerum writes: “Gilles will forever be remembered as a true-hearted man, a brilliant mind always scratching beneath the surface, looking for elegance in the driest topics, never jaded from such accomplishment.

According to Paul Valéry, “death is a trick played by the inconceivable on the conceivable”. Gilles’s absence is inconceivable to me, I can only assume that for once, he just got rest from tirelessly helping all of us.”