Tag Archives: realtime

Security updates for Monday

Post Syndicated from ris original https://lwn.net/Articles/749087/rss

Security updates have been issued by CentOS (389-ds-base, dhcp, kernel, libreoffice, php, quagga, and ruby), Debian (ming, util-linux, vips, and zsh), Fedora (community-mysql, php, ruby, and transmission), Gentoo (newsbeuter), Mageia (libraw and mbedtls), openSUSE (php7 and python-Django), Red Hat (MRG Realtime 2.5), and SUSE (kernel).

[$] Deadline scheduling part 1 — overview and theory

Post Syndicated from corbet original https://lwn.net/Articles/743740/rss

The deadline scheduler enables the user to specify a realtime task’s
requirements
using well-defined realtime abstractions, allowing the system to make
the best scheduling decisions, guaranteeing the scheduling of realtime
tasks even in higher-load systems.
This article, the first in a series of two, provides an introduction to
realtime scheduling (deadline
scheduling in particular) and some of the theory behind it.

Potential impact of the Intel ME vulnerability

Post Syndicated from Matthew Garrett original https://mjg59.dreamwidth.org/49611.html

(Note: this is my personal opinion based on public knowledge around this issue. I have no knowledge of any non-public details of these vulnerabilities, and this should not be interpreted as the position or opinion of my employer)

Intel’s Management Engine (ME) is a small coprocessor built into the majority of Intel CPUs[0]. Older versions were based on the ARC architecture[1] running an embedded realtime operating system, but from version 11 onwards they’ve been small x86 cores running Minix. The precise capabilities of the ME have not been publicly disclosed, but it is at minimum capable of interacting with the network[2], display[3], USB, input devices and system flash. In other words, software running on the ME is capable of doing a lot, without requiring any OS permission in the process.

Back in May, Intel announced a vulnerability in the Advanced Management Technology (AMT) that runs on the ME. AMT offers functionality like providing a remote console to the system (so IT support can connect to your system and interact with it as if they were physically present), remote disk support (so IT support can reinstall your machine over the network) and various other bits of system management. The vulnerability meant that it was possible to log into systems with enabled AMT with an empty authentication token, making it possible to log in without knowing the configured password.

This vulnerability was less serious than it could have been for a couple of reasons – the first is that “consumer”[4] systems don’t ship with AMT, and the second is that AMT is almost always disabled (Shodan found only a few thousand systems on the public internet with AMT enabled, out of many millions of laptops). I wrote more about it here at the time.

How does this compare to the newly announced vulnerabilities? Good question. Two of the announced vulnerabilities are in AMT. The previous AMT vulnerability allowed you to bypass authentication, but restricted you to doing what AMT was designed to let you do. While AMT gives an authenticated user a great deal of power, it’s also designed with some degree of privacy protection in mind – for instance, when the remote console is enabled, an animated warning border is drawn on the user’s screen to alert them.

This vulnerability is different in that it allows an authenticated attacker to execute arbitrary code within the AMT process. This means that the attacker shouldn’t have any capabilities that AMT doesn’t, but it’s unclear where various aspects of the privacy protection are implemented – for instance, if the warning border is implemented in AMT rather than in hardware, an attacker could duplicate that functionality without drawing the warning. If the USB storage emulation for remote booting is implemented as a generic USB passthrough, the attacker could pretend to be an arbitrary USB device and potentially exploit the operating system through bugs in USB device drivers. Unfortunately we don’t currently know.

Note that this exploit still requires two things – first, AMT has to be enabled, and second, the attacker has to be able to log into AMT. If the attacker has physical access to your system and you don’t have a BIOS password set, they will be able to enable it – however, if AMT isn’t enabled and the attacker isn’t physically present, you’re probably safe. But if AMT is enabled and you haven’t patched the previous vulnerability, the attacker will be able to access AMT over the network without a password and then proceed with the exploit. This is bad, so you should probably (1) ensure that you’ve updated your BIOS and (2) ensure that AMT is disabled unless you have a really good reason to use it.

The AMT vulnerability applies to a wide range of versions, everything from version 6 (which shipped around 2008) and later. The other vulnerability that Intel describe is restricted to version 11 of the ME, which only applies to much more recent systems. This vulnerability allows an attacker to execute arbitrary code on the ME, which means they can do literally anything the ME is able to do. This probably also means that they are able to interfere with any other code running on the ME. While AMT has been the most frequently discussed part of this, various other Intel technologies are tied to ME functionality.

Intel’s Platform Trust Technology (PTT) is a software implementation of a Trusted Platform Module (TPM) that runs on the ME. TPMs are intended to protect access to secrets and encryption keys and record the state of the system as it boots, making it possible to determine whether a system has had part of its boot process modified and denying access to the secrets as a result. The most common usage of TPMs is to protect disk encryption keys – Microsoft Bitlocker defaults to storing its encryption key in the TPM, automatically unlocking the drive if the boot process is unmodified. In addition, TPMs support something called Remote Attestation (I wrote about that here), which allows the TPM to provide a signed copy of information about what the system booted to a remote site. This can be used for various purposes, such as not allowing a compute node to join a cloud unless it’s booted the correct version of the OS and is running the latest firmware version. Remote Attestation depends on the TPM having a unique cryptographic identity that is tied to the TPM and inaccessible to the OS.

PTT allows manufacturers to simply license some additional code from Intel and run it on the ME rather than having to pay for an additional chip on the system motherboard. This seems great, but if an attacker is able to run code on the ME then they potentially have the ability to tamper with PTT, which means they can obtain access to disk encryption secrets and circumvent Bitlocker. It also means that they can tamper with Remote Attestation, “attesting” that the system booted a set of software that it didn’t or copying the keys to another system and allowing that to impersonate the first. This is, uh, bad.

Intel also recently announced Intel Online Connect, a mechanism for providing the functionality of security keys directly in the operating system. Components of this are run on the ME in order to avoid scenarios where a compromised OS could be used to steal the identity secrets – if the ME is compromised, this may make it possible for an attacker to obtain those secrets and duplicate the keys.

It’s also not entirely clear how much of Intel’s Secure Guard Extensions (SGX) functionality depends on the ME. The ME does appear to be required for SGX Remote Attestation (which allows an application using SGX to prove to a remote site that it’s the SGX app rather than something pretending to be it), and again if those secrets can be extracted from a compromised ME it may be possible to compromise some of the security assumptions around SGX. Again, it’s not clear how serious this is because it’s not publicly documented.

Various other things also run on the ME, including stuff like video DRM (ensuring that high resolution video streams can’t be intercepted by the OS). It may be possible to obtain encryption keys from a compromised ME that allow things like Netflix streams to be decoded and dumped. From a user privacy or security perspective, these things seem less serious.

The big problem at the moment is that we have no idea what the actual process of compromise is. Intel state that it requires local access, but don’t describe what kind. Local access in this case could simply require the ability to send commands to the ME (possible on any system that has the ME drivers installed), could require direct hardware access to the exposed ME (which would require either kernel access or the ability to install a custom driver) or even the ability to modify system flash (possible only if the attacker has physical access and enough time and skill to take the system apart and modify the flash contents with an SPI programmer). The other thing we don’t know is whether it’s possible for an attacker to modify the system such that the ME is persistently compromised or whether it needs to be re-compromised every time the ME reboots. Note that even the latter is more serious than you might think – the ME may only be rebooted if the system loses power completely, so even a “temporary” compromise could affect a system for a long period of time.

It’s also almost impossible to determine if a system is compromised. If the ME is compromised then it’s probably possible for it to roll back any firmware updates but still report that it’s been updated, giving admins a false sense of security. The only way to determine for sure would be to dump the system flash and compare it to a known good image. This is impractical to do at scale.

So, overall, given what we know right now it’s hard to say how serious this is in terms of real world impact. It’s unlikely that this is the kind of vulnerability that would be used to attack individual end users – anyone able to compromise a system like this could just backdoor your browser instead with much less effort, and that already gives them your banking details. The people who have the most to worry about here are potential targets of skilled attackers, which means activists, dissidents and companies with interesting personal or business data. It’s hard to make strong recommendations about what to do here without more insight into what the vulnerability actually is, and we may not know that until this presentation next month.

Summary: Worst case here is terrible, but unlikely to be relevant to the vast majority of users.

[0] Earlier versions of the ME were built into the motherboard chipset, but as portions of that were incorporated onto the CPU package the ME followed
[1] A descendent of the SuperFX chip used in Super Nintendo cartridges such as Starfox, because why not
[2] Without any OS involvement for wired ethernet and for wireless networks in the system firmware, but requires OS support for wireless access once the OS drivers have loaded
[3] Assuming you’re using integrated Intel graphics
[4] “Consumer” is a bit of a misnomer here – “enterprise” laptops like Thinkpads ship with AMT, but are often bought by consumers.

comment count unavailable comments

[$] A report from the Realtime Summit

Post Syndicated from jake original https://lwn.net/Articles/738001/rss

The 2017
Realtime Summit
(RT-Summit) was hosted by the Czech Technical University on
Saturday, October 21 in Prague, just before the Embedded Linux
Conference. It
was attended by more than 50 individuals with backgrounds ranging from
academic to
industrial, and some local students daring enough to spend a day with that
group. Guest author Mathieu Poirier provides summaries of some of the
talks from the summit.

[$] The state of the realtime union

Post Syndicated from jake original https://lwn.net/Articles/737367/rss

The 2017
Realtime Summit
was held October 21 at Czech Technical University
in Prague to discuss all manner of topics related to realtime Linux.
Nearly two years ago, a collaborative
project
was formed with the goal of mainlining the realtime patch set. At the
summit, project
lead Thomas Gleixner reported on the progress that has been made and the
plans for the future.

[$] Safety-critical realtime with Linux

Post Syndicated from corbet original https://lwn.net/Articles/734694/rss

Doing realtime processing with a general-purpose operating-system like
Linux can be a challenge by itself, but safety-critical realtime processing
ups the ante considerably. During a session at Open Source Summit North
America, Wolfgang Maurer discussed the difficulties involved in this kind
of work and what Linux has to offer.

[$] Notes from the LPC scheduler microconference

Post Syndicated from corbet original https://lwn.net/Articles/734039/rss

The scheduler
workloads microconference
at the 2017 Linux Plumbers Conference covered
several aspects of the kernel’s CPU scheduler. While workloads were on the
agenda, so were a rework of the realtime scheduler’s push/pull mechanism, a
distinctly different approach to multi-core scheduling, and the use of
tracing for workload simulation and analysis. As the following summary
shows, CPU scheduling has not yet reached a point where all of the
important questions have been answered.

A few tidbits on networking in games

Post Syndicated from Eevee original https://eev.ee/blog/2017/05/22/a-few-tidbits-on-networking-in-games/

Nova Dasterin asks, via Patreon:

How about do something on networking code, for some kind of realtime game (platformer or MMORPG or something). 😀

Ah, I see. You’re hoping for my usual detailed exploration of everything I know about networking code in games.

Well, joke’s on you! I don’t know anything about networking.

Wait… wait… maybe I know one thing.

Doom

Surprise! The thing I know is, roughly, how multiplayer Doom works.

Doom is 100% deterministic. Its random number generator is really a list of shuffled values; each request for a random number produces the next value in the list. There is no seed, either; a game always begins at the first value in the list. Thus, if you play the game twice with exactly identical input, you’ll see exactly the same playthrough: same damage, same monster behavior, and so on.

And that’s exactly what a Doom demo is: a file containing a recording of player input. To play back a demo, Doom runs the game as normal, except that it reads input from a file rather than the keyboard.

Multiplayer works the same way. Rather than passing around the entirety of the world state, Doom sends the player’s input to all the other players. Once a node has received input from every connected player, it advances the world by one tic. There’s no client or server; every peer talks to every other peer.

You can read the code if you want to, but at a glance, I don’t think there’s anything too surprising here. Only sending input means there’s not that much to send, and the receiving end just has to queue up packets from every peer and then play them back once it’s heard from everyone. The underlying transport was pluggable (this being the days before we’d even standardized on IP), which complicated things a bit, but the Unix port that’s on GitHub just uses UDP. The Doom Wiki has some further detail.

This approach is very clever and has a few significant advantages. Bandwidth requirements are fairly low, which is important if it happens to be 1993. Bandwidth and processing requirements are also completely unaffected by the size of the map, since map state never touches the network.

Unfortunately, it has some drawbacks as well. The biggest is that, well, sometimes you want to get the world state back in sync. What if a player drops and wants to reconnect? Everyone has to quit and reconnect to one another. What if an extra player wants to join in? It’s possible to load a saved game in multiplayer, but because the saved game won’t have an actor for the new player, you can’t really load it; you’d have to start fresh from the beginning of a map.

It’s fairly fundamental that Doom allows you to save your game at any moment… but there’s no way to load in the middle of a network game. Everyone has to quit and restart the game, loading the right save file from the command line. And if some players load the wrong save file… I’m not actually sure what happens! I’ve seen ZDoom detect the inconsistency and refuse to start the game, but I suspect that in vanilla Doom, players would have mismatched world states and their movements would look like nonsense when played back in each others’ worlds.

Ah, yes. Having the entire game state be generated independently by each peer leads to another big problem.

Cheating

Maybe this wasn’t as big a deal with Doom, where you’d probably be playing with friends or acquaintances (or coworkers). Modern games have matchmaking that pits you against strangers, and the trouble with strangers is that a nontrivial number of them are assholes.

Doom is a very moddable game, and it doesn’t check that everyone is using exactly the same game data. As long as you don’t change anything that would alter the shape of the world or change the number of RNG rolls (since those would completely desynchronize you from other players), you can modify your own game however you like, and no one will be the wiser. For example, you might change the light level in a dark map, so you can see more easily than the other players. Lighting doesn’t affect the game, only how its drawn, and it doesn’t go over the network, so no one would be the wiser.

Or you could alter the executable itself! It knows everything about the game state, including the health and loadout of the other players; altering it to show you this information would give you an advantage. Also, all that’s sent is input; no one said the input had to come from a human. The game knows where all the other players are, so you could modify it to generate the right input to automatically aim at them. Congratulations; you’ve invented the aimbot.

I don’t know how you can reliably fix these issues. There seems to be an entire underground ecosystem built around playing cat and mouse with game developers. Perhaps the most infamous example is World of Warcraft, where people farm in-game gold as automatically as possible to sell to other players for real-world cash.

Egregious cheating in multiplayer really gets on my nerves; I couldn’t bear knowing that it was rampant in a game I’d made. So I will probably not be working on anything with random matchmaking anytime soon.

Starbound

Let’s jump to something a little more concrete and modern.

Starbound is a procedurally generated universe exploration game — like Terraria in space. Or, if you prefer, like Minecraft in space and also flat. Notably, it supports multiplayer, using the more familiar client/server approach. The server uses the same data files as single-player, but it runs as a separate process; if you want to run a server on your own machine, you run the server and then connect to localhost with the client.

I’ve run a server before, but that doesn’t tell me anything about how it works. Starbound is an interesting example because of the existence of StarryPy — a proxy server that can add some interesting extra behavior by intercepting packets going to and from the real server.

That means StarryPy necessarily knows what the protocol looks like, and perhaps we can glean some insights by poking around in it. Right off the bat there’s a list of all the packet types and rough shapes of their data.

I modded StarryPy to print out every single decoded packet it received (from either the client or the server), then connected and immediately disconnected. (Note that these aren’t necessarily TCP packets; they’re just single messages in the Starbound protocol.) Here is my quick interpretation of what happens:

  1. The client and server briefly negotiate a connection. The password, if any, is sent with a challenge and response.

  2. The client sends a full description of its “ship world” — the player’s ship, which they take with them to other servers. The server sends a partial description of the planet the player is either on, or orbiting.

  3. From here, the server and client mostly communicate world state in the form of small delta updates. StarryPy doesn’t delve into the exact format here, unfortunately. The world basically freezes around you during a multiplayer lag spike, though, so it’s safe to assume that the vast bulk of game simulation happens server-side, and the effects are broadcast to clients.

The protocol has specific message types for various player actions: damaging tiles, dropping items, connecting wires, collecting liquids, moving your ship, and so on. So the basic model is that the player can attempt to do stuff with the chunk of the world they’re looking at, and they’ll get a reaction whenever the server gets back to them.

(I’m dimly aware that some subset of object interactions can happen client-side, but I don’t know exactly which ones. The implications for custom scripted objects are… interesting. Actually, those are slightly hellish in general; Starbound is very moddable, but last I checked it has no way to send mods from the server to the client or anything similar, and by default the server doesn’t even enforce that everyone’s using the same set of mods… so it’s possible that you’ll have an object on your ship that’s only provided by a mod you have but the server lacks, and then who knows what happens.)

IRC

Hang on, this isn’t a video game at all.

Starbound’s “fire and forget” approach reminds me a lot of IRC — a protocol I’ve even implemented, a little bit, kinda. IRC doesn’t have any way to match the messages you send to the responses you get back, and success is silent for some kinds of messages, so it’s impossible (in the general case) to know what caused an error. The most obvious fix for this would be to attach a message id to messages sent out by the client, and include the same id on responses from the server.

It doesn’t look like Starbound has message ids or any other solution to this problem — though StarryPy doesn’t document the protocol well enough for me to be sure. The server just sends a stream of stuff it thinks is important, and when it gets a request from the client, it queues up a response to that as well. It’s TCP, so the client should get all the right messages, eventually. Some of them might be slightly out of order depending on the order the client does stuff, but that’s not a big deal; anyway, the server knows the canonical state.

Some thoughts

I bring up IRC because I’m kind of at the limit of things that I know. But one of those things is that IRC is simultaneously very rickety and wildly successful: it’s a decade older than Google and still in use. (Some recent offerings are starting to eat its lunch, but those are really because clients are inaccessible to new users and the protocol hasn’t evolved much. The problems with the fundamental design of the protocol are only obvious to server and client authors.)

Doom’s cheery assumption that the game will play out the same way for every player feels similarly rickety. Obviously it works — well enough that you can go play multiplayer Doom with exactly the same approach right now, 24 years later — but for something as complex as an FPS it really doesn’t feel like it should.

So while I don’t have enough experience writing multiplayer games to give you a run-down of how to do it, I think the lesson here is that you can get pretty far with simple ideas. Maybe your game isn’t deterministic like Doom — although there’s no reason it couldn’t be — but you probably still have to save the game, or at least restore the state of the world on death/loss/restart, right? There you go: you already have a fragment of a concept of entity state outside the actual entities. Codify that, stick it on the network, and see what happens.

I don’t know if I’ll be doing any significant multiplayer development myself; I don’t even play many multiplayer games. But I’d always assumed it would be a nigh-impossible feat of architectural engineering, and I’m starting to think that maybe it’s no more difficult than anything else in game dev. Easy to fudge, hard to do well, impossible to truly get right so give up that train of thought right now.

Also now I am definitely thinking about how a multiplayer puzzle-platformer would work.

BEURK – Linux Userland Preload Rootkit

Post Syndicated from Darknet original http://feedproxy.google.com/~r/darknethackers/~3/ocUJrwmh2Bk/

BEURK is an userland preload rootkit for GNU/Linux, heavily focused around anti-debugging and anti-detection. Being a userland rootkit it gives limited privileges (whatever the user has basically) vs a superuser or root level rootkit. Features Hide attacker files and directories Realtime log cleanup (on utmp/wtmp) Anti process and login detection…

Read the full post at darknet.org.uk

[$] A deadline scheduler update

Post Syndicated from corbet original https://lwn.net/Articles/716982/rss

The deadline CPU scheduler has come a long way, Juri Lelli said in his 2017
Linaro Connect session, but there is still quite a bit of work to be done.
While this scheduler was originally intended for realtime workloads, there is
reason to believe that it is well suited for other settings, including the
embedded and mobile world. In this talk, he gave a summary of what the
deadline scheduler provides now and the changes that are envisioned for the
near (and not-so-near) future.

AWS Week in Review – February 27, 2016

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-week-in-review-february-27-2016/

This edition includes all of our announcements, content from all of our blogs, and as much community-generated AWS content as I had time for. Going forward I hope to bring back the other sections, as soon as I get my tooling and automation into better shape.

Monday

February 27

Tuesday

February 28

Wednesday

March 1

Thursday

March 2

Friday

March 3

Saturday

March 4

Sunday

March 5

Jeff;

 

Congratulations to the Winners of the Serverless Chatbot Competition!

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/congratulations-to-the-winners-of-the-serverless-chatbot-competition/

I announced the AWS Serverless Chatbot Competion in August and invited you to build a chatbot for Slack using AWS Lambda and Amazon API Gateway.

Last week I sat down with fellow judges Tim Wagner (General Manager of AWS Lambda) and Cecilia Deng (a Software Development Engineer on Tim’s team) to watch the videos and to evaluate all 62 submissions. We were impressed by the functionality and diversity of the entrees, as well as the efforts that the entrants put in to producing attractive videos to show their submissions in action.

After hours of intense deliberation we chose a total of 9 winners: 8 from individuals, teams & small organizations and one from a larger organization. Without further ado, here you go:

Individuals, Teams, and Small Organizations
Here are the winners of the Serverless Slackbot Hero Award. Each winner receives one ticket to AWS re:Invent, access to discounted hotel room rates, public announcement and promotion during the Serverless Computing keynote, some cool swag, and $100 in AWS Credits. You can find the code for many of these bots on GitHub. In alphabetical order, the winners are:

AWS Network Helper“The goal of this project is to provide an AWS network troubleshooting script that runs on a serverless architecture, and can be interacted with via Slack as a chat bot.GitHub repo.

B0pb0t – “Making Mealtime Awesome.” GitHub repo.

Borges – “Borges is a real-time translator for multilingual Slack teams.” GitHub repo.

CLIve – “CLIve makes managing your AWS EC2 instances a doddle. He understands natural language, so no need to learn a new CLI!”

Litlbot – “Litlbot is a Slack bot that enables realtime interaction with students in class, creating a more engaged classroom and learning experience.” GitHub repo.

Marbot – “Forward alerts from Amazon Web Services to your DevOps team.”

Opsidian – “Collaborate on your AWS infra from Slack using natural language.”

ServiceBot – “Communication platform between humans, machines, and enterprises.” GitHub repo.

Larger Organization
And here’s the winner of the Serverless Slackbot Large Organization Award:

Eva – “The virtual travel assistant for your team.” GitHub repo.

Thanks & Congratulations
I would like to personally thank each of the entrants for taking the time to submit their entries to the competition!

Congratulations to all of the winners; I hope to see you all at AWS re:Invent.


Jeff;

 

PS – If this list has given you an idea for a chatbot of your very own, please watch our Building Serverless Chatbots video and take advantage of our Serverless Chatbot Sample.

Friday’s security updates

Post Syndicated from n8willis original http://lwn.net/Articles/697707/rss

CentOS has updated python (C7; C6: multiple vulnerabilities).

Fedora has updated ca-certificates (F24: update to CA certificates) and spice (F23: multiple vulnerabilities).

Oracle has updated kernel
(O7: TCP injection) and python (O7; O6: multiple vulnerabilities).

Red Hat has updated kernel (RHEL7; RHEL6:
TCP injection),
kernel-rt (RHEL7: TCP injection), python (RHEL 6,7: multiple vulnerabilities), python27-python (RHSC: multiple vulnerabilities), python33-python (RHSC: multiple vulnerabilities), realtime-kernel (RHEM2.5: TCP injection), rh-mariadb101-mariadb (RHSC: multiple vulnerabilities), rh-python34-python (RHSC: multiple vulnerabilities), and rh-python35-python (RHSC: multiple vulnerabilities).

SUSE has updated the Linux
Kernel
(SLE12: multiple vulnerabilities) and xen (SLE11: multiple vulnerabilities).

Ubuntu has updated gnupg
(12.04, 14.04, 16.04: flawed random-number generation), libgcrypt11, libgcrypt20 (12.04, 14.04,
16.06: flawed random-number generation),
and postgresql-9.1, postgresql-9.3,
postgresql-9.5
(12.04, 14.04, 16.04: multiple vulnerabilities).

Xenomai project mourns Gilles Chanteperdrix

Post Syndicated from jake original http://lwn.net/Articles/697594/rss

The Xenomai project is mourning Gilles Chanteperdrix, a longtime maintainer of the realtime framework, who recently passed away. In the announcement, Philippe Gerum writes: “Gilles will forever be remembered as a true-hearted man, a brilliant mind always scratching beneath the surface, looking for elegance in the driest topics, never jaded from such accomplishment.

According to Paul Valéry, “death is a trick played by the inconceivable on the conceivable”. Gilles’s absence is inconceivable to me, I can only assume that for once, he just got rest from tirelessly helping all of us.”

Readmission Prediction Through Patient Risk Stratification Using Amazon Machine Learning

Post Syndicated from Ujjwal Ratan original https://blogs.aws.amazon.com/bigdata/post/Tx1Z7AR9QTXIWA1/Readmission-Prediction-Through-Patient-Risk-Stratification-Using-Amazon-Machine

Ujjwal Ratan is a Solutions Architect with Amazon Web Services

The Hospital Readmission Reduction Program (HRRP) was included as part of the Affordable Care Act to improve quality of care and lower healthcare spending. A patient visit to a hospital may be constituted as a readmission if the patient in question is admitted to a hospital within 30 days after being discharged from an earlier hospital stay. This should be easy to measure right? Wrong.

Unfortunately, it gets more complicated than this. Not all readmissions can be prevented, as some of them are part of an overall care plan for the patient. There are also factors beyond the hospital’s control that may cause a readmission. The Center for Medicare and Medicaid Services (CMS) recognized the complexities with measuring readmission rates and came up with a set of measures to evaluate providers.

There is still a long way to go for hospitals to be effective in preventing unplanned readmissions. Recognizing factors effecting readmissions is an important first step, but it is also important to draw out patterns in readmission data by aggregating information from multiple clinical and non-clinical hospital systems.

Moreover, most analysis algorithms rely on financial data which omit the clinical nuances applicable to a readmission pattern. The data sets contain a lot of redundant information like patient demographics and historical data. All this creates a massive data analysis challenge that may take months to solve using conventional means.

In this post, I show how to apply advanced analytics concepts like pattern analysis and machine learning to do risk stratification for patient cohorts.

The role of Amazon ML

There have been multiple global scientific studies on scalable models for predicting readmissions with high accuracy. Some of them, like comparison of models for predicting early hospital readmissions and predicting hospital readmissions in the Medicare population, are great examples.

Readmission records demonstrate patterns in data that can be used in a prediction algorithm. These patterns can be separated as outliers that are used to identify patient cohorts with high risk. Attribute correlation helps to identify the significant features that effect readmission risk in a patient.  This risk stratification in patients is enabled by categorizing patient attributes into numerical, categorical, and text attributes and applying statistical methods like standard deviation, median analysis, and the chi-squared test. These data sets are used to build statistical models to identify patients demonstrating certain characteristics consistent with readmissions so necessary steps can be taken to prevent it.

Amazon Machine Learning (Amazon ML) provides visual tools and wizards that guide users in creating complex ML models in minutes. You can also interact with it using the AWS CLI and API to integrate the power of ML with other applications. Based on the chosen target attribute in Amazon ML, you can build ML models like a binary classification model that predicts between states of 0 or 1 or a numeric regression model that predicts numerical values based on certain correlated attributes.

Creating an ML model for readmission prediction

The following diagram represents a reference architecture for building a scalable ML platform on AWS.

  1. The first step is to get the data into Amazon S3, the object storage service from AWS.
  2. Amazon Redshift acts as the database for the huge amounts of structured clinical data. The data is loaded into Amazon Redshift tables and is massaged to make it more meaningful as a data source for an ML model.
  3. A binary classification ML model is created using Amazon ML, with Amazon Redshift as the data source. A real-time endpoint is also created to allow real-time querying for the ML model.
  4. Amazon Cognito is used for secure federated access to the Amazon ML real-time endpoint.
  5. A static web site is created on S3. This website hosts the end user facing application using which one can query the Amazon ML endpoint in real time.

The architecture above is just one of the ways in which you can use AWS for building machine learning applications. You can vary this architecture and add services such as Amazon Elastic Map Reduce (EMR) if your use case involves large volumes of unstructured data sets or build a business intelligence (BI) reporting interface for analysis of predicted metrics. AWS provides a range of services that act as building blocks for the use case you want to build.

 

Prerequisite: Start with a data set

The first step in creating an accurate model is to choose the right data set to build and train the model. For the purposes of this post, I am using a publicly available diabetes data set from the University of California, Irvine (UCI).  The data set consists of 101,766 rows and represents 10 years of clinical care records from 130 US hospitals and integrated delivery networks. It includes over 50 features (attributes) representing patient and hospital outcomes. The data set can be downloaded from the UCI website. The hosted zip file consists of two csv files. The first file, diabetic_data.csv, is the actual data set and the second file, IDs_mapping.csv is the master data for admission_type_id, discharge_disposition_id, and admission_source_id.

Amazon ML automatically splits source data sets into two parts. The first part is used to train the ML model and the second part is used to evaluate the ML model’s accuracy. In this case, seventy percent of the source data is used to train the ML model and thirty percent is used to evaluate it. This is represented in the data rearrangement attribute as shown below:

ML model training data set:

{
  "splitting": {
    "percentBegin": 0,
    "percentEnd": 70,
    "strategy": "random",
    "complement": false,
    "strategyParams": {
      "randomSeed": ""
    }
  }
}

ML model evaluation data set:

{
  "splitting": {
    "percentBegin": 70,
    "percentEnd": 100,
    "strategy": "random",
    "complement": false,
    "strategyParams": {
      "randomSeed": ""
    }
  }
}

The accuracy of ML models becomes better when more data is used to train it. The data set I’m using in this post is very limited for building a comprehensive ML model but this methodology can be replicated with larger data sets.

 

Prepare the data and move it into Amazon S3

For an ML model to be effective, you should prepare the data so that it provides the right patterns to the model. The data set should have good coverage for relevant features, be low in unwanted “noise” or variance, and be as complete as possible with correct labels.

Use the Amazon Redshift database to prepare the data set. To begin, copy the data into an S3 bucket named diabetesdata. The bucket consists of four CSV files:

You can LIST the bucket contents by running the following command in the AWS CLI:

aws s3 ls s3://diabetesdata

Following this, create the necessary tables in Amazon Redshift to process the data in the CSV files by creating three master tables in one transaction table.

The transaction table consists of lookup IDs which act as foreign keys (FK) from the above master tables. It also has a primary key “encounter_id” and multiple columns that act as features for the ML model. The createredshifttables.sql script is executed to create the above tables.         

After the necessary tables are created, start loading them with data. You can make use of the Amazon Redshift COPY command to copy the data from the files on S3 into the respective Amazon Redshift tables. The following script template details the format of the copy command used:

COPY diabetes_data from 's3://<S3 file path>' credentials 'aws_access_key_id=<AWS Access Key ID>;aws_secret_access_key=<AWS Secret Access Key>' delimiter ',' IGNOREHEADER 1;

The loaddata.sql script is executed for the data loading step.

 

Modify the data set in Amazon Redshift

The next step is to make some changes to the data set to make it less noisy and suitable for the ML model that you create later. There are various things you can do as part of this clean up, such as updating incomplete values and grouping attributes into categories. For example, age can be grouped into young, adult or old based on age ranges.

For the target attribute for your ML model, create a custom attribute called readmission_result, with a value of “Yes” or “No” based on conditions in the readmitted attribute. To see all the changes made to the data, see the ModifyData.sql script.

Finally, the complete modified data set is dumped into a new table, diabetes_data_modified, which acts as a source for the ML model. Notice the new custom column readmission_result, which is your target attribute for the ML model.

 

Create a data source for Amazon ML and build the ML model

Next, create an Amazon ML data source, choosing Amazon Redshift as the source. This can be easily done through the console or through the CreateDataSourceFromRedshift API operation by specifying the Redshift parameters like Cluster Name, Database Name, username, password, role and the SQL query. The IAM role for Amazon Redshift as a data source is easily populated, as shown in the screenshot below.

You need the entire data set for the ML model, so use the following query for the data source:

SELECT * FROM diabetes_data_modified

This can be modified with column names and WHERE clauses to build different data sets for training the ML model.

The steps to create a binary classification ML model are covered in detail in the Building a Binary Classification Model with Amazon Machine Learning and Amazon Redshift blog post.

Amazon ML provides two types of predictions that you can try. The first one is a batch prediction that can be generated through the console or the GetBatchPrediction API operation. The result of the batch prediction is stored in an Amazon S3 bucket and can be used to build reports for end users (like monthly actual value vs predicted value report).

You can also use the ML model to generate a real-time prediction. To enable real-time predictions, create an endpoint for the ML model either through the console or using the CreateRealTimeEndpoint API operation.

After it’s created, you can query this endpoint in real time to get a response from Amazon ML, as shown in the following CLI screenshot.


 

 

Build the end user application

The Amazon ML endpoint created earlier can be invoked using an API call. This is very handy for building an application for end users who can interact with the ML model in real time.

Create a similar application and host it as a static website on Amazon S3. This feature of S3 allows you to host websites without any web servers and takes away the complexities of scaling hardware based on traffic routed to your application. The following is a screenshot from the application:

The application allows end users to select certain patient parameters and then makes a call to the predict API. The results are displayed in real time in the results pane.

I made use of the AWS SDK for JavaScript to build this application. The SDK can be added to your script using the following code:

<script src="https://sdk.amazonaws.com/js/aws-sdk-2.3.3.min.js"></script>

 

Use Amazon Cognito for secure access

To authenticate the Amazon ML API request, you can make use of Amazon Cognito, which allows for secure access to the Amazon ML endpoint without making use of the AWS security credentials. To enable this, create an identity pool in Amazon Cognito.

Amazon Cognito creates a new role in IAM. You need to allow this new IAM role to interact with Amazon ML by attaching the AmazonMachineLearningRealTimePredictionOnlyAccess policy to the role. This IAM policy allows the application to query the Amazon ML endpoint.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "machinelearning:Predict"
      ],
      "Resource": "*"
    }
  ]
}

Next, initialize credential objects, as shown in the code below:

var parameters = {
      AccountId: "AWS Account ID",
      RoleArn: "ARN for the role created by Amazon Cognito",
      IdentityPoolId: "The identity pool ID created in Amazon Cognito"
       };
 // set the Amazon Cognito region
       AWS.config.region = 'us-east-1';
// initialize the Credentials object with the parameters
 AWS.config.credentials = new AWS.CognitoIdentityCredentials(parameters);

 

Call the AML Endpoint using the API

Create the function callApi() to make a call to the Amazon ML endpoint. The steps in the callAPI() function involve building the object that forms a part of the parameters sent to the Amazon ML endpoint, as shown in the code below:

var machinelearning = new AWS.MachineLearning({apiVersion: '2014-12-12'});
var params = {
	 	 	MLModelId: ‘<ML model ID>',
	  		PredictEndpoint: ‘<ML model real-time endpoint>',
	  		Record: 
		};
		var request = machinelearning.predict(params);

The API call returns a JSON object that includes, among other things, the predictedLabel and predictedScores parameters, as shown in the code below:

{
    "Prediction": {
        "details": {
            "Algorithm": "SGD",
            "PredictiveModelType": "BINARY"
        },
        "predictedLabel": "1",
        "predictedScores": {
            "1": 0.5548262000083923
        }
    }
}

The predictedScores parameter generates a score between 0 and 1 which you can convert into a percentage:

if(!isNaN(predictedScore)){
			finalScore = Math.round(predictedScore * 100);
			resultMessage = finalScore + "%";
		}

The complete code for this sample application is uploaded to PredictReadmission_AML GitHub repo for reference and can be used to create more sophisticated machine learning applications using Amazon ML.

 

Conclusion

The power of machine learning opens new avenues for advanced analytics in healthcare. With new means of gathering data that range from sensors mounted on medical devices to medical images and everything in between, the complexities demonstrated by these varied data sets are pushing the boundaries of conventional analysis techniques.

The advent of cloud computing has made it possible for researchers to take up the challenging task of synthesizing these data sets and draw insights that are providing us with information that we never knew existed.

We are still at the beginning of this journey and there are, of course, challenges that we have to overcome. The ease of availability of quality data sets, which is the starting point of any good analysis, is still a major hurdle. Regulations like Health Insurance Portability and Accountability Act of 1996 (HIPAA) make it difficult to obtain medical records with Protected Health Information (PHI). The good news is that this is changing with initiatives like AWS Public Data Sets, which hosts a variety of public data sets that anyone can use.

At the end of the day, all this analysis and research is for one cause: To improve the quality of human lives. I hope this is, and will continue to be, the greatest motivation to overcome any challenge.

If you have any questions or suggestions, please comment below.
_ _ _ _ _

Do you want to be part of the conversation? Join AWS developers, enthusiasts, and healthcare professionals as we discuss building smart healthcare applications on AWS in Seattle on August 31.

Seattle AWS Big Data Meetup (Wednesday, August 31, 2016)


 

Related

Building a Multi-Class ML Model with Amazon Machine Learning
 

 

Amazon EMR 5.0.0 – Major App Updates, UI Improvements, Better Debugging, and More

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/amazon-emr-5-0-0-major-app-updates-ui-improvements-better-debugging-and-more/

The Amazon EMR team has been cranking out new releases at a fast and furious pace! Here’s a quick recap of this year’s launches:

  • EMR 4.7.0 – Updates to Apache Tez, Apache Phoenix, Presto, HBase, and Mahout (June).
  • EMR 4.6.0 – HBase for realtime access to massive datasets (April).
  • EMR 4.5.0 – Updates to Hadoop, Presto; addition of Spark and EMRFS (April).
  • EMR 4.4.0 – Sqoop, HCatalog, Java 8, and more (March).
  • EMR 4.3.0 – Updates to Spark, Presto, and Ganglia (January).

Today the team is announcing and releasing EMR 5.0.0. This is a major release that includes support for 16 open source Hadoop ecosystem projects, major version upgrades for Spark and Hive, use of Tez by default for Hive and Pig, user interface improvements to Hue and Zeppelin, and enhanced debugging functionality.

Here’s a map that shows how EMR has progressed over the course of the past couple of releases:

Let’s check out the new features in EMR 5.0.0!

Support for 16 Open Source Hadoop Ecosystem Projects
We started using Apache Bigtop to manage the EMR build and packaging process during the development of EMR 4.0.0. The use of Bigtop helped us to accelerate the release cycle while we continued to add additional packages from the Hadoop ecosystem, with a goal of making the newest GA (generally available) open source versions accessible to you as quickly as possible.

In accord with our goal, EMR 5.0 includes support for 16 Hadoop ecosystem projects including Apache Hadoop, Apache Spark, Presto, Apache Hive, Apache HBase, and Apache Tez. You can choose the desired set of apps when you create a new EMR cluster:

Major Version Upgrade for Spark and Hive
This release of EMR updates Hive (a SQL-like interface for Tez and Hadoop MapReduce) from 1.0 to 2.1, accompanied by a move to Java 8. It also updates Spark (an engine for large-scale data processing) from 1.6.2 to 2.0, with a similar move to Scala 2.11. The Spark and Hive updates are both major releases and include new features, performance enhancements, and bug fixes. For example, Spark now includes a Structured Streaming API, better SQL support, and more. Be aware that the new versions of Spark and Hive are not 100% backward compatible with the old ones; check your code and upgrade to EMR 5.0.0 with care.

With this release, Tez is now the default execution engine for Hive 2.1 and Pig 0.16, replacing Hadoop MapReduce and resulting in better performance, including reduced query latency. With this update, EMR uses MapReduce only when running a Hadoop MapReduce job directly (Hive and Pig now use Tez; Spark has its own framework).

User Interface Improvements
EMR 5.0.0 also updates Apache Zeppelin (a notebook for interactive data analytics) from 0.5.6 to 0.6.1, and Hue (an interface for analyzing data with Hadoop) from 3.7.1 to 3.10. The new versions of both of these web-based tools include new features and lots of smaller improvements.

Zeppelin is often used with Spark; Hue works well with Hive, Pig, and HBase. The new version of Hue includes a notebooks feature that allows you to have multiple queries on the same page:

Hue can also help you to design Oozie workflows:

Enhanced Debugging Functionality
Finally, EMR 5.0.0 includes some better debugging functionality, making it easier for you to figure out why a particular step of your EMR job failed. The console now displays a partial stack track and links to the log file (stored in Amazon S3) in order to help you to find, troubleshoot, and fix errors:

Launch a Cluster Today
You can launch an EMR 5.0.0 cluster today in any AWS Region! Open up the EMR Console, click on Create cluster, and choose emr-5.0.0 from the Release menu:

Learn More
To learn more about this powerful new release of EMR, plan to attend our webinar or August 23rd, Introducing Amazon EMR Release 5.0: Faster, Easier, Hadoop, Spark, and Presto.


Jeff;